analysis.process¶
-
bigbang.analysis.process.
ai
(m, parts, i)¶
-
bigbang.analysis.process.
bi
(m, parts, i)¶
-
bigbang.analysis.process.
consolidate_senders_activity
(activity_df, to_consolidate)¶ takes a DataFrame in the format returned by activity takes a list of tuples of format (‘from 1’, ‘from 2’) to consolidate returns the consolidated DataFrame (a copy, not in place)
-
bigbang.analysis.process.
containment_distance
(a, b)¶ A case-insensitive distance measure on strings.
- Returns
0 if strings are identical
positive infinity if neither string contains the other
1 / (minimum string length) if one string contains the other.
Good for Organizations. I.e. “cisco” “Cisco” “Cisco Systems” are all ‘close’ (< .2)
-
bigbang.analysis.process.
domain_name_from_email
(name)¶
-
bigbang.analysis.process.
eij
(m, parts, i, j)¶
-
bigbang.analysis.process.
from_header_distance
(a, b, verbose=False)¶ A distance measure specifically for the ‘From’ header of emails. Normalizes based on common differences in client handling of email, then computes Levenshtein distance between components of the field.
-
bigbang.analysis.process.
matricize
(series, func)¶ create a matrix by applying func to pairwise combos of elements in a Series returns a square matrix as a DataFrame should return a symmetric matrix if func(a,b) == func(b,a) should return the identity matrix if func == ‘==’
-
bigbang.analysis.process.
minimum_but_not_self
(column, dataframe)¶
-
bigbang.analysis.process.
modularity
(m, parts)¶ Compute modularity of an adjacency matrix. Use metric from:
Zanetti, M. and Schweitzer, F. 2012. “A Network Perspective on Software Modularity” ARCS Workshops 2012, pp. 175-186.
-
bigbang.analysis.process.
resolve_entities
(significance, distance_function, threshold=0)¶ Takes a Series mapping entities (index) to significance (values, numerical).
Resolves the entities based on a lexical distance function.
Returns a dictionary of labeled (keys) entity lists (values). Key is the most significant member of the entity list.
-
bigbang.analysis.process.
resolve_sender_entities
(act, lexical_distance=0)¶ Given an Archive’s activity matrix, return a dict of lists, each containing message senders (‘From’ fields) that have been groups to be probably the same entity.
-
bigbang.analysis.process.
sorted_matrix
(from_dataframe, limit=None, sort_key=None)¶ Takes a dataframe with ‘from’ fields for column headers
- .
Returns a sorted distance matrix for the column headers, using from_header_distance (see method).