analysis.utils¶
-
bigbang.analysis.utils.clean_addresses(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶
-
bigbang.analysis.utils.clean_datetime(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶
-
bigbang.analysis.utils.clean_subject(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶
-
bigbang.analysis.utils.domain_entropy(domain, froms)¶ Compute the entropy of the distribution of counts of email prefixes within the given archive.
- Parameters
domain (string) – An email domain
froms (pandas.DataFrame) – A pandas.DataFrame with From fields, email address, and domains. See the Archive method
get_froms()
- Returns
entropy
- Return type
float
-
bigbang.analysis.utils.extract_domain(from_field)¶ Returns the domain of an email address from a string.
-
bigbang.analysis.utils.extract_email(from_field)¶ Returns an email address from a string.
-
bigbang.analysis.utils.get_index_of_msgs_with_datetime(df: pandas.core.frame.DataFrame, return_boolmask: bool = False) → numpy.array¶
-
bigbang.analysis.utils.get_index_of_msgs_with_subject(df: pandas.core.frame.DataFrame, return_boolmask: bool = False) → numpy.array¶