analysis.utils¶
-
bigbang.analysis.utils.
clean_addresses
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶
-
bigbang.analysis.utils.
clean_datetime
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶
-
bigbang.analysis.utils.
clean_subject
(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame¶
-
bigbang.analysis.utils.
domain_entropy
(domain, froms)¶ Compute the entropy of the distribution of counts of email prefixes within the given archive.
- Parameters
domain (string) – An email domain
froms (pandas.DataFrame) – A pandas.DataFrame with From fields, email address, and domains. See the Archive method
get_froms()
- Returns
entropy
- Return type
float
-
bigbang.analysis.utils.
extract_domain
(from_field)¶ Returns the domain of an email address from a string.
-
bigbang.analysis.utils.
extract_email
(from_field)¶ Returns an email address from a string.
-
bigbang.analysis.utils.
get_index_of_msgs_with_datetime
(df: pandas.core.frame.DataFrame, return_boolmask: bool = False) → numpy.array¶
-
bigbang.analysis.utils.
get_index_of_msgs_with_subject
(df: pandas.core.frame.DataFrame, return_boolmask: bool = False) → numpy.array¶