analysis.utils

bigbang.analysis.utils.clean_addresses(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame
bigbang.analysis.utils.clean_datetime(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame
bigbang.analysis.utils.clean_subject(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame
bigbang.analysis.utils.domain_entropy(domain, froms)

Compute the entropy of the distribution of counts of email prefixes within the given archive.

Parameters
  • domain (string) – An email domain

  • froms (pandas.DataFrame) – A pandas.DataFrame with From fields, email address, and domains. See the Archive method get_froms()

Returns

entropy

Return type

float

bigbang.analysis.utils.extract_domain(from_field)

Returns the domain of an email address from a string.

bigbang.analysis.utils.extract_email(from_field)

Returns an email address from a string.

bigbang.analysis.utils.get_index_of_msgs_with_datetime(df: pandas.core.frame.DataFrame, return_boolmask: bool = False) → numpy.array
bigbang.analysis.utils.get_index_of_msgs_with_subject(df: pandas.core.frame.DataFrame, return_boolmask: bool = False) → numpy.array