3GPP¶
This page introduces a collection of simple functions with which a comprehensive overview of 3GPP mailinglists, ingressed using bigbang/ingress/listserv.py, is gained. Without extensive editing, these functions should also be applicable to IETF, ICANN, W3C, and IEEE mailinglists, however it hasn’t been tested yet.
To start, a ListservList
class instance needs to be created using either
.from_mbox()
or .from_pandas_dataframe()
. Using the former as an example:
from bigbang.analysis.listserv import ListservList
mlist_name = "3GPP_TSG_CT_WG1_122E_5G"
mlist = ListservList.from_mbox(
name=mlist_name,
filepath=f"/path/to/{mlist_name}.mbox",
include_body=True,
)
The function argument include_body
is by default True
, but if one has to work
with a large quantity of Emails, it might be necessary to set it to False to
avoid out-of-memory errors.
Cropping of mailinglist¶
If one is interested in specific subgroups contained in a mailinglist, then the ListservList class instance can be cropped using the following functions:
# select Emails send in a specific year
mlist.crop_by_year(yrs=[2011])
# select Emails send within a period
mlist.crop_by_year(yrs=[2011, 2021])
# select Emails send or received from specified addresses
mlist.crop_by_address(
header_field='from',
per_address_field={'domain': ['t-mobile.at', 'nokia.com']}
)
# select Emails containing string in subject
mlist.crop_by_subject(match='OpenPGP')
In the second example, the function has an per_address_field
argument. This
argument is a dictionary in which the top-level keys can be localpart
and domain
, where the former is the part of an Email address that stands
in front of the @ and the latter after. Thus for Heinrich.vonKleist@selbst.org,
localpart is Heinrich.vonKleist and the domain is selbst.org.
Who is sending/receiving?¶
To get an insight in which actors are involved in a mailinglist, a ListservList
class instance can be return the unique email domains and the unique email localparts
per domain for multiple header fields:
mlist.get_domains(header_fields=['from', 'reply-to'])
mlist.get_localparts(header_fields=['from', 'reply-to'])
This will return a dictionary, in which each key (both ‘from’ and ‘reply-to’)
contains a list of all domains. If one wants see not just who contributes, but
also how much, change the default argument of return_msg_counts=False
to True
:
mlist.get_domains(header_fields=['from', 'reply-to'], return_msg_counts=True)
Alternatively, one can also get the number of Emails send or received by a certain address via,
mlist.get_messagescount(
header_fields=['from', 'reply-to'],
per_address_field={
'domain': ['t-mobile.at', 'nokia.com'],
'localpart': ['ian.hacking', 'victor.klemperer'],
}
)
Communication Network¶
For a more in-depth view into who is sending (receiving) to (from) whom in a
mailing list, one can use the return_msg_counts=False
as follows:
mlist.create_sender_receiver_digraph()
This will create a new networkx.DiGraph()
instance attribute for mlist
,
which can be used to perform a number of standard calculations using the
networkx
python package:
import networkx as nx
nx.betweenness_centrality(mlist.dg, weight="weight")
nx.closeness_centrality(mlist.dg)
nx.degree_centrality(mlist.dg)
Time-series¶
To study, e.g., the continuity of an actors contribution to a mailinglist, many
function have an optional per_year
boolean argument.
To simply find out during which period Emails were in a mailinglist, one can call
mlist.period_of_activity()
.