3GPP

This page introduces a collection of simple functions with which a comprehensive overview of 3GPP mailinglists, ingressed using bigbang/ingress/listserv.py, is gained. Without extensive editing, these functions should also be applicable to IETF, ICANN, W3C, and IEEE mailinglists, however it hasn’t been tested yet.

To start, a ListservList class instance needs to be created using either .from_mbox() or .from_pandas_dataframe(). Using the former as an example:

from bigbang.analysis.listserv import ListservList

mlist_name = "3GPP_TSG_CT_WG1_122E_5G"
mlist = ListservList.from_mbox(
    name=mlist_name,
    filepath=f"/path/to/{mlist_name}.mbox",
    include_body=True,
)

The function argument include_body is by default True, but if one has to work with a large quantity of Emails, it might be necessary to set it to False to avoid out-of-memory errors.

Cropping of mailinglist

If one is interested in specific subgroups contained in a mailinglist, then the ListservList class instance can be cropped using the following functions:

# select Emails send in a specific year
mlist.crop_by_year(yrs=[2011])

# select Emails send within a period
mlist.crop_by_year(yrs=[2011, 2021])

# select Emails send or received from specified addresses
mlist.crop_by_address(
    header_field='from',
    per_address_field={'domain': ['t-mobile.at', 'nokia.com']}
)

# select Emails containing string in subject
mlist.crop_by_subject(match='OpenPGP')

In the second example, the function has an per_address_field argument. This argument is a dictionary in which the top-level keys can be localpart and domain, where the former is the part of an Email address that stands in front of the @ and the latter after. Thus for Heinrich.vonKleist@selbst.org, localpart is Heinrich.vonKleist and the domain is selbst.org.

Who is sending/receiving?

To get an insight in which actors are involved in a mailinglist, a ListservList class instance can be return the unique email domains and the unique email localparts per domain for multiple header fields:

mlist.get_domains(header_fields=['from', 'reply-to'])

mlist.get_localparts(header_fields=['from', 'reply-to'])

This will return a dictionary, in which each key (both ‘from’ and ‘reply-to’) contains a list of all domains. If one wants see not just who contributes, but also how much, change the default argument of return_msg_counts=False to True:

mlist.get_domains(header_fields=['from', 'reply-to'], return_msg_counts=True)

Alternatively, one can also get the number of Emails send or received by a certain address via,

mlist.get_messagescount(
    header_fields=['from', 'reply-to'],
    per_address_field={
        'domain': ['t-mobile.at', 'nokia.com'],
        'localpart': ['ian.hacking', 'victor.klemperer'],
    }
)

Communication Network

For a more in-depth view into who is sending (receiving) to (from) whom in a mailing list, one can use the return_msg_counts=False as follows:

mlist.create_sender_receiver_digraph()

This will create a new networkx.DiGraph() instance attribute for mlist, which can be used to perform a number of standard calculations using the networkx python package:

import networkx as nx

nx.betweenness_centrality(mlist.dg, weight="weight")
nx.closeness_centrality(mlist.dg)
nx.degree_centrality(mlist.dg)

Time-series

To study, e.g., the continuity of an actors contribution to a mailinglist, many function have an optional per_year boolean argument.

To simply find out during which period Emails were in a mailinglist, one can call mlist.period_of_activity().