Ancillary Datasets
In addition to providing tools for gathering data from public sources, BigBang
also includes some datasets that have been curated by contributors and researchers.
General
Email domain categories
BigBang comes with a partial list of email domains, categorized as:
Generic. A domain associated with a generic email provider. E.g.
gmail.com
Personal. A domain associated with a single individual. E.g
csperkins.org
Company. A domain associated with a particular company. E.g.
apple.com
Academic. A domain associated with a university or academic professional organization. E.g.
mit.edu
SDO. A domain associated with a Standards Development Organization. E.g.
ietf.org
This data can be loaded as a Pandas DataFrame with indices as email domains and
categories in the category
column with the following code:
import bigbang.datasets.domains as domains
domain_data = domains.load_data()
The sources of this data are a hand-curated list of domains provided by BigBang contributors and a list of generic email domain providers provided by this public gist.
Organization Metadata
BigBang comes with a curated list of metadata about organizations. This data is provided as a DataFrame with the following columns:
name. Organization name. E.g.
gmail.com
Category. Kind of organization. E.g
Infrastructure Company
subsidiary. This column describes when a company is the subsidiary of another company in the list. If the cell in this column is empty, this company can be understood as the parent company.. E.g.
apple.com
stakeholdergroup. Stakeholdergroups are used as they have been defined in the WSIS process and the Tunis-agenda.
nationality. The country name in which the stakeholder or subsidiary is registered.
email domain names. Email domains associated with the organization. May include multiple, comma separated, domain names.
Membership Organization. Membership of regional SDOs, derived from 3GPP data.
This data can be loaded as a Pandas DataFrame with indices as email domains and
categories in the category
column with the following code:
import bigbang.datasets.organizations as organizations
organization_data = organizations.load_data()
The sources of this data are a hand-curated list of domains provided by BigBang contributors and a list of generic email domain providers provided by this public gist.
IETF
Publication date of protocols.
3GPP
Release dates of standards.