.. _ancillary_datasets:
Ancillary Datasets
*********************
In addition to providing tools for gathering data from public sources, ``BigBang`` also includes some datasets that have been curated by contributors and researchers.
General
================
Email domain categories
-----------------------
BigBang comes with a partial list of email domains, categorized as:
- **Generic**. A domain associated with a generic email provider. E.g. ``gmail.com``
- **Personal**. A domain associated with a single individual. E.g ``csperkins.org``
- **Company**. A domain associated with a particular company. E.g. ``apple.com``
- **Academic**. A domain associated with a university or academic professional organization. E.g. ``mit.edu``
- **SDO**. A domain associated with a Standards Development Organization. E.g. ``ietf.org``
This data can be loaded as a Pandas DataFrame with indices as email domains and
categories in the ``category`` column with the following code:
::
import bigbang.datasets.domains as domains
domain_data = domains.load_data()
The sources of this data are a hand-curated list of domains provided by BigBang contributors
and a list of generic email domain providers provided by this `public gist `_.
Organization Metadata
-----------------------
BigBang comes with a curated list of metadata about organizations. This data is provided as a DataFrame with the following columns:
- **name**. Organization name. E.g. ``gmail.com``
- **Category**. Kind of organization. E.g ``Infrastructure Company``
- **subsidiary**. This column describes when a company is the subsidiary of another company in the list. If the cell in this column is empty, this company can be understood as the parent company.. E.g. ``apple.com``
- **stakeholdergroup**. Stakeholdergroups are used as they have been defined in the WSIS process and the Tunis-agenda.
- **nationality**. The country name in which the stakeholder or subsidiary is registered.
- **email domain names**. Email domains associated with the organization. May include multiple, comma separated, domain names.
- **Membership Organization**. Membership of regional SDOs, derived from 3GPP data.
This data can be loaded as a Pandas DataFrame with indices as email domains and
categories in the ``category`` column with the following code:
::
import bigbang.datasets.organizations as organizations
organization_data = organizations.load_data()
The sources of this data are a hand-curated list of domains provided by BigBang contributors
and a list of generic email domain providers provided by this `public gist `_.
IETF
================
Publication date of protocols.
3GPP
================
Release dates of standards.