Datasets
========

This section outlines, how various public mailing lists can be scraped from the web and stored to disk for further processing. Currently, the ``BigBang`` repository does not contain personally identifiable information of any kind. The datasets included in ``BigBang`` pertain to organizational entities and provide :ref:`ancillary data <ancillary_datasets>` useful in preprocessing and analysis of those entities. As the mailing-list archives are large and time consuming to scrape from the web, we are working on GDPR compliant method to share the datasets with other researchers.


.. toctree::
  :maxdepth: 2

  mailinglists
  drafts
  ancillary
  git