Datasets

This section outlines, how various public mailing lists can be scraped from the web and stored to disk for further processing. Currently, the BigBang repository does not contain personally identifiable information of any kind. The datasets included in BigBang pertain to organizational entities and provide ancillary data useful in preprocessing and analysis of those entities. As the mailing-list archives are large and time consuming to scrape from the web, we are working on GDPR compliant method to share the datasets with other researchers.