(12 Dec 2024) For nearly twenty-five years, the Library of Congress has been archiving campaign websites for Presidential, Congressional, and gubernatorial elections. Back in 2022, we released a dataset of index files for the United States Elections Web Archive, and we are happy to announce that this dataset is being relaunched as data package on data.labs.loc.gov/packages/, with new resources to help researchers understand and use the data.
The new data package includes enhanced documentation explaining the contents of the dataset and how it was created, as well as metadata for candidate campaign sites extracted from the United States Elections Web Archive. The general election seasons from 2000-2016 are currently available, with more recent data to be added later. As before, the data includes index files (CDX file format) rather than archived web content itself. These index files list archived document URLs and help users to automatically construct URLs for fetching the archived web documents. For help getting started, a Python notebook is available to demonstrate the basics of using the dataset—including how to query the metadata, filter and download CDX files, and analyze the text.
Source: Library of Congress Blogs