25 million free records of bibliographic metadata
(16 May 2017) The Library of Congress announced today that it is making 25 million records in its online catalog available for free bulk download at loc.gov/cds/products/marcDist.php. This is the largest release of digital records in the Library’s history.
The records also will be easily accessible at data.gov, the open-government website hosted by the General Services Administration (GSA). Until now, these bibliographic records have only been available individually or through a paid subscription.
The Library is also joining with George Washington University and George Mason University to host a Hack-to-Learn workshop Wednesday, May 17 through Thursday, May 18, which will bring together librarians, digital researchers and coders to explore how the data (and other interesting data sets) can be used.
“The Library of Congress is our nation’s monument to knowledge and we need to make sure the doors are open wide for everyone, not just physically but digitally too,” said Librarian of Congress Carla Hayden. “Unlocking the rich data in the Library’s online catalog is a great step forward. I’m excited to see how people will put this information to use.”
The new, free service will operate in parallel with the Library’s fee-based MARC Distribution Service, which is used extensively by large commercial customers and libraries. All records use the MARC (Machine Readable Cataloging Records) format, which is the international standard maintained by the Library of Congress with participation and support of libraries and librarians worldwide for the representation and communication of bibliographic and related information in machine-readable form.
The data covers a wide range of Library items including books, serials, computer files, manuscripts, maps, music and visual materials. The free data sets cover more than 45 years, ranging from 1968, during the early years of MARC, to 2014. Each record provides standardized information about an item, including the title, author, publication date, subject headings, genre, related names, summary and other notes.
In addition to their traditional value to libraries, the rich data included in these records can be used for a wide range of cultural, historical and literary research. “The Library of Congress catalog is literally the gold standard for bibliographic data and we believe this treasure trove of information can be used for much more than its original purpose,” added Beacher Wiggins, the Library’s director for Acquisitions and Bibliographic Access. “From more efficient information-sharing and easier analysis to visualizations and other possibilities we cannot begin to predict, we hope this data will be put to work by social scientists, data analysts, developers, statisticians and everyone else doing innovative work with large data sets to enhance learning and the formation of new knowledge.”
The Hack-to-Learn workshop will bring together experts and enthusiasts to learn more about available research tools and to conduct hands-on exploration of largely unexplored data sets, including the 25 million MARC records; 52,000 index cards of jokes from the Phyllis Diller Gag File; and 8,000 documents from Eleanor Roosevelt’s “My Day” columns. For more information about the Hack-to-Learn, visit digitalpreservation.gov/meetings/hack-to-learn/hack-to-learn-site.html.
The Library of Congress is the world’s largest library, offering access to the creative record of the United States—and extensive materials from around the world—both on site and online. It is the main research arm of the U.S. Congress and the home of the U.S. Copyright Office. Explore collections, reference services and other programs and plan a visit at loc.gov, access the official site for U.S. federal legislative information at congress.gov, and register creative works of authorship at copyright.gov.
The announcement is here.