(15 Apr 2024) “Collections as data” research is an initiative based on researchers using digital collections as the dataset for analysis with the help of computational methods, such as text mining, data visualization, mapping, image analysis, audio analysis, and network analysis. For example, if a historian needs to investigate a specific topic over time, manually reviewing every written piece of history that mentions that specific topic would be a laborious and time-consuming task. Now, let’s relate this hypothetical to a concrete collection. Say that a researcher wants to investigate the history of the Cleveland Play House. This archival collection at Case Western Reserve University comprises 1681 containers (1100 linear feet) and would be a large task to investigate physically. Thankfully though, a large amount of it is available online, but to review it individually, browsing each digitized object online, would still be just as difficult. To overcome this obstacle, one would need a way to compile and export all the textual data from the collection, use a computer program to analyze the dataset, identify topics, trends, and patterns, and create a visualization of this data.
Thankfully, there is an option to export the text and programs to complete this analysis. This is where collections as data research comes into the picture and why digital library professionals have strived to alter their procedures and workflows to ensure access to the data of a collection to users.
Crissandra George, the Digital Collections Manager Librarian at Case Western Reserve University, shares more here.