(6 Mar 2025) In the 12 years that ORCID has been sharing the Public Data File, it has been downloaded more than 190,000 times, serving as a data source for a diverse range of projects such as the analysis of relationships and individual trajectories within the research community, scientific migrations, collaboration networks, and the adoption of ORCID across disciplines and locations. However, we understand that using the Public Data File in its current form requires a large amount of effort. Would-be users must possess an understanding of and skill with working with such a substantial dataset: how to download, parse, extract, and upload the data into a local environment—before analysis can even begin.
Building on our current relationship with Figshare which serves as the repository for the Public Data File, ORCID member Digital Science has now generously offered to host the 2024 Public Data File into Dimension’s Google Big Query (GBQ), meaning that the data is directly available for exploration and analysis without the need to first create a local copy.
Google Big Query is a cloud-based, fully-managed data analytics platform, optimized for handling large datasets efficiently. This makes it an ideal platform for exploring and analyzing the ORCID Public Data File, which contains millions of records. The ORCID Public Data File has been used for projects such as metadata enrichment, visualizing connections between authors, data sharing practices in a particular region, and analysis of scientist migration patterns.
The beta version of this service is now available, and we hope that the lower effort required to use it will enable our community to explore and develop new innovative use cases for the ORCID data, such as reporting on peer review practices, or analysis that involves linking ORCID data with data from the World Bank. While the dataset itself is and will remain freely available, those wishing to use it will need to establish their own GBQ account; Google offers a free tier of usage up to a certain level, but levies fees for usage beyond that. Within the free tier, it is possible to run many queries before running out of quota. Digital Science has also provided example sample queries that allow you to efficiently query different parts of the ORCID dataset.
More details can be found here.