DataSeer develops AI system to track dataset reuse

(11 Mar 2026) DataSeer, in collaboration with The Michael J. Fox Foundation (MJFF), has developed a new large language model (LLM)-based system designed to detect and quantify dataset reuse across the scholarly literature at scale.

The system aims to address a longstanding challenge for research funders and institutions: measuring the downstream impact of shared research data.

By automating the detection of dataset reuse in published research, the platform is intended to remove a key bottleneck that has previously limited large-scale analysis of how data is reused.

Developed by DataSeer in collaboration with its Open Science Indicator partner PLOS and with input from the broader open science community, the LLM was piloted on a corpus of 6,000 MJFF-funded articles. Unlike traditional approaches that rely on formal data citations or digital object identifiers (DOIs), the model analyses the full text of research articles to identify reused datasets.

This allows the system to detect reuse even when datasets are referenced indirectly, such as through accession numbers, repository names, URLs, or narrative descriptions.

Find out more here.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

DataSeer develops AI system to track dataset reuse

Related Articles