(8 Mar 2023) Europe PMC, an open access repository of life science research, has developed a new approach to extract relevant information from research papers using machine learning algorithms. Text-mining algorithms can be used to automatically extract key concepts, relationships, and findings from scientific papers, allowing researchers to quickly identify relevant information and stay up-to-date on the latest developments in their field.
Europe PMC’s SciLite Annotations tool uses text-mining to highlight terms in research articles and preprints, allowing users to quickly scan the article for relevant concepts, such as diseases, chemicals, or protein interactions. Europe PMC contains ∼1.3 billion annotations sourced in-house and from 10 external providers. The annotations platform covers multiple annotations types including bioentities ranging from accession numbers to Open Targets gene–disease relationships. Users can programmatically access the annotations using the Annotations API, reducing the time requirement of extracting facts and evidence to help advance the discovery process.
Europe PMC has developed annotations for Gene/Protein, Disease, Organism and Chemical bioentities. For this purpose, established ontologies are being used as dictionaries to pattern-match the entity terms from the text. Although the dictionary-based approach is easy to understand and implement, it requires an exhaustive list of patterns to recall more entities and regular updating to remain current. Moreover, with the contextual information missing, this creates ambiguity, especially with the use of acronyms and abbreviations by scientists writing papers.
The press release in full is here.