An artificial intelligence system trained on nearly 40 years of scientific literature correctly identified 19 out of 20 research papers that had the greatest scientific impact on biotechnology – and has selected 50 recent papers that it predicts will be among the “Top 5%” of biotechnology will be papers in the future.1
Scientists say the system could be used to find “hidden gems” of research that are overlooked by other methods, and even to guide funding decisions in such a way that it is most likely to target promising research.
“Our goal is to develop tools that can help us identify the most interesting, exciting, and impactful research – especially research that existing publication metrics may overlook,” said James Weis, a computer scientist at the Massachusetts Institute of Technology and lead author of a new one Study of the system.
The study describes a machine learning system called Delphi, which stands for Dynamic Early Warning through Learning to Predict High Impact, and was “trained” with metrics from more than 1.6 million articles published in 42 biotech journals between 1982 and 2019.
The system evaluated 29 different characteristics of the articles in the magazines, resulting in more than 7.8 million individual machine learning nodes and 201 million relationships.
Features included regular metrics such as an author’s h-index of research productivity and the number of citations a research report has produced in the five years since it was published. This also included things like the change in an author’s h-index over time, the number and ranking of the co-authors of a paper, as well as various key figures on the journals themselves.
The researchers then used the system to correctly identify 19 of the 20 “landmark” biotechnology papers from 1980 to 2014 in a blinded study and select an additional 50 papers published in 2018 that they predict will be among the top 5% of the ” effective “will belong. Research work on biotechnology in the coming years.
Weis says that the important paper that the Delphi system overlooked included the fundamental evolution of chromosome conformational capture (3C) – methods of analyzing the spatial organization of chromosomes within a cell – also because a large number of citations that As a result, journals were not biotechnological and so were not in their database.
“We don’t expect to be able to identify all of the basic technologies early on,” he says. “First and foremost, we hope to find technologies that have been overlooked by the current measurement data.”
As with all machine learning systems, careful attention must be paid to breaking down systemic prejudices and ensuring that “malicious actors” cannot manipulate them, he says. “However, we believe that Delphi has the potential to reduce bias by avoiding reliance on simpler metrics,” he says. Weis adds that this will also make Delphi harder to “play”.
According to Weis, the Delphi prototype can easily be expanded to other scientific areas by first including additional disciplines and journals, and possibly other sources of high quality research such as the arXiv online preprint archive.
The intention is not to replace existing methods of assessing the importance of research, but to improve them, he says. “We see Delphi as an additional tool to be integrated into the researcher’s toolkit – not a substitute for expertise and intuition on a human level.”
Lutz Bornmann, sociologist of science at the headquarters of the Max Planck Society in Munich, has examined how the effects on research can be measured2 notes that many of the publication characteristics assessed by the Delphi system depend heavily on the quantification of the resulting research citations. “However, the proposed method sounds interesting and has led to the first promising empirical results,” he says. “More extensive empirical testing is needed to confirm these initial results.”