A team of biochemists and computer scientists has developed a new way to accurately predict the three-dimensional structures of RNA molecules using an artificial intelligence system trained on a small number of known forms of RNA.
Experts have hailed the development as a significant improvement in the computational prediction of RNA structures, saying it could lead to a better understanding of the role of RNA in cell functions and new therapeutic drugs.
Rhiju Das, associate professor of biochemistry at Stanford University in California, says the new machine learning system – called the atomic rotationally equivalent scorer (Ares) – uses an “equivariate” neural network to accurately distinguish the three-dimensional structure of an RNA molecule.
This explains that the computational ‘neurons’ in equivariate neural networks not only use numbers for activation, like other types of neural networks, but also vectors, tensors, and other types of quantifiable objects. This enables Ares to assess the structural motifs of RNA molecules such as different types of helices, “hairpins” and stems – an approach known as “geometric deep learning”.
The researchers trained the Ares system on only 18 complex RNAs, the structures of which were meticulously determined experimentally. The system was then tested on much larger RNA structures listed on the RNA Puzzles website, a decades-old scientific competition.
They used a version of Rosetta molecular modeling software to generate more than 1,500 different structural models for six dissolved RNAs from the site, while ensuring that at least 1% of them were “near native” – meaning they were the true Structure of the RNA.
Using Ares, they then calculated an evaluation for each of the model structures and their evaluation using the evaluation functions of the Rosetta software, the statistical protocol for ribonucleic acids (Rasp) and 3dRNAscore. The Ares system clearly outperformed the other three evaluation functions: Ares contained at least one of the “near native” model structures in 81% of its 10 models with the best evaluation, compared with 48% for Rosetta, 48% for Rasp and 33% for 3dRNAscore.
Ares also outperformed other scoring features in tests with pools that did not contain “near native” models. It also featured blind predictions in four rounds of the RNA puzzle competition where the true structures of the RNAs were not yet known, which in each case provided the most accurate of the models submitted.
“It was a surprise that we were able to train the Ares network with so few training examples and then get the latest results from the RNA puzzles competition for the blind,” says Das.
Playing catching up
The researchers write that scientific knowledge about the structure of RNA lags far behind that of protein structure, which benefits from artificial intelligence prediction systems such as AlphaFold from Google subsidiary DeepMind. In comparison, these are often trained on huge data sets with thousands of structures.
“The proportion of the human genome that is transcribed into RNA is roughly 30 times that that encodes proteins, but the number of RNA structures available is less than 1% that of proteins,” mainly because the structures Related RNAs are less likely to be better known than for proteins and therefore cannot be used as templates, the researchers write.
They now hope that the geometric deep learning approach developed by Ares will help stimulate research into RNA structures, although so far it has only addressed part of the process. “Our paper still relies on model pools generated with an earlier generation of Rosetta software that did not use neural networks,” says Das. “It would be wonderful to now generate the RNA 3D models yourself using tricks from geometric deep learning.”
And since Ares only needs atomic coordinates and chemical elements for his inputs, the same approach can be applied to other areas that involve a three-dimensional chemical structure. Similar equivariate neural networks have been used successfully in recent research using AlphaFold and Rosetta software, Das says.
Computational biologist Alex Bateman of the European Bioinformatics Institute, who was not involved in the study, notes that predicting RNA structures lags behind the advances in protein structure prediction made possible by AlphaFold. But “Ares’ development has shown a big step forward in this area and we look forward to having access to these models,” he says.
He warns that Ares still needs improvements in its accuracy. “Perhaps, inspired by the release of the AlphaFold 2.0 method, we’ll see even better methods and models in the months and years to come,” he says. “This is a very exciting time for RNA research.”