Swiss scientists have developed a method of machine learning that can be used to determine the enantioselectivities of reactions that are catalyzed by complex organocatalysts. The key to the strong performance of this machine learning technique is a clever trick to avoid time-consuming calculations made possible by an in-depth selection of molecular descriptors, reaction representations, and feature engineering.
The development of new catalysts is essential to ensure faster, more selective, and more reliable reactions. “Experimentally, large-scale screenings remain expensive in terms of staff, time and equipment,” explains Simone Gallarati from the Swiss Federal Institute of Technology in Lausanne (EPFL), one of the lead researchers in the study. “From a computational point of view, performing calculations on hundreds of catalytic systems is still a tedious task, and accurately predicting enantioselectivity using standard methods is an incredibly difficult task.” This is due to traditional computational methods that must determine the transition states that lead to different enantiomers.
Cristina Trujillo, who was not involved in the study and is researching the computational design of organocatalysts at Trinity College Dublin, Ireland, says that the calculation of transition states in enantioselective reactions is usually very time-consuming and often error-prone. “Small errors can mean that the predicted enantiomer is the opposite of that observed experimentally. With this in mind, machine learning approaches generally offer an alternative solution to address the current computational cost challenges. “
Gallarati and colleagues investigated whether machine leaning methods could be used to determine the enantioselectivity, which results from the relative activation energy of the (R.)- and (S.) Ligand configuration of the enantiodetermining transition states of organocatalytic asymmetric propargylation, which involves the reaction of an aldehyde with an allene and leads to a new chiral center. Machine learning models, however, are not without their complications. “In principle, we could feed a machine learning model information about an unknown catalyst – in the form of its 3D structure – and receive a prediction of its selectivity within seconds,” says Gallarati. “Unfortunately, the enantioselectivity of a catalyst is an incredibly difficult quantity that machine learning models can accurately predict.”
To overcome this enantioselectivity prediction challenge, the team had to choose an appropriate representation of the propargylation reaction and then refine their model to identify key features from the structural noise. This enabled the machine learning algorithm to be trained to determine the activation energies for the competing (R.) and (S.) Paths that could then be translated into an enantiomeric excess.
“The excellent ability of the presented strategy to predict energy differences is more than remarkable,” comments Maria Besora, who studies catalysis using computational methods at the University of Rovira i Virgili in Spain.
Customized reaction representations
Since the cost of computing the enantiomeric transition states was challenging, the EPFL team investigated using the intermediate stages on either side of a transition state as a response representation to train a machine learning model. Starting with transition states from a database developed by Steven Wheeler and colleagues at Texas A&M University in the USA, the EPFL team calculated intermediate stages on both sides of the transition state using DFT reaction coordinate calculations. These intermediates were then converted into molecular representations – a version of the vital information about a molecule that can be understood by machine learning algorithms. Molecular representations “vary from collections of physical-organic parameters to text-based representations and chemoinformatics descriptors,” says Gallarati. The team chose Slatm, which stands for Spectral London and Axilrod-Teller-Muto, because this representation can encode 3D molecular structures.
The next step was to find a representation of the enantioselective reaction step that could be used to train and predict activation energies. To do this, the team examined the difference in Slatm representations of the intermediate, which “contains information about all structural features that are changed during the reaction step and eliminates those that remain unchanged,” says Gallarati. This had the advantage that the reaction is appropriately mapped and the amount of data that the machine learning algorithm has to process is reduced. Finally, the team applied a cross-validation feature engineering step to improve accuracy and reduce the noise associated with the response plots, which greatly reduced the amount of data required.
As a result, the machine learning model was able to determine the activation energy and therefore the enantioselectivity of bipyridine. to predict No,No‘-Dioxides that were not part of the training database, only from intermediate structures. In addition, the machine learning model was able to elucidate the key features of the enantioselectivity-determining transition states within the asymmetric propargylation reaction and identify the presence of π stacking and CH / π interactions as key motifs.
However, Trujillo noted a caveat that it required a large number of intermediates (over 1000) to train the algorithm and that the reaction studied was quite specific. However, in the future, machine learning solutions could expand to a wider range of systems. “I think the use of machine learning models in the field of organocatalysis will increase in the near future. In this context, I think this is a promising development, but it will take a long time for further generalization, ”says Trujillo.
“The fact that the strategy is based not on predicting enantiomeric excess, but rather on energy differences, opens the door to its applicability to other chemical problems and also to predicting enantioselectivity, which is extended to predict enantiomers when more complex problems come into play “, Notes Besora. “In principle, our approach can be used to develop a machine learning model to predict the enantioselectivity of any catalytic system,” comments Gallarati, “provided that enough structural information is available for training.”