Synlett 2021; 32(18): 1837-1842
DOI: 10.1055/s-0040-1705977
cluster
Machine Learning and Artificial Intelligence in Chemical Synthesis and Catalysis

A Molecular Stereostructure Descriptor Based On Spherical Projection

Li-Cheng Xu
,
Xin Li
,
Miao-Jiong Tang
,
Luo-Tian Yuan
,
Jia-Yu Zheng
,
Shuo-Qing Zhang
,
Xin Hong
Department of Chemistry, Zhejing University, Zheda Road 38, 310027, Hangzhou, P. R. of China
› Author Affiliations
Financial support from the National Natural Science Foundation of China (21702182 and 21873081), the Fundamental Research Funds for the Central Universities (2020XZZX002-02), and the State Key Laboratory of Clean Energy Utilization (ZJUCEU2020007).
 


Abstract

Description of molecular stereostructure is critical for the machine learning prediction of asymmetric catalysis. Herein we report a spherical projection descriptor of molecular stereostructure (SPMS), which allows precise representation of the molecular van der Waals (vdW) surface. The key features of SPMS descriptor are presented using the examples of chiral phosphoric acid, and the machine learning application is demonstrated in Denmark’s dataset of asymmetric thiol addition to N-acylimines. In addition, SPMS descriptor also offers a color-coded diagram that provides straightforward chemical interpretation of the steric environment.


#

Stereostructure is one of the most fundamental molecular properties, which plays a pivotal role in many areas including asymmetric catalysis[1], drug-target interaction[2], and material design.[3] The description of molecular stereostructure has been a long-term topic in physical organic chemistry, and the classic strategy is to use the key geometric parameters (i.e., distance, angle, and dihedral angle).[4] These stereostructure descriptors are readily available from the molecular 3D coordinates and allows straightforward chemical interpretation. A large number of related descriptors were applied in daily practice of organic chemist, such as Tolman angle,[5] bite angle,[6] and Sterimol parameters.[7] In addition, continuous chirality measure[8] (CCM) and derived electronic chirality measure[9] (ECM) descriptors were developed to parameterize the chirality of molecule, which have been successfully applied in a wide array of quantitative structure–activity relationship (QSAR) studies in asymmetric catalysis.[10]

Zoom Image
Figure 1 Selected examples of molecular stereostructure descriptors and our approach of stereostructure representation based on spherical projection

In addition to the key geometric parameters that can be directly applied for machine learning purposes,[11] various approaches were developed to create descriptor vectors using molecular 3D coordinates.[12] These descriptor vectors are suited for machine learning applications, including the widely used smooth overlap of atomic positions[13] (SOAP, Figure [1a]) and atom-centered symmetry functions[14] (ACSF, Figure [1b]). However, many of these widely used features in molecular machine learning were not developed for asymmetric catalysis, which would give identical vector (such as SOAP and ACSF) for enantiomeric molecules. This could bring limitations to the training of machine learning models for chiral induction knowledge if the only differentiable label is a one-hot feature (R or S).

Capturing the information of molecular stereostructure has a rich research history in the 3D-QSAR study of asymmetric catalysis, and the last two decades have witnessed fruitful results of descriptor development.[15] For molecules with the same scaffold, alignment-dependent comparative molecular field analysis (CoMFA) approach[16] has been proved as a powerful strategy. By aligning the molecules based on the core structure, the stereostructure information can be retrieved by placing the target molecules into common grids (Figure [1c]). Probing at each grid would result in a grid-based description of stereostructure information. These probes, including Lennard–Jones potential,[17] Coulombic interaction,[18] average steric occupancy (ASO),[19] and atomic electronic indicator fields (AEIF),[20] provided multidimensional information for the CoMFA approach and supported its remarkable success in the 3D-QSAR study of asymmetric catalysis.[21] One of the landmark applications is the recent breakthrough of machine learning prediction in the asymmetric thiol addition to N-acylimines by Denmark and co-workers,[19a] in which the BINOL-derived chiral phosphoric acids are encoded using ASO descriptors. To circumvent the requirement of structural alignment, grid-independent descriptors[22] (GRIND) have also been developed and successful applied in the modelling of asymmetric catalysis.[23]

Inspired by the success of spherical projection in object recognition,[24] we surmised that the same strategy can be applied in the description of molecular van der Waals (vdW) surfaces, which is critical for the enantiomeric discrimination in asymmetric catalysis. Herein we report a spherical projection descriptor of molecular stereostructure (SPMS). This approach creates a readily available matrix descriptor that can capture the stereostructure information, whose ability in molecular machine learning was demonstrated in Denmark’s dataset of asymmetric thiol addition to N-acylimines. In addition, SPMS also offers a color-coded diagram that enables straightforward chemical interpretation.

Zoom Image
Figure 2 Generation procedure of SPMS descriptor using l-proline as demonstration

The generation procedure of SPMS descriptor is demonstrated using l-proline as an example (Figure [2]). The l-proline molecule is first placed in a sphere with customized center and radius. In this case, the chiral carbon is selected as the sphere center. Subsequent rotation standardizes the orientation of molecule, which makes the generated SPMS descriptor invariant of rotation and translation (Figure S1). This orientation standardization allows SPMS descriptor to differentiate the enantiomeric compounds. The distance between the molecular vdW surface and the sphere is next projected to the sphere surface in a color-coded fashion. Red region indicates that this part of molecular surface is proximal to the sphere and sterically demanding from the sphere perspective. Equirectangular projection of the sphere surface creates the desired SPMS descriptor, which is a color-coded diagram and also a readable matrix for machine learning models. The details of the generation procedure are included in the Supporting Information. We also provided a website[25] for users to create the SPMS descriptor with uploaded coordinate of target molecule.

The resolution of SPMS descriptor is customizable, and the recommended resolution that balances accuracy and generation efficiency is 40 × 80. Figure [3] compares four resolutions of SPMS descriptors of l-proline. The difference between 10 × 20 and 80 × 160 resolutions is significant (Diff(a, d), Figure [3]), with a mean absolute deviation of 0.27 Å. This suggests that the 10 × 20 resolution is insufficient to capture the stereostructure information (Figure [3a]). Similar situation exists when comparing the 20 × 40 and 80 × 160 resolutions (Diff(b, d), Figure [3]). When the resolution increases to 40 × 80, the difference is limited, with only 0.04 Å mean absolute deviation (Diff(c, d), Figure [3]). Therefore, the 40 × 80 resolution allows the desired description of l-proline stereostructure in a sub-angstrom accuracy. SPMS descriptor in the 40 × 80 resolution can be generated within milliseconds for general chiral catalysts in asymmetric synthesis.

Zoom Image
Figure 3 Comparisons of SPMS descriptors of l-proline in different resolutions

The SPMS descriptor can accurately capture and represent the information of molecular vdW surface. Figure [4] includes three scenarios that are typically encountered in asymmetric catalysis, using chiral phosphoric acid catalysts as demonstration. Figure [4a] compares the SPMS diagrams of the spiro phosphoric acid 1 and the BINOL-derived phosphoric acid 2. The change of chiral scaffold is clearly differentiated in the highlighted region. The vdW surface of BINAP scaffold is closer to the sphere surface as compared to the spiro scaffold, thus creating a redder region in the highlighted area. Comparing the BINOL-derived phosphoric acids 3 and 4, Figure [4b] presents the capability of SPMS descriptor in representing the substituent effect. The shape and steric bulkiness of the two t-Bu substituents are precisely captured in the highlighted regions. Figure [4c] compares the SPMS descriptors of the enantiomeric phosphoric acids, (R)-5 and (S)-5, which is the key application purpose that SPMS is designed for. The two enantiomers have exactly the opposite pattern in the SPMS diagrams, which reflect the enantiomeric nature. The diagrams of (R)-5 and (S)-5 are not mirror images because of the spherical coordinate in the equirectangular projection. This does not affect the application of SPMS descriptors in machine learning, but the spherical coordinate can be adjusted based on the user’s desire. In addition, the change of stereostructure between the two enantiomers can be described by the difference image of the two corresponding SPMS descriptors, as demonstrated in Figure [4c]. This creates a new set of SPMS descriptors, Diff(R, S), which describe how the vdW surfaces change between the two enantiomers from a standardized sphere perspective. This difference matrix is closely related to the nature of chiral induction, which would be helpful in the future machine learning trainings of asymmetric catalysis.

We next demonstrated the applications of SPMS descriptor in the machine learning of asymmetric catalysis. We used the dataset of asymmetric thiol addition to N-acylimines from the study of Denmark and co-workers.[19a] The dataset includes 1075 experimental enantioselectivities from the combinations of five N-acyl imines, five thiols, and 43 chiral phosphoric acid (CPA) catalysts (Figure [5a]). For each reactant or catalyst, 20 favorable conformations were identified using MMFF94 force field.[26] The SPMS descriptors of the 20 conformations were generated and averaged into the final SPMS descriptor of the target molecule. Consideration of conformational flexibility would yield a more descriptive representation of stereostructure as compared to a static single conformer, as demonstrated in previous applications of CoMFA descriptors in asymmetric catalysis.[10e] [19] [27] For each asymmetric transformation, the SPMS descriptors of imine, thiol, and CPA were concatenated to a three-channel matrix input, which was subjected to a convolution neural network[28] for the enantioselectivity training (ΔΔG in kcal/mol).

Zoom Image
Figure 4 Key features of SPMS descriptors in representing the molecular stereostructure, using chiral phosphoric acids as demonstration: (a) change of chiral scaffold; (b) change of substituent; (c) enantiomeric compounds.
Zoom Image
Figure 5 Application of SPMS descriptor in machine learning of CPA-catalyzed asymmetric thiol addition to N-acylimines (a to c) and chemical interpretation of steric environment (d).

These 1075 reactions were randomly partitioned into 600 data for model training and 475 data for validation. To ensure that the dataset partitioning was unbiased, this process was repeated ten times. The averaged mean absolute error of ten trials is 0.1624 kcal/mol, and the R2 of trial 1 is 0.8904 (Figure [5c]). The performance of our model is slightly inferior as compared to Denmark’s results[19a] (averaged MAE of ten trials: 0.1516 kcal/mol), which may be due to the lack of electronic descriptors in our training and the fact that SPMS mainly describes the vdW surface while the description of the internal structural of CPA catalyst is insufficient. The details of model training are provided in the Supporting Information.

In addition to the machine learning application, the SPMS diagram essentially captures the vdW surface in a fashion that follows the general consensus of organic chemists, which allows straightforward chemical interpretation of the stereostructure. Figure [5d] shows the structure of RuII-(R)-BINAP,[29] whose chiral induction is usually interpreted using the classic four-quadrant diagram. The corresponding SPMS diagram of RuII-(R)-BINAP captures the change of steric environment in the four quadrants, in which the red regions are in the second and forth quadrants as expected. Quantified comparison between the four quadrants is also allowed through integration in each quadrant (Figure S4). Therefore, the SPMS descriptors can also be used as a tool in the understanding of stereostructure for daily practice of experimental chemist as well as chemical education.

In summary, a molecular stereostructure descriptor is developed based on spherical projection strategy (SPMS). By projecting the distance between the vdW surface and the customized sphere, SPMS descriptors accurately captures the stereostructure information of vdW surface in a matrix or a color-coded diagram. The key features of SPMS descriptors in the application of asymmetric catalysis are elaborated using chiral phosphoric acids as examples, which presents the capability of SPMS in the differentiation between the chiral scaffolds, substituents, as well as enantiomers. The machine learning application of SPMS descriptors was demonstrated on the dataset of CPA-catalyzed asymmetric thiol addition to N-acylimines, which provides a satisfying regression model of the experimental enantioselectivities. In addition to its application in machine learning, the SPMS diagram also follows the general consensus of organic chemists, which allows straightforward chemical interpretation of steric environment. We envision that SPMS descriptors can serve as a complementary molecular feature to the CoMFA-based descriptors, together supporting the advancement of machine learning predictions of asymmetric catalysis.


#

Acknowledgment

Calculations were performed on the high-performance computing system at the Department of Chemistry, Zhejiang University.

Supporting Information


Corresponding Authors

Shuo-Qing Zhang
Department of Chemistry, Zhejing University
Zheda Road 38, 310027, Hangzhou
P. R. of China
Xin Hong
Department of Chemistry, Zhejing University
Zheda Road 38, 310027, Hangzhou
P. R. of China   

Publication History

Received: 23 July 2020

Accepted after revision: 23 October 2020

Article published online:
18 November 2020

© 2020. Thieme. All rights reserved

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Figure 1 Selected examples of molecular stereostructure descriptors and our approach of stereostructure representation based on spherical projection
Zoom Image
Figure 2 Generation procedure of SPMS descriptor using l-proline as demonstration
Zoom Image
Figure 3 Comparisons of SPMS descriptors of l-proline in different resolutions
Zoom Image
Figure 4 Key features of SPMS descriptors in representing the molecular stereostructure, using chiral phosphoric acids as demonstration: (a) change of chiral scaffold; (b) change of substituent; (c) enantiomeric compounds.
Zoom Image
Figure 5 Application of SPMS descriptor in machine learning of CPA-catalyzed asymmetric thiol addition to N-acylimines (a to c) and chemical interpretation of steric environment (d).