Subscribe to RSS
DOI: 10.1055/s-0044-1788317
Deep Learning-Based Self-Adaptive Evolution of Enzymes
Funding This work was supported by the National Natural Science Foundation of China (Grant No. 22208217) for Jiang, and the Shanghai Pujiang Program (Grant No. 21PJ1423200) and the Program of Shanghai Academic/Technology Research Leader (Grant No. 23XD1435000) for Yi.Abstract
Biocatalysis has been widely used to prepare drug leads and intermediates. Enzymatic synthesis has advantages, mainly in terms of strict chirality and regional selectivity compared with chemical methods. However, the enzymatic properties of wild-type enzymes may or may not meet the requirements for biopharmaceutical applications. Therefore, protein engineering is required to improve their catalytic activities. Thanks to advances in algorithmic models and the accumulation of immense biological data, artificial intelligence can provide novel approaches for the functional evolution of enzymes. Deep learning has the advantage of learning functions that can predict the properties of previously unknown protein sequences. Deep learning-based computational algorithms can intelligently navigate the sequence space and reduce the screening burden during evolution. Thus, intelligent computational design combined with laboratory evolution is a powerful and potentially versatile strategy for developing enzymes with novel functions. Herein, we introduce and summarize deep-learning-assisted enzyme functional adaptive evolution strategies based on recent studies on the application of deep learning in enzyme design and evolution. Altogether, with the developments of technology and the accumulation of data for the characterization of enzyme functions, artificial intelligence may become a powerful tool for the design and evolution of intelligent enzymes in the future.
Keywords
artificial intelligence - deep learning - protein engineering - directed evolution - biopharmaceuticalsPublication History
Received: 19 September 2023
Accepted: 25 June 2024
Article published online:
03 September 2024
© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
Reference
- 1 Devine PN, Howard RM, Kumar R. et al. Extending the application of biocatalysis to meet the challenges of drug development. Nat Rev Chem 2018; 2: 409-421
- 2 Adams JP, Brown MJB, Diaz-Rodriguez A. et al. Biocatalysis: a pharma perspective. Adv Synth Catal 2019; 361: 2421-2432
- 3 Stepan AF, Tran TP, Helal CJ. et al. Late-stage microsomal oxidation reduces drug-drug interaction and identifies phosphodiesterase 2A inhibitor PF-06815189. ACS Med Chem Lett 2018; 9 (02) 68-72
- 4 Charlton SN, Hayes MA. Oxygenating biocatalysts for hydroxyl functionalisation in drug discovery and development. ChemMedChem 2022; 17 (12) e202200115
- 5 Fuchs CS, Farnberger JE, Steinkellner G. et al. Asymmetric amination of α-chiral aliphatic aldehydes via dynamic kinetic resolution to access stereocomplementary brivaracetam and pregabalin precursors. Adv Synth Catal 2018; 360 (04) 768-778
- 6 Ali M, Ishqi HM, Husain Q. Enzyme engineering: reshaping the biocatalytic functions. Biotechnol Bioeng 2020; 117 (06) 1877-1894
- 7 Victorino da Silva Amatto I, Gonsales da Rosa-Garzon N, Antônio de Oliveira Simões F. et al. Enzyme engineering and its industrial applications. Biotechnol Appl Biochem 2022; 69 (02) 389-409
- 8 Campbell E, Kaltenbach M, Correy GJ. et al. The role of protein dynamics in the evolution of new enzyme function. Nat Chem Biol 2016; 12 (11) 944-950
- 9 Curado-Carballada C, Feixas F, Iglesias-Fernández J, Osuna S. Hidden conformations in Aspergillus niger monoamine oxidase are key for catalytic efficiency. Angew Chem Int Ed Engl 2019; 58 (10) 3097-3101
- 10 Petrović D, Risso VA, Kamerlin SCL, Sanchez-Ruiz JM. Conformational dynamics and enzyme evolution. J R Soc Interface 2018; 15 (144) 20180330
- 11 Wrenbeck EE, Azouz LR, Whitehead TA. Single-mutation fitness landscapes for an enzyme on multiple substrates reveal specificity is globally encoded. Nat Commun 2017; 8: 15695
- 12 Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44 (05) 1172-1239
- 13 Obexer R, Godina A, Garrabou X. et al. Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nat Chem 2017; 9 (01) 50-56
- 14 Jiménez-Osés G, Osuna S, Gao X. et al. The role of distant mutations and allosteric regulation on LovD active site dynamics. Nat Chem Biol 2014; 10 (06) 431-436
- 15 Wang Y, Xue P, Cao M, Yu T, Lane ST, Zhao H. Directed Evolution: Methodologies and Applications. Chem Rev 2021; 121 (20) 12384-12444
- 16 Longwell CK, Labanieh L, Cochran JR. High-throughput screening technologies for enzyme engineering. Curr Opin Biotechnol 2017; 48: 196-202
- 17 Jiang S, Zhang L, Yao Z. et al. Switching a nitrilase from Syechocystis sp. PCC6803 to a nitrile hydratase by rationally regulating reaction pathways. Catal Sci Technol 2017; 7 (05) 1122-1128
- 18 Ferguson AL, Ranganathan R. 100th anniversary of macromolecular science viewpoint: data-driven protein design. ACS Macro Lett 2021; 10 (03) 327-340
- 19 Hossack EJ, Hardy FJ, Green AP. Building enzymes through design and evolution. ACS Catal 2023; 13 (19) 12436-12444
- 20 Paladino A, Marchetti F, Rinaldi S, Colombo G. Protein design: from computer models to artificial intelligence. Wiley Interdiscip Rev Comput Mol Sci 2017; 7: e1318
- 21 Yi D, Bayer T, Badenhorst CPS. et al. Recent trends in biocatalysis. Chem Soc Rev 2021; 50 (14) 8003-8049
- 22 Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access 2019; 7: 53040-53065
- 23 Dongare AD, Kharde RR, Kachare AD. Introduction to artificial neural network. Int J Eng Innov Technol 2012; 2 (01) 189-194
- 24 Xu Y, Verma D, Sheridan RP. et al. Deep dive into machine learning models for protein engineering. J Chem Inf Model 2020; 60 (06) 2773-2790
- 25 Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 2019; 31 (07) 1235-1270
- 26 Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 2019; 16 (12) 1315-1322
- 27 Gu J, Wang Z, Kuen J. et al. Recent advances in convolutional neural networks. Pattern Recognit 2018; 77: 354-377
- 28 Li F, Yuan L, Lu H. et al. Deep learning-based k cat prediction enables improved enzyme-constrained model reconstruction. Nat Catal 2022; 5 (08) 662-672
- 29 Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw 2019; 6 (01) 11
- 30 Zhang ZB, Xu MH, Jamasb A. et al. Protein representation learning by geometric structure pretraining. arXiv. Preprint. September 19, 2022. Available from: https://doi.org/10.48550/arXiv.2203.06125
- 31 Pu Y, Gan Z, Henao R. et al. Variational autoencoder for deep learning of images, labels and captions. Paper presented at: Proceedings of the 30th International Conference on Neural Information Processing Systems, December 2016; Barcelona, Spain: 2360–2368
- 32 Hawkins-Hooker A, Depardieu F, Baur S, Couairon G, Chen A, Bikard D. Generating functional protein variants with variational autoencoders. PLOS Comput Biol 2021; 17 (02) e1008736
- 33 Wang K, Gou C, Duan Y. et al. Generative adversarial networks: introduction and outlook. IEEE CAA J Automatic 2017; 4 (04) 588-598
- 34 Wu Z, Johnston KE, Arnold FH, Yang KK. Protein sequence design with deep generative models. Curr Opin Chem Biol 2021; 65: 18-27
- 35 Zhuang F, Qi Z, Duan K. et al. A comprehensive survey on transfer learning. Proc IEEE 2021; 109 (01) 43-76
- 36 Luo Y, Jiang G, Yu T. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 2021; 12 (01) 5743
- 37 Mistry J, Chuguransky S, Williams L. et al Pfam: the protein families database in 2021. Nucleic Acids Res 2021; 49: D412-D419
- 38 Madani A, Krause B, Greene ER. et al. Large language models generate functional protein sequences across diverse families. Nat Biotechnol 2023; 41: 1099-1106
- 39 Church KW. Word2Vec. Nat Lang Eng 2016; 23 (01) 155-162
- 40 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv. Preprint. September 7, 2013. Available from: https://doi.org/10.48550/arXiv.1301.3781
- 41 Jeffrey P, Richard S, Manning C. . GloVe: global vectors for word representation. Paper presented at: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); October 2014; Doha, Qatar: 1532–1543
- 42 Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. . Paper presented at: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics; April 3–7, 2017; Valencia, Spain; Volumn 2: 427–431
- 43 Matthew EP, Mark N, Mohit I. et al. Deep contextualized word representations. . Paper presented at: Proceedings of NAACL-HLT; June 1-6, 2018; New Orleans, Louisiana: 2227–2237
- 44 Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P. Language Models are Few-Shot Learners. Paper presented at: Proceedings of the 34th International Conference on Neural Information Processing Systems; December 2020; Vancouver, Canada: 1877–1901
- 45 Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. . Paper presented at: Proceedings of NAACL-HLT; June 2–7, 2019; Minneapolis, Minnesota: 4171–4186
- 46 Zhou GM, Gao ZF, Ding QK. et al. Uni-Mol: A universal 3D molecular representation learning framework. . Paper presented at: The Eleventh International Conference on Learning Representations; May 1–5, 2023; Kigali, Rwanda
- 47 Cramer P. AlphaFold2 and the future of structural biology. Nat Struct Mol Biol 2021; 28 (09) 704-705
- 48 Feehan R, Montezano D, Slusky JSG. Machine learning for enzyme engineering, selection and design. Protein Eng Des Sel 2021; 34: gzab019
- 49 Ovek D, Abali Z, Zeylan ME, Keskin O, Gursoy A, Tuncbag N. Artificial intelligence based methods for hot spot prediction. Curr Opin Struct Biol 2022; 72: 209-218
- 50 Wittmann BJ, Johnston KE, Wu Z, Arnold FH. Advances in machine learning for directed evolution. Curr Opin Struct Biol 2021; 69: 11-18
- 51 Hie BL, Yang KK. Adaptive machine learning for protein engineering. Curr Opin Struct Biol 2022; 72: 145-152
- 52 UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49 (D1): D480-D489
- 53 Burley SK, Bhikadiya C, Bi C. et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res 2021; 49 (D1): D437-D451
- 54 Nikam R, Kulandaisamy A, Harini K, Sharma D, Gromiha MM. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res 2021; 49 (D1): D420-D424
- 55 Stourac J, Dubrava J, Musil M. et al. FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 2021; 49 (D1): D319-D324
- 56 Yan B, Ran X, Gollu A. et al. IntEnzyDB: an integrated structure-kinetics enzymology database. J Chem Inf Model 2022; 62 (22) 5841-5848
- 57 Drula E, Garron ML, Dogan S, Lombard V, Henrissat B, Terrapon N. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res 2022; 50 (D1): D571-D577
- 58 Adler BA, Trinidad MI, Bellieny-Rabelo D. et al. CasPEDIA Database: a functional classification system for class 2 CRISPR-Cas enzymes. Nucleic Acids Res 2024; 52 (D1): D590-D596 Erratum in: Nucleic Acids Res 2024;52(02):1002
- 59 Li F, Chen Y, Anton M, Nielsen J. GotEnzymes: an extensive database of enzyme parameter predictions. Nucleic Acids Res 2023; 51 (D1): D583-D586
- 60 Yin J, Li F, Zhou Y. et al. INTEDE: interactome of drug-metabolizing enzymes. Nucleic Acids Res 2021; 49 (D1): D1233-D1243
- 61 Ribeiro AJM, Holliday GL, Furnham N, Tyzack JD, Ferris K, Thornton JM. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res 2018; 46 (D1): D618-D623
- 62 Caspi R, Billington R, Keseler IM. et al. The MetaCyc database of metabolic pathways and enzymes - a 2019 update. Nucleic Acids Res 2020; 48 (D1): D445-D453
- 63 Wittig U, Rey M, Weidemann A, Kania R, Müller W. SABIO-RK: an updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Res 2018; 46 (D1): D656-D660
- 64 Nagano N, Nakayama N, Ikeda K. et al. EzCatDB: the enzyme reaction database, 2015 update. Nucleic Acids Res 2015; 43 (Database issue, D1): D453-D458
- 65 Ofer D, Brandes N, Linial M. The language of proteins: NLP, machine learning & protein sequences. Comput Struct Biotechnol J 2021; 19: 1750-1758
- 66 Rives A, Meier J, Sercu T. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A 2021; 118 (15) e2016239118
- 67 Villegas-Morcillo A, Makrodimitris S, van Ham RCHJ, Gomez AM, Sanchez V, Reinders MJT. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function. Bioinformatics 2021; 37 (02) 162-170
- 68 Wang L, Zhao KY. Detecting “protein words” through unsupervised word. F1000Research 2015; 4: 1517
- 69 Asgari E, McHardy AC, Mofrad MRK. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX). Sci Rep 2019; 9 (01) 3577
- 70 Yang KK, Wu Z, Bedbrook CN, Arnold FH. Learned protein embeddings for machine learning. Bioinformatics 2018; 34 (15) 2642-2648
- 71 Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021; 11 (01) 23916
- 72 Väth P, Münch M, Raab C. et al. PROVAL: a framework for comparison of protein sequence embeddings. J Comput Math Data Sci 2022; 3: 100044
- 73 Elnaggar A, Heinzinger M, Dallago C. et al. ProtTrans: towards cracking the language of life's code through self-supervised learning. IEEE Trans Pattern Anal Mach Intell 2021; 14: 8
- 74 Rao R, Bhattacharya N, Thomas N. et al. Evaluating Protein Transfer Learning with TAPE. Paper presented at: Proceedings of the 33rd International Conference on Neural Information Processing Systems; December 2019; Vancouver, Canada: 9689–9701
- 75 Min S, Park S, Kim S. et al. Pre-training of deep bidirectional protein sequence representations with structural information. IEEE Access 2021; 9: 123912-123926
- 76 Webb B, Sali A. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics 2016; 54: 5.6.1-5.6.37
- 77 Buller R, Lutz S, Kazlauskas RJ, Snajdrova R, Moore JC, Bornscheuer UT. From nature to industry: Harnessing enzymes for biocatalysis. Science 2023; 382 (6673): eadh8615
- 78 Jumper J, Evans R, Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021; 596 (7873): 583-589
- 79 Perrakis A, Sixma TK. AI revolutions in biology: the joys and perils of AlphaFold. EMBO Rep 2021; 22 (11) e54046
- 80 Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. Before and after AlphaFold2: an overview of protein structure prediction. Front Bioinform 2023; 3: 1120370
- 81 Lin Z, Akin H, Rao R. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023; 379 (6637): 1123-1130
- 82 Gao Z, Tan C, Wu L, Stan Z. CoSP: Co-supervised pretraining of pocket and ligan. . Paper presented at: the next European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases; September 18–22, 2023; Turin, Italy
- 83 Zhang Z, Xu M, Jamasb A. et al . Protein representation learning by geometric structure pretraining. Paper presented at: The Eleventh International Conference on Learning Representations; May 1–5, 2023; Kigali, Rwanda
- 84 Torng W, Altman RB. High precision protein functional site detection using 3D convolutional neural networks. Bioinformatics 2019; 35 (09) 1503-1512
- 85 Koohi-Moghadam M, Wang H, Wang Y. et al. Predicting disease-associated mutation of metal-binding sites in proteins using a deep learning approach. Nat Mach Intell 2019; 1 (12) 561-567
- 86 Mansoor S, Baek M, Madan U, Horvitz E. Toward more general embeddings for protein design harnessing joint representations of sequence and structure. bioRxiv. Preprint. September 1, 2021. Available from: https://doi.org/10.1101/2021.09.01.458592
- 87 Wang Z, Combs SA, Brand R. et al. LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction. Sci Rep 2022; 12 (01) 6832
- 88 Wu Z, Kan SBJ, Lewis RD, Wittmann BJ, Arnold FH. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc Natl Acad Sci USA 2019; 116 (18) 8852-8858
- 89 Shroff R, Cole AW, Diaz DJ. et al. Discovery of novel gain-of-function mutations guided by structure-based deep learning. ACS Synth Biol 2020; 9 (11) 2927-2935
- 90 Lu H, Diaz DJ, Czarnecki NJ. et al. Machine learning-aided engineering of hydrolases for PET depolymerization. Nature 2022; 604 (7907): 662-667
- 91 Thean DGL, Chu HY, Fong JHC. et al. Machine learning-coupled combinatorial mutagenesis enables resource-efficient engineering of CRISPR-Cas9 genome editor activities. Nat Commun 2022; 13 (01) 2219
- 92 Biswas S, Khimulya G, Alley EC, Esvelt KM, Church GM. Low-N protein engineering with data-efficient deep learning. Nat Methods 2021; 18 (04) 389-396
- 93 Strokach A, Kim PM. Deep generative modeling for protein design. Curr Opin Struct Biol 2022; 72: 226-236
- 94 Osadchy M, Kolodny R. How deep learning tools can help protein engineers find good sequences. J Phys Chem B 2021; 125 (24) 6440-6450
- 95 Greener JG, Moffat L, Jones DT. Design of metalloproteins and novel protein folds using variational autoencoders. Sci Rep 2018; 8 (01) 16189
- 96 Trinquier J, Uguzzoni G, Pagnani A, Zamponi F, Weigt M. Efficient generative modeling of protein sequences using simple autoregressive models. Nat Commun 2021; 12 (01) 5800
- 97 Shin JE, Riesselman AJ, Kollasch AW. et al. Protein design and variant prediction using autoregressive generative models. Nat Commun 2021; 12 (01) 2403
- 98 Castro E, Godavarthi A, Rubinfien J. et al. Transformer-based protein generation with regularized latent space optimization. Nat Mach Intell 2022; 4 (10) 840-851
- 99 Lobzaev E, Herrera MA, Campopiano DJ. et al. Designing human Sphingosine-1-phosphate lyases using a temporal Dirichlet variational autoencoder. bioRxiv. Preprint. February 15, 2022.
- 100 Giessel A, Dousis A, Ravichandran K. et al. Therapeutic enzyme engineering using a generative neural network. Sci Rep 2022; 12 (01) 1536
- 101 Repecka D, Jauniskis V, Karpus L. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 2021; 3 (04) 324-333
- 102 Vaswani A, Shazeer N, Parmar N. et al. Attention is all you need. Paper presented at: Proceedings of the 31st International Conference on Neural Information Processing Systems; December 2017; Long Beach, CA, United States: 6000–6010
- 103 Sevgen E, Moller J, Lange A. et al. ProT-VAE: protein transformer variational autoencoder for functional protein design. bioRxiv. Preprint. January 23, 2023. Available from: https://doi.org/10.1101/2023.01.23.525232