Leveraging Machine Learning for Sampling Rare Events in Biomolecular Systems

Cover
Start: 17 Sep 2024
Organized by: Martin Girard (Max-Planck-Institut für Polymerforschung), Oleksandra Kukharenko (Max Planck Institute for Polymer Research), Leonardo Medrano Sandonas (Technische Universität Dresden), Adolfo Poma Bernaola (Institute of Fundamental Technological Research Polish Academy of Sciences)

Biomolecular simulations have enriched our physical and chemical understanding of the structure-function relationship of large biomolecular assemblies in crowded cellular environments. An essential feature of those systems is their multicomponent nature and the interplay between different length and time scales for the emergence of complexity. The time scales of biological processes involving, e.g., unfolding of proteins, transition between different metastable states in the intrinsically disordered proteins among other events are in the range of μs-ms. Thus, they are orders of magnitude slower than typical molecular motion (i.e. fs-ps) captured by all-atom molecular dynamics (AA-MD). The length scales of conformational rearrangements are also much smaller in AA-MD simulation than they would be for studying processes involving large structural changes in biological systems. In this regard, a structure-based coarse-grained (CG) approach enables reaching the realistic time and length scales relevant to the experimentally observed phenomena, while maintaining a molecular-level model of the systems under consideration [1-3]. Yet some large conformational changes [4,5] spanning several orders are not mapped accurately. In addition, standard AA-MD simulations can hardly target system sizes larger than ~500 Million atomistic particles [6] (biological length scale of interest), which can only be simulated in a few HPC clusters worldwide. Hence, it is evident the need to construct biological models from bottom-up approaches based on quantum mechanics (QM), AA-MD, and CG-MD that allow us to push the boundaries and limitations of biomolecular simulations.
In this context, the coupling with a new paradigm in computer simulation based on machine learning (ML) could lead to the foundation of a new way to perform biomolecular simulations [7-9] and automate many aspects of CG modelling [10]. Thus, advances in ML herald a new era in understanding rare events in biomolecular systems. One of the promising applications is the construction of ML-based force fields (FFs), intending to narrow the gap between the accuracy of QM methods and the efficiency of classical FFs [11,12]. Recent works have shown that it is feasible to capture and integrate quantum effects (e.g., electrostatics, polarisation, and dispersion) into full atomistic MLFFs [13], ML bottom-up coarse-grained model [14-17] and hybrid ML-based QM/MM model [18] for gaining dynamical insights into biomolecular systems. While these models exhibit superior accuracy in predicting thermodynamic properties, folding-unfolding processes, and offer faster performance compared to QM-based MD simulations, there are certain limitations [10] in their transferability and scalability, as well as their reliance on extensive training data. Solving these shortcomings would facilitate precise free energy calculations [19] to gain a deeper understanding of rare events and, accordingly, to put this knowledge to the ultimate test and suggest new experiments. By gathering young researchers as well as world-class scientists who have spearheaded groundbreaking advancements in the fields of ML and multiscale simulations, our primary objective is to cultivate dynamic discussions that could lead to novel ideas for the systematic coupling of the methods ML/AA/CG. We strive to contribute to the development of estimators used for assessing results in the sampling of rare events that use ML with the focus on breaking the wall of current biomolecular simulations in length and time scales. Therefore, we envision our conference event as a pivotal milestone that has the potential to ignite new collaborative efforts aimed at further developing ML-based methods for large-scale MD simulations of intricate biological systems.

[1] P. Souza, R. Alessandri, J. Barnoud, S. Thallmair, I. Faustino, F. Grünewald, I. Patmanidis, H. Abdizadeh, B. Bruininks, T. Wassenaar, P. Kroon, J. Melcr, V. Nieto, V. Corradi, H. Khan, J. Domański, M. Javanainen, H. Martinez-Seara, N. Reuter, R. Best, I. Vattulainen, L. Monticelli, X. Periole, D. Tieleman, A. de Vries, S. Marrink, Nat. Methods., 18, 382-388 (2021)
[2] S. Ołdziej, C. Czaplewski, A. Liwo, M. Chinchio, M. Nanias, J. Vila, M. Khalili, Y. Arnautova, A. Jagielska, M. Makowski, H. Schafroth, R. Kaźmierkiewicz, D. Ripoll, J. Pillardy, J. Saunders, Y. Kang, K. Gibson, H. Scheraga, Proc. Natl. Acad. Sci. U.S.A., 102, 7547-7552 (2005)
[3] L. Darré, M. Machado, A. Brandner, H. González, S. Ferreira, S. Pantano, J. Chem. Theory Comput., 11, 723-739 (2015)
[4] A. Poma, M. Cieplak, P. Theodorakis, J. Chem. Theory Comput., 13, 1366-1374 (2017)
[5] Z. Liu, R. Moreira, A. Dujmović, H. Liu, B. Yang, A. Poma, M. Nash, Nano Lett., 22, 179-187 (2021)
[6] A. Dommer, L. Casalino, F. Kearns, M. Rosenfeld, N. Wauer, S. Ahn, J. Russo, S. Oliveira, C. Morris, A. Bogetti, A. Trifan, A. Brace, T. Sztain, A. Clyde, H. Ma, C. Chennubhotla, H. Lee, M. Turilli, S. Khalid, T. Tamayo-M


Share

Leveraging Machine Learning for Sampling Rare Events in Biomolecular Systems

Cover
Start: 17 Sep 2024
Organized by: Martin Girard (Max-Planck-Institut für Polymerforschung), Oleksandra Kukharenko (Max Planck Institute for Polymer Research), Leonardo Medrano Sandonas (Technische Universität Dresden), Adolfo Poma Bernaola (Institute of Fundamental Technological Research Polish Academy of Sciences)

Biomolecular simulations have enriched our physical and chemical understanding of the structure-function relationship of large biomolecular assemblies in crowded cellular environments. An essential feature of those systems is their multicomponent nature and the interplay between different length and time scales for the emergence of complexity. The time scales of biological processes involving, e.g., unfolding of proteins, transition between different metastable states in the intrinsically disordered proteins among other events are in the range of μs-ms. Thus, they are orders of magnitude slower than typical molecular motion (i.e. fs-ps) captured by all-atom molecular dynamics (AA-MD). The length scales of conformational rearrangements are also much smaller in AA-MD simulation than they would be for studying processes involving large structural changes in biological systems. In this regard, a structure-based coarse-grained (CG) approach enables reaching the realistic time and length scales relevant to the experimentally observed phenomena, while maintaining a molecular-level model of the systems under consideration [1-3]. Yet some large conformational changes [4,5] spanning several orders are not mapped accurately. In addition, standard AA-MD simulations can hardly target system sizes larger than ~500 Million atomistic particles [6] (biological length scale of interest), which can only be simulated in a few HPC clusters worldwide. Hence, it is evident the need to construct biological models from bottom-up approaches based on quantum mechanics (QM), AA-MD, and CG-MD that allow us to push the boundaries and limitations of biomolecular simulations.
In this context, the coupling with a new paradigm in computer simulation based on machine learning (ML) could lead to the foundation of a new way to perform biomolecular simulations [7-9] and automate many aspects of CG modelling [10]. Thus, advances in ML herald a new era in understanding rare events in biomolecular systems. One of the promising applications is the construction of ML-based force fields (FFs), intending to narrow the gap between the accuracy of QM methods and the efficiency of classical FFs [11,12]. Recent works have shown that it is feasible to capture and integrate quantum effects (e.g., electrostatics, polarisation, and dispersion) into full atomistic MLFFs [13], ML bottom-up coarse-grained model [14-17] and hybrid ML-based QM/MM model [18] for gaining dynamical insights into biomolecular systems. While these models exhibit superior accuracy in predicting thermodynamic properties, folding-unfolding processes, and offer faster performance compared to QM-based MD simulations, there are certain limitations [10] in their transferability and scalability, as well as their reliance on extensive training data. Solving these shortcomings would facilitate precise free energy calculations [19] to gain a deeper understanding of rare events and, accordingly, to put this knowledge to the ultimate test and suggest new experiments. By gathering young researchers as well as world-class scientists who have spearheaded groundbreaking advancements in the fields of ML and multiscale simulations, our primary objective is to cultivate dynamic discussions that could lead to novel ideas for the systematic coupling of the methods ML/AA/CG. We strive to contribute to the development of estimators used for assessing results in the sampling of rare events that use ML with the focus on breaking the wall of current biomolecular simulations in length and time scales. Therefore, we envision our conference event as a pivotal milestone that has the potential to ignite new collaborative efforts aimed at further developing ML-based methods for large-scale MD simulations of intricate biological systems.

[1] P. Souza, R. Alessandri, J. Barnoud, S. Thallmair, I. Faustino, F. Grünewald, I. Patmanidis, H. Abdizadeh, B. Bruininks, T. Wassenaar, P. Kroon, J. Melcr, V. Nieto, V. Corradi, H. Khan, J. Domański, M. Javanainen, H. Martinez-Seara, N. Reuter, R. Best, I. Vattulainen, L. Monticelli, X. Periole, D. Tieleman, A. de Vries, S. Marrink, Nat. Methods., 18, 382-388 (2021)
[2] S. Ołdziej, C. Czaplewski, A. Liwo, M. Chinchio, M. Nanias, J. Vila, M. Khalili, Y. Arnautova, A. Jagielska, M. Makowski, H. Schafroth, R. Kaźmierkiewicz, D. Ripoll, J. Pillardy, J. Saunders, Y. Kang, K. Gibson, H. Scheraga, Proc. Natl. Acad. Sci. U.S.A., 102, 7547-7552 (2005)
[3] L. Darré, M. Machado, A. Brandner, H. González, S. Ferreira, S. Pantano, J. Chem. Theory Comput., 11, 723-739 (2015)
[4] A. Poma, M. Cieplak, P. Theodorakis, J. Chem. Theory Comput., 13, 1366-1374 (2017)
[5] Z. Liu, R. Moreira, A. Dujmović, H. Liu, B. Yang, A. Poma, M. Nash, Nano Lett., 22, 179-187 (2021)
[6] A. Dommer, L. Casalino, F. Kearns, M. Rosenfeld, N. Wauer, S. Ahn, J. Russo, S. Oliveira, C. Morris, A. Bogetti, A. Trifan, A. Brace, T. Sztain, A. Clyde, H. Ma, C. Chennubhotla, H. Lee, M. Turilli, S. Khalid, T. Tamayo-M


Share