Rendering protein mutation movies with MutAmore

Weissenow, Konstantin; Rost, Burkhard

doi:10.1186/s12859-023-05610-8

Software
Open access
Published: 12 December 2023

Rendering protein mutation movies with MutAmore

Konstantin Weissenow^1,2 &
Burkhard Rost^1,3,4

BMC Bioinformatics volume 24, Article number: 469 (2023) Cite this article

862 Accesses
2 Altmetric
Metrics details

Abstract

Background

The success of AlphaFold2 in reliable protein three-dimensional (3D) structure prediction, assists the move of structural biology toward studies of protein dynamics and mutational impact on structure and function. This transition needs tools that qualitatively assess alternative 3D conformations.

Results

We introduce MutAmore, a bioinformatics tool that renders individual images of protein 3D structures for, e.g., sequence mutations into a visually intuitive movie format. MutAmore streamlines a pipeline casting single amino-acid variations (SAVs) into a dynamic 3D mutation movie providing a qualitative perspective on the mutational landscape of a protein. By default, the tool first generates all possible variants of the sequence reachable through SAVs (L*19 for proteins with L residues). Next, it predicts the structural conformation for all L*19 variants using state-of-the-art models. Finally, it visualizes the mutation matrix and produces a color-coded 3D animation. Alternatively, users can input other types of variants, e.g., from experimental structures.

Conclusion

MutAmore samples alternative protein configurations to study the dynamical space accessible from SAVs in the post-AlphaFold2 era of structural biology. As the field shifts towards the exploration of alternative conformations of proteins, MutAmore aids in the understanding of the structural impact of mutations by providing a flexible pipeline for the generation of protein mutation movies using current and future structure prediction models.

Peer Review reports

Background

AI has changed structural biology and its impact

The remarkable success of AlphaFold2 [1] in effectively predicting protein three-dimensional (3D) structure from sequence is shifting paradigms in structural biology and beyond. AlphaFold2 combines advanced Artificial Intelligence (AI) with evolutionary information from multiple sequence alignments (MSAs). Reliable 3D predictions for over 200 million proteins [2] have begun to change molecular biology. Concurrently, protein Language Models (pLMs) have emerged as a new approach to represent protein sequences [3,4,5]. Downstream prediction methods based on pLM embeddings (vectors representing the last hidden layers of the pLMs: Fig. 1 [4]) find new solutions, e.g., in structure prediction methods such as ESMFold [6]. Embeddings allow predictions at unprecedented speed [7, 8], often outperforming state-of-the-art methods [9] using the combination of evolutionary information introduced three decades ago for secondary structure prediction [10] and eclipsed by AlphaFold2.

From available 3D model to dynamics?

In the post-AlphaFold2 era, structural biology moves toward a deeper exploration of alternative protein conformations and mutational landscapes [11]. Many proteins significantly change function upon minor sequence changes [12,13,14]. Understanding these changes will be critical to unlocking the complexities of protein function, protein evolution, and disease progression at the molecular level.

Despite its immense success, AlphaFold2 often fails to correctly predict the effect of missense mutations upon 3D structure [7, 8, 15]. Even if it correctly captured such effects, it would still be too resource-intensive to be used to explore the mutational space of an average protein, e.g., by evaluating all possible single amino-acid substitutions (SAVs) even for short proteins [7, 8]. Structure predictions based on pLMs such as ESMFold or EMBER2 [7] offer both the necessary speed and promise higher sensitivity to small changes in input sequences (Fig. 2 [8]).

Many methods have been developed to predict the effects of sequence variants upon protein function, including SIFT [16], PolyPhen [17], SNAP2 [18], GEMME [19], DeepSequence [20], Packpred [21], Tranception [22] and VESPA [23] (for a longer list: [22, 24]). On the other hand, tools such as I-Mutant [25, 26], FoldX [27], PoPMuSiC [28], DUET [29], or INPS-MD [30] aim at predicting the structural impact of mutations by providing estimates for changes in stability, folding or dynamics, which greatly helps our understanding of disease emergence. Despite convincing examples for how to use the numerical output from such methods to rationalize on static images about possible dynamical changes [21, 29, 31], none of these methods directly displays the actual structural change in the 3D conformation.

Imagine we had fast and accurate structure predictions for all L*19 SAV mutants of a protein with L residues. How to visualize those data? Today, no comprehensive and accessible visualization tools for alternative protein conformations are available. To fill this void, we introduced MutAmore (MutAtion movie renderer), a tool that provides a pipeline rendering a mutated protein sequence into a dynamic 3D protein mutation movie (PMM), thereby making the analysis of mutational landscapes accessible and tangible. The implementation and results presented highlight the potential of MutAmore to fill a growing need within the structural biology field. Its efficient visualizations could aid research of protein dynamics, function, evolution, and disease mechanisms.

Implementation

MutAmore is designed to create an animated visualization of the mutational landscape of proteins. Inputting a protein amino acid sequence (or a set thereof) as a FASTA file [32], MutAmore first generates all possible single amino acid variants (SAVs) for this sequence(s) and uses a structure prediction model to predict 3D structure for each SAV (Fig. 1). We provide a ready-made interface to ESMFold [6] and ColabFold/AlphaFold2 [33] along with documentation on how to easily use any other prediction model inputting FASTA files and outputting PDB files. Optionally, a user can provide experimental structures for some of the mutants to MutAmore. These variants are then skipped during the prediction stage.

MutAmore computes a mutation profile by assessing the structural divergences between each mutated protein and the (predicted) wild-type structure. The local Distance Difference Test (lDDT) [34] tallies scores across all residues to derive a single score for each structure pair (typically wild-type/native vs. SAV). Structures similar to the wild-type have scores near one while divergent structures approach zero. These data are then visualized as a mutation matrix (19 × protein length) using Python-Pillow [35].

3D visualizations of all variants are created via the PyMOL API [36] and color-coded to indicate predicted confidence levels (blue for high confidence, yellow–red for low confidence; following the AlphaFold2 standard). All recent 3D structure prediction methods include predicted lDDT values as confidence scores, including systems which do not build on top of the AlphaFold2 architecture, such as RoseTTAFold [37] and EMBER3D [8]. PyMOL aligns all 19*L SAV structures to the original wild-type prediction prior to rendering to maintain uniformity in viewing angles in the final animated visualization.

MutAmore then assembles the final frames of the animation in residue index order (from first at the N-terminus to the last at the C-terminus), merging the 3D renderings with the mutation profile to create the PMM with ffmpeg [38]. Rendered at a rate of 19 frames per second, each residue remains in focus for one second and shows all potential SAVs for this position in the protein. The mutated position in each frame is rendered in black in both the 3D visualization and the mutation profile. Details of the SAVs displayed in each frame are indicated in the top-left corner through the standard single-letter amino acid code: XnY meaning that the wild-type amino acid X at residue position n is mutated to amino acid Y (Fig. 2: K76H).

When users provide experimental structures—or otherwise their own labels—in a PDB-formatted file, the visual output highlights the differences between predictions and experimental models by showing the latter at full opacity while applying slight transparency to the former, giving the visual impression of “filling in the gaps” between known structures with predictions.

Users can adapt the resolution of the final animation to their needs, e.g., choosing high-quality for publications or lower resolution clips for web sharing. The mutation profile automatically scales to the specified vertical resolution for optimal visual interpretability. The 3D visualization rendered by PyMOL automatically chooses a zoom level, which allows enough space to accommodate structural changes caused by mutations. MutAmore lets experienced users override the zoom level manually where needed.

MutAmore also lets users render subsets of the most impactful mutations, e.g., top-50: those with the highest effect upon 3D. In this mode, the framerate is slightly reduced (slowed down) for better visual comparison.

Many advanced structure prediction systems require robust and substantial GPU resources. Therefore, MutAmore provides an option for a two-step process: computation of all mutation predictions using a server machine, and subsequent processing and rendering of the animated visualization on a desktop computer.

Results

We evaluated MutAmore on an Intel Xeon Gold 6248 CPU with a NVidia Quadro RTX 8000 GPU (48 GB) using twelve proteins ranging in lengths from 72 to 639 residues, including both globular and membrane proteins. Although AlphaFold2 [1] or its faster spin-off ColabFold [33] outperform ESMFold [6], the latter seems slightly better at capturing the effects of SAVs upon structure [7, 8]. We also used ESMFold to predict 3D structures because it is substantially faster which mattered for the 55,879 SAVs in all twelve samples. Then we generated movies at the default resolution of 1280 × 720 (720p) and 3840 × 2160 (4 K, Table 1).

Table 1 Benchmark of MutAmore in 720p and 4 K resolution for twelve proteins

Full size table

The time required for structure predictions substantially varies with the method used. ColabFold, utilized for the five smallest proteins only, accumulated over a week of GPU time, given the non-linear scale with protein length, this rapidly becomes infeasible for longer proteins. ESMFold obtained predictions for all twelve proteins in about 16 days, with the bulk of the computation time dedicated to the longest samples (11 days for the protein with 639 residues). In contrast, the predictions for the nine smaller proteins were computed by ESMFold in 30 h. The tremendous increase in runtime by protein length (Additional file 1: Fig. S13) is due to GPU memory limitations. Structure prediction systems such as ESMFold compute multiple samples in parallel, to fill memory as efficiently as possible. For shorter proteins, this allows batches of up to dozens of simultaneous predictions. Longer sequences require more computation time and more memory. This limits the number of samples that can be processed concurrently, leading to an exponential increase in total runtime. Overall, these numbers highlight the need for future structure predictions with both increased speed and memory efficiency to properly explore the mutational landscape of longer proteins.

Given the 3D predictions, creating the animated visualizations is considerably faster. The majority of MutAmore’s processing time is devoted to the rendering of 3D visualizations, followed by the composition of final frames.

After generating structure predictions on our server hardware, we applied MutAmore’s rendering pipeline on a consumer grade laptop with an Intel Core i7 6700HQ CPU to compare performance with the server environment. Performance decreased by roughly 30%, showing that MutAmore fits to readily available hardware, at least for shorter proteins.

Enhancing the resolution required additional runtime. Rendering at 4 K increased the processing time for the 3D visualization, compositing, and rendering by a factor of three to seven over rendering at 720p (Table 1), but generally less than the increase in amount of pixels (4 K/720p = 9×). The time for computing structural similarity and for generating the profile did not differ much between 720p and 4 K (Table 1). We provide a detailed breakdown of the runtime of all pipeline steps on the twelve individual samples in the Additional file 1: Tables S1–S12. Even at 4 K, the total processing time for MutAmore remained substantially below that needed to predict structures even with ESMFold for all but the shortest protein sequences (Additional file 1: Fig. S13).

Limitations

A profound consequence of all attempts to visualize the dynamics of 3D objects lies in the obstruction of internal parts. For instance, buried residues and changes of local regions around these will remain obscured from our 3D movies. For some proteins, such as beta barrels, a cleverly chosen viewing angle might provide a better perspective, but globular proteins do not provide such an alternative. Transparency might address such issues, but transparency tends to cause a lack of depth perception and too much visual clutter to clearly comprehend the visual information being presented. Thus, it remains unclear how to show internal structural changes of proteins in a visually concise manner apart from going back to two-dimensional distance maps, which are only intuitive to a well-trained structural expert.

Another implicit limitation for using 3D predictions for all 19-non-native SAVs is in the substantial demand on computing resources of today’s prediction methods. This becomes particularly challenging for long proteins (Additional file 1: Fig. S13), MutAmore would greatly benefit from future prediction systems that are leaner without having to sacrifice performance [8].

Too few high-resolution experimental structures establish the effect of point mutations upon 3D to evaluate how well prediction methods capture SAV effects. Proxying deep mutational scanning data [39] might suffice to establish correlations between observed and predicted impact upon function without probing how well methods predict the effect of SAVs upon 3D structure and dynamics. Once more experimental data of variant structures will be available, MutAmore could be extended to serve as a benchmarking tool for the sensitivity of structure prediction.

A webserver for MutAmore might ease the access for users with less experience in computational biology. While we are looking for the resources to realize such a project, we provide a Google Colab Notebook linked on the MutAmore website and allows the creation of PMM’s without the need of a local installation.

Conclusions

We designed MutAmore to bridge a crucial gap in the post-AlphaFold era. By rendering conceivable and visually comprehensible protein mutation movies (PMMs) of single amino-acid substitutions (SAVs), MutAmore enhances the exploration and understanding of alternative protein conformations brought on by mutations. This is particularly significant in translating the depth and complexity of the protein mutational landscape. Our benchmark demonstrated the efficiency and versatility of MutAmore even for high-definition 4 K video settings. The tool effectively balances computational load by allowing multi-step operation across multiple systems, ensuring usability across varied system capabilities. We hope that structural biology increasingly shifts toward routine analysis of alternate protein conformations. MutAmore supports such a more dynamic perspective on protein structures and might aid analyzing protein function and studying protein evolution.

Availability

MutAmore is publicly available and is free for all users.

Project name: MutAmore.

Project home page: https://github.com/kWeissenow/MutAmore

Operating systems: Linux.

Programming language: Python.

Other requirements: ffmpeg 4.1 or higher, PyMOL 2.2 or higher.

License: MIT.

Any restrictions to use by non-academics: None.

Availability of data and materials

The benchmark dataset analyzed during the current study is available in the MutAmore Github repository: https://github.com/kWeissenow/MutAmore/blob/main/benchmark/benchmark_set.fasta.

Abbreviations

3D:: Three-dimensional (coordinates)
3D structure:: Three-dimensional coordinates of protein structure
AI:: Artificial Intelligence
API:: Application programing interface
Embeddings:: Fixed-size vectors derived from pre-trained pLMs
GPU:: Graphical processing unit
PDB:: Protein Data Bank
PIDE:: Percentage pairwise sequence identity
pLM:: Protein Language Model
PMM:: Protein mutation movie

References

Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Varadi M, et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2021;50(D1):D439–44.
Article PubMed Central Google Scholar
Rao R, et al. Transformer protein language models are unsupervised structure learners. bioRxiv, 2020: p. 2020.12.15.422761.
Elnaggar A, et al. ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE Trans Pattern Anal Mach Intell. 2021.
Heinzinger M, et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform. 2019;20(1):723.
Article CAS Google Scholar
Lin Z, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022: p. 2022.07.20.500902.
Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure. 2022;30(8):1169-1177.e4.
Article CAS PubMed Google Scholar
Weissenow K, et al. Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies. bioRxiv, 2022: p. 2022.11.14.516473.
Bordin N, et al. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci. 2023;48(4):345–59.
Article CAS PubMed PubMed Central Google Scholar
Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993;232:584–99.
Article CAS PubMed Google Scholar
Sala D, et al. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol. 2023;81: 102645.
Article CAS PubMed Google Scholar
Vedithi SC, et al. Structural implications of mutations conferring rifampin resistance in Mycobacterium leprae. Sci Rep. 2018;8(1):5016.
Article PubMed PubMed Central Google Scholar
Portelli S, et al. Understanding molecular consequences of putative drug resistant mutations in Mycobacterium tuberculosis. Sci Rep. 2018;8(1):15356.
Article PubMed PubMed Central Google Scholar
Gerasimavicius L, Livesey BJ, Marsh JA. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun. 2022;13(1):3895.
Article CAS PubMed PubMed Central Google Scholar
Buel GR, Walters KJ. Can AlphaFold2 predict the impact of missense mutations on structure? Nat Struct Mol Biol. 2022;29(1):1–2.
Article CAS PubMed Google Scholar
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–4.
Article CAS PubMed PubMed Central Google Scholar
Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;7:7–20.
Google Scholar
Hecht M, Bromberg Y, Rost B. Better prediction of functional effects for sequence variants. BMC Genom. 2015;16(8):S1.
Article Google Scholar
Laine E, Karami Y, Carbone A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol Biol Evol. 2019;36(11):2604–19.
Article CAS PubMed PubMed Central Google Scholar
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods. 2018;15(10):816–22.
Article CAS PubMed PubMed Central Google Scholar
Tan KP, et al. Packpred: predicting the functional effect of missense mutations. Front Mol Biosci. 2021;8: 646288.
Article CAS PubMed PubMed Central Google Scholar
Notin P, Dias M, Frazer J, Hurtado JM, Gomez AN, Marks D, Gal Y. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In: Proceedings of the 39th international conference on machine learning. PMLR.
Marquet C, et al. Embeddings from protein language models predict conservation and variant effects. Hum Genet. 2022;141(10):1629–47.
Article CAS PubMed Google Scholar
Livesey BJ, Marsh JA. Updated benchmarking of variant effect predictors using deep mutational scanning. Mol Syst Biol. 2023;19(8): e11474.
Article PubMed PubMed Central Google Scholar
Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33:W306–10.
Article CAS PubMed PubMed Central Google Scholar
Capriotti E, et al. A three-state prediction of single point mutations on protein stability changes. BMC Bioinform. 2008;9(2):S6.
Article Google Scholar
Schymkowitz J, et al. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–8.
Article CAS PubMed PubMed Central Google Scholar
Dehouck Y, et al. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinform. 2011;12(1):151.
Article Google Scholar
Pandurangan AP, Blundell TL. Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning. Prot Sci. 2020;29(1):247–57.
Article CAS Google Scholar
Savojardo C, et al. INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics. 2016;32(16):2542–4.
Article CAS PubMed Google Scholar
Hecht M, Bromberg Y, Rost B. News from the protein mutability landscape. J Mol Biol. 2013;425(21):3937–48.
Article CAS PubMed Google Scholar
Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988;85(8):2444–8.
Article CAS PubMed PubMed Central Google Scholar
Mirdita M, et al. ColabFold—making protein folding accessible to all. bioRxiv, 2021: p. 2021.08.15.456425.
Mariani V, et al. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics. 2013;29(21):2722–8.
Article CAS PubMed PubMed Central Google Scholar
Clark, A., Python-Pillow. 2010.
Schrodinger, LLC, The PyMOL Molecular Graphics System, Version 1.8. 2015.
Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–6.
Article CAS PubMed PubMed Central Google Scholar
Tomar S. Converting video formats with FFmpeg. Linux J. 2006;2006(146):10.
Google Scholar
Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801–7.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Thanks to Tim Karl (TUM) for invaluable help with hardware and software; to Nikita Kugut (TUM) for support with many other aspects of this work; to Michael Heinzinger (TUM) for helpful discussions and comments on work and manuscript. Furthermore, we thank all those who make experimental and predicted structures and other resources publicly available, in particular, thanks to DeepMind (AlphaFold2) and Meta (ESMFold), as well as to the creators and community of PyMOL and ffmpeg.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work was supported by the Alexander von Humboldt Foundation (BMBF), and by the German Research Foundation (DFG–GZ: RO1320/4–1).

Author information

Authors and Affiliations

Department of Informatics, Bioinformatics and Computational Biology i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching, Munich, Germany
Konstantin Weissenow & Burkhard Rost
TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Konstantin Weissenow
Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching, Munich, Germany
Burkhard Rost
TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
Burkhard Rost

Authors

Konstantin Weissenow
View author publications
You can also search for this author in PubMed Google Scholar
Burkhard Rost
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KW designed, implemented and benchmarked the software in this work and drafted the manuscript. BR supervised the work and substantially revised the manuscript.

Corresponding author

Correspondence to Konstantin Weissenow.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. In figures S1-S12 we show screenshots of all protein mutation movies used for benchmarking. Corresponding tables S1-S12 indicate runtimes for all individual samples, including prediction time using ESMFold and all MutAmore rendering pipeline steps. Figure S13 visualizes the scaling of runtime for prediction and rendering steps with sequence length.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Weissenow, K., Rost, B. Rendering protein mutation movies with MutAmore. BMC Bioinformatics 24, 469 (2023). https://doi.org/10.1186/s12859-023-05610-8

Download citation

Received: 17 September 2023
Accepted: 08 December 2023
Published: 12 December 2023
DOI: https://doi.org/10.1186/s12859-023-05610-8

Rendering protein mutation movies with MutAmore