Skip to main content

Rendering protein mutation movies with MutAmore

Abstract

Background

The success of AlphaFold2 in reliable protein three-dimensional (3D) structure prediction, assists the move of structural biology toward studies of protein dynamics and mutational impact on structure and function. This transition needs tools that qualitatively assess alternative 3D conformations.

Results

We introduce MutAmore, a bioinformatics tool that renders individual images of protein 3D structures for, e.g., sequence mutations into a visually intuitive movie format. MutAmore streamlines a pipeline casting single amino-acid variations (SAVs) into a dynamic 3D mutation movie providing a qualitative perspective on the mutational landscape of a protein. By default, the tool first generates all possible variants of the sequence reachable through SAVs (L*19 for proteins with L residues). Next, it predicts the structural conformation for all L*19 variants using state-of-the-art models. Finally, it visualizes the mutation matrix and produces a color-coded 3D animation. Alternatively, users can input other types of variants, e.g., from experimental structures.

Conclusion

MutAmore samples alternative protein configurations to study the dynamical space accessible from SAVs in the post-AlphaFold2 era of structural biology. As the field shifts towards the exploration of alternative conformations of proteins, MutAmore aids in the understanding of the structural impact of mutations by providing a flexible pipeline for the generation of protein mutation movies using current and future structure prediction models.

Peer Review reports

Background

AI has changed structural biology and its impact

The remarkable success of AlphaFold2 [1] in effectively predicting protein three-dimensional (3D) structure from sequence is shifting paradigms in structural biology and beyond. AlphaFold2 combines advanced Artificial Intelligence (AI) with evolutionary information from multiple sequence alignments (MSAs). Reliable 3D predictions for over 200 million proteins [2] have begun to change molecular biology. Concurrently, protein Language Models (pLMs) have emerged as a new approach to represent protein sequences [3,4,5]. Downstream prediction methods based on pLM embeddings (vectors representing the last hidden layers of the pLMs: Fig. 1 [4]) find new solutions, e.g., in structure prediction methods such as ESMFold [6]. Embeddings allow predictions at unprecedented speed [7, 8], often outperforming state-of-the-art methods [9] using the combination of evolutionary information introduced three decades ago for secondary structure prediction [10] and eclipsed by AlphaFold2.

Fig. 1
figure 1

MutAmore pipeline. The tool generates mutated (all possible SAVs) versions of the input sequences and predicts 3D structure for each, e.g., using ColabFold [33] or ESMFold [6]. Optionally, experimental structures for mutants can be input as an alternative to predictions. MutAmore computes the structural difference between each mutant and the predicted wild-type structure and renders mutation profiles along with 3D visualizations. After merging both, the tool renders the final protein mutation movie (PMM). MutAmore can be run in a two-step process, e.g., predicting structure (orange) on a server machine and running the rendering steps (blue) on a desktop machine

From available 3D model to dynamics?

In the post-AlphaFold2 era, structural biology moves toward a deeper exploration of alternative protein conformations and mutational landscapes [11]. Many proteins significantly change function upon minor sequence changes [12,13,14]. Understanding these changes will be critical to unlocking the complexities of protein function, protein evolution, and disease progression at the molecular level.

Despite its immense success, AlphaFold2 often fails to correctly predict the effect of missense mutations upon 3D structure [7, 8, 15]. Even if it correctly captured such effects, it would still be too resource-intensive to be used to explore the mutational space of an average protein, e.g., by evaluating all possible single amino-acid substitutions (SAVs) even for short proteins [7, 8]. Structure predictions based on pLMs such as ESMFold or EMBER2 [7] offer both the necessary speed and promise higher sensitivity to small changes in input sequences (Fig. 2 [8]).

Fig. 2
figure 2

Single frame of protein mutation movie (PMM). Left panel: 3D visualization of the prediction for one single amino-acid variant (SAV). The respective SAV is indicated in the top-left corner (residue position 83: native lysin (K) mutated into isoleucine (I)) and the affected residue is rendered in black in the visualization. All other residues are colored by the AlphaFold2 predicted confidence in the AlphaFold2 color scheme (blue: high confidence; yellow–red: low confidence). Right panel: the mutation profile shows the structural difference between mutant and wild-type structure with residue indices running along the vertical axis and substitution amino-acids on the horizontal. The currently displayed SAV in each frame is indicated by a black border. The movie frame shown was rendered in a screen resolution of 3840 × 2160

Many methods have been developed to predict the effects of sequence variants upon protein function, including SIFT [16], PolyPhen [17], SNAP2 [18], GEMME [19], DeepSequence [20], Packpred [21], Tranception [22] and VESPA [23] (for a longer list: [22, 24]). On the other hand, tools such as I-Mutant [25, 26], FoldX [27], PoPMuSiC [28], DUET [29], or INPS-MD [30] aim at predicting the structural impact of mutations by providing estimates for changes in stability, folding or dynamics, which greatly helps our understanding of disease emergence. Despite convincing examples for how to use the numerical output from such methods to rationalize on static images about possible dynamical changes [21, 29, 31], none of these methods directly displays the actual structural change in the 3D conformation.

Imagine we had fast and accurate structure predictions for all L*19 SAV mutants of a protein with L residues. How to visualize those data? Today, no comprehensive and accessible visualization tools for alternative protein conformations are available. To fill this void, we introduced MutAmore (MutAtion movie renderer), a tool that provides a pipeline rendering a mutated protein sequence into a dynamic 3D protein mutation movie (PMM), thereby making the analysis of mutational landscapes accessible and tangible. The implementation and results presented highlight the potential of MutAmore to fill a growing need within the structural biology field. Its efficient visualizations could aid research of protein dynamics, function, evolution, and disease mechanisms.

Implementation

MutAmore is designed to create an animated visualization of the mutational landscape of proteins. Inputting a protein amino acid sequence (or a set thereof) as a FASTA file [32], MutAmore first generates all possible single amino acid variants (SAVs) for this sequence(s) and uses a structure prediction model to predict 3D structure for each SAV (Fig. 1). We provide a ready-made interface to ESMFold [6] and ColabFold/AlphaFold2 [33] along with documentation on how to easily use any other prediction model inputting FASTA files and outputting PDB files. Optionally, a user can provide experimental structures for some of the mutants to MutAmore. These variants are then skipped during the prediction stage.

MutAmore computes a mutation profile by assessing the structural divergences between each mutated protein and the (predicted) wild-type structure. The local Distance Difference Test (lDDT) [34] tallies scores across all residues to derive a single score for each structure pair (typically wild-type/native vs. SAV). Structures similar to the wild-type have scores near one while divergent structures approach zero. These data are then visualized as a mutation matrix (19 × protein length) using Python-Pillow [35].

3D visualizations of all variants are created via the PyMOL API [36] and color-coded to indicate predicted confidence levels (blue for high confidence, yellow–red for low confidence; following the AlphaFold2 standard). All recent 3D structure prediction methods include predicted lDDT values as confidence scores, including systems which do not build on top of the AlphaFold2 architecture, such as RoseTTAFold [37] and EMBER3D [8]. PyMOL aligns all 19*L SAV structures to the original wild-type prediction prior to rendering to maintain uniformity in viewing angles in the final animated visualization.

MutAmore then assembles the final frames of the animation in residue index order (from first at the N-terminus to the last at the C-terminus), merging the 3D renderings with the mutation profile to create the PMM with ffmpeg [38]. Rendered at a rate of 19 frames per second, each residue remains in focus for one second and shows all potential SAVs for this position in the protein. The mutated position in each frame is rendered in black in both the 3D visualization and the mutation profile. Details of the SAVs displayed in each frame are indicated in the top-left corner through the standard single-letter amino acid code: XnY meaning that the wild-type amino acid X at residue position n is mutated to amino acid Y (Fig. 2: K76H).

When users provide experimental structures—or otherwise their own labels—in a PDB-formatted file, the visual output highlights the differences between predictions and experimental models by showing the latter at full opacity while applying slight transparency to the former, giving the visual impression of “filling in the gaps” between known structures with predictions.

Users can adapt the resolution of the final animation to their needs, e.g., choosing high-quality for publications or lower resolution clips for web sharing. The mutation profile automatically scales to the specified vertical resolution for optimal visual interpretability. The 3D visualization rendered by PyMOL automatically chooses a zoom level, which allows enough space to accommodate structural changes caused by mutations. MutAmore lets experienced users override the zoom level manually where needed.

MutAmore also lets users render subsets of the most impactful mutations, e.g., top-50: those with the highest effect upon 3D. In this mode, the framerate is slightly reduced (slowed down) for better visual comparison.

Many advanced structure prediction systems require robust and substantial GPU resources. Therefore, MutAmore provides an option for a two-step process: computation of all mutation predictions using a server machine, and subsequent processing and rendering of the animated visualization on a desktop computer.

Results

We evaluated MutAmore on an Intel Xeon Gold 6248 CPU with a NVidia Quadro RTX 8000 GPU (48 GB) using twelve proteins ranging in lengths from 72 to 639 residues, including both globular and membrane proteins. Although AlphaFold2 [1] or its faster spin-off ColabFold [33] outperform ESMFold [6], the latter seems slightly better at capturing the effects of SAVs upon structure [7, 8]. We also used ESMFold to predict 3D structures because it is substantially faster which mattered for the 55,879 SAVs in all twelve samples. Then we generated movies at the default resolution of 1280 × 720 (720p) and 3840 × 2160 (4 K, Table 1).

Table 1 Benchmark of MutAmore in 720p and 4 K resolution for twelve proteins

The time required for structure predictions substantially varies with the method used. ColabFold, utilized for the five smallest proteins only, accumulated over a week of GPU time, given the non-linear scale with protein length, this rapidly becomes infeasible for longer proteins. ESMFold obtained predictions for all twelve proteins in about 16 days, with the bulk of the computation time dedicated to the longest samples (11 days for the protein with 639 residues). In contrast, the predictions for the nine smaller proteins were computed by ESMFold in 30 h. The tremendous increase in runtime by protein length (Additional file 1: Fig. S13) is due to GPU memory limitations. Structure prediction systems such as ESMFold compute multiple samples in parallel, to fill memory as efficiently as possible. For shorter proteins, this allows batches of up to dozens of simultaneous predictions. Longer sequences require more computation time and more memory. This limits the number of samples that can be processed concurrently, leading to an exponential increase in total runtime. Overall, these numbers highlight the need for future structure predictions with both increased speed and memory efficiency to properly explore the mutational landscape of longer proteins.

Given the 3D predictions, creating the animated visualizations is considerably faster. The majority of MutAmore’s processing time is devoted to the rendering of 3D visualizations, followed by the composition of final frames.

After generating structure predictions on our server hardware, we applied MutAmore’s rendering pipeline on a consumer grade laptop with an Intel Core i7 6700HQ CPU to compare performance with the server environment. Performance decreased by roughly 30%, showing that MutAmore fits to readily available hardware, at least for shorter proteins.

Enhancing the resolution required additional runtime. Rendering at 4 K increased the processing time for the 3D visualization, compositing, and rendering by a factor of three to seven over rendering at 720p (Table 1), but generally less than the increase in amount of pixels (4 K/720p = 9×). The time for computing structural similarity and for generating the profile did not differ much between 720p and 4 K (Table 1). We provide a detailed breakdown of the runtime of all pipeline steps on the twelve individual samples in the Additional file 1: Tables S1–S12. Even at 4 K, the total processing time for MutAmore remained substantially below that needed to predict structures even with ESMFold for all but the shortest protein sequences (Additional file 1: Fig. S13).

Limitations

A profound consequence of all attempts to visualize the dynamics of 3D objects lies in the obstruction of internal parts. For instance, buried residues and changes of local regions around these will remain obscured from our 3D movies. For some proteins, such as beta barrels, a cleverly chosen viewing angle might provide a better perspective, but globular proteins do not provide such an alternative. Transparency might address such issues, but transparency tends to cause a lack of depth perception and too much visual clutter to clearly comprehend the visual information being presented. Thus, it remains unclear how to show internal structural changes of proteins in a visually concise manner apart from going back to two-dimensional distance maps, which are only intuitive to a well-trained structural expert.

Another implicit limitation for using 3D predictions for all 19-non-native SAVs is in the substantial demand on computing resources of today’s prediction methods. This becomes particularly challenging for long proteins (Additional file 1: Fig. S13), MutAmore would greatly benefit from future prediction systems that are leaner without having to sacrifice performance [8].

Too few high-resolution experimental structures establish the effect of point mutations upon 3D to evaluate how well prediction methods capture SAV effects. Proxying deep mutational scanning data [39] might suffice to establish correlations between observed and predicted impact upon function without probing how well methods predict the effect of SAVs upon 3D structure and dynamics. Once more experimental data of variant structures will be available, MutAmore could be extended to serve as a benchmarking tool for the sensitivity of structure prediction.

A webserver for MutAmore might ease the access for users with less experience in computational biology. While we are looking for the resources to realize such a project, we provide a Google Colab Notebook linked on the MutAmore website and allows the creation of PMM’s without the need of a local installation.

Conclusions

We designed MutAmore to bridge a crucial gap in the post-AlphaFold era. By rendering conceivable and visually comprehensible protein mutation movies (PMMs) of single amino-acid substitutions (SAVs), MutAmore enhances the exploration and understanding of alternative protein conformations brought on by mutations. This is particularly significant in translating the depth and complexity of the protein mutational landscape. Our benchmark demonstrated the efficiency and versatility of MutAmore even for high-definition 4 K video settings. The tool effectively balances computational load by allowing multi-step operation across multiple systems, ensuring usability across varied system capabilities. We hope that structural biology increasingly shifts toward routine analysis of alternate protein conformations. MutAmore supports such a more dynamic perspective on protein structures and might aid analyzing protein function and studying protein evolution.

Availability

MutAmore is publicly available and is free for all users.

Project name: MutAmore.

Project home page: https://github.com/kWeissenow/MutAmore

Operating systems: Linux.

Programming language: Python.

Other requirements: ffmpeg 4.1 or higher, PyMOL 2.2 or higher.

License: MIT.

Any restrictions to use by non-academics: None.

Availability of data and materials

The benchmark dataset analyzed during the current study is available in the MutAmore Github repository: https://github.com/kWeissenow/MutAmore/blob/main/benchmark/benchmark_set.fasta.

Abbreviations

3D:

Three-dimensional (coordinates)

3D structure:

Three-dimensional coordinates of protein structure

AI:

Artificial Intelligence

API:

Application programing interface

Embeddings:

Fixed-size vectors derived from pre-trained pLMs

GPU:

Graphical processing unit

PDB:

Protein Data Bank

PIDE:

Percentage pairwise sequence identity

pLM:

Protein Language Model

PMM:

Protein mutation movie

References

  1. Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

  2. Varadi M, et al. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2021;50(D1):D439–44.

    Article  PubMed Central  Google Scholar 

  3. Rao R, et al. Transformer protein language models are unsupervised structure learners. bioRxiv, 2020: p. 2020.12.15.422761.

  4. Elnaggar A, et al. ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing. IEEE Trans Pattern Anal Mach Intell. 2021.

  5. Heinzinger M, et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform. 2019;20(1):723.

    Article  CAS  Google Scholar 

  6. Lin Z, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022: p. 2022.07.20.500902.

  7. Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure. 2022;30(8):1169-1177.e4.

    Article  CAS  PubMed  Google Scholar 

  8. Weissenow K, et al. Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies. bioRxiv, 2022: p. 2022.11.14.516473.

  9. Bordin N, et al. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci. 2023;48(4):345–59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993;232:584–99.

    Article  CAS  PubMed  Google Scholar 

  11. Sala D, et al. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol. 2023;81: 102645.

    Article  CAS  PubMed  Google Scholar 

  12. Vedithi SC, et al. Structural implications of mutations conferring rifampin resistance in Mycobacterium leprae. Sci Rep. 2018;8(1):5016.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Portelli S, et al. Understanding molecular consequences of putative drug resistant mutations in Mycobacterium tuberculosis. Sci Rep. 2018;8(1):15356.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gerasimavicius L, Livesey BJ, Marsh JA. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun. 2022;13(1):3895.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Buel GR, Walters KJ. Can AlphaFold2 predict the impact of missense mutations on structure? Nat Struct Mol Biol. 2022;29(1):1–2.

    Article  CAS  PubMed  Google Scholar 

  16. Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;7:7–20.

    Google Scholar 

  18. Hecht M, Bromberg Y, Rost B. Better prediction of functional effects for sequence variants. BMC Genom. 2015;16(8):S1.

    Article  Google Scholar 

  19. Laine E, Karami Y, Carbone A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol Biol Evol. 2019;36(11):2604–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations. Nat Methods. 2018;15(10):816–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tan KP, et al. Packpred: predicting the functional effect of missense mutations. Front Mol Biosci. 2021;8: 646288.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Notin P, Dias M, Frazer J, Hurtado JM, Gomez AN, Marks D, Gal Y. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In: Proceedings of the 39th international conference on machine learning. PMLR.

  23. Marquet C, et al. Embeddings from protein language models predict conservation and variant effects. Hum Genet. 2022;141(10):1629–47.

    Article  CAS  PubMed  Google Scholar 

  24. Livesey BJ, Marsh JA. Updated benchmarking of variant effect predictors using deep mutational scanning. Mol Syst Biol. 2023;19(8): e11474.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33:W306–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Capriotti E, et al. A three-state prediction of single point mutations on protein stability changes. BMC Bioinform. 2008;9(2):S6.

    Article  Google Scholar 

  27. Schymkowitz J, et al. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Dehouck Y, et al. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinform. 2011;12(1):151.

    Article  Google Scholar 

  29. Pandurangan AP, Blundell TL. Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning. Prot Sci. 2020;29(1):247–57.

    Article  CAS  Google Scholar 

  30. Savojardo C, et al. INPS-MD: a web server to predict stability of protein variants from sequence and structure. Bioinformatics. 2016;32(16):2542–4.

    Article  CAS  PubMed  Google Scholar 

  31. Hecht M, Bromberg Y, Rost B. News from the protein mutability landscape. J Mol Biol. 2013;425(21):3937–48.

    Article  CAS  PubMed  Google Scholar 

  32. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988;85(8):2444–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Mirdita M, et al. ColabFold—making protein folding accessible to all. bioRxiv, 2021: p. 2021.08.15.456425.

  34. Mariani V, et al. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics. 2013;29(21):2722–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Clark, A., Python-Pillow. 2010.

  36. Schrodinger, LLC, The PyMOL Molecular Graphics System, Version 1.8. 2015.

  37. Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Tomar S. Converting video formats with FFmpeg. Linux J. 2006;2006(146):10.

    Google Scholar 

  39. Fowler DM, Fields S. Deep mutational scanning: a new style of protein science. Nat Methods. 2014;11(8):801–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Thanks to Tim Karl (TUM) for invaluable help with hardware and software; to Nikita Kugut (TUM) for support with many other aspects of this work; to Michael Heinzinger (TUM) for helpful discussions and comments on work and manuscript. Furthermore, we thank all those who make experimental and predicted structures and other resources publicly available, in particular, thanks to DeepMind (AlphaFold2) and Meta (ESMFold), as well as to the creators and community of PyMOL and ffmpeg.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work was supported by the Alexander von Humboldt Foundation (BMBF), and by the German Research Foundation (DFG–GZ: RO1320/4–1).

Author information

Authors and Affiliations

Authors

Contributions

KW designed, implemented and benchmarked the software in this work and drafted the manuscript. BR supervised the work and substantially revised the manuscript.

Corresponding author

Correspondence to Konstantin Weissenow.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. In figures S1-S12 we show screenshots of all protein mutation movies used for benchmarking. Corresponding tables S1-S12 indicate runtimes for all individual samples, including prediction time using ESMFold and all MutAmore rendering pipeline steps. Figure S13 visualizes the scaling of runtime for prediction and rendering steps with sequence length.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weissenow, K., Rost, B. Rendering protein mutation movies with MutAmore. BMC Bioinformatics 24, 469 (2023). https://doi.org/10.1186/s12859-023-05610-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-023-05610-8

Keywords