EvoFreq: visualization of the Evolutionary Frequencies of sequence and model data
BMC Bioinformatics volume 20, Article number: 710 (2019)
High throughput sequence data has provided in depth means of molecular characterization of populations. When recorded at numerous time steps, such data can reveal the evolutionary dynamics of the population under study by tracking the changes in genotype frequencies over time. This necessitates a simple and flexible means of visualizing an increasingly complex set of data.
Here we offer EvoFreq as a comprehensive tool set to visualize the evolutionary and population frequency dynamics of clones at a single point in time or as population frequencies over time using a variety of informative methods. EvoFreq expands substantially on previous means of visualizing the clonal, temporal dynamics and offers users a range of options for displaying their sequence or model data.
EvoFreq, implemented in R with robust user options and few dependencies, offers a high-throughput means of quickly building, and interrogating the temporal dynamics of hereditary information across many systems. EvoFreq is freely available via https://github.com/MathOnco/EvoFreq.
Changes in genotype frequencies are often visualized using Muller plots, wherein each polygon represents a genotype (clone), and the thickness of the polygon indicates either the number of individuals with that genotype, or the frequency of the genotype in the total population at each time point. Nesting of genotypes represents evolutionary relationships, i.e. one genotype emerging from within another genotype’s polygon indicates that the former was created by an individual with the latter genotype. Muller plots thus provide an excellent way to visualize how the genetic composition of a population changes over time. As these changes are governed by mutation, selection, drift, and gene flow (evolutionary forces), the visualization of these dynamics facilitates an understanding of which forces are dominating the population.
One example where this form of data visualization has proven insightful is in understanding tumor evolution and subclonal compositions from both mechanistic models and bulk or multi-region sequencing. The hereditary nature of mutations in somatic tissue and cancers allows us to recapitulate, the temporal dynamics associated with mutational events through subclonal reconstruction. To date, two visualization packages exist in R to visualize this temporal data fishplot  and ggmuller . We constructed an alternative library that includes the features of these two libraries while expanding functionalities (see Table 1). EvoFreq offers users ease of implementation and a way to generate many visualizations for assessing the temporal dynamics of clonal Evolutionary Frequencies. EvoFreq is available at https://github.com/MathOnco/EvoFreq.
EvoFreq is built on top of the leading visualization library in R, ggplot2 . Input data can be either clone sizes or frequencies, and in long or wide formats. Given such data, EvoFreq can create Muller plots and/or dendrograms, revealing the clonal dynamics over time. Additional customizations include the ability to color clones based on a user defined attribute (such as fitness), provide custom colors, and label polygons. Due to EvoFreq’s utilization of ggplot2 as the primary, underlying library, further customization to EvoFreq’s plots is possible. Using the optional dependency , users may also create animations of evolving Muller plots and “growing” dendrograms. EvoFreq is capable of visualizing frequency dynamics at a single point in time, as a phylogenetic tree representation, a graph representation, or as a frequency plot over time similar to FishPlot and ggMuller, but with extended capabilities.
At the core of EvoFreq is the ability to visualize relational data structures over time, whether this is from simulations or inferred from data. Here we illustrate the usefullness of generating these results using EvoFreq. Our first example utilizes data generated by West et al. which quantifies how spatial constraints alter the evolutionary trajectory of a single tumor using a passenger-driver model . EvoFreq was used extensively within this publication and a subset of this data is shown in Fig. 1. This figure highlights how the user can manually or automatically add labels to provide informative details of clones, a particularly useful feature of EvoFreq.
Second, we analyze sequence data using three different clonal reconstruction tools from 15 initial engraftments and serial propagation of primary and metastatic breast cancers  and visualize these results using EvoFreq. We applied ClonEvol , PhyloWGS , and CALDER  to infer the clonal dynamics from each of the longitudinal xenografts (select inferences are illustrated in Fig. 2). Initial processing and reformatting of the somatic single nucleotide variant (SNV), copy-number alteration (CNA), and loss of heterozygosity (LOH) data from whole-genome shotgun sequences (WGSS) and Affymetrix SNP Array 6.0 was carried out to extract read data using custom python scripts and prepare inputs for PhyloWGS and CALDER. Each of these tools requires different pre-processing and infers subclonal reconstructions in different ways. We have incorporated functions within EvoFreq to parse outputs of each of these tools to visualize inferred clonal dynamics using EvoFreq. When the output provides numerous solutions for subclonal reconstructions, the user has an option of returning one or all of these solutions. Examples of this process have also been provided within EvoFreq’s documentation.
We present EvoFreq as a versatile library capable of, generating publication and presentation ready images as well as video for, visualizing clonal frequencies over time. EvoFreq’s design allows for broad access to all users through robust support for input data and input validation functions. EvoFreq can be used for all relational data structures providing many different visualizations from one user-facing library, making it applicable in a number of fields. EvoFreq has currently been adopted by a number of research groups, focused on cancer genomics and mechanistic modelling, and has been used in more than four studies already [10–13]. All source code, read me, and issue support is available at https://github.com/MathOnco/EvoFreq.
Availability and requirements
Project name: EvoFreq
Project home page: https://github.com/MathOnco/EvoFreq
Operating system: Operating system independent.
Programming languages: R
Other requirements: ggplot2.
License: GNU GPLv3
Any restrictions to use by non-academics: None
Availability of data and materials
All data used in this manuscript is freely available at the European Genome-phenome Archive under accession number EGAS00001000952. The code repository can be found at https://github.com/MathOnco/EvoFreq.
Single nucleotide variant
Copy number alteration
Loss of heterozygosity
Miller CA, McMichael J, Dang HX, Maher CA, Ding L, Ley TJ, Mardis ER, Wilson RK. Visualizing tumor evolution with the fishplot package for r. BMC Genomics. 2016; 17(1):880. https://doi.org/10.1186/s12864-016-3195-z.
Noble R. Ggmuller: Create Muller Plots of Evolutionary Dynamics. 2018. R package version 0.5.1. https://CRAN.R-project.org/package=ggmuller.
Wickham H. Ggplot2: Elegant Graphics for Data Analysis: Springer; 2016. http://ggplot2.org.
Pedersen TL, Robinson D. Gganimate: A Grammar of Animated Graphics. 2019. R package version 126.96.36.19900. http://github.com/thomasp85/gganimate.
West J, Schenck R, Gatenbee C, Robertson-Tessi M, Anderson ARA. Tissue structure accelerates evolution: premalignant sweeps precede neutral expansion: Cold Spring Harbor Laboratory; 2019. https://doi.org/10.1101/542019.
Eirew P, Steif A, Khattra J, Ha G, Yap D, Farahani H, Gelmon K, Chia S, Mar C, Wan A, Laks E, Biele J, Shumansky K, Rosner J, McPherson A, Nielsen C, Roth AJL, Lefebvre C, Bashashati A, de Souza C, Siu C, Aniba R, Brimhall J, Oloumi A, Osako T, Bruna A, Sandoval JL, Algara T, Greenwood W, Leung K, Cheng H, Xue H, Wang Y, Lin D, Mungall AJ, Moore R, Zhao Y, Lorette J, Nguyen L, Huntsman D, Eaves CJ, Hansen C, Marra MA, Caldas C, Shah SP, Aparicio S. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature. 2014; 518:422.
Dang HX, White BS, Foltz SM, Miller CA, Luo J, Fields RC, Maher CA. Clonevol: clonal ordering and visualization in cancer sequencing. Ann Oncol. 2017; 28(12):3076–82. https://doi.org/10.1093/annonc/mdx517.
Deshwar AG, Vembu S, Yung CK, Jang GH, Stein L, Morris Q. Phylowgs: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol. 2015; 16(1):35. https://doi.org/10.1186/s13059-015-0602-8.
Myers MA, Satas G, Raphael BJ. Inferring tumor evolution from longitudinal samples: Cold Spring Harbor Laboratory; 2019. https://doi.org/10.1101/526814.
West J, You L, Zhang J, Gatenby RA, Brown J, Newton PK, Anderson ARA. Towards multi-drug adaptive therapy: Cold Spring Harbor Laboratory; 2019. https://doi.org/10.1101/476507.
West J, Schenck RO, Gatenbee C, Robertson-Tessi M, Anderson ARA. Tissue structure accelerates evolution: premalignant sweeps precede neutral expansion: Cold Spring Harbor Laboratory; 2019. https://doi.org/10.1101/542019.
Schenck RO, Kim E, Bravo RR, West J, Leedham S, Shibata D, Anderson ARA. How homeostasis limits keratinocyte evolution. 2019. https://doi.org/10.1101/548131.
Gatenbee CD, Baker A-M, Schenck RO, Neves MP, Hasan SY, Martinez P, Cross WC, Jansen M, Rodriguez-Justo M, Sottoriva A, Leedham S, Robertson-Tessi M, Graham TA, Anderson ARA. Niche engineering drives early passage through an immune bottleneck in progression to colorectal cancer. 2019. https://doi.org/10.1101/623959.
The authors gratefully acknowledge funding from both the Cancer Systems Biology Consortium (CSBC) and the Physical Sciences Oncology Network (PSON) at the National Cancer Institute, through grants U01CA232382 (supporting ARAA) and U54CA193489 (supporting CDG, ARAA). ARAA would also like to acknowledge support from the Moffitt Center of Excellence for Evolutionary Therapy. ROS is supported by the Wellcome Trust (grant no. 108861/7/15/7) and the Wellcome Centre for Human Genetics (grant no. 203141/7/16/7). No funding body played a role in the design of the study, analysis and interpretation of data, or in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gatenbee, C.D., Schenck, R.O., Bravo, R.R. et al. EvoFreq: visualization of the Evolutionary Frequencies of sequence and model data. BMC Bioinformatics 20, 710 (2019). https://doi.org/10.1186/s12859-019-3173-y
- Visualization software