Enhancing cancer clonality analysis with integrative genomics

It is understood that cancer is a clonal disease initiated by a single cell, and that metastasis, which is the spread of cancer from the primary site, is also initiated by a single cell. The seemingly natural capability of cancer to adapt dynamically in a Darwinian manner is a primary reason for therapeutic failures. Survival advantages may be induced by cancer therapies and also occur as a result of inherent cell and microenvironmental factors. The selected "more fit" clones outmatch their competition and then become dominant in the tumor via propagation of progeny. This clonal expansion leads to relapse, therapeutic resistance and eventually death. The goal of this study is to develop and demonstrate a more detailed clonality approach by utilizing integrative genomics. Patient tumor samples were profiled by Whole Exome Sequencing (WES) and RNA-seq on an Illumina HiSeq 2500 and methylation profiling was performed on the Illumina Infinium 450K array. STAR and the Haplotype Caller were used for RNA-seq processing. Custom approaches were used for the integration of the multi-omic datasets. Reported are major enhancements to CloneViz, which now provides capabilities enabling a formal tumor multi-dimensional clonality analysis by integrating: i) DNA mutations, ii) RNA expressed mutations, and iii) DNA methylation data. RNA and DNA methylation integration were not previously possible, by CloneViz (previous version) or any other clonality method to date. This new approach, named iCloneViz (integrated CloneViz) employs visualization and quantitative methods, revealing an integrative genomic mutational dissection and traceability (DNA, RNA, epigenetics) thru the different layers of molecular structures. The iCloneViz approach can be used for analysis of clonal evolution and mutational dynamics of multi-omic data sets. Revealing tumor clonal complexity in an integrative and quantitative manner facilitates improved mutational characterization, understanding, and therapeutic assignments.


Background
It is recognised that cancer is a clonal disease instigated by a single cell and that metastasis is also commenced thru a single cell [1][2][3]. Tumors are composed of a variety of clones or subpopulations of cancer cells that may differ, for instance in their expression of cell surface markers, sensitivity to therapeutic agents, karyotype, proliferation rate. A cancer clone or subclone is a cell or group of cells that have formed from an original cell as a result of a new mutation [4]. Many cancers including multiple myeloma (MM) are difficult to treat due to their dynamic adaptability resulting from clonal evolution [5].
Evolution is an important scientific concept because it works. It provides a framework to explain changes in biological systems. Cancer is the result of an evolutionary process, but it is destructive, since it involves the loss of mechanisms that are implemented to protect against uncontrolled and undifferentiated growth. Ultimately, natural selection has a harsh reality that worried Darwin, namely, all that seems to matter is reproductive success [6].
MM is a cancer of the bone marrow characterized by a malignant transformation and proliferation of plasma cells [7]. Definitive therapies include combination chemotherapy, autologous transplant regimens [8], and two new classes of agents called immunomodulatory drugs (IMiDs) and proteasome inhibitors [9][10][11]. A significant improvement in patient survival has occurred over the last 15 years [12]. However, there is still significant variation in outcome and an explanation is tumor heterogeneity with associated complex genomic landscapes [5,13,14].
There exist an array of computational methods/tools that allow one to characterize various aspects of the clonal architecture of a tumor(s). Each method employs different computational and visualization techniques, and are very briefly described. SciClone [15] allows for the characterization of the clonal structure of a tumor using multiple samples, in an attempt to shed light on "cryptic subclones", which can appear when only one sample is analyzed. Several visualizations are available within SciClone, and clustering on the variant allele frequencies of somatic mutations (using a variational Bayesian mixture model on copy number neutral regions) attempts to infer the number and composition of subclones. PyClone [16] makes use of hierarchical Bayesian clustering models to estimate the number and cellular prevalence of subclones from the variant allele frequencies of somatic mutations. The method also takes into account copy number variation and possible normal cell contamination in the model. TrAp [17], in addition to inferring the subclones and their abundances within a single "aggregate sample" (via "aberrant frequencies" aka variant allele frequencies), constructs a phylogenetic tree-describing the evolution of clones within the tumor. The problem is modeled, and solutions based mathematically, on the deconvolution of a single aggregate signal.
Clomial [18] is another method used to infer subclonal structure using a binomial expectation maximization based approach (via somatic variant allele frequencies), and was specifically designed to deal with multiple samples from a single tumor sample. Rec-BTP [19] casts the problem of uncovering the clonal structure of a single sample, using the variant allele frequencies of somatic mutations, as a combinatorial one. A recursive algorithm using a binary tree partition is developed which approximates the originally formulated NP-complete problem. THetA [20] uses copy number aberrations to discover most likely clonal subpopulations. In particular, the problem of subclonal characterization is solved using a maximum likelihood mixture decomposition method, in order to find the genotypes "whose mixture best explains the observed sequencing data." PhyloWGS [21] is unique, in that it makes use of both somatic mutational and copy number variation data in its subclonal analysis. It additionally infers a phylogenetic tree (using non-parametric mathematical approaches) explaining the evolutionary history of a tumor's clonal composition, as well as, each clone's relative abundance. The method is also able to take multiple sample inputs to aid subclonal reconstruction.
All of the aforementioned referenced methods only make use of a single modality in their characterizations of clonal architecture, namely, DNA-based mutational data, culled from whole genome or whole exome sequencing (WES) experiments. Some methods also attempt to account for the effect of copy number variation on clonal architecture. This is contrasted to the multiple modality datasets employed in the characterization of clonal structure used within iCloneViz. iCloneViz is the only known computational method for inferring clonal architecture, which integrates multiple modality datasets to derive deeper biological meaning. For example, RNA variant calling is performed (a relatively new method of variant calling) to detect whether or not a mutation found within the DNA (via a WES or WGS experiment) is detectable within a RNA transcript. Further, if a mutation is found to be present at the RNA level, the expression values associated with the transcript(s) containing the mutation can be quantified and visualized. Finally, DNA methylation data is integrated into the analysis, which could lead to hypotheses regarding methylation suppressing the expression of a tumor suppressor gene (TSG). iCloneViz tracks TSGs both by mutations and epigenetic/methylation events.
In the presented analysis, integrated clonal dynamics from a single patient diagnosed with MM is explored. Tumor material is investigated at initial diagnosis (i.e., Presentation) and then later when the cancer recurred (i.e., Relapse). A novel bioinformatic approach named CloneViz [22] has been enhanced to allow for the rapid integration, quantitation, visualization, and investigation of the mutational dynamics from i) WES, ii) RNA-seq, and iii) DNA methylation.
The methodology is independent of any specific cancer type and MM is used as a demonstrative example due to its intrinsic heterogeneity. The novelty of the approach concerns the rapid integration and dissection of large and complex multi-omic datasets. Discovered in this study is first, the existence of an amplified and mutated MYC oncogene in both Presentation and Relapse. Second, the occurrence of hotspot mutations in MTOR with rationale to treat, and in KRAS, with evidence not to treat. Third, is progressive evidence of tumor suppressor gene silencing. Overall, the molecular profiling and advanced bioinformatics (iCloneViz) provide for a more precise understanding of tumorigenesis with potential for an improved outcome via precision medicine-based approaches.

Sample and library preparation
All samples were obtained from a single patient with MM at the Myeloma Institute, University of Arkansas for Medical Sciences (UAMS), Little Rock, AR. Sample collection protocols were approved by the UAMS Institutional Review Board (IRB). Collected samples include bone marrow aspirate with biopsy and peripheral blood. Tumor plasma cells were obtained from the bone marrow aspirate after enrichment by anti-CD 138 immunomagnetic bead selection at a central laboratory, in a manner previously described [23]. Selected tumor material had a purity greater than 96% for the presentation sample and 99% for relapse. CD-138 is a marker for malignant plasma cells in a patient with clinical MM. The patient's first (Presentation) and second (Relapse) tumor samples were obtained in 2010 and 2013 respectively. Normal (germline) material was obtained from the buffy coat of peripheral blood in 2010, after density gradient centrifugation. To ensure the absence of plasma cells buffy coat material was also examined by flow cytometry.
Tumor and normal whole exome libraries were constructed from 50 ng of DNA material after shearing, end repair, phosphorylation, and ligation to bar coded sequencing adapters. DNA material was further fragmented using the S220 focused-ultrasonicator (Covaris), for a target base pair (bp) length of 300. DNA was size selected for lengths between~250 -330 bp. DNA regions were captured using the Agilent SureSelect QXT Human All Exon v5 Plus hybrid capture kit. Samples were then multiplexed and subjected to sequencing (101 bp paired-end reads) on an Illumina HiSeq 2500. Whole exome libraries were sequenced to an average depth of 100x.
RNA-seq tumor libraries were constructed using 200 ng of total RNA material, using the Illumina TruSeq mRNA v2 kit according to the manufacturer's instructions. Poly-A selection for mRNA was first performed using streptavidin-coated magnetic beads, which was followed by thermal mRNA fragmentation. mRNA was then subjected to cDNA synthesis using reverse transcriptase. Resulting cDNA was converted to double stranded cDNA, followed by end repair, and then ligated to paired-end adapters. Size selection was performed using AMPure XP beads (Beckman Coulter), for sequences~300 -350 bp in length. The library was further enriched using 15 cycles of PCR and purified again with AMPure XP beads. The concentration of material run was 8 pM. The libraries were then multiplexed and subjected to sequencing (101 bp paired-end reads) on an Illumina HiSeq 2500. RNA-seq libraries were sequenced to a total of 100M reads.
Genome-wide DNA methylation was assessed in bisulfite-converted genomic DNA using the Illumina Infinium HumanMethylation450 (HM450K) BeadChip array, which contains 485,577 probes covering 99% of RefSeq genes, 96% of CpG islands (CGI) and coverage across promoters, 5' and 3'-UTRs, first exons and gene bodies. Genomic DNA (500 ng) was bisulfite treated and purified using the EZ DNA Methylation-Gold kit (Zymo Research, Irvine, CA) according to the manufacturer's protocol. The resultant bisulfite-converted DNA was processed, hybridized to Illumina HumanMethyla-tion450 BeadChips, fluorescently stained and scanned according to the Infinium HD Assay Methylation Protocol User's Guide provided by Illumina. Processed BeadChips were scanned on an Illumina iScan and methylation values were determined for all probes using the GenomeStudio Methylation module (Illumina).

RNA sequencing processing
Transcriptome reconstruction RNA-seq samples were first demultiplexed and FASTQ files were created from BCL files using CASAVA v1.8.2. Quality control and assessment was performed on FASTQ files using FastQC v0.11.2. Trimmomatic v0.32 [34] (utilizing a sliding window approach) was used to trim low quality reads and remove possible adapter sequences. Alignment of reads and transcriptome reconstruction was performed using the Tuxedo suite of tools. TopHat v2.0.12 [35] was used to align each sample's paired-end reads to the Ensembl reference genome build GRCh37.75. Quality control and assessment of resulting BAM files was performed using SAMtools v0.1.8 [36]. BAM files were then used to reconstruct the transcriptome and quantify each isoform's fragments per kilobase of transcript per million mapped reads (FPKM) using Cufflinks v2.2.1 [37], and Cufflinks was run in a mode which allows for the discovery of novel isoforms. HTSeq v0.6.1 [38] was used to quantify raw (non-normalized) gene-based read counts.

RNA-seq based variant calling
After demultiplexing and creating FASTQ files using the previous subsection's description, RNA variants were called using the Broad Institute's GATK Best Practices for RNA-seq variant calling [39]. These steps include the following: STAR v2.3.0e [40] was used to align reads to the Ensembl reference genome (build GRCh37.75), using the recommended "2-pass" approach. Duplicates were marked and the aligned reads sorted with Picard Tools. Next, the tool SplitNCigarReads (GATK component) was used to split reads into exon segments, clip reads which overhang intronic regions, and assign a default MAPQ score of 60 to all reads. Variants were called using the HaplotypeCaller tool (GATK component).

Variant quantification and classification
The variant allele frequency (VAF) was determined by dividing the total reads for the variant (TRV) by the sum of the total reads for the variant (TRV) plus total reads for the reference (TRR). Copy number data was derived using ExomeCNV v1.4. The selection and retention of variants were based on the following filtering parameters: i) VAF ≥ 4%, and ii) 20 ≤ DP ≤ 1000. A manual evaluation of the read alignments using the Integrative Genomics Viewer (IGV) v2.3.32 was also performed [41]. At times a second selection of variants utilized an intersection against a key gene (KG) list. The KG group was constructed from the following public sources: i) known drivers and cancer predisposition genes cited in Vogelstein, et. al. [42], ii) Foundation One Heme™ Genes (http://foundationone.com/genelist2.php) and, iii) the MD Anderson listing of human DNA repair genes [43].

Quantifying clonal diversity
Diversity measures from ecology were adapted to quantify clonal diversity in MM serial samples [44]. Each sample is not a single organism/species, but rather consists of thousands of cells from a purified bone marrow aspirate. The abundance of a molecular species (variant/mutation) is the product of VAF * DP * CN. The number of clones in a neoplasm is a simple measure of diversity. Diversity measures typically incorporate both the number and abundance of clones [44]. The Shannon diversity index (SDI) [45] is where p(i) is the frequency of clone i in the neoplasm. The SDI computes a single quantitative value based on the number of different mutations in the cancer sample and how evenly distributed each mutation is among the entire group. The SDI value will increase when the number of distinct mutations increases and also when the evenness among the mutations increases [44]. There are other diversity measures (e.g., Simpson, Berger-Parker) but, the Shannon diversity index is preferable because it is not dominated by the most frequent clone, and it has been utilized in previous studies of cancer [46,47].

Software engineering and integrative analysis
This section and its constituent sub-sections documents and describes all software components, data structures, and algorithms used to construct the extended and integrated CloneViz (iCloneViz), especially the data calculations, visualizations and the rationale for their use.

General technology and frameworks
The C# programming language, targeting the .NET Framework v4.5 [48] (utilizing the integrated development environment Visual Studio 2013 [49]), was used to construct iCloneViz. This programming platform was used to facilitate the rapid development of an interactive Windows-based application. To minimize the amount of custom data access layer code needed to perform the dataintensive functions of iCloneViz, the object-relational mapping software Entity Framework v6.1.1 [50] (using the database-first model) was utilized along with the molecular profiling modality database (MPMDB), which is further described later in this study. The built-in .NET Framework charting package (i.e., namespace System. Windows.Forms.DataVisualization.Charting), was used to provide visualization primitives. This package was used in favor of others, because of its relative ease-of-use and built-in functions, in particular, its ability to perform smart label positioning. The relational database management system SQL Server 2012 Developer Edition [51] was used to store all patient data and meta-data. The database system was chosen for its robustness, ability to scale, and its low cost for academic and research use. (The database system is described in further detail in the "Data storage, retrieval and annotation" section below).

Mathematics and statistics
To assist in kernel density estimation (KDE) and perform probability density calculations necessary to estimate the unknown distribution of the mutations found, the package Math.NET Numerics v3.2.3 [52] was used. The standard normal (Gaussian) kernel was used in the estimation. The equation used to estimate the probability density function induced by the mutations is as follows: Where x is the point at which density is to be estimated, (x 1 , x 2 ,..., x n ) is the array of independent and identically distributed (i.i.d) sample (mutations) of some unknown distribution, n is the array size, h is the bandwidth, and K is the standard normal kernel function.
To account for the effect of copy number variation (CNV) on mutations found within regions containing copy number alterations, the following was done: i) copy number information was inferred using ExomeCNV for each loci of the tumor exome; ii) each mutation's loci was associated with those regions found in (i); iii) each mutation was weighted based on the copy number for the loci it lied within. The statistical software package R v3.1.3 [53], in particular, the KernSmooth (ks) package v1.9.4 [54] (hpi function) was used to calculate a data-derived univariate kernel density "plug-in" bandwidth [55]. The package R.NET v1.6 [56] was utilized to provide an application programming interface from .NET to R and the ks package.

Data storage, retrieval and annotation
At the Myeloma Institute the MPMDB is a pre-existing, research-driven molecular profiling repository. The design philosophy of the database was to provide a centralized information architecture for interfacing additional custom tools (e.g., iCloneViz) via various abstractions. The MPMDB provides many other features to facilitate the analysis of complex next generation sequencing (NGS) data, including extensive extract-transform-load (ETL) capabilities for data cleaning as well as data association and integration with various knowledge databases.
The integration of various modality datasets (aka tables or relations), which allow for the visualizations of iCloneViz are now described (N.B. not all attributes are defined for brevity). Standard database related theory and relational algebra notation are used [57]. A diagram titled, Multi-Omic Relational Integration (Additional File 1) illustrates the relations between the various relational algebra equations.
Let the schema of a relation be denoted as A(a 0 , a 1 , ..., a n ), where A is the relation's name and each a i (0 ≤ i ≤ n) is a name or identifier of an attribute within A (the domain or data type for each attribute a i are excluded for brevity ). Let: i. W be the resulting data from a whole exome sequencing experiment, that is, the SnpEff annotated results from the variant-calling application Strelka; ii. M be the data resulting from a methylation experiment; iii. C be the copy number data computationally inferred from a WES experiment using ExomeCNV; iv. H be the raw count (unnormalized) data for each gene in an RNA-seq experiment using the htseqcount function from the HTSeq application; v. RV be the data resulting from RNA-seq variant calling using the Haloptype Caller (GATK); vi. R be RNA-seq transcript quantification data (FPKM) following transcriptome reconstruction by Cufflinks; vii. K be a list of key genes representing cancer drivers and Variants of Uncertain Significance (VUS) the user is interested in labeling in each visualization; and viii. P be patient and experimental meta-data.
The schema of each of the aforementioned relations to be integrated is defined as follows: Let ⋈ A.a=B.b denote the theta-join (in particular the equijoin) between relations A and B on attributes a and b respectively. Let L A.a=B.b denote the left-outer join between relations A and B on attributes a and b respectively. Let π (a 0 ,...,a n ) denote the projection operator over attributes (a 0 ,...,a n ). Finally, let s E (A) denote the selection operator, where A is the relation to be selected from and E is the conditional expression used for selecting tuples/rows from relation A. Given these definitions, and assuming each relation is selected for the experiment that is to be integrated, the final integration of the various multi-omic datasets can be defined as: Where IntegratedRelation denotes the final integrated relation. Given this final relation (and the key gene list K to visually label), aggregations and selections can be performed to filter data and encode all visual aspects within iCloneViz. Some of those aggregations and selections are now defined for later reference: Where W i denotes s W.ExperimentId=i (W) and ∩(a 0 ,...,a n ) is the intersection operator based on common attributes (a 0 ,...,a n ).
Where Glyph = [1: Only WES Data, 2: WES + RNA Data], is a computed attribute on IntegratedRelation indicating what type of data is available (not null) in each tuple of IntegratedRelation.ˆR Whereŵ is a tuple within relationŴ from Eq. (11).
Pseudocode describing the algorithms of iCloneViz can be found in the appendix.

Results and discussion
This study illustrates the integrated analysis of temporal-based clonal dynamics from a patient with MM. Examination of Presentation and Relapse samples from purified bone marrow aspirates are compared and contrasted using a bioinformatic approach named iClone-Viz, which has been enhanced and redesigned to perform clonality analysis utilizing an integrative genomic (DNA, RNA, methylation) methodology, not previously possible. All graphs and visualizations in this study were generated by iCloneViz, unless indicated otherwise. It has been established that there is intraclonal heterogeneity at the level of single nucleotide variants (SNVs) in myeloma [58]. As a means to study the genetic heterogeneity and Darwinian nature that is germane to cancer, clonal analysis and integrative genomics has been advocated [59,60].
iCloneViz makes available a global view of all mutational events in a WES experiment. This is illustrated in Additional File 2. Subfigure A corresponds to the Presentation and B to the Relapse sample. The x-axis contains an ordered list of chromosomes (1-22, X, Y). Each chromosomal region is sized by the number of base pairs it contains. The y-axis is organized by variant allele frequency (VAF). Each variant is a point on the plot and the color scale indicates the depth of coverage (depth). A global view of mutations on a chromosomal basis is provided by these plots. A higher degree of mutations are evident on chromosomes eight and 14 for both Presentation and Relapse. Chromosome 18 contains more mutations in the Relapse sample vs. Presentation. Global views may provide rapid insights, especially as newly discovered types of complex genomic rearrangements, such as chromothripsis [61], katageis and the underlying roles for APOBEC and similar family enzymes [62], have their signatures or motifs become more understood.
A scatter plot of paired samples from Presentation and Relapse are profiled in Additional File 3. The x-axis contains the Presentation and y-axis the Relapse, and both axes are ordered by VAF. Key genes are labelled. Color discriminates paired mutations (blue) from those appearing only in the Presentation (green) and Relapse (red). A summary illustration of the mutational landscape is provided by this graphic, namely what is common and different between the two samples. There are a noteworthy fraction of shared mutations but also a reasonable amount of variants unique to each sample.
A visual exploration of integrated mutational data revealing the evolutionary trajectories is provided by iCloneViz. Figure 1 demonstrates Gaussian kernel density plots with paired scatter plots for the Presentation (A) and Relapse (B) samples. The paired plots serve to profile the magnitude and frequency of DNA mutations found in these tumor samples. The x-axis for each graph is VAF. The output of the kernel density function is contained on the y-axis. Peaks on the kernel density plots may infer dominant clones and subclones. A measure of relative abundance (Depth) [15] is contained on the y-axis of the scatter plot. Key genes are labelled, and two glyphs are used to signify if a mutation was found only in WES (blue circle) or found in both WES and RNA-seq (red star). A level of heterogeneity across the Presentation and Relapse samples is evident by the distinct peaks seen in the two kernel density plots.
iCloneViz has a number of filtering options. Figure 2 shows a series of kernel density plots with associated scatter plots for the key genes found in the Presentation (A) and Relapse (B) samples. The units associated with the × and y axes are unchanged. The mutational glyph assignments are also without change. Distinct peaks are evident in both samples indicating heterogeneity.
The temporal DNA-based mutational dynamics and clonal evolution observed in the Presentation and Relapse samples are shown in Figure 3A. This is an extension of the iCloneViz analysis. The Presentation sample was obtained when the patient was initially diagnosed with MM in year 2010. It contains 89 mutations that passed filtering criteria and was found to have a Shannon Diversity Index (SDI) of 3.89. There are five mutations classified from the key genes (KG) group and are: DNA2, MYC, SETD2, TET2, and TNFRS11A.
An evolutionary selection event occurred later in year 2010, as a result of the patient receiving definitive therapy for MM. The patient did well until year 2013 when a relapse occurred and a new bone marrow aspirate was obtained. Comparing the Presentation to Relapse reveals that all the key genes survive, including the amplified MYC oncogene, which also contains a VUS missense mutation. In addition three more mutated genes are gained including the histone methyltransferase MLL3, along with the KRAS and MTOR oncogenes. Gaussian kernel density plot and a paired scatter plot, for Presentation (A) and Relapse (B) samples, for all mutations found within the filtering criteria. The x-axis for all plots is the variant allele frequency (VAF). A Gaussian kernel density estimator is used to approximate the mutational density (weighted by copy number) of the mutations plotted in the upper subfigures. Lower subfigures (scatter plots) plot each mutation found (with key genes labelled). The y-axis of each lower subfigure is the depth of coverage. Mutations in the scatter plot are given a glyph that indicates the level of multi-omic data available for the given mutation, blue circle (WES only) and red star (WES and RNA-seq) A higher number of mutations (122) are found in the Relapse sample. It also has an increased SDI (4.25), which implies more uncertainty or randomness in the process, from an information theory viewpoint [45]. From a cancer biology view, it indicates a progression of disease or diversification in the mutational landscape. Table 1 lists the summary information for the Presentation key genes, and Table 2 for the Relapse key genes. Columns are Gene for gene symbol, DNA Mutation, AA Change for the corresponding amino acid change, CN for copy number, DP for depth, AF for allelic frequency, RNA Mutation to indicate expressed mutations, and Notes that contains the gene class. Noted are three mutations in the Presentation group (MYC, SETD2, TET2), which have their DNA mutations also found to be expressed in mRNA. The Relapse key genes also contain three DNA variants found to have their mutations expressed in RNA, and include the still amplified and mutated MYC oncogene, SETD2, and the MTOR oncogene that contains a hotspot mutation (p.Val2006Leu). Figure 3B illustrates the temporal RNA-based mutational dynamics and clonal evolution. This is also an extension of the iCloneViz analysis. The MYC oncogene and SETD2, in addition to having their DNA mutations found in mRNA, have two transcripts in the Presentation that are also found in the Relapse sample. SETD2 also has two new transcripts appearing in the Relapse sample. CUFF.29743.3 is a novel transcript due to a new exon. However, this exon was found to have only one supporting read by IGV, and thus not seriously considered to be viable. The TET2 transcripts do not survive and are listed in the "Non Surviving Mutations" box along with passengers. MTOR, which was a new DNA mutation found in the Relapse sample, has two viable mRNA transcripts now appearing in the Relapse. Table 3 lists the expressed  RNA mutated key gene transcripts for the Presentation sample and Table 4 for Relapse. Table 5 lists the DNA methylation of tumor suppressor genes (TSGs) for the Presentation sample passing iCloneViz filtering criteria from~485,000 possible candidates. The filtering criteria consisted of: i) Avg-Beta ≥ 25%, ii) "Promoter Associated" regulatory feature, and iii) being from a CpG Island. The methylation data set was integrated with RNA-seq to directly associate the raw (unnormalized) RNA Read Counts for the TSGs identified by filtering criteria. The majority of TSGs in the Presentation (Table 5) and Relapse (Table 6) have zero or very low read counts, indicating a lack of mRNA expression and thus no protein production. Loss of TSG(s) is associated with both the onset and progression of many cancers [63,64].
Why did the patient's cancer recur following definitive therapy? Does the experimental information from DNA, RNA, and methylation analyses provide answers or insights concerning the relapse? Regarding the Presentation sample, associating the findings from Figure 1A, and Figure 2A, the founder clone contains an amplified and mutated MYC oncogene. Table 1 reports a missense mutation in MYC having the base substitution of c.226G>A, resulting in alanine being replace by threonine at position 76 (p.Ala76Thr) in the amino acid chain. The copy number is 3 (amplified), depth of coverage is 207 and the allelic fraction is 31. This mutation is also expressed in mRNA. Additional File 4 shows a MYC lolliplot diagram made by the Protein Paint [65] application  Figure 3A lists changes in allelic frequency, depth and copy number for the Presentation sample. Also indicated is whether or not the DNA mutation was expressed in the sample's associated mRNA.

N HMT
A tabular version of the clonal dynamics showed in Figure 3A lists changes in allelic frequency, depth and copy number for the Relapse sample. A tabular version of the clonal dynamics as shown in Figure 3B, lists changes in depth, allelic frequency and FPKM in the Presentation sample. Abbreviations: DP (Depth) and Allelic Frequency (AF). A tabular version of the clonal dynamics as shown in Figure 3B, lists changes in depth, allelic frequency and FPKM for the Relapse sample.
showing the possible missense mutations, depending on splicing, in the coding region. Table 3 shows the MYC transcripts with good depth of coverage and FPKMs. Finally, Table 5 shows four TSGs (BRCA1, CEBPA, MSH2, SOCS1) with a CpG Island, promoter associated regulatory feature being methylated by more than 25% and having RNA read counts of zero, indicating these genes have been silenced. Concerning the Relapse sample, integrating findings from Figure 1, B and Figure 2, B shows that the founder clone continues to contain the mutated MYC oncogene. Table 2 reports the same MYC DNA mutation, the copy number is still amplified at three, DP is 257, allelic frequency is 33%, and the mutation is present in mRNA. The MTOR oncogene is new, and contains a hotspot mutation (c.6016G>C, p.Val2006Leu), which is also found in RNA. Figure 3B shows survival of MYC related transcripts from Presentation to Relapse, and the transcripts for MTOR. Table 4 shows the two MYC transcripts with good depth of coverage and FPKMs, and the MTOR transcripts have reasonable values. Table 6 displays six TSGs (AXIN1, BRCA1, MEN1, MSH2, RUNX1, SOCS1) with CpG Island promoter associated regulatory feature being methylated by more than 25% and having RNA read counts of zero, indicating a progression of gene silencing. The KRAS oncogene appears in the Relapse but the mutation was not expressed in RNA (Table 2) therefore would not be considered for therapeutic targeting.
Deregulated expression of MYC is a hallmark feature of cancer and serves to uncouple growth factor dependent proliferation [66], and may occur through a variety of mechanisms (e.g., gene amplification, translocation, focal enhancer amplification, or constitutive activation of Listed are the DNA methylation of tumor suppressor genes (TSGs) passing iCloneViz filtering criteria from~485,000 possible candidates, for the Presentation sample. The filtering criteria consisted of: i) Avg-Beta ≥ 25%, ii) "Promoter Associated" regulatory feature, and iii) being from a CpG Island. The methylation data set was integrated with RNA-seq to directly associate the raw (unnormalized) RNA Read Counts for the identified TSGs. Listed are the DNA methylation of tumor suppressor genes (TSGs) passing iCloneViz filtering criteria from~485,000 possible candidates, for the Relapse sample. Filtering criteria and integration with RNA-seq is the same as is reported for the Presentation sample (Table 5).
upstream signalling pathways) [67]. MYC over-expression occurs in~30% of human cancers and commonly is a harbinger for a poor clinical outcome, aggressive biological behavior, increased chance of relapse and advanced stage of disease at initial diagnosis [68]. Studies in transgenic mouse models have identified MYC inactivation leads to prompt tumor regressions [69].
There are now a number of new agents in clinical trials for targeting MYC, for instance the BET bromodomain inhibition [70,71], and this should be considered for subsequent therapy. Additionally, the Relapse sample contained a hotspot mutation in MTOR and targeting with a "rapalog" (rapamycin and its analogs) should also be considered. Precision medicine and therapeutic combinations with new and more targeted agents are challenging and an active area in clinical trials and translational research [72,73]. Essential to these efforts are advanced bioinformatics with abilities to integrate multi-omic datasets, combat cancer heterogeneity via clonal and evolutionary approaches, and ultimately provide clinical utility thru an improved understanding of the disease process at hand. This was demonstrated in this study.

Conclusions
Illustrated in this study has been the temporal-based analysis of integrated clonal dynamics of a single patient with MM by examining the Presentation and Relapse tumor samples. This involved the visualization and quantification of variant/mutational dynamics in the context of integrated evolution of WES, RNA-seq, and DNA methylation. Subpopulations of mutations will evolve over time due to natural selection events related to cell intrinsic or micro-environmental factors, as well as selection events induced therapeutically. Selection events eliminate some mutations and provide a survival advantage to others. iCloneViz provides global views of mutational events as well as rapid data integration of i) WES, ii) RNA-seq, and iii) DNA methylation. This results in an enriched picture with a more focused specificity of mutational events, and provides evidence and more confidence for therapeutic assignments.
Heterogeneity is found in many cancers and limits aggregate approaches for scientific and clinical utility. MM is known as a heterogeneous cancer with a complex molecular landscape. Clonality and integrated genomics has been advocated as means to combat heterogeneity [59,60]. The illustrated novelty and precise contribution of iCloneViz is the ability to perform integrative clonality analysis utilizing data sets from: i) WES, ii) RNA-seq, and iii) DNA methylation. To date, there is no other software based clonality tool available (commercially or open source) that can perform integrative clonality analysis. Given the repeated findings of multiple TCGA studies reporting heterogeneity with a complex mutational landscape [14,74], additional analytical approaches are needed; since it is evident that DNA analysis is necessary but not sufficient. Our approach, which builds on previous work now employs clonality and integrative genomics, both of which have been recommended to combat heterogeneity.
In this study, observed in all serial MM samples was the presence of an amplified MYC oncogene species with a VUS missense mutation. MYC is a transcription factor and master regulator of~15% of all gene expression, and also functions to regulate chromatin structure. A permanent remission or cure was not achieved despite definitive therapy. The dominant genetic alterations in the founder clone was never targeted specifically for therapy thus cure or a lasting remission was unlikely. For major advances in cancer management, a systematic approach to collect tissue samples at diagnosis, and serially at relapse(s), in order to profile the dynamic clonal evolution is critical.

Appendix iCloneViz pseudocode
The pseudocode for iCloneViz is now presented, and is described and documented using a PDL (Program Design Language) structure as described in Pressman [75]. A C-style notation was utilized. A flow diagram of the pseudocode is shown in Additional File 5. The multi-omic relational integration is illustrated in Additional File 1.
• button_selection: An instance of the button that was clicked.
• button_click_filter: Boolean indicating whether an instance of the button labeled "Filter" was clicked.
• button_click_show_tsg_methylation_table: Boolean indicating whether an instance of the button labeled "Show TSG Methylation Table" was clicked.
• button_click_show_wes_table: Boolean indicating whether an instance of the button labeled "Show WES Table" was clicked.