The diagnostic application of RNA sequencing in patients with thyroid cancer: an analysis of 851 variants and 133 fusions in 524 genes

Background Thyroid carcinomas are known to harbor oncogenic driver mutations and advances in sequencing technology now allow the detection of these in fine needle aspiration biopsies (FNA). Recent work by The Cancer Genome Atlas (TCGA) Research Network has expanded the number of genetic alterations detected in papillary thyroid carcinomas (PTC). We sought to investigate the prevalence of these and other genetic alterations in diverse subtypes of thyroid nodules beyond PTC, including a variety of samples with benign histopathology. This is the first clinical evaluation of a large panel of TCGA-reported genomic alterations in thyroid FNAs. Results In FNAs, genetic alterations were detected in 19/44 malignant samples (43 % sensitivity) and in 7/44 histopathology benign samples (84 % specificity). Overall, after adding a cohort of tissue samples, 38/76 (50 %) of histopathology malignant samples were found to harbor a genetic alteration, while 15/75 (20 %) of benign samples were also mutated. The most frequently mutated malignant subtypes were medullary thyroid carcinoma (9/12, 75 %) and PTC (14/30, 47 %). Additionally, follicular adenoma, a benign subtype of thyroid neoplasm, was also found to harbor mutations (12/29, 41 %). Frequently mutated genes in malignant samples included BRAF (20/76, 26 %) and RAS (9/76, 12 %). Of the TSHR variants detected, (6/7, 86 %) were in benign nodules. In a direct comparison of the same FNA also tested by an RNA-based gene expression classifier (GEC), the sensitivity of genetic alterations alone was 42 %, compared to the 91 % sensitivity achieved by the GEC. The specificity based only on genetic alterations was 84 %, compared to 77 % specificity with the GEC. Conclusions While the genomic landscape of all thyroid neoplasm subtypes will inevitably be elucidated, caution should be used in the early adoption of published mutations as the sole predictor of malignancy in thyroid. The largest set of such mutations known to date detects only a portion of thyroid carcinomas in preoperative FNAs in our cohort and thus is not sufficient to rule out cancer. Due to the finding that variants are also found in benign nodules, testing only GEC suspicious nodules may be helpful in avoiding false positives and altering the extent of treatment when selected mutations are found. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0849-9) contains supplementary material, which is available to authorized users.


Introduction
With the wide adoption of next-generation sequencing (NGS) technologies, the number of genome variants and fusions detected in disease tissues has increased dramatically. The Catalog of Somatic Mutations in Cancer (COSMIC) database contains >100,000 somatic mutations found across >400,000 tumors [1]. The publicly accessible ClinVar database contains >170,000 variant submissions, and >118,000 of these are with clinical interpretations [2]. The Cancer Gene Atlas (TCGA) project has generated extensive genomic data on >11,000 cases across 34 different cancers, including thyroid cancer [3]. With the discovery of so many variants in tumor tissues, expectations of their utility in diagnosing disease are high. In a recent statement by the American Thyroid Association [4], the suggestion is made that the expansion of gene sets informed by TCGA variant discovery "will likely improve the diagnostic accuracy of molecular analyses of thyroid cytology specimens and offer promise for personalizing surgical therapy, with the potential for cost and risk reduction in the diagnostic and therapeutic approaches to treating DTC (differentiated thyroid cancer)." However, the discovery of variants of uncertain significance (VUS), or variants found in benign conditions may temper these expectations.
We and others have studied the genomic landscape of thyroid neoplasms in the pursuit of diagnostic tools for managing patients with thyroid nodules. Several well-studied mutations in genes such as BRAF, N-, H-, K-RAS, and translocations of RET/PTC and PAX8/PPARG have been evaluated for use in the clinical decision-making of these patients. In recent years, a growing list of amino acid substitutions, deletions, insertions, frameshifts, truncations, extensions, gene-pair fusions, and other complex changes have been described in thyroid cancer [5][6][7][8][9][10][11][12]. The effort implemented by TCGA identified >500 novel somatic variants in papillary thyroid carcinoma (PTC), the most common type of thyroid cancer. While TCGA used malignant tumor vs. matched normal surgical tissue as the comparison for variant discovery and analysis, a comparison of malignant vs. benign thyroid neoplasms was not conducted, and an understanding of the frequency of these variants in benign thyroid nodules as well as their frequency in fine needle aspiration (FNA) biopsies is lacking. Here we used a deep RNA sequencing approach to test the hypothesis that diagnostic accuracy of thyroid cancer would be improved by testing a large number of TCGA and literature curated variants.
The management of thyroid nodules larger than 1 cm generally includes ultrasound-guided FNA for clinical decision-making. Evaluation of FNA by cytology leads to definitive benign or malignant results in 64% of cases [13] with both high negative predictive value (NPV) >95%, and high positive predictive value (PPV) >98% [14][15][16][17]. Thirteen percent of cases are non-diagnostic [13].
Among asymptomatic thyroid nodules, a benign cytological diagnosis typically averts the need for diagnostic surgery. However, 23% of nodules cannot be definitively diagnosed [13] and are instead grouped into one of three cytology indeterminate categories (Bethesda III, IV, V) [14] according to defined microscopic features. As most cytology indeterminate thyroid nodules are pathologically benign, consensus has arisen on the value of high sensitivity/high NPV value tests to "rule-out" cancer and safely avoid unnecessary diagnostic surgery [18].
The advent of molecular testing with an RNA gene expression classifier (GEC) to rule-out malignancy with high NPV has greatly reduced the number of diagnostic thyroid surgeries of cytology indeterminate nodules, as a majority of these are indeed benign [19][20][21][22][23][24][25][26][27]. Others have approached this problem using gene panels whose variants have been associated with malignancy [28,29]. Whether these rule-in tests (associated with high specificity/high PPV) could be improved to have high sensitivity/high NPV for detecting malignancy (and thus become both a rule-in and rule-out test) has become a topic of interest. A panel measuring variants in seven genes failed to achieve the high sensitivity and NPV required to safely rule-out cancer, as more than half the cancers did not harbor an identified variant [29]. Panels measuring an increasing number of variants have been tested, but not validated in blinded or comprehensive studies, so their ability to safely rule-out cancer has yet to be determined.
The large-scale effort employed by TCGA, which generated exome sequencing, copy number and transcriptional profiling data on a large number of surgical tumor specimens, was a landmark study in elucidating the genomic landscape of many important cancers [3]. One of those studied was PTC, the most common thyroid malignancy [5]. Other types of thyroid cancer were not analyzed, therefore we extended the TCGA effort, in part, by evaluating other subtypes of thyroid cancer and by studying clinically relevant biopsies (thyroid FNA) collected pre-operatively. Here we report for the first time the prevalence of many somatic variants reported by TCGA and others across a wide spectrum of thyroid neoplasms, including benign lesions, in both preoperative FNA and surgical tissue. We studied FNA samples that were collected as part of a prospective, multicenter, blinded cohort, subjected to expert histopathology review [19]; these samples were subsequently blinded and then tested for known somatic genetic alterations using the largest panel of variants yet described. We demonstrate that this panel of 851 somatic variants and 133 somatic fusion gene-pairs has the power to detect only a portion of thyroid carcinomas in preoperative FNAs, raising doubt that such panels can safely rule-out cancer and avoid unnecessary diagnostic surgery, as has been suggested [28,4]. Importantly, some of these genetic alterations were also commonly found in histopathology benign tumors, suggesting that as variant panels grow in size, an increasing fraction of benign nodules will be falsely identified as cancerous unless careful curation of variants based on their frequencies in benign and malignant nodules is incorporated into clinical decision-making.

Sample cohorts
FNAs were prospectively collected from 88 patients as part of a previously reported, blinded multi-center study (VERA FNA) [19], or from a multi-center consecutive series of 110 de-identified, pre-operative specimens with remnant nucleic acids archived by Veracyte's CLIA laboratory (CLIA FNA). An additional 85 post-surgical snap-frozen tissues were procured from multiple clinical centers across the US, including samples obtained from Asterand (Detroit, MI), Cureline (South San Francisco, CA), Proteogenex (Culver City, CA) and CHTN (http://www.chtn.nci.nih.gov/). All clinical protocols were approved by both central and intuition-specific investigational review boards, and when applicable patients provided written informed consent. All FNAs were blindly evaluated by a cytopathologist and were categorized according to the Bethesda System [14].
Blinded histopathology reference standard labels were determined by a panel of experts (VERA001 FNA) [19], or by blinded secondary review to confirm vendor labels (R. Monroe, tissue cohort) independent of all molecular testing. After secondary review 63/85 (74%) of tissues had histopathology truth labels, while 22 were designated as follicular neoplasm (FN) with uncertain histopathology (Histo U). CLIA FNA were categorized as GEC Benign or GEC Suspicious, according to their original Afirma GEC test result. RNA was extracted using the AllPrep Micro kit (Qiagen) per the manufacturer's instructions. Sequencing libraries were prepared and pooled in groups of 16 using 10-20 ng of RNA per the manufacturer's instructions using the TruSeq RNA Access kit, followed by sequencing with a NextSeq 500 instrument (Illumina) to an average depth of 50 million (from 25 million fragments) using 75bp paired-end reads.

Detection of genetic alterations
The genomic location of all variants and fusions was extracted directly from their source publications [6,5,28,29,7] or determined using reported gene ID, amino acid change and mutation annotation files (MAF) generated by Broad Institute, British Columbia Cancer Research Centre, or Baylor University and obtained through the TCGA public access portal. When this information was not available, variants were mapped to their genomic location using COSMIC, and/or cBioportal databases. The initial panel was comprised of 987 distinct genetic alterations, including 854 somatic variants and 133 somatic fusion gene-pairs comprising 524 genes, however 3 variants could not be captured by our platform. All genetic alterations included in the final panel (n=984) are known to result in protein level changes, including amino acid substitutions, deletions, insertions, frameshifts, truncations, extensions, and other complex changes. Copy number alterations were not measured. While this study focused exclusively on curated somatic gene alteration reports [6,28,7,29,5], it is possible that some mutations are present in nodules due to germline variation. Samples were scored positive using the variant and fusion detection pipelines described below.
Reads were aligned to the reference genome, build 37, via STAR v2.4.1b [30] using state of the art best practices for variant and fusion detection as follows; splice-aware alignments used annotations of all known splice junctions from Ensembl 75. De-duplication was carried out with picard v1.123 MarkDuplicates and aligned reads were processed in GATK v3.3. Variants were detected using the GATK Haplotype Caller [31]. Low quality calls were filtered out with GATK VariantFiltration. Fusions were detected from STAR outputs using the Bioconductor package chimera [32] and all filtering settings turned on. Asymptotic standard logit intervals were used to calculate positive predictive values (PPV) and negative predictive values (NPV) along with their associated confidence intervals as previously described [33]. Whenever a zero occurs in a twoby-two contingency table, this method returns estimated adjusted logit intervals that are nonzero.

Results
Our analysis of genetic alterations using genome-wide transcriptome sequencing included 133 fusion pairs and 854 somatic variants. Of these, 133/133 (100%) of fusions and 851/854 (99.6%) of variants were detectable by our RNA sequencing effort. Only 3 variants could not be captured, because they are located at two genomic sites within the non-transcribed promoter region of the TERT gene. Hence, our results center on a final panel of 984 distinct genetic alterations spanning 524 genes. We measured sequencing quality metrics in this study, capturing a median of 48 million paired-end reads and 37 million aligned reads per sample, and computed the range of variants covered at various read depths (Additional file 1). We measured the median read-depth per variant across the pathology subtypes and found variation in coverage, consistent with biological variation in expression of transcripts harboring the genetic variants (Additional file 2).
Given this biological variation among subtypes, it is important to assess individual level coverage.
We found 93% of the 851 variant locations have at least one sample with 20-fold coverage (Additional file 1). Variants discussed in this paper were identified using standard variant calling methods (GATK) and for the sake of comparison, additional variants identified at a more sensitive but less specific setting using another variant calling method (SamTools) are reported in additional file 3.
We evaluated an FNA cohort with surgical truth and found that 19/44 of histopathology malignant samples scored positive for at least one of the genetic alterations, corresponding to 43% sensitivity ( Figure 1, Table 1). Histopathology benign FNAs scored positive in 7/44 cases, corresponding to 84% specificity ( Table 1). The prevalence of malignant samples in this study cohort was 50%, resulting in estimates of NPV of 60% (53-66, CI), and PPV of 73% (56-85, CI). The performance of the molecular panel was also evaluated for each Bethesda cytology category ( In the post-surgical tissue cohort, 19/32 histopathology malignant samples had a detectable genetic alteration (59% sensitivity), while 8/31 histopathology benign samples scored positive, a specificity of 74% ( Figure 2). A single GNAS variant was observed to occur in a follicular adenoma.
As this variant has been considered to be a marker for benign nodules, we excluded it from performance calculations to avoid falsely lowering the estimate of specificity. The frequency of malignant samples in this small cohort was 51% and its NPV was estimated at 64% (53-74, CI), while PPV was 70% (55-82, CI).
We also studied a consecutive FNA series (n=110) processed through our CLIA laboratory. This clinically representative cohort was comprised primarily of cytology indeterminate samples, and was evaluated to characterize the frequency of somatic genetic alterations in a broad, multicenter, US population ( Figure 3). In CLIA FNA with AUS/FLUS cytology, (Bethesda III) 9/52 (17%) of samples scored positive for a genetic alteration, similar to FNAs with FN/SFN cytology (Bethesda IV) where 4/24 samples harbored a variant (17%). Since histopathology truth was not available for any CLIA FNA, we could not compute sensitivity, specificity, NPV or PPV.
We compared the results obtained from testing for genetic alterations with results obtained from testing with the Afirma GEC, an RNA-based molecular classifier designed to rule-out malignancy with 90% sensitivity and 94-95% NPV [19]. The test reports either a GEC Benign or GEC Suspicious result. In a subset of 86 VERA FNA samples with truth labels and both GEC and genetic alteration results available, testing for genetic alterations alone identified 42% of malignant samples, compared to the GEC, which identified 91% of the true malignant samples (Figure 4). These results are consistent with previously published reports confirming high sensitivity of the GEC [19,34,[21][22][23][24][25][26]. In this cohort, testing for genetic alterations resulted in 84% specificity, compared to the GEC, which resulted in 77% specificity, higher than the 52% reported in a large, prospective, blinded multicenter study [19]. The proportion of detected genetic alterations was far higher in for the BRAF V600E mutation, and in surgical tissues with these same diagnoses, 7/12 (54%) harbored this mutation (Additional file 5). Also frequently mutated was TSHR, found in 6/75 (8%) of histology benign samples and only once in a malignant follicular carcinoma (Figures 1 and 2).
Given the rarity of most variants measured in this cohort, we investigated the effect of sample size and its impact on mutation rate estimation. At a true underlying mutation rate of 3%, 95% confidence intervals are large for sample sizes below 100 (Additional file 6), suggesting that larger cohorts would be required to achieve more precise estimates. However, the large proportion of malignant samples with no variant detected (50%), along with the reasonably high rate of benign nodules with a variant detected (20%) cannot be explained solely by an under-representation of these variants in a small sample size cohort. The variant detection rate observed in benign nodules limits the performance of the panel, in particular confers a specificity penalty that limits the panel's ability to attain both high sensitivity and high specificity.

Discussion
In this study we demonstrate that interrogation of a large number of somatic genomic alterations, as reported by TCGA [5,6] and others [28,7], is limited in its ability to detect cancer with high sensitivity when applied to a diverse set of thyroid carcinomas. Published data from TCGA shows 59.7% of their PTC tissue cohort harbored a BRAF mutation, similar to the 54% we found for the PTC tissues in our study. Despite this similarity, extension of our study to other subtypes and to FNAs reveals that of the histopathology malignant samples, 41% of tissues and 57% of FNAs scored negative for any of the 984 genetic alterations tested. These results are consistent with the observations of others [4] noting that a panel of the 17 most common genetic alterations [29] failed to detect 53% of carcinomas with Bethesda III or IV cytology. Despite significant overlap with the mutations evaluated here and in those of other smaller panels [35,36], a recent single-center study that was not blinded, reported both high sensitivity and high specificity measuring similar genetic alterations on FN/SFN [28] and/or AUS/FLUS samples [37].
While the difference in sensitivity may be partially explained by the measurement of very rare variants in these small cohorts, which is expected statistically to result in a wide range in performance, this would not explain the difference in specificity, as we detect many of the same variants in benign nodules. In addition, the impact of blinded versus unblinded histopathology truth labels, or local histopathology diagnostic trends among challenging neoplasms, may account for differing performance among similar genomic panels. Collectively, our results and those of other groups [38,29,36,35,39] do not provide sufficient evidence to support the notion that detection of known genetic alterations alone has sufficient power at this time to safely rule out malignancy in all thyroid subtypes. Additional studies in larger FNA cohorts are clearly needed to evaluate further the utility of mutational testing in ruling out malignancy.
This analysis of a broad cohort of expertly diagnosed thyroid neoplasms establishes for the first time that recent TCGA-reported oncogenic driver alterations are commonly found in nodules with benign histopathology. While we recognize that our sample cohorts are of modest size and that more accurate mutation-detection methods may be developed and applied in the future, the overall mutation frequencies observed in histopathology benign FNA (16%) and benign tissues (26%) suggest that the use of panels with increasingly larger numbers of discovered variants may not help rule out malignancy in thyroid nodules. Without proper curation, such panels may increase the frequency of variant detection in benign nodules and generate false positive results. Many markers detected in the current study have been reported to vary widely in their individual specificity to detect malignancy [39,29,40]. Unfortunately, variants with high specificity such as BRAF V600E or RET/PTC fusions are the exception rather than the rule. Somatic genetic alterations such as those found in NRAS, HRAS, and PAX/PPARG are known to occur in benign nodules [35,[41][42][43]. The long-term significance of these mutations in benign nodules has not been fully studied [44], and some have suggested that these represent pre-malignant lesions [45,46,36,35,29]. Despite this suggestion, no clinical evidence exists to demonstrate that today's mutation-positive, but histopathology-benign tumors have a clinically meaningful rate of developing into cancer, or a higher rate of malignant transformation than mutation-negative benign lesions [44]. Thus, no evidence exists that patients benefit from the identification and treatment of mutation-positive benign nodules. In contrast, there may be a role in identifying variants in GEC Suspicious nodules, as these genomic alterations may alter the extent of treatment when selected mutations are found.
Because many gene variants are also detected in our histologically benign nodules, an approach whereby any mutation-positive nodule is designated as "test-positive" has the risk of artificially and erroneously raising the frequency of suspected thyroid carcinoma, thus negatively impacting patients who could have benefited from a watch-and-wait approach. An additional concern is that a number of studies in the literature use mutation status to assign a diagnosis, bypassing blinded expert review to confirm that a mutation positive nodule was indeed histopathology malignant [36,47]. This chicken-egg conundrum can result in artificially high reported sensitivity and specificity, as truly benign nodules are declared malignant based on mutation status [36,29,47]. A recent report indicates that a diagnosis changes from benign to malignant in 30% of RAS mutated cases when the pathologist is aware that a mutation is present [47].
Our approach measures gene variants found only in expressed genes, and therefore has the advantage of capturing biological processes such as allele-specific gene expression and imprinting. For example, variants found in DNA may have little relevance to a disease if the gene harboring the variant is not expressed in the tissue being sampled. This could be due to general lack of expression in a particular tissue, or could be due to a parental allele-of-origin effect. Our finding that gene transcripts harboring these variants shows expression patterns that differ amongst thyroid subtypes is consistent with the power of this approach to capture potential biological processes (Additional file 2). Thus in contrast to measuring variants in DNA, our RNA transcriptome approach allows the detection of variants and fusions that have the potential to be biologically relevant and clinically meaningful.
As some genomic variants are modestly or highly associated with thyroid malignancy, there may be value to their detection when their presence would increase the extent of initial surgery from a hemi-thyroidectomy to a total thyroidectomy to avoid the need for a second completion thyroidectomy surgery amongst those with cancer. Alternatively, in seven studies evaluating a total of 730 patients whose thyroid nodules were tested with the highly sensitive RNA-based GEC, 49% were scored as benign [34,23,22,21,24,25]. These patients are considered for clinical follow-up in lieu of diagnostic surgery with high accuracy [26], a data-driven approach that spares unnecessary morbidity. The implementation of a genomic panel in patients destined for surgery may reduce the need for this second (completion) surgery [48], especially when positive for highly specific variants.

Conclusions
Our studies in this cohort show that evaluating thyroid nodules for genetic variants increases the risk of false positives without rescuing any GEC false negatives. Our data suggest that even when testing with a large variant panel alone, a negative mutation result does not reasonably exclude cancer. However, our data also indicates that when malignancy cannot be ruled out with variant panels, a GEC benign result can first be implemented to safely exclude patients without malignancy and patients with suspicious GEC results may then benefit from evaluation with a genomic variant panel, if this new information results in a change to clinical management. A majority of nodules with indeterminate cytology prove benign on post-operative histopathology.
Hence, tests that use genomic profiling to rule-out cancer with high sensitivity have demonstrated clinical utility in the management of cytology indeterminate nodules [19-23, 25, 24, 26]. Given the extremely low frequency of many of the variants and fusions tested here, accurate estimates of test performance will require the testing of many more patient samples and cohorts.