A benchmark for microRNA quantification algorithms using the OpenArray platform

McCall, Matthew N.; Baras, Alexander S.; Crits-Christoph, Alexander; Ingersoll, Roxann; McAlexander, Melissa A.; Witwer, Kenneth W.; Halushka, Marc K.

doi:10.1186/s12859-016-0987-8

Research Article
Open access
Published: 22 March 2016

A benchmark for microRNA quantification algorithms using the OpenArray platform

Matthew N. McCall¹,
Alexander S. Baras²,
Alexander Crits-Christoph³,
Roxann Ingersoll⁴,
Melissa A. McAlexander⁵,
Kenneth W. Witwer^5,6 &
…
Marc K. Halushka²

BMC Bioinformatics volume 17, Article number: 138 (2016) Cite this article

3811 Accesses
4 Citations
3 Altmetric
Metrics details

Abstract

Background

Several techniques have been tailored to the quantification of microRNA expression, including hybridization arrays, quantitative PCR (qPCR), and high-throughput sequencing. Each of these has certain strengths and limitations depending both on the technology itself and the algorithm used to convert raw data into expression estimates. Reliable quantification of microRNA expression is challenging in part due to the relatively low abundance and short length of the miRNAs. While substantial research has been devoted to the development of methods to quantify mRNA expression, relatively little effort has been spent on microRNA expression.

Results

In this work, we focus on the Life Technologies TaqMan OpenArray^Ⓡ system, a qPCR-based platform to measure microRNA expression. Several algorithms currently exist to estimate expression from the raw amplification data produced by qPCR-based technologies. To assess and compare the performance of these methods, we performed a set of dilution/mixture experiments to create a benchmark data set. We also developed a suite of statistical assessments that evaluate many different aspects of performance: accuracy, precision, titration response, number of complete features, limit of detection, and data quality. The benchmark data and software are freely available via two R/Bioconductor packages, miRcomp and miRcompData. Finally, we demonstrate use of our software by comparing two widely used algorithms and providing assessments for four other algorithms.

Conclusions

Benchmark data sets and software are crucial tools for the assessment and comparison of competing algorithms. We believe that the miRcomp and miRcompData packages will facilitate the development of new methodology for microRNA expression estimation.

Background

MicroRNAs (miRNAs) are a class of small (18–24 nucleotide) regulatory RNAs. They are essential regulators that act as translational repressors throughout many eukaryotic species [1]. Several thousand miRNAs have been described in humans and other species, although in practicality only 350–400 are present at robust levels in mature cells and tissues [2]. MiRNAs are known to alter their expression levels in disease, malignancy, and cell stress [3] and exhibit tissue and cell-type specific patterns of expression [4, 5].

Many expression platforms, originally designed to quantify mRNA expression, have been adapted to globally assay miRNA expression including hybridization arrays, quantitative PCR (qPCR), and sequencing [6]. However, each of these approaches must overcome several challenges specific to miRNAs: short sequence length, low abundance of target molecules, and sequence homology between miRNAs. Comparative performance assessments are crucial to understanding the strengths and limitations of each approach to miRNA quantification. A group of investigators recently systematically evaluated 12 available miRNA platforms across 20 standardized control samples [7]. This study, called miRQC, established metrics to assay reproducibility, sensitivity, accuracy, specificity and concordance across the different methods. Although a single platform was not found to be uniformly superior, there was substantial variability in performance across assessments. For each of the platforms, performance depends on both the instrument and the algorithm used to convert raw measurements into expression estimates. For example, one platform assessed in the miRQC study was RNA-seq performed on an Illumina GAIIx instrument. The sample prep used the TruSeq Small RNA Prep Kit and results were aligned to the hg19 reference sequence allowing one mismatch, without further delineation of the alignment method [7]. A previous performance evaluation of miRNA expression arrays noted a strong dependency between technology and signal processing methodology [8]. More recently, we have demonstrated that different miRNA RNA-seq alignment algorithms produce different alignments, impacting the quality of the data [9]. We surmise that many miRNA expression platforms are not yet optimized to yield consistent and maximally accurate data.

Another platform evaluated in the miRQC study was the Life Technologies TaqMan OpenArray^Ⓡ system. This is a qPCR-based miRNA array platform that currently has coverage for 754 human miRNAs across two sets of primer pools. While qPCR is considered the gold standard for low-throughput measurement of gene expression, microarray- and sequencing-based platforms are preferable for most high-throughput applications. Given the relative small number of common miRNAs, it is possible to use a qPCR-based platform to measure the expression of all abundant miRNAs in many tissues and cells.

The primary advantage of qPCR-based technologies is the ability to simultaneously amplify and quantify a target transcript over sequential PCR cycles. The greater the initial amount of the target transcript present in a sample, the more rapidly the target will reach a threshold at which it can be detected by flourescence (e.g. from amplicon-associated intercalating dyes or freed, unquenched hydrolysis probes). As such, the raw data produced by qPCR-based technologies are fluorescence signal intensities captured at the end of each amplification cycle (typically 1–40). Analysis of these data typically begins by assigning a threshold cycle number to each amplification. These threshold cycles can then be used to estimate target abundance, either relative or in reference to values for a standard curve. For example, Life Technologies provides the ExpressionSuite software package, which uses the shape of the amplification curve to estimate a relative threshold cycle and corresponding expression estimate [10]. While a substantial number of software tools have been developed to estimate gene expression from raw amplification data [11–13], these focused on mRNA rather than miRNA targets. Whether these methods perform similarly when estimating miRNA expression is an area of ongoing research.

The software presented in this manuscript provides tools to assess and compare the performance of methods to transform raw amplification data into expression estimates and determine optimal quality thresholds. While the miRQC study focused on comparing many different platforms, here we focus on a single platform but provide a much larger and more diverse data set for evaluation. We believe that the availability of these data and corresponding software will greatly accelerate the development of improved methodology for the OpenArray^Ⓡ miRNA platform. Furthermore, seamless integration with the R/Bioconductor [14, 15] suite of analysis packages will enhance the value of OpenArray^Ⓡ miRNA data. Therefore, we developed miRcomp, an R package to assess and compare microRNA expression estimation methods using a benchmark data set.

Methods

Experimental design

Selection of tissues

Two separate RNA pools were prepared by blending two tissues each: (1) kidney and placenta and (2) skeletal muscle and brain (frontal cortex). These sources of RNA were chosen based on our prior analysis of Agilent V3 miRNA array data that suggested this collection of tissues would capture a large number of microRNAs, including several unique to each sample, such as miR-133a for skeletal muscle and the chromosome 19 miRNA cluster for placenta [2].

The surgical pathology archives of the Department of Pathology at Johns Hopkins Hospital were used to obtain formalin fixed paraffin-embedded (FFPE) tissues from four distinct tissue sources. All tissues were verified as normal by review of tissue histology on an adjacent hematoxylin and eosin stained slide. These anonymized human samples were used based on an exemption from the Institutional Review Board of Johns Hopkins Hospital.

RNA extraction

We extracted RNA from FFPE sections of kidney, placenta, skeletal muscle, and brain using the AllPrep DNA/RNA FFPE protocol (Qiagen). Xylene was chosen for deparaffinization. Extra xylene and ethanol washes were performed, and DNase digestion was done on-column.

RNA quality control

Concentration of eluted RNA was assessed by NanoDrop. Due to the low quality of longer RNA molecules extracted from FFPE tissues, including the ribosomal RNAs, the presence of several ubiquitous and tissue-enriched small RNAs or miRNAs was confirmed by stem-loop reverse transcription quantitative PCR using 10 ng RNA per reaction. For example, miR-1 and miR-133a were enriched in skeletal muscle, miR-516b was enriched in placenta, and miR-200b was enriched in kidney (Additional file 7: Figure S1). RNA was stored at −80C.

Reverse transcription and pre-amplification

The kidney/placenta (KP) and skeletal muscle/brain (MB) mixtures were made by combining equal masses of kidney and placenta or skeletal muscle and frontal cortex RNA, respectively, and diluting to an equal concentration of 3.3 ng/ul. 10 ng of RNA was used as the input for reverse transcription using the A and B primer pools, following the Life Technologies OpenArray^Ⓡ protocol modification for low-concentration and FFPE RNA. Separate reverse transcription and pre-amplification reactions were performed for the Life Technologies MegaPlex Pools A and B primer pools, which reverse transcribe and pre-amplify specific microRNAs. Following pre-amplification, 30 ul from the A and B reactions for both KP and MB were mixed with 570 ul of 0.1x TE. Further dilutions and combinations of the KP and MB mixtures were then prepared. To keep the non-nucelic acid components equal after mixing KP and MB, we added a diluent C mix as needed (Fig. 1). The diluent C included the same proportions of RT buffer and Pre-Amp mix components as in the Life Technologies protocol-specified dilution of nucleic acid-containing post-pre-amp mixture. The final concentrations were 50, 40, 20, 10 and 5 and 0.5 % for each sample (Fig. 1). The sample numbers (1–10 in Fig. 1) are used throughout the manuscript to refer to specific mixture/dilution sample types.

Life technologies openArray^Ⓡ assay

Standard Human TaqMan^Ⓡ OpenArray^Ⓡ Human MicroRNA Panel, QuantStudio ^TM 12K flex chips (part number 4470187) and other necessary reagents were provided by Life Technologies for this experiment. This panel contains 754 human miRNA sequences from miRBase v14 which have all been previously functionally validated with miRNA artificial templates. For conversion of notation from miRBase v14 style to current miRNA style, the webtool miRiadne can be used [16]. The specially prepared post-pre-amp dilution mixtures were added to the sample plates and then loaded onto the chips using the Accufill robot following the standard protocols (Life Technologies part number 4461306 Rev. B). A modified MicroRNA.edt file, provided by Life Technologies, was used to extend the cycles from the standard 40 to 46 cycles. This was done to make sure all amplifications went to completion, as the authors noted that some microRNA amplicons had not reached their maximal intensity at 40 cycles, causing a slight left shift to lower Crt values in prior experiments. The additional cycles do not increase the detection limit of the system. Three samples on one chip (the first replicate from sample types 1, 3, and 9) were run using the standard MicroRNA.edt file provided with the instrument, due to human error. This did not have a noticeable effect on the expression estimates from any of the algorithms. Additional information on the TaqMan^Ⓡ OpenArray^Ⓡ MicroRNA Panels can be found in the technical manual (Additional file 6).

Expression estimation algorithms

There are a wide variety of algorithms available to estimate expression from qPCR amplification curves. To facilitate comparisons between these algorithms, we have applied many of these algorithms to our benchmark data set. The resulting expression estimates and quality scores are available as data objects in the miRcomp package.

Specifically, we provide expression estimates from the following methods:

LifeTech ExpressionSuite
4 parameter sigmoidal model (b4)
5 parameter sigmoidal model (b5)
4 parameter log sigmoidal model (l4)
5 parameter log sigmoidal model (l5)
Linear exponential model (linexp)

Additionally, the raw amplification data are available in the miRcompData package allowing researchers to easily generate expression estimates using other current or future algorithms.

Statistical assessments

The primary goal of the mixture/dilution experiment described above is to provide a benchmark data set with which to assess the performance of methods that estimate miRNA expression from qPCR amplification curves. Specifically, we propose assessments of accuracy, precision, data quality, titration response, limit of detection, and number of complete features. Each of these is described in detail below. To avoid any confusion due to naming conventions (expression estimates from amplification curves have been called Ct values, Crt values, and Cq values to name a few), we refer to the reported values as expression estimates or simply expression.

Quality scores

When estimating expression from amplification data, it is crucial for methods to provide both an expression estimate and a corresponding quality score. These quality scores are often used to filter, flag, or down-weight poor quality expression estimates in subsequent analyses. The qualityAssessment function in the miRcomp package allows one to examine the relationship between quality scores and expression estimates, the distribution of quality scores across samples, and the relationship between quality scores from two different methods.

Expression comparison

When comparing two methods, a natural starting point is to compare the expression estimates produced by each method. By examining the features and samples for which expression estimates differ substantially, one can better understand the strengths and limitations of each method. The expressionComp function in the miRcomp package allows one to examine the relationship between expression estimates produced by two different methods. Feature/sample combinations for which the expression estimates differ by more than a given threshold are flagged for further investigation.

Complete features

A measure of the amount of readily usable data produced by a method is the number of complete features (here miRNAs). Complete features are defined as detected (non-NA expression estimate) and of good quality (above a given threshold) across all samples in a given experiment. The completeFeatures function allows one to assess a single method or compare two methods.

Limit of detection

The limit of detection is an estimate of the smallest signal that can be reliably measured. We propose assessing the limit of detection in two ways: (1) examining the distribution of average observed expression stratified by the proportion of values within a set of replicates that are good quality, and (2) comparing the average observed vs expected expression in the two low input sample types (9 & 10). The expected expression for both low input sample types (9 &10) can be calculated based on the pure sample types (1 & 5) or, in the case of the 0.01/0.01 dilution (sample type 10), it can be calculated based on the expression in the 0.1/0.1 dilution (sample type 9). Visual representations of these comparisons are produced by the limitOfDetection function.

The limitOfDetection function also reports several potential limits of detection based on each of the following comparisons:

1.
Average observed expression in the 0.1/0.1 dilution samples (sample type 9) vs expected expression based on the pure samples (sample types 1 & 5).
2.
Average observed expression in the 0.01/0.01 dilution samples (sample type 10) vs expected expression based on the pure samples (sample types 1 & 5).
3.
Average observed expression in the 0.01/0.01 dilution samples (sample type 10) vs expected expression based on the 0.1/0.1 dilution samples (sample type 9).

For each of these comparisons, we calculate the difference between the observed and expected expression estimates. To assess the limit of detection, we compute the expression threshold such that the median difference (between observed and expected) of all features exceeding that threshold is equal to a predetermined tolerance. The limitOfDetection returns these potential limits of detection for each comparison and three tolerances (0.5, 0.75, and 1.00).

Titration response

The titration response is defined as the ability of a method to produce monotone increasing expression estimates in response to increasing amounts of input RNA. We consider sample types 2–4 and 6–8 as two separate titration series. In each of these series, one mixture component is held constant at 80 μl and the other is doubled twice from 16 μl to 32 μl to 64 μl. Because this response will depend heavily on the underlying expression of a given feature in each mixture component, the titration response is stratified by the difference in expression between the component being titrated and the component being held constant. For example, in the sample type 2–4 titration series, mixture component A is held constant and mixture component B is titrated. To assess the difference in expression between mixture components A and B, we use the expression estimates in the pure sample types: sample type 1 (pure A) and sample type 5 (pure B).

Accuracy

To assess accuracy, we calculate the signal detect slope, defined as the slope of the regression line of observed expression on expected expression, for the two titration series (sample types 2–4 &6–8). The ideal signal detect slope is one, representing agreement between observed and expected expression. The signal detect slopes are stratified by pure sample expression. A signal detect slope captures the average relationship between observed and expected expression; however, some features may perform well on average but be highly variable. In the plots produced, features are displayed in grey if the signal detect slope is not statistically significantly different from zero (p-value <0.05). As such, a grey point corresponding to a signal detect slope well above zero represents a particularly noisy (large residual variance) response.

Precision

To assess precision, we calculate both the within-replicate standard deviation and coefficient of variation (the within-replicate standard deviation divided by the within-replicate mean). Both statistics are calculated for each set of replicates (unique feature/sample type combinations) that are of acceptable quality. For both summaries, the values are stratified by the average observed expression.

Software

Software implementing the assessments described in this manuscript was written in the open-source statistical language R (v3.2.1) [14]. The R software package, miRcomp, and the R data package, miRcompData, are available as part of the Bioconductor project [15] (v3.2 and later), a collaborative effort to develop software for computational biology and bioinformatics. In addition to the primary functionality described above, the miRcomp package contains many additional options for customizable use of these assessment functions. These are described in the miRcomp package vignette (included here as Additional file 1).

Results

In the following, we compare two methods to generate expression estimates and quality scores from raw miRNA qPCR amplification data. The first method is an algorithm developed by Life Technologies and implemented in the ExpressionSuite software package. This software package produces estimates of expression (called Crt values) and a measure of quality (called the AmpScore). The second method is a four-parameter log-sigmoid curve-fitting algorithm [17] implemented as the default method in the qpcR R package [18] and referred to in this manuscript as simply qpcR. This open-source R package produces expression estimates by fitting a four parameter log sigmoidal curve to the amplification data and computing the point at which the second derivative of this curve is maximized (cpD2 method) [19] and a measure of quality (the R ² from the model fit).

Four additional algorithms (see Methods) were applied to the benchmark data set, and the resulting expression estimates and quality scores are available in the miRcomp R package. For clarity of presentation in this manuscript, we will focus on comparing two widely-used algorithms, the default algorithms from Life Technologies and the qpcR R package, in the following results.

Quality assessment

Given the interdependence between the expression estimates and quality scores produced by a method, we begin by examining this relationship for each method (Fig. 2). As one might expect, quality scores decrease as the expression estimates increase (recall that for qPCR based technologies, a higher expression value corresponds to fewer copies of the target transcript). Another feature of note is that both methods occasionally fail to produce an expression estimate (denoted as NA in Fig. 2). However, while the qpcR method assigns all of these values fairly low quality (Fig. 2 b), the Life Technologies method produces a substantial number of NA expression estimates with high quality scores (Fig. 2 a).

When comparing two methods, it is also interesting to examine the relationship between the quality scores produced by each method (Fig. 3). Examination of this figure highlights regions of consensus high quality (upper right) and consensus low quality (lower left) as well as regions of disagreement between the methods (upper left and lower right). While there are relatively few data points that are estimated with a high quality AmpScore and low quality R ², there are a substantial number of high quality R ² and low quality AmpScore data points.

Taken together, Figs. 2 & 3 suggest quality thresholds of AmpScore=1.25 and R ²=0.99. While we will use these thresholds throughout the remainder of this manuscript, all functions in the miRcomp package allow the user to set their own quality thresholds. Furthermore, for many functions, one can compare results from a single method using two different quality thresholds to examine the effect of changing the quality threshold on each assessment. Lastly, when comparing two methods, we typically restrict the assessment to data considered to be good quality by both methods. This provides the most direct comparison between the expression estimates produced by the two methods; however, for many of these assessments, the miRcomp package allows one to perform these comparisons using each method’s own quality assessment independently.