Experimental design
Selection of tissues
Two separate RNA pools were prepared by blending two tissues each: (1) kidney and placenta and (2) skeletal muscle and brain (frontal cortex). These sources of RNA were chosen based on our prior analysis of Agilent V3 miRNA array data that suggested this collection of tissues would capture a large number of microRNAs, including several unique to each sample, such as miR-133a for skeletal muscle and the chromosome 19 miRNA cluster for placenta [2].
The surgical pathology archives of the Department of Pathology at Johns Hopkins Hospital were used to obtain formalin fixed paraffin-embedded (FFPE) tissues from four distinct tissue sources. All tissues were verified as normal by review of tissue histology on an adjacent hematoxylin and eosin stained slide. These anonymized human samples were used based on an exemption from the Institutional Review Board of Johns Hopkins Hospital.
RNA extraction
We extracted RNA from FFPE sections of kidney, placenta, skeletal muscle, and brain using the AllPrep DNA/RNA FFPE protocol (Qiagen). Xylene was chosen for deparaffinization. Extra xylene and ethanol washes were performed, and DNase digestion was done on-column.
RNA quality control
Concentration of eluted RNA was assessed by NanoDrop. Due to the low quality of longer RNA molecules extracted from FFPE tissues, including the ribosomal RNAs, the presence of several ubiquitous and tissue-enriched small RNAs or miRNAs was confirmed by stem-loop reverse transcription quantitative PCR using 10 ng RNA per reaction. For example, miR-1 and miR-133a were enriched in skeletal muscle, miR-516b was enriched in placenta, and miR-200b was enriched in kidney (Additional file 7: Figure S1). RNA was stored at −80C.
Reverse transcription and pre-amplification
The kidney/placenta (KP) and skeletal muscle/brain (MB) mixtures were made by combining equal masses of kidney and placenta or skeletal muscle and frontal cortex RNA, respectively, and diluting to an equal concentration of 3.3 ng/ul. 10 ng of RNA was used as the input for reverse transcription using the A and B primer pools, following the Life Technologies OpenArrayⓇ protocol modification for low-concentration and FFPE RNA. Separate reverse transcription and pre-amplification reactions were performed for the Life Technologies MegaPlex Pools A and B primer pools, which reverse transcribe and pre-amplify specific microRNAs. Following pre-amplification, 30 ul from the A and B reactions for both KP and MB were mixed with 570 ul of 0.1x TE. Further dilutions and combinations of the KP and MB mixtures were then prepared. To keep the non-nucelic acid components equal after mixing KP and MB, we added a diluent C mix as needed (Fig. 1). The diluent C included the same proportions of RT buffer and Pre-Amp mix components as in the Life Technologies protocol-specified dilution of nucleic acid-containing post-pre-amp mixture. The final concentrations were 50, 40, 20, 10 and 5 and 0.5 % for each sample (Fig. 1). The sample numbers (1–10 in Fig. 1) are used throughout the manuscript to refer to specific mixture/dilution sample types.
Life technologies openArrayⓇ assay
Standard Human TaqManⓇ OpenArrayⓇ Human MicroRNA Panel, QuantStudio TM 12K flex chips (part number 4470187) and other necessary reagents were provided by Life Technologies for this experiment. This panel contains 754 human miRNA sequences from miRBase v14 which have all been previously functionally validated with miRNA artificial templates. For conversion of notation from miRBase v14 style to current miRNA style, the webtool miRiadne can be used [16]. The specially prepared post-pre-amp dilution mixtures were added to the sample plates and then loaded onto the chips using the Accufill robot following the standard protocols (Life Technologies part number 4461306 Rev. B). A modified MicroRNA.edt file, provided by Life Technologies, was used to extend the cycles from the standard 40 to 46 cycles. This was done to make sure all amplifications went to completion, as the authors noted that some microRNA amplicons had not reached their maximal intensity at 40 cycles, causing a slight left shift to lower Crt values in prior experiments. The additional cycles do not increase the detection limit of the system. Three samples on one chip (the first replicate from sample types 1, 3, and 9) were run using the standard MicroRNA.edt file provided with the instrument, due to human error. This did not have a noticeable effect on the expression estimates from any of the algorithms. Additional information on the TaqManⓇ OpenArrayⓇ MicroRNA Panels can be found in the technical manual (Additional file 6).
Expression estimation algorithms
There are a wide variety of algorithms available to estimate expression from qPCR amplification curves. To facilitate comparisons between these algorithms, we have applied many of these algorithms to our benchmark data set. The resulting expression estimates and quality scores are available as data objects in the miRcomp package.
Specifically, we provide expression estimates from the following methods:
-
LifeTech ExpressionSuite
-
4 parameter sigmoidal model (b4)
-
5 parameter sigmoidal model (b5)
-
4 parameter log sigmoidal model (l4)
-
5 parameter log sigmoidal model (l5)
-
Linear exponential model (linexp)
Additionally, the raw amplification data are available in the miRcompData package allowing researchers to easily generate expression estimates using other current or future algorithms.
Statistical assessments
The primary goal of the mixture/dilution experiment described above is to provide a benchmark data set with which to assess the performance of methods that estimate miRNA expression from qPCR amplification curves. Specifically, we propose assessments of accuracy, precision, data quality, titration response, limit of detection, and number of complete features. Each of these is described in detail below. To avoid any confusion due to naming conventions (expression estimates from amplification curves have been called Ct values, Crt values, and Cq values to name a few), we refer to the reported values as expression estimates or simply expression.
Quality scores
When estimating expression from amplification data, it is crucial for methods to provide both an expression estimate and a corresponding quality score. These quality scores are often used to filter, flag, or down-weight poor quality expression estimates in subsequent analyses. The qualityAssessment function in the miRcomp package allows one to examine the relationship between quality scores and expression estimates, the distribution of quality scores across samples, and the relationship between quality scores from two different methods.
Expression comparison
When comparing two methods, a natural starting point is to compare the expression estimates produced by each method. By examining the features and samples for which expression estimates differ substantially, one can better understand the strengths and limitations of each method. The expressionComp function in the miRcomp package allows one to examine the relationship between expression estimates produced by two different methods. Feature/sample combinations for which the expression estimates differ by more than a given threshold are flagged for further investigation.
Complete features
A measure of the amount of readily usable data produced by a method is the number of complete features (here miRNAs). Complete features are defined as detected (non-NA expression estimate) and of good quality (above a given threshold) across all samples in a given experiment. The completeFeatures function allows one to assess a single method or compare two methods.
Limit of detection
The limit of detection is an estimate of the smallest signal that can be reliably measured. We propose assessing the limit of detection in two ways: (1) examining the distribution of average observed expression stratified by the proportion of values within a set of replicates that are good quality, and (2) comparing the average observed vs expected expression in the two low input sample types (9 & 10). The expected expression for both low input sample types (9 &10) can be calculated based on the pure sample types (1 & 5) or, in the case of the 0.01/0.01 dilution (sample type 10), it can be calculated based on the expression in the 0.1/0.1 dilution (sample type 9). Visual representations of these comparisons are produced by the limitOfDetection function.
The limitOfDetection function also reports several potential limits of detection based on each of the following comparisons:
-
1.
Average observed expression in the 0.1/0.1 dilution samples (sample type 9) vs expected expression based on the pure samples (sample types 1 & 5).
-
2.
Average observed expression in the 0.01/0.01 dilution samples (sample type 10) vs expected expression based on the pure samples (sample types 1 & 5).
-
3.
Average observed expression in the 0.01/0.01 dilution samples (sample type 10) vs expected expression based on the 0.1/0.1 dilution samples (sample type 9).
For each of these comparisons, we calculate the difference between the observed and expected expression estimates. To assess the limit of detection, we compute the expression threshold such that the median difference (between observed and expected) of all features exceeding that threshold is equal to a predetermined tolerance. The limitOfDetection returns these potential limits of detection for each comparison and three tolerances (0.5, 0.75, and 1.00).
Titration response
The titration response is defined as the ability of a method to produce monotone increasing expression estimates in response to increasing amounts of input RNA. We consider sample types 2–4 and 6–8 as two separate titration series. In each of these series, one mixture component is held constant at 80 μl and the other is doubled twice from 16 μl to 32 μl to 64 μl. Because this response will depend heavily on the underlying expression of a given feature in each mixture component, the titration response is stratified by the difference in expression between the component being titrated and the component being held constant. For example, in the sample type 2–4 titration series, mixture component A is held constant and mixture component B is titrated. To assess the difference in expression between mixture components A and B, we use the expression estimates in the pure sample types: sample type 1 (pure A) and sample type 5 (pure B).
Accuracy
To assess accuracy, we calculate the signal detect slope, defined as the slope of the regression line of observed expression on expected expression, for the two titration series (sample types 2–4 &6–8). The ideal signal detect slope is one, representing agreement between observed and expected expression. The signal detect slopes are stratified by pure sample expression. A signal detect slope captures the average relationship between observed and expected expression; however, some features may perform well on average but be highly variable. In the plots produced, features are displayed in grey if the signal detect slope is not statistically significantly different from zero (p-value <0.05). As such, a grey point corresponding to a signal detect slope well above zero represents a particularly noisy (large residual variance) response.
Precision
To assess precision, we calculate both the within-replicate standard deviation and coefficient of variation (the within-replicate standard deviation divided by the within-replicate mean). Both statistics are calculated for each set of replicates (unique feature/sample type combinations) that are of acceptable quality. For both summaries, the values are stratified by the average observed expression.
Software
Software implementing the assessments described in this manuscript was written in the open-source statistical language R (v3.2.1) [14]. The R software package, miRcomp, and the R data package, miRcompData, are available as part of the Bioconductor project [15] (v3.2 and later), a collaborative effort to develop software for computational biology and bioinformatics. In addition to the primary functionality described above, the miRcomp package contains many additional options for customizable use of these assessment functions. These are described in the miRcomp package vignette (included here as Additional file 1).