A benchmark for microRNA quantification algorithms using the OpenArray platform

Background Several techniques have been tailored to the quantification of microRNA expression, including hybridization arrays, quantitative PCR (qPCR), and high-throughput sequencing. Each of these has certain strengths and limitations depending both on the technology itself and the algorithm used to convert raw data into expression estimates. Reliable quantification of microRNA expression is challenging in part due to the relatively low abundance and short length of the miRNAs. While substantial research has been devoted to the development of methods to quantify mRNA expression, relatively little effort has been spent on microRNA expression. Results In this work, we focus on the Life Technologies TaqMan OpenArrayⓇ system, a qPCR-based platform to measure microRNA expression. Several algorithms currently exist to estimate expression from the raw amplification data produced by qPCR-based technologies. To assess and compare the performance of these methods, we performed a set of dilution/mixture experiments to create a benchmark data set. We also developed a suite of statistical assessments that evaluate many different aspects of performance: accuracy, precision, titration response, number of complete features, limit of detection, and data quality. The benchmark data and software are freely available via two R/Bioconductor packages, miRcomp and miRcompData. Finally, we demonstrate use of our software by comparing two widely used algorithms and providing assessments for four other algorithms. Conclusions Benchmark data sets and software are crucial tools for the assessment and comparison of competing algorithms. We believe that the miRcomp and miRcompData packages will facilitate the development of new methodology for microRNA expression estimation. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0987-8) contains supplementary material, which is available to authorized users.

Many expression platforms, originally designed to quantify mRNA expression, have been adapted to globally assay miRNA expression including hybridization arrays, quantitative PCR (qPCR), and sequencing [6]. However, each of these approaches must overcome several challenges specific to miRNAs: short sequence length, low abundance of target molecules, and sequence homology between miRNAs. Comparative performance assessments are crucial to understanding the strengths and limitations of each approach to miRNA quantification. A group of investigators recently systematically evaluated 12 available miRNA platforms across 20 standardized control samples [7]. This study, called miRQC, established metrics to assay reproducibility, sensitivity, accuracy, specificity and concordance across the different methods. Although a single platform was not found to be uniformly superior, there was substantial variability in performance across assessments. For each of the platforms, performance depends on both the instrument and the algorithm used to convert raw measurements into expression estimates. For example, one platform assessed in the miRQC study was RNA-seq performed on an Illumina GAIIx instrument. The sample prep used the TruSeq Small RNA Prep Kit and results were aligned to the hg19 reference sequence allowing one mismatch, without further delineation of the alignment method [7]. A previous performance evaluation of miRNA expression arrays noted a strong dependency between technology and signal processing methodology [8]. More recently, we have demonstrated that different miRNA RNA-seq alignment algorithms produce different alignments, impacting the quality of the data [9]. We surmise that many miRNA expression platforms are not yet optimized to yield consistent and maximally accurate data.
Another platform evaluated in the miRQC study was the Life Technologies TaqMan OpenArray system. This is a qPCR-based miRNA array platform that currently has coverage for 754 human miRNAs across two sets of primer pools. While qPCR is considered the gold standard for low-throughput measurement of gene expression, microarray-and sequencing-based platforms are preferable for most high-throughput applications. Given the relative small number of common miRNAs, it is possible to use a qPCR-based platform to measure the expression of all abundant miRNAs in many tissues and cells.
The primary advantage of qPCR-based technologies is the ability to simultaneously amplify and quantify a target transcript over sequential PCR cycles. The greater the initial amount of the target transcript present in a sample, the more rapidly the target will reach a threshold at which it can be detected by flourescence (e.g. from amplicon-associated intercalating dyes or freed, unquenched hydrolysis probes). As such, the raw data produced by qPCR-based technologies are fluorescence signal intensities captured at the end of each amplification cycle (typically 1-40). Analysis of these data typically begins by assigning a threshold cycle number to each amplification. These threshold cycles can then be used to estimate target abundance, either relative or in reference to values for a standard curve. For example, Life Technologies provides the ExpressionSuite software package, which uses the shape of the amplification curve to estimate a relative threshold cycle and corresponding expression estimate [10]. While a substantial number of software tools have been developed to estimate gene expression from raw amplification data [11][12][13], these focused on mRNA rather than miRNA targets. Whether these methods perform similarly when estimating miRNA expression is an area of ongoing research.
The software presented in this manuscript provides tools to assess and compare the performance of methods to transform raw amplification data into expression estimates and determine optimal quality thresholds. While the miRQC study focused on comparing many different platforms, here we focus on a single platform but provide a much larger and more diverse data set for evaluation. We believe that the availability of these data and corresponding software will greatly accelerate the development of improved methodology for the OpenArray miRNA platform. Furthermore, seamless integration with the R/Bioconductor [14,15] suite of analysis packages will enhance the value of OpenArray miRNA data. Therefore, we developed miRcomp, an R package to assess and compare microRNA expression estimation methods using a benchmark data set.

Selection of tissues
Two separate RNA pools were prepared by blending two tissues each: (1) kidney and placenta and (2) skeletal muscle and brain (frontal cortex). These sources of RNA were chosen based on our prior analysis of Agilent V3 miRNA array data that suggested this collection of tissues would capture a large number of microRNAs, including several unique to each sample, such as miR-133a for skeletal muscle and the chromosome 19 miRNA cluster for placenta [2].
The surgical pathology archives of the Department of Pathology at Johns Hopkins Hospital were used to obtain formalin fixed paraffin-embedded (FFPE) tissues from four distinct tissue sources. All tissues were verified as normal by review of tissue histology on an adjacent hematoxylin and eosin stained slide. These anonymized human samples were used based on an exemption from the Institutional Review Board of Johns Hopkins Hospital.

RNA extraction
We extracted RNA from FFPE sections of kidney, placenta, skeletal muscle, and brain using the AllPrep DNA/RNA FFPE protocol (Qiagen). Xylene was chosen for deparaffinization. Extra xylene and ethanol washes were performed, and DNase digestion was done oncolumn.

RNA quality control
Concentration of eluted RNA was assessed by NanoDrop. Due to the low quality of longer RNA molecules extracted from FFPE tissues, including the ribosomal RNAs, the presence of several ubiquitous and tissue-enriched small RNAs or miRNAs was confirmed by stem-loop reverse transcription quantitative PCR using 10 ng RNA per reaction. For example, miR-1 and miR-133a were enriched in skeletal muscle, miR-516b was enriched in placenta, and miR-200b was enriched in kidney (Additional file 7: Figure S1). RNA was stored at −80C.

Reverse transcription and pre-amplification
The kidney/placenta (KP) and skeletal muscle/brain (MB) mixtures were made by combining equal masses of kidney and placenta or skeletal muscle and frontal cortex RNA, respectively, and diluting to an equal concentration of 3.3 ng/ul. 10 ng of RNA was used as the input for reverse transcription using the A and B primer pools, following the Life Technologies OpenArray protocol modification for low-concentration and FFPE RNA. Separate reverse transcription and pre-amplification reactions were performed for the Life Technologies MegaPlex Pools A and B primer pools, which reverse transcribe and pre-amplify specific microRNAs. Following pre-amplification, 30 ul from the A and B reactions for both KP and MB were mixed with 570 ul of 0.1x TE. Further dilutions and combinations of the KP and MB mixtures were then prepared. To keep the non-nucelic acid components equal after mixing KP and MB, we added a diluent C mix as needed (Fig. 1). The diluent C included the same proportions of RT buffer and Pre-Amp mix components as in the Life Technologies protocol-specified dilution of nucleic acid-containing post-pre-amp mixture. The final concentrations were 50, 40, 20, 10 and 5 and 0.5 % for each sample (Fig. 1). The sample numbers (1-10 in Fig. 1) are used throughout the manuscript to refer to specific mixture/dilution sample types.

Standard Human TaqMan
OpenArray Human MicroRNA Panel, QuantStudio TM 12K flex chips (part number 4470187) and other necessary reagents were provided by Life Technologies for this experiment. This panel contains 754 human miRNA sequences from miRBase v14 which have all been previously functionally validated with miRNA artificial templates. For conversion of notation from miRBase v14 style to current miRNA style, the webtool miRiadne can be used [16]. The specially prepared post-pre-amp dilution mixtures were added to the sample plates and then loaded onto the chips using the Accufill robot following the standard protocols (Life Technologies part number 4461306 Rev. B). A modified MicroRNA .edt file, provided by Life Technologies, was used to extend the cycles from the standard 40 to 46 cycles. This was done to make sure all amplifications went to completion, as the authors noted that some microRNA amplicons had not reached their maximal intensity at 40 cycles, causing a slight left shift to lower Crt values in prior experiments. The additional cycles do not increase the detection limit of the system. Three samples on one chip (the first replicate from sample types 1, 3, and 9) Fig. 1 Two RNA pools were formed by blending two tissues each: (A) kidney and placenta and (B) skeletal muscle and brain. These pools were combined in different mixtures and dilutions to form the 10 sample types shown here. Gene expression in each sample type was assessed in quadruplicate (for a total of 40 samples) using the Life Technologies OpenArray platform modified to perform 46 amplification cycles were run using the standard MicroRNA .edt file provided with the instrument, due to human error. This did not have a noticeable effect on the expression estimates from any of the algorithms. Additional information on the TaqMan OpenArray MicroRNA Panels can be found in the technical manual (Additional file 6).

Expression estimation algorithms
There are a wide variety of algorithms available to estimate expression from qPCR amplification curves. To facilitate comparisons between these algorithms, we have applied many of these algorithms to our benchmark data set. The resulting expression estimates and quality scores are available as data objects in the miRcomp package.
Specifically, we provide expression estimates from the following methods: • LifeTech ExpressionSuite Additionally, the raw amplification data are available in the miRcompData package allowing researchers to easily generate expression estimates using other current or future algorithms.

Statistical assessments
The primary goal of the mixture/dilution experiment described above is to provide a benchmark data set with which to assess the performance of methods that estimate miRNA expression from qPCR amplification curves. Specifically, we propose assessments of accuracy, precision, data quality, titration response, limit of detection, and number of complete features. Each of these is described in detail below. To avoid any confusion due to naming conventions (expression estimates from amplification curves have been called Ct values, Crt values, and Cq values to name a few), we refer to the reported values as expression estimates or simply expression.

Quality scores
When estimating expression from amplification data, it is crucial for methods to provide both an expression estimate and a corresponding quality score. These quality scores are often used to filter, flag, or down-weight poor quality expression estimates in subsequent analyses. The qualityAssessment function in the miRcomp package allows one to examine the relationship between quality scores and expression estimates, the distribution of quality scores across samples, and the relationship between quality scores from two different methods.

Expression comparison
When comparing two methods, a natural starting point is to compare the expression estimates produced by each method. By examining the features and samples for which expression estimates differ substantially, one can better understand the strengths and limitations of each method. The expressionComp function in the miRcomp package allows one to examine the relationship between expression estimates produced by two different methods. Feature/sample combinations for which the expression estimates differ by more than a given threshold are flagged for further investigation.

Complete features
A measure of the amount of readily usable data produced by a method is the number of complete features (here miRNAs). Complete features are defined as detected (non-NA expression estimate) and of good quality (above a given threshold) across all samples in a given experiment. The completeFeatures function allows one to assess a single method or compare two methods.

Limit of detection
The limit of detection is an estimate of the smallest signal that can be reliably measured. We propose assessing the limit of detection in two ways: (1) examining the distribution of average observed expression stratified by the proportion of values within a set of replicates that are good quality, and (2) comparing the average observed vs expected expression in the two low input sample types (9 & 10). The expected expression for both low input sample types (9 &10) can be calculated based on the pure sample types (1 & 5) or, in the case of the 0.01/0.01 dilution (sample type 10), it can be calculated based on the expression in the 0.1/0.1 dilution (sample type 9). Visual representations of these comparisons are produced by the limitOfDetection function.
The limitOfDetection function also reports several potential limits of detection based on each of the following comparisons: For each of these comparisons, we calculate the difference between the observed and expected expression estimates. To assess the limit of detection, we compute the expression threshold such that the median difference (between observed and expected) of all features exceeding that threshold is equal to a predetermined tolerance. The limitOfDetection returns these potential limits of detection for each comparison and three tolerances (0.5, 0.75, and 1.00).

Titration response
The titration response is defined as the ability of a method to produce monotone increasing expression estimates in response to increasing amounts of input RNA. We consider sample types 2-4 and 6-8 as two separate titration series. In each of these series, one mixture component is held constant at 80 μl and the other is doubled twice from 16 μl to 32 μl to 64 μl. Because this response will depend heavily on the underlying expression of a given feature in each mixture component, the titration response is stratified by the difference in expression between the component being titrated and the component being held constant. For example, in the sample type 2-4 titration series, mixture component A is held constant and mixture component B is titrated. To assess the difference in expression between mixture components A and B, we use the expression estimates in the pure sample types: sample type 1 (pure A) and sample type 5 (pure B).

Accuracy
To assess accuracy, we calculate the signal detect slope, defined as the slope of the regression line of observed expression on expected expression, for the two titration series (sample types 2-4 &6-8). The ideal signal detect slope is one, representing agreement between observed and expected expression. The signal detect slopes are stratified by pure sample expression. A signal detect slope captures the average relationship between observed and expected expression; however, some features may perform well on average but be highly variable. In the plots produced, features are displayed in grey if the signal detect slope is not statistically significantly different from zero (p-value < 0.05). As such, a grey point corresponding to a signal detect slope well above zero represents a particularly noisy (large residual variance) response.

Precision
To assess precision, we calculate both the within-replicate standard deviation and coefficient of variation (the within-replicate standard deviation divided by the withinreplicate mean). Both statistics are calculated for each set of replicates (unique feature/sample type combinations) that are of acceptable quality. For both summaries, the values are stratified by the average observed expression.

Software
Software implementing the assessments described in this manuscript was written in the open-source statistical language R (v3.2.1) [14]. The R software package, miRcomp, and the R data package, miRcompData, are available as part of the Bioconductor project [15] (v3.2 and later), a collaborative effort to develop software for computational biology and bioinformatics. In addition to the primary functionality described above, the miRcomp package contains many additional options for customizable use of these assessment functions. These are described in the miRcomp package vignette (included here as Additional file 1).

Results
In the following, we compare two methods to generate expression estimates and quality scores from raw miRNA qPCR amplification data. The first method is an algorithm developed by Life Technologies and implemented in the ExpressionSuite software package. This software package produces estimates of expression (called Crt values) and a measure of quality (called the AmpScore). The second method is a four-parameter log-sigmoid curve-fitting algorithm [17] implemented as the default method in the qpcR R package [18] and referred to in this manuscript as simply qpcR. This open-source R package produces expression estimates by fitting a four parameter log sigmoidal curve to the amplification data and computing the point at which the second derivative of this curve is maximized (cpD2 method) [19] and a measure of quality (the R 2 from the model fit).
Four additional algorithms (see Methods) were applied to the benchmark data set, and the resulting expression estimates and quality scores are available in the miRcomp R package. For clarity of presentation in this manuscript, we will focus on comparing two widely-used algorithms, the default algorithms from Life Technologies and the qpcR R package, in the following results.

Quality assessment
Given the interdependence between the expression estimates and quality scores produced by a method, we begin by examining this relationship for each method (Fig. 2). As one might expect, quality scores decrease as the expression estimates increase (recall that for qPCR based technologies, a higher expression value corresponds to fewer copies of the target transcript). Another feature of note is that both methods occasionally fail to produce an expression estimate (denoted as NA in Fig. 2). However, while the qpcR method assigns all of these values fairly low quality (Fig. 2b), the Life Technologies method produces a substantial number of NA expression estimates with high quality scores (Fig. 2a).
When comparing two methods, it is also interesting to examine the relationship between the quality scores produced by each method (Fig. 3). Examination of this figure highlights regions of consensus high quality (upper right) and consensus low quality (lower left) as well as regions of disagreement between the methods (upper left and lower right). While there are relatively few data points that are estimated with a high quality AmpScore and low quality R 2 , there are a substantial number of high quality R 2 and low quality AmpScore data points.
Taken together, Figs. 2 & 3 suggest quality thresholds of AmpScore = 1.25 and R 2 = 0.99. While we will use these thresholds throughout the remainder of this manuscript, all functions in the miRcomp package allow the user to set their own quality thresholds. Furthermore, Each point represents the quality values for a unique miRNA/sample combination. Two-dimensional scatter-plot smoothing is used to avoid over-plotting and convey the distribution of points across the plotting region for many functions, one can compare results from a single method using two different quality thresholds to examine the effect of changing the quality threshold on each assessment. Lastly, when comparing two methods, we typically restrict the assessment to data considered to be good quality by both methods. This provides the most direct comparison between the expression estimates produced by the two methods; however, for many of these assessments, the miRcomp package allows one to perform these comparisons using each method's own quality assessment independently.

Expression comparison
When comparing two methods, it is also interesting to examine the relationship between the expression estimates produced by each method (Fig. 4a). Overall, the expression estimates produced by the Life Technologies and qpcR methods are quite similar; however, one feature/sample shows a substantial difference between methods (miR-155 in sample KW10:2). One can investigate this difference further by examining the raw amplification data provided in the miRcompData package. For comparison, we selected a feature from the same sample for which the two methods were approximate agreement (miR-29a in sample KW10:2). The amplification curve for miR-155 (Fig. 4b) shows the probable cause of the discrepency between expression estimates -the amplification curve is Note that the amplification curve is still increasing at cycle 46 for miR-155 whereas for miR-29a is has leveled off still increasing at the maximum observed cycle. In contrast, the amplification curve for miR-29a (Fig. 4c) has clearly reached a plateau before the maximum cycle. This results in expression estimates that are fairly similar across methods.

Complete features
Given the difficulty in measuring the expression of many miRNAs, we examine the number of complete features (good quality across all 40 samples), partial features (good quality in at least one but not all samples), and absent features (poor quality in all 40 samples). Examining the number of features that fall into each of these  (Table 1) provides a straightforward assessment of the amount of easily usable data produced by each method. Of the 754 features in these data, there are 165 complete features using the Life Technologies method and 251 complete features using the qpcR method; however, this may simply be a result of the choice of quality threshold. Of note, there are 162 complete features in common between both methods and 103 features that are considered absent by both methods. The latter may represent miRNAs that are not expressed in any of the four tissues included in these data.

Limit of detection
Next, we examine the limit of detection for a given method. While this is related to the previous assessment of complete features, here the focus is on determining the minimum signal that can be reliably detected. The first assessment compares the difference between observed and expected expression vs the expected expression based on three different comparisons (Fig. 5). One can visually assess the limit of detection by identifying the expected expression value (x-axis) at which the expression difference (y-axis) begins to diverge from zero and the data are of increasingly poor quality. A quantitative version of this approach is described in the Methods Section. Potential limits of detection for each comparison and each  Table 2. In general, the three comparisons produce similar limits of detection for a given tolerance. An alternative approach to assess the limit of detection is to examine the distribution of average observed expression among replicates (unique feature/sample type combinations) stratified by the proportion of poor quality (Fig. 6). This allows one to easily see that higher expression values result in increasing amounts of poor quality data; however, for both methods, there is substantial overlap between the distributions.

Titration response
We now turn to assessments of the expression estimates themselves. For both methods, we calculated the proportion of features that show a titration response (see Methods for definition) for each titration series (Table 3). We also stratify these results by the difference in pure sample expression between the mixture component being titrated and the component being held constant (Fig. 7). As one might expect, if an miRNA is far less abundant in the component being titrated than in the component being held constant, it is difficult to detect a titration response for this miRNA. If an miRNA is far more abundant in the component being titrated than in the component being held constant, it is relatively easy to detect a titration response for this miRNA. While both methods perform fairly well when the features are expressed higher in the titrating component (x-axis value > 0), the Life Technologies method appears to perform slightly better for features expressed higher in the component being held constant; however, neither method performs particularly well for such features.

Accuracy
Related to the titration response is the accuracy of the expression estimates. Rather than simply requiring Here we report three potential limits of detection based on the three assessments monotone increasing expression estimates in response to increasing input RNA, here we are interested in the actual magnitude of the increase in expression. Specifically, to assess accuracy, we calculate the signal detect slope, defined as the slope of the regression line of observed expression on expected expression. We stratify the signal detect slopes into three equally sized bins to highlight the dependence on the underlying difference in expression noted in the previous subsection. Similar to the results seen in Fig. 7, accuracy is better for features with higher relative expression in the titrating component (Fig. 8). For each of the bins shown in Fig. 8, we can summarize the accuracy by computing robust measures of center and spread -median and median absolute deviation (MAD)of the signal detect slopes ( Table 4). The qpcR method appears to have slightly better accuracy, although this is not significant.

Precision
Finally, we consider a measure of precision, the withinreplicate coefficient of variation. Here, replicates are defined as each unique feature/sample type combination consisting of four data points each. The data are divided into three bins of equal size based on the average expression of each replicate (Fig. 9). Both methods appear to have comparable precision.

Performance of other algorithms
To this point, we have focused on assessing two algorithms to demonstrate the functionality of the miRcomp R/Bioconductor package. However, we also applied four additional algorithms (see Methods) to this benchmark data set. The resulting expression estimates and quality scores are available in the miRcomp R package.  Performance assessments comparing all six algorithms are shown in Additional file 7: Figures S2-S7.
The relationship between expression estimates and quality scores was fairly similar across methods with the 5 parameter log sigmodial method producing slightly better quality scores and the linear exponential method producing overall lower quality scores (Additional file 7: Figure  S2). The limit of detection was roughly 30 for all six algorithms, although the limit of detection appears slightly lower for the Life Technologies and linear exponential methods (Additional file 7: Figures S3 & S4). The Life Technologies and 5 parameter log sigmodial methods per-  formed somewhat better in terms of titration response, while the linear exponential method performed significantly worse (Additional file 7: Figure S5). The linear exponential and 5 parameter log sigmoid methods had the best accuracy on average, but they also had the most variation in accuracy (Additional file 7: Figure S6). All methods performed better when the target transcript was highly expressed. The Life Technologies method had consistently the best precision and the 5 parameter log sigmoidal method had consistently the worst precision (Additional file 7: Figure S7).

Discussion
In the previous section, we focused on assessing two algorithms to demonstrate the functionality of the miRcomp R/Bioconductor package. Additionally, we evaluated the performance of four other algorithms. In summary, the 4-parameter and 5-parameter sigmodial and logsigmodial algorithms performed similarly, with the 5-parameter log-sigmoidal achieving slightly better accuracy by sacrificing some precision. In contrast the Life Technologies method appears to have sacrificed a bit of accuracy to achieve better precision. The linear exponential algorithm performed significantly worse than the other five methods.
However, the primary purpose of this work is to facilitate the development of new expression estimation algorithms by allowing researchers to assess their own method(s). The benchmark data set, described in the Methods section and provided in the miRcomp-Data R/Bioconductor package, provides a rich resource for novel methods development. One can use the assessment functions in the miRcomp package to examine the effect of higher or lower quality thresholds for a currently available algorithm. We anticipate that these packages will facilitate the comparison of competing algorithms and guide the selection of those most suitable for a specific experiment. Furthermore, we anticipate that the development of open-source software to estimate expression from the raw amplification data will lead to increased integration between expression estimation procedures and subsequent statistical analyses, often performed with R/Bioconductor software packages.
Software to estimate expression from amplification data are implemented across a wide variety of operating systems (e.g. Windows, Mac OS, and Unix/Linx) and programming languages (e.g. R, Python, Perl, and SAS) [20]. As there is currently no standard data structure for raw qPCR amplification data, we store the raw data in a simple table that can be easily exported for the miRcom-pData package and converted into formats required by other expression estimation software. While the assessments described in this manuscript are implemented in R, To assess accuracy, we calculate the signal detect slope: the slope of the regression line of observed expression on expected expression for both the Life Technologies (LifeTech) and qpcR algorithms. The ideal signal detect slope is one, representing agreement between observed and expected expression. The signal detect slopes are stratified by pure sample expression. Each point represents an miRNA. Points in the figures below are grey if the signal detect slope is not statistically signficantly different from zero. As such, a grey point corresponding to a signal detect slope well above zero represents a particularly noisy (large residual variance) response the assessment functions accept simple matrices as input, which should be easy to create from the output of any other software package.
The assessment strategy proposed in this manuscript is based on a mixture/dilution experimental design. An alternative strategy would have been to spike-in known amounts of several miRNAs. Assessing accuracy for genomic technologies presents a challenge because the correct outcome for a given measurement must be known a priori. Spike-in experiments represent a natural way  Fig. 8, robust estimates of the center (median) and spread (MAD) of the signal detect slopes are reported for each method, Life Technologies (LifeTech) and qpcR to accomplish this by comparing the nominal spike-in concentrations with the observed expression estimates. This is crucial to assess the accuracy of absolute expression estimates. However, performance assessments based on spike-in data depend strongly on the chosen spikein concentrations [21,22]. One must select spike-ins that span the entire dynamic range of observed expression, are unexpressed in all biological samples being analyzed, and are measured by the technology under consideration.
In a mixture/dilution experiment, the focus is on relative rather than absolute expression. While the absolute expression of a given feature is unknown, the relative expression is known. Therefore, in the assessments proposed in this manuscript, we have focused on the ability of various methods to accurately measure relative expression. Additionally, mixture/dilution experiments are inherently more realistic with respect to the dynamic range of observed expression levels by using mixtures of two biologically distinct RNA pools at varying proportions. Such a mixture/dilution approach allows one to characterize performance of assays across the full spectrum of expression, which is necessary to assess the performance of any analytic strategy. While one is restricted to assessments of relative expression, this corresponds to the standard use of qPCR expression assays -to measure differential rather than absolute expression. Precision Assessment. To assess precision, we calculate the within-replicate coefficient of variation, calculated as the within-replicate standard deviation divided by the within-replicate mean, for both the Life Technologies (LifeTech) and qpcR algorithms. These are calculated for each set of replicates (unique feature/sample type combination) that are of good quality and stratified by the observed expression

Conclusions
In this manuscript, we present a benchmark data set and software to assess the performance of methods that estimate expression from raw data on the Life Technologies OpenArray microRNA platform. The raw data and software packages, miRcomp and miRcom-pData, are open-source and freely available as part of the Bioconductor project. We believe that the data and assessment software will facilitate the develop of novel methods for quantification of microRNA expression from the Life Technologies OpenArray platform. Using these packages, we assessed the performance of six expression estimation algorithms. The 5-parameter log-sigmodial algorithm had the best accuracy; the Life Technologies method had the best titration response and precision.
of MNM was partially funded by the National Institutes of Health (HG006853). The work of MKH was supported by the American Heart Association (13GRNT16420015).