puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis
© Liu et al.; licensee BioMed Central Ltd. 2013
Received: 13 September 2012
Accepted: 18 January 2013
Published: 5 February 2013
Microarrays have been a popular tool for gene expression profiling at genome-scale for over a decade due to the low cost, short turn-around time, excellent quantitative accuracy and ease of data generation. The Bioconductor package puma incorporates a suite of analysis methods for determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analysis. As isoform level expression profiling receives more and more interest within genomics in recent years, exon microarray technology offers an important tool to quantify expression level of the majority of exons and enables the possibility of measuring isoform level expression. However, puma does not include methods for the analysis of exon array data. Moreover, the current expression summarisation method for Affymetrix 3’ GeneChip data suffers from instability for low expression genes. For the downstream analysis, the method for differential expression detection is computationally intensive and the original expression clustering method does not consider the variance across the replicated technical and biological measurements. It is therefore necessary to develop improved uncertainty propagation methods for gene and transcript expression analysis.
We extend the previously developed Bioconductor package puma with a new method especially designed for GeneChip Exon arrays and a set of improved downstream approaches. The improvements include: (i) a new gamma model for exon arrays which calculates isoform and gene expression measurements and a level of uncertainty associated with the estimates, using the multi-mappings between probes, isoforms and genes, (ii) a variant of the existing approach for the probe-level analysis of Affymetrix 3’ GeneChip data to produce more stable gene expression estimates, (iii) an improved method for detecting differential expression which is computationally more efficient than the existing approach in the package and (iv) an improved method for robust model-based clustering of gene expression, which takes technical and biological replicate information into consideration.
With the extensions and improvements, the puma package is now applicable to the analysis of both Affymetrix 3’ GeneChips and Exon arrays for gene and isoform expression estimation. It propagates the uncertainty of expression measurements into more efficient and comprehensive downstream analysis at both gene and isoform level. Downstream methods are also applicable to other expression quantification platforms, such as RNA-Seq, when uncertainty information is available from expression measurements. puma is available through Bioconductor and can be found at http://www.bioconductor.org.
Microarrays have been applied to high-throughput gene expression profiling for over a decade due to several advantages, e.g. high coverage, low cost, short turn-around time, excellent quantitative accuracy and ease of data generation. It has been shown recently that microarrays still remain an efficient and reliable tool for expression quantification especially for low-abundance targets . We previously developed the Bioconductor package puma for Affymetrix GeneChip data analysis. In the initial probe-level analysis, puma uses the multi-mgMOS method  to obtain an expression estimate for each gene and a level of uncertainty associated with this estimate. In the downstream analysis, puma propagates these uncertainties to principal component analysis, differential expression detection and gene expression clustering using methods NPPCA , PPLR  and PUMA-CLUST , respectively, and obtains improved analysis results. In addition to expression measurements obtained from microarrays, these downstream methods are also applicable to other expression quantification platforms, e.g. RNA-Seq based on high throughput sequencing technology, providing a level of uncertainty is associated with each measurement.
As the analysis of alternative splicing gains more and more interest in recent years, exon microarray technology, such as Affymetrix GeneChip Exon arrays, provides an option for measuring isoform level expression. It is therefore necessary for puma to include methods for propagating isoform expression uncertainty in the analysis of exon array data. Furthermore, the current probe-level analysis method, multi-mgMOS, obtains unstable expression estimates for low expression genes which can adversely affect the downstream analysis results. For the downstream analysis, the PPLR method for differential expression detection is computationally expensive and the PUMA-CLUST method for expression clustering does not consider the variance across the replicated technical and biological measurements. For all these reasons, we present here a new version of the puma package which incorporates a suite of improved probe-level analysis methods for gene and transcript expression summarisation and uncertainty propagation methods for the downstream analysis. The new version of the package covers the wide range of quantitative expression analysis of microarray at both gene and isoform level with the great benefit from propagating uncertainty associated with expression estimates into various advanced downstream analyses.
Affymetrix microarrays use 25-base long probes to measure transcript abundance. Traditional 3’ GeneChips use two types of probes, perfect match (PM) and mismatch (MM) probes. A PM probe matches the target sequence exactly, whereas the corresponding MM probe differs from the PM probe in the middle base which is changed to the complementary one. MM probes are introduced to act as a control for cross hybridisation and other types of background signal. The GeneChip Exon arrays use only PM probes to obtain higher density of coverage and make exon, isoform and gene level profiling possible. Many probe-level analysis methods for 3’ arrays such as PLIER  and RMA  which do not use MM probe intensities, can be applied to exon arrays directly for exon or gene level expression calculation by using probe-to-exon or probe-to-gene mappings, respectively. With the estimated exon and gene expression, it is possible to perform alternative splicing detection by measuring exon-gene expression ratios [9-11]. In addition to calculating exon and gene expression ratios, isoform expression levels can also be quantified for a more refined downstream analysis.
The expression calculation at isoform level is non-trivial since one probe can be mapped to multiple transcripts or gene loci . Also, an important characteristic of Affymetrix microarray probes is that they have different sensitivity to transcript abundance according to their sequence content. Many probe-level analysis approaches for 3’ arrays account for these probe-specific effects and have obtained improved results [3, 13]. Moreover, a level of uncertainty associated with estimated isoform expression would help downstream analyses to obtain more biologically relevant results. With available multi-mappings between probes and Ensembl transcripts, some methods have recently been proposed to address the expression calculation for known isoforms, such as MMBGX  and MEAP . MMBGX uses a hierarchical Bayesian model to calculates the expression level of target transcripts and results in a posterior distribution of each isoform expression. MMBGX is solved by MCMC method and is therefore computationally intensive. After background removal, MEAP adopts a non-negative matrix factorisation approach to summarise isoform expression as a point estimate and does not provide a level of uncertainty associated with this estimate. MMBGX and MEAP perform cross-hybridisation correction according to different GC content for probes, removing probe-specific effects to a certain extent. However, it has been shown that specific hybridisation also presents probe-specific variations [8, 16]. We developed a new gamma model for exon array data (GME), which accounts for probe-effects in specific hybridisation and multi-mappings between probes, transcripts and genes. The GME model parameters are estimated by Maximum a Posteriori (MAP) optimisation to give isoform and gene level expression measurements with a level of uncertainty of these estimates, provided by a MAP-Laplace approximation . The new method has been implemented as an R function in the new version of the puma package.
For traditional 3’ GeneChips, PM probes are thought to mainly measure specific hybridisation and MM probes measure non-specific hybridisation and other background. However, probes for low expression genes often obtain higher background than true signal. When combining PM and the corresponding MM probe intensities to calculate gene expression, the resulting gene expression measurements can be unstable for low expression genes, especially on a log scale. For this reason, most popular methods provide an option of using PM probes only in order to obtain more stable expression values on the log scale, such as PLIER , dCHIP  and RMA . The previous method for 3’ GeneChips in puma, multi-mgMOS , combines both PM and MM probe intensities to calculate gene expression values and provide a level of uncertainty associated with the measurements. For low expression genes the estimated logarithmic expression values are usually negative and the associated variance is typically large. These expression measurements with large error can further affect downstream analyses and may lead to incorrect biological conclusions. This is especially the case when the mean expression estimates are processed by methods outside of the puma package which do not account for measurement uncertainty. To alleviate this problem, we propose PM-only multi-mgMOS for 3’ arrays, which uses only PM probe intensities and obtains more stable gene expression estimates for low expression genes.
For the downstream analyses of gene expression, the new version of puma includes two newly improved approaches for finding differentially expressed (DE) genes and gene expression clustering. The previous method PPLR for finding DE genes considers the probe-level measurement error, which can improve results when there are few replicates available [5, 18]. PPLR uses an importance sampling procedure in the variational EM solver which leads to computational inefficiency since the number of samples needs to be increased to gain better accuracy. By adding a layer of hidden variables to the hierarchical Bayesian model, inference in the PPLR model is faster due to the elimination of this inefficient importance sampling step . The PUMA-CLUST method provided by the previous version of puma propagates probe-level uncertainty to improve results of standard Gaussian mixture clustering of gene expression . The recently proposed PUMA-CLUSTII  approach improves PUMA-CLUST in several aspects. First, variance across the replicated technical and biological measurements for the same experimental condition is considered. Second, a Student’s t-distribution is adopted as the clustering components to improve the robustness of the method. Finally, the optimal number of components can be automatically found, and this is especially important for the clustering when the ground truth in the data is unknown.
Extended and improved function components in puma
puma includes two levels of analyses for expression data, expression summarisation and downstream analyses. At the summarisation level of analysis, the previous version of puma as described in  can only processe 3’ GeneChip data using mainly multi-mgMOS. With the obtained gene expression measurements and the associated measurement uncertainty from microarrays or other platforms, puma propagates uncertainty into the downstream analyses, including PPLR for finding DE genes, PUMA-CLUST for gene expression clustering and NPPCA  for principal component analysis of gene expression. The diagram of function components for the previous puma is shown in the upper part of Figure 1. After the extension and improvement in this paper, the functions of the new version of puma are illustrated in the lower part of Figure 1. The new version provides the following contributions:
GME - In addition to traditional 3’ GeneChip data, the new version is capable of processing Exon array data using a new model GME at the summarisation level of analysis. From the Exon array data analysis, both gene and isoform expression can be computed.
PM-only multi-mgMOS - PM-only multi-mgMOS is included to improve the stability of multi-mgMOS for gene expression estimation.
IPPLR - At the downstream analyses, the new version of the package contains IPPLR as an improvement to speed up PPLR for detecting differential expression.
PUMA-CLUSTII - For expression clustering, PUMA-CLUSTII is introduced to consider the technical and biological variance across experimental replicates. The new clustering method increases the robustness of clustering and automatically selects the optimal number of clusters by model selection.
With these contributions, methods in puma can process both gene and isoform expression, making puma useful in the analysis of alternative splicing. See Methods for more details on these algorithms.
Multi-mappings between probes and isoforms
The increasing availability of mappings of microarray probes to isoforms in the Ensembl database can be used to perform isoform expression estimation. In particular, multi-mappings between probes and isoforms are helpful in separating the intensity contributions from probes shared by multiple isoforms. Transcript expression estimation may benefit from this intensity separation. The database GATExplorer  integrates information from multiple biological sources (including Ensembl database and probe sequences of Affymetrix microarrays) to provide the mappings between microarray probes and the functional transcriptional entities, i.e. gene loci, transcripts, exons and ncRNAs. We include the multi-mappings between Exon array probes, isoforms and genes obtained from GATExplorer into the separate Bioconductor data package pumadata which contains example and annotation data used by puma. Mappings for human, mouse and rat exon arrays are included and this makes puma applicable to all types of Affymetrix Exon arrays.
Using the extended functions in puma
The new version of puma and the related pumadata package can be found at http://www.bioconductor.org. The GEM model is implemented in the function gmoExon to calculate gene and isoform level expression for Exon arrays. The PM-only multi-mgMOS method is implemented in the function PMmmgmos to estimate stable gene expression for Affymetrix GeneChips. The improved PPLR for detecting DE genes is implemented in the function pumaCombImproved. The PUMA-CLUSTII is implemented in the function pumaclustii for robust expression clustering. To use these functions, type library(puma) and library(pumadata) at R prompt to load puma package and the data package. A quick start of each of these functions is described below. For detailed use of these functions, please refer to the user manual of the puma package.
Gamma model for Exon arrays
The expression summarisation method for Exon arrays is GME. The method makes use of multi-mappings between probes, isoforms and genes obtained from GATExplorer to aid the calculation of gene and isoform expression. The mappings are included in the individual package pumadata. The following code shows a quick start of this method.
The above code loads exon array data (CEL files) in the working directory as an AffyBatch object and processes it using GME method. Among the parameters, exontype can be one of “Human”, “Mouse” and “Rat”, indicating the exon chip type. GT can be one of “gene” and “transcript”, specifying the expression estimated at gene and isoform level, respectively. gsnorm specifies the algorithm used by the global scaling normalisation and can be one of “mean”, “median”, “meanlog” and “none”. “mean” and “meanlog” are mean-centered normalisation on raw and the log scale, respectively, “median” is median-centered normalisation and “none” means no global scaling normalisation. The value of gmoExon is an object of class exprReslt which stores the estimated expression and a level of uncertainty associated with this measurement.
PM-only multi-mgMOS for Affymetrix GeneChips
PM-only multi-mgMOS increases the stability of the original multi-mgMOS method, especially for weakly expressed genes. We use an example dataset included in the pumadata package to demonstrate the use of this method.
The first parameter of the function PMmmgmos is an AffyBatch object containing the raw probe intensities. The parameter gsnorm has the same meaning as that in the function gmoExon. The value of PMmmgmos is an object of class exprReslt which contains the estimated gene expression and the corresponding estimation uncertainty.
Improved PPLR for finding DE genes
IPPLR is designed to improve the computational efficiency of the original PPLR for finding differential expression. Similar to PPLR, it includes two steps to detect DE genes. At the first step, the function pumaCombImproved is used to combine expression from replicates to give a single measurement for the related condition. At the second step, the existing function pumaDE is used to calculate the PPLR (probability of positive log-ratio) values to identify DE genes. We use an example dataset in the puma package to demonstrate the use of this method as below.
The parameter of pumaCombImproved is an object of class ExpressionSet and can also be the outputs from GME, PM-only multi-mgMOS or multi-mgMOS. The function pumaDE generates lists of genes ranked by the PPLR values which indicate the significance of differential expression.
PUMA-CLUSTII for robust clustering
The existing clustering method PUMA-CLUST in puma considers uncertainty of gene expression but does not take into account the technical and biological variance when replicates are available. PUMA-CLUSTII is proposed to address this problem. It also adopts more robust components by using a Student’s t distribution instead of the Gaussian components used by PUMA-CLUST. We use an example dataset in the puma package to show the use of this method.
The first two parameters of pumaClustii are data frames containing the expression measurements and the associated uncertainty respectively. The minimum and maximum numbers of clusters are specified by the parameters mincls and maxcls, respectively. The parameter conds indicates the number of conditions involved in the data and reps is a vector specifying which condition each column of the input data frame belongs to. The result is a list containing the center of clustering components, the membership of components for each data point, the optional number of clusters and other auxiliary information.
Results and discussion
We use the well studied Microarray Quality Control (MAQC) dataset  to evaluate most of the extensions of the new version of puma at gene expression level. MAQC project measured gene expression levels from high-quality RNA samples to assess the comparability across multiple platforms. We select two RNA samples, the universal human reference RNA (UHRR) and the human brain reference RNA (HBRR), from Affymetrix Exon array and Affymetrix U133 GeneChip platforms. Each sample type has five replicates for both platforms. Experiments of Exon arrays were carried out in two independent labs: McGill University (MU) and Virginia Tech (VT). We randomly selected data from MU for the evaluation of GME. For U133 GeneChips, we use data AFX_1_[A-B][1-5] from GSE5350. Apart from microarray experiments, MAQC project also conducted qRT-PCR experiments for around one thousand genes which can be served as a gold-standard to benchmark gene expression values estimated from other platforms [22, 23].
Number of qRT-PCR validated non-DE and DE genes and probe-sets for Exon arrays and H133 GeneChips
The qRT-PCR validated head and neck squamous cell carcinoma (HNSCC) dataset  is used to verify the isoform expression calculated by GME. In HNSCC dataset, 15 cell lines from tongue and larynx were cultured and samples were assayed using Affymetrix Human Exon 1.0 ST microarrays. Amplification of the chromosome region 11q13 is a common genomic alteration in HNSCC. The 15 cell lines are divided into two sample groups, with 11q13 amplification (11q13+) and without 11q13 amplification (11q13-). 11q13+ group contains seven cell lines and 11q13- group contains eight. qRT-PCR experiments were performed for four alternatively spliced variants of two genes (ORAOV1 and NEO1) located in the 11q13 amplified region and associated with HNSCC. We use GME to calculate the expression levels for the four isoforms in all 15 cell lines and then apply PPLR to identify the differential expressed transcripts (DETs). The detected DETs are compared with qRT-PCR findings to verify the performance of GME.
Accuracy of gene expression estimation for Exon array data
Area under ROC curves from different methods for Exon array data
Validation of isoform expression estimation
GME results for the qRT-PCR validated transcripts
m a x( P P L R,1− P P L R)
11q13+ vs. 11q13-
ORAOV1-201 vs. 202
NEO1-201 vs. 202
Improvements for detection of differential expression
Run time of PPLR and IPPLR
Accuracy of gene expression estimation for 3’ GeneChips
Area under ROC curves from different methods for U133 GeneChip data
# of probe-sets
Robust clustering considering technical and biological variance
PUMA-CLUSTII is a robust Student’s t mixture model and takes into accounts expression measurement error, and technical and biological variance. Our work in  has already demonstrated that PUMA-CLUSTII obtained more accurate partitions compared with other alternatives on synthetic data. Furthermore, the method was shown to obtain numbers of clusters similar to the number of underlying groups in realistic simulated data. Applications of PUMA-CLUSTII on yeast metabolic cycle and cell cycle datasets have already shown that the method led to more biologically relevant clusters in terms of both GO category and TF-gene interaction.
We have presented the extended and improved functions of the new version of the puma package and demonstrated the usefulness of these new functions on the well studied MAQC dataset and the qRT-PCR validated HNSCC dataset. With these extensions and improvements, puma is able to provide accurate expression estimates for both Affymetrix 3’ GeneChips and Exon arrays. In addition to gene expression measurements, the new puma can also provide reliable estimation of isoform expression from Exon array data. For 3’ GeneChip data, the stability of expression measurements for low expression genes was improved. Furthermore, a level of uncertainty associated with these expression estimates can also be obtained and this measurement error can be propagated into our downstream analysis approaches to obtain improved results. With the consideration of expression measurement error in the downstream analyses, methods can be computationally demanding. The new puma package significantly improves the computational efficiency of the previous method for finding DE genes and obtains even better accuracy. As the final contribution, the new puma provides a robust clustering method which considers the within-chip measurement error and across-chip technical and biological variance.
There are two main advantages of the new puma package. One is that the package processes Affymetrix 3’ GeneChips and Exon arrays to obtain accurate gene and isoform expression estimates with a level of uncertainty associated with these measurements. The other is that the package offers various downstream analysis approaches which make use of measurement error of expression to produce improved results at both gene and isoform level. Note that the data used for these downstream analyses is not limited to expression measurements from microarrays. The data can be expression measurement obtained from any other platform so long as a reasonable level of uncertainty can be associated with each measurement. For example, RNA-Seq is increasingly applied for transcript quantification . Some methods proposed to analyse RNA-Seq data are able to provide both expression estimates and measurement uncertainty [25, 26]. The transcript expression estimates and the related measurement error output by these methods can be used directly by the downstream analysis methods of puma. For all these reasons, puma is very useful to a large number of researchers who are interested in gene and transcript expression analysis.
Gamma model for Affymetrix GeneChip Exon array data
The posterior distributions of the logged gene/isoform expression can be estimated from equation (2) and (3), respectively. The expectation of the logged expression level is then computed and approximated by a Gaussian. The Gaussian approximation to the posterior distribution is useful for propagating the probe-level measurement error in subsequent downstream analyses of both gene and isoform expression.
PM-only multi-mgMOS for Affymetrix 3’ GeneChip data
where b i j is a latent variable which models probe-specific effects for the same type of chip.
We use a Gaussian with a mean and a variance to approximate the posterior distribution of the expectation of log(y i j c ). The mean of the Gaussian is taken as the estimated gene expression and the variance shows the measurement error associated with this estimate.
Improved PPLR for finding differential expressed genes
where is the probe-level measurement error, which can be obtained from multi-mgMOS or PM-only multi-mgMOS.
We use the EM algorithm combined with a variational method to work out the model. In the E-step of PPLR, the variational distribution of λ is obtained by importance sampling which slows down the computation of the method. In contrast, the computation in the E-step of IPPLR is analytical due to the introduction of the latent variable x i j . IPPLR is therefore more computationally efficient than PPLR.
where D is the observed dataset and is the set of ML estimates of hyperparameters. The examined transcript is up-regulated in the treatment when P P L R>0.5 while down-regulated when P P L R<0.5.
PUMA-CLUSTII for clustering of replicated gene expression
Inference can be carried out using the variational EM algorithm. Specifying the maximum and minimum numbers of components, the algorithm automatically converged to the optimal number of mixture components by employing the minimum message length (MML) principle  for model selection.
Availability and requirements
XL acknowledges support from NSFC (61170152) and Qing Lan Project. LZ acknowledges support by “the Fundamental Research Funds for the Central Universities” (CXZZ11_0217). MR was supported by BBSRC award BB/H018123/2.
- Łabaj PP, Leparc GG, E LB, Markillie LM, S WH, P KD: Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 2011,27(13):i383-i391. 10.1093/bioinformatics/btr247PubMed CentralView ArticlePubMedGoogle Scholar
- Pearson RD, Liu X, Sanguinetti G, Milo M, D LN, Rattray M: puma: a bioconductor package for propagating uncertainty in microarray analysis. BMC Bioinformatics 2009, 10: 211. 10.1186/1471-2105-10-211PubMed CentralView ArticlePubMedGoogle Scholar
- Liu X, Milo M, Lawrence ND, Rattray M: A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips. Bioinformatics 2005, 21: 3637-3644. 10.1093/bioinformatics/bti583View ArticlePubMedGoogle Scholar
- Sanguinetti G, MIlo M, Rattray M, Lawrence ND: Accounting for probe-level noise in principal component analysis of mmicroarray data. Bioinformatice 2005, 21: 3748-3754. 10.1093/bioinformatics/bti617View ArticleGoogle Scholar
- Liu X, Milo M, Lawrence ND, Rattray M: Probe-level measurement error improves accuracy in detecting differential gene expression. Bioinformatics 2006, 22: 2107-2113. 10.1093/bioinformatics/btl361View ArticlePubMedGoogle Scholar
- Liu X, Lin KK, Andersen B, Rattray M: Including probe-level uncertainty in model-based gene expression clustering. BMC Bioinformatics 2007, 9: 98.View ArticleGoogle Scholar
- Affymetrix: Guide to Probe Logarithmic Intensity Error. 2008. [Technical note] [Technical note]Google Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ: Exploreation, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4: 249-264. 10.1093/biostatistics/4.2.249View ArticlePubMedGoogle Scholar
- Affymetrix: Alternative Transcript Analysis Methods for Exon Arrays. 2005. (11 October 2005, date last revised) [Http://media.affymetrix.com/support/technical/whitepapers/exon_alt_transcript_analysis_whitepaper.pdf] (11 October 2005, date last revised) Google Scholar
- Purdom E, Simpson KM, Robinson MD, Conboy JG, Lapuk AV, Speed TP: FIRMA: a method for detection of alternative splicing from exon array data. Bioinformatics 2008, 24: 1707-1714. 10.1093/bioinformatics/btn284PubMed CentralView ArticlePubMedGoogle Scholar
- Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S, Black DL, Wong WH: MADS: a new and improved method for analysis of differential alternative splicing by exon-tiling microarrays. RNA 2008, 14: 1470-1479. 10.1261/rna.1070208PubMed CentralView ArticlePubMedGoogle Scholar
- Risueño A, Fontanillo C, E DM, J DLR: GATExplorer: genomic and transcriptomic explorer; mapping expression probe to gene loci, transcripts, exons and ncRNAs. BMC Bioinformatics 2010, 11: 221. 10.1186/1471-2105-11-221PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F: A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 2004, 99: 909-917. 10.1198/016214504000000683View ArticleGoogle Scholar
- Turro E, Lewin A, Rose A, Dallman MJ, Richardson S: MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays. Nucleic Acids Res 2010, 38: e4. 10.1093/nar/gkp853PubMed CentralView ArticlePubMedGoogle Scholar
- Chen P, Lepikhova T, Hu Y, Monni O, Hautamiemi S: Comprehensive exon array data processing method for quantitative analysis of alternative spliced variants. Nucleic Acids Res 2011, 39: e123. 10.1093/nar/gkr513PubMed CentralView ArticlePubMedGoogle Scholar
- Li C, Wong W: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA 2001, 98: 31-36. 10.1073/pnas.98.1.31PubMed CentralView ArticlePubMedGoogle Scholar
- Bishop CM: Pattern Recognition and Machine Learning. New York: Springer; 2006.Google Scholar
- Pearson RD: A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods. BMC Bioinformatice 2008, 9: 164. 10.1186/1471-2105-9-164View ArticleGoogle Scholar
- Zhang L, Liu X: An improved probabilistic model for finding differential gene expression. In Proceedings of the 2nd International Conference on BioMedical Engineering and Informatics, BMEI 2009. Tianjin, China; 2009.Google Scholar
- Liu X, Rattray M: Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression. Stat Appl Genet Mol Biol 2010, 9: 42.Google Scholar
- Consortium M: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24: 1151-1161. 10.1038/nbt1239View ArticleGoogle Scholar
- Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Y LK, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, L RP, Samaha RR, Shi L, Yang W, Zhang L, M GF: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 2006, 24: 1115-1122. 10.1038/nbt1236View ArticlePubMedGoogle Scholar
- Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 2010, 11: 94. 10.1186/1471-2105-11-94PubMed CentralView ArticlePubMedGoogle Scholar
- Nagalakshmi U, Wang Z, Waem K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional lanscape of the yeast genome defined by RNA sequencing. Science 2008, 320: 1344-1349. 10.1126/science.1158441PubMed CentralView ArticlePubMedGoogle Scholar
- Katz Y, Wang ET, Airoldi EM, Burge CB: Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010, 7: 1009-1015. 10.1038/nmeth.1528PubMed CentralView ArticlePubMedGoogle Scholar
- Glaus P, Honkela A, Rattray M: Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 2012, 28: 1721-1728. 10.1093/bioinformatics/bts260PubMed CentralView ArticlePubMedGoogle Scholar
- Figueiredo MAT, Jain AK: Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 2002, 24: 381-396.View ArticleGoogle Scholar
- Spellucci PDB: An SQP method for general nonlinear programs using only equality constrained subproblems. Math Program 1998, 82: 413.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.