puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis
 Xuejun Liu^{1}Email author,
 Zhenzhu Gao^{1},
 Li Zhang^{1} and
 Magnus Rattray^{2}Email author
https://doi.org/10.1186/147121051439
© Liu et al.; licensee BioMed Central Ltd. 2013
Received: 13 September 2012
Accepted: 18 January 2013
Published: 5 February 2013
Abstract
Background
Microarrays have been a popular tool for gene expression profiling at genomescale for over a decade due to the low cost, short turnaround time, excellent quantitative accuracy and ease of data generation. The Bioconductor package puma incorporates a suite of analysis methods for determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analysis. As isoform level expression profiling receives more and more interest within genomics in recent years, exon microarray technology offers an important tool to quantify expression level of the majority of exons and enables the possibility of measuring isoform level expression. However, puma does not include methods for the analysis of exon array data. Moreover, the current expression summarisation method for Affymetrix 3’ GeneChip data suffers from instability for low expression genes. For the downstream analysis, the method for differential expression detection is computationally intensive and the original expression clustering method does not consider the variance across the replicated technical and biological measurements. It is therefore necessary to develop improved uncertainty propagation methods for gene and transcript expression analysis.
Results
We extend the previously developed Bioconductor package puma with a new method especially designed for GeneChip Exon arrays and a set of improved downstream approaches. The improvements include: (i) a new gamma model for exon arrays which calculates isoform and gene expression measurements and a level of uncertainty associated with the estimates, using the multimappings between probes, isoforms and genes, (ii) a variant of the existing approach for the probelevel analysis of Affymetrix 3’ GeneChip data to produce more stable gene expression estimates, (iii) an improved method for detecting differential expression which is computationally more efficient than the existing approach in the package and (iv) an improved method for robust modelbased clustering of gene expression, which takes technical and biological replicate information into consideration.
Conclusions
With the extensions and improvements, the puma package is now applicable to the analysis of both Affymetrix 3’ GeneChips and Exon arrays for gene and isoform expression estimation. It propagates the uncertainty of expression measurements into more efficient and comprehensive downstream analysis at both gene and isoform level. Downstream methods are also applicable to other expression quantification platforms, such as RNASeq, when uncertainty information is available from expression measurements. puma is available through Bioconductor and can be found at http://www.bioconductor.org.
Keywords
Background
Microarrays have been applied to highthroughput gene expression profiling for over a decade due to several advantages, e.g. high coverage, low cost, short turnaround time, excellent quantitative accuracy and ease of data generation. It has been shown recently that microarrays still remain an efficient and reliable tool for expression quantification especially for lowabundance targets [1]. We previously developed the Bioconductor package puma[2] for Affymetrix GeneChip data analysis. In the initial probelevel analysis, puma uses the multimgMOS method [3] to obtain an expression estimate for each gene and a level of uncertainty associated with this estimate. In the downstream analysis, puma propagates these uncertainties to principal component analysis, differential expression detection and gene expression clustering using methods NPPCA [4], PPLR [5] and PUMACLUST [6], respectively, and obtains improved analysis results. In addition to expression measurements obtained from microarrays, these downstream methods are also applicable to other expression quantification platforms, e.g. RNASeq based on high throughput sequencing technology, providing a level of uncertainty is associated with each measurement.
As the analysis of alternative splicing gains more and more interest in recent years, exon microarray technology, such as Affymetrix GeneChip Exon arrays, provides an option for measuring isoform level expression. It is therefore necessary for puma to include methods for propagating isoform expression uncertainty in the analysis of exon array data. Furthermore, the current probelevel analysis method, multimgMOS, obtains unstable expression estimates for low expression genes which can adversely affect the downstream analysis results. For the downstream analysis, the PPLR method for differential expression detection is computationally expensive and the PUMACLUST method for expression clustering does not consider the variance across the replicated technical and biological measurements. For all these reasons, we present here a new version of the puma package which incorporates a suite of improved probelevel analysis methods for gene and transcript expression summarisation and uncertainty propagation methods for the downstream analysis. The new version of the package covers the wide range of quantitative expression analysis of microarray at both gene and isoform level with the great benefit from propagating uncertainty associated with expression estimates into various advanced downstream analyses.
Affymetrix microarrays use 25base long probes to measure transcript abundance. Traditional 3’ GeneChips use two types of probes, perfect match (PM) and mismatch (MM) probes. A PM probe matches the target sequence exactly, whereas the corresponding MM probe differs from the PM probe in the middle base which is changed to the complementary one. MM probes are introduced to act as a control for cross hybridisation and other types of background signal. The GeneChip Exon arrays use only PM probes to obtain higher density of coverage and make exon, isoform and gene level profiling possible. Many probelevel analysis methods for 3’ arrays such as PLIER [7] and RMA [8] which do not use MM probe intensities, can be applied to exon arrays directly for exon or gene level expression calculation by using probetoexon or probetogene mappings, respectively. With the estimated exon and gene expression, it is possible to perform alternative splicing detection by measuring exongene expression ratios [911]. In addition to calculating exon and gene expression ratios, isoform expression levels can also be quantified for a more refined downstream analysis.
The expression calculation at isoform level is nontrivial since one probe can be mapped to multiple transcripts or gene loci [12]. Also, an important characteristic of Affymetrix microarray probes is that they have different sensitivity to transcript abundance according to their sequence content. Many probelevel analysis approaches for 3’ arrays account for these probespecific effects and have obtained improved results [3, 13]. Moreover, a level of uncertainty associated with estimated isoform expression would help downstream analyses to obtain more biologically relevant results. With available multimappings between probes and Ensembl transcripts, some methods have recently been proposed to address the expression calculation for known isoforms, such as MMBGX [14] and MEAP [15]. MMBGX uses a hierarchical Bayesian model to calculates the expression level of target transcripts and results in a posterior distribution of each isoform expression. MMBGX is solved by MCMC method and is therefore computationally intensive. After background removal, MEAP adopts a nonnegative matrix factorisation approach to summarise isoform expression as a point estimate and does not provide a level of uncertainty associated with this estimate. MMBGX and MEAP perform crosshybridisation correction according to different GC content for probes, removing probespecific effects to a certain extent. However, it has been shown that specific hybridisation also presents probespecific variations [8, 16]. We developed a new gamma model for exon array data (GME), which accounts for probeeffects in specific hybridisation and multimappings between probes, transcripts and genes. The GME model parameters are estimated by Maximum a Posteriori (MAP) optimisation to give isoform and gene level expression measurements with a level of uncertainty of these estimates, provided by a MAPLaplace approximation [17]. The new method has been implemented as an R function in the new version of the puma package.
For traditional 3’ GeneChips, PM probes are thought to mainly measure specific hybridisation and MM probes measure nonspecific hybridisation and other background. However, probes for low expression genes often obtain higher background than true signal. When combining PM and the corresponding MM probe intensities to calculate gene expression, the resulting gene expression measurements can be unstable for low expression genes, especially on a log scale. For this reason, most popular methods provide an option of using PM probes only in order to obtain more stable expression values on the log scale, such as PLIER [7], dCHIP [16] and RMA [8]. The previous method for 3’ GeneChips in puma, multimgMOS [3], combines both PM and MM probe intensities to calculate gene expression values and provide a level of uncertainty associated with the measurements. For low expression genes the estimated logarithmic expression values are usually negative and the associated variance is typically large. These expression measurements with large error can further affect downstream analyses and may lead to incorrect biological conclusions. This is especially the case when the mean expression estimates are processed by methods outside of the puma package which do not account for measurement uncertainty. To alleviate this problem, we propose PMonly multimgMOS for 3’ arrays, which uses only PM probe intensities and obtains more stable gene expression estimates for low expression genes.
For the downstream analyses of gene expression, the new version of puma includes two newly improved approaches for finding differentially expressed (DE) genes and gene expression clustering. The previous method PPLR for finding DE genes considers the probelevel measurement error, which can improve results when there are few replicates available [5, 18]. PPLR uses an importance sampling procedure in the variational EM solver which leads to computational inefficiency since the number of samples needs to be increased to gain better accuracy. By adding a layer of hidden variables to the hierarchical Bayesian model, inference in the PPLR model is faster due to the elimination of this inefficient importance sampling step [19]. The PUMACLUST method provided by the previous version of puma propagates probelevel uncertainty to improve results of standard Gaussian mixture clustering of gene expression [6]. The recently proposed PUMACLUSTII [20] approach improves PUMACLUST in several aspects. First, variance across the replicated technical and biological measurements for the same experimental condition is considered. Second, a Student’s tdistribution is adopted as the clustering components to improve the robustness of the method. Finally, the optimal number of components can be automatically found, and this is especially important for the clustering when the ground truth in the data is unknown.
Implementation
Extended and improved function components in puma
puma includes two levels of analyses for expression data, expression summarisation and downstream analyses. At the summarisation level of analysis, the previous version of puma as described in [2] can only processe 3’ GeneChip data using mainly multimgMOS. With the obtained gene expression measurements and the associated measurement uncertainty from microarrays or other platforms, puma propagates uncertainty into the downstream analyses, including PPLR for finding DE genes, PUMACLUST for gene expression clustering and NPPCA [4] for principal component analysis of gene expression. The diagram of function components for the previous puma is shown in the upper part of Figure 1. After the extension and improvement in this paper, the functions of the new version of puma are illustrated in the lower part of Figure 1. The new version provides the following contributions:

GME  In addition to traditional 3’ GeneChip data, the new version is capable of processing Exon array data using a new model GME at the summarisation level of analysis. From the Exon array data analysis, both gene and isoform expression can be computed.

PMonly multimgMOS  PMonly multimgMOS is included to improve the stability of multimgMOS for gene expression estimation.

IPPLR  At the downstream analyses, the new version of the package contains IPPLR as an improvement to speed up PPLR for detecting differential expression.

PUMACLUSTII  For expression clustering, PUMACLUSTII is introduced to consider the technical and biological variance across experimental replicates. The new clustering method increases the robustness of clustering and automatically selects the optimal number of clusters by model selection.
With these contributions, methods in puma can process both gene and isoform expression, making puma useful in the analysis of alternative splicing. See Methods for more details on these algorithms.
Multimappings between probes and isoforms
The increasing availability of mappings of microarray probes to isoforms in the Ensembl database can be used to perform isoform expression estimation. In particular, multimappings between probes and isoforms are helpful in separating the intensity contributions from probes shared by multiple isoforms. Transcript expression estimation may benefit from this intensity separation. The database GATExplorer [12] integrates information from multiple biological sources (including Ensembl database and probe sequences of Affymetrix microarrays) to provide the mappings between microarray probes and the functional transcriptional entities, i.e. gene loci, transcripts, exons and ncRNAs. We include the multimappings between Exon array probes, isoforms and genes obtained from GATExplorer into the separate Bioconductor data package pumadata which contains example and annotation data used by puma. Mappings for human, mouse and rat exon arrays are included and this makes puma applicable to all types of Affymetrix Exon arrays.
Using the extended functions in puma
The new version of puma and the related pumadata package can be found at http://www.bioconductor.org. The GEM model is implemented in the function gmoExon to calculate gene and isoform level expression for Exon arrays. The PMonly multimgMOS method is implemented in the function PMmmgmos to estimate stable gene expression for Affymetrix GeneChips. The improved PPLR for detecting DE genes is implemented in the function pumaCombImproved. The PUMACLUSTII is implemented in the function pumaclustii for robust expression clustering. To use these functions, type library(puma) and library(pumadata) at R prompt to load puma package and the data package. A quick start of each of these functions is described below. For detailed use of these functions, please refer to the user manual of the puma package.
Gamma model for Exon arrays
The expression summarisation method for Exon arrays is GME. The method makes use of multimappings between probes, isoforms and genes obtained from GATExplorer to aid the calculation of gene and isoform expression. The mappings are included in the individual package pumadata. The following code shows a quick start of this method.
The above code loads exon array data (CEL files) in the working directory as an AffyBatch object and processes it using GME method. Among the parameters, exontype can be one of “Human”, “Mouse” and “Rat”, indicating the exon chip type. GT can be one of “gene” and “transcript”, specifying the expression estimated at gene and isoform level, respectively. gsnorm specifies the algorithm used by the global scaling normalisation and can be one of “mean”, “median”, “meanlog” and “none”. “mean” and “meanlog” are meancentered normalisation on raw and the log scale, respectively, “median” is mediancentered normalisation and “none” means no global scaling normalisation. The value of gmoExon is an object of class exprReslt which stores the estimated expression and a level of uncertainty associated with this measurement.
PMonly multimgMOS for Affymetrix GeneChips
PMonly multimgMOS increases the stability of the original multimgMOS method, especially for weakly expressed genes. We use an example dataset included in the pumadata package to demonstrate the use of this method.
The first parameter of the function PMmmgmos is an AffyBatch object containing the raw probe intensities. The parameter gsnorm has the same meaning as that in the function gmoExon. The value of PMmmgmos is an object of class exprReslt which contains the estimated gene expression and the corresponding estimation uncertainty.
Improved PPLR for finding DE genes
IPPLR is designed to improve the computational efficiency of the original PPLR for finding differential expression. Similar to PPLR, it includes two steps to detect DE genes. At the first step, the function pumaCombImproved is used to combine expression from replicates to give a single measurement for the related condition. At the second step, the existing function pumaDE is used to calculate the PPLR (probability of positive logratio) values to identify DE genes. We use an example dataset in the puma package to demonstrate the use of this method as below.
The parameter of pumaCombImproved is an object of class ExpressionSet and can also be the outputs from GME, PMonly multimgMOS or multimgMOS. The function pumaDE generates lists of genes ranked by the PPLR values which indicate the significance of differential expression.
PUMACLUSTII for robust clustering
The existing clustering method PUMACLUST in puma considers uncertainty of gene expression but does not take into account the technical and biological variance when replicates are available. PUMACLUSTII is proposed to address this problem. It also adopts more robust components by using a Student’s t distribution instead of the Gaussian components used by PUMACLUST. We use an example dataset in the puma package to show the use of this method.
The first two parameters of pumaClustii are data frames containing the expression measurements and the associated uncertainty respectively. The minimum and maximum numbers of clusters are specified by the parameters mincls and maxcls, respectively. The parameter conds indicates the number of conditions involved in the data and reps is a vector specifying which condition each column of the input data frame belongs to. The result is a list containing the center of clustering components, the membership of components for each data point, the optional number of clusters and other auxiliary information.
Results and discussion
Datasets
MAQC dataset
We use the well studied Microarray Quality Control (MAQC) dataset [21] to evaluate most of the extensions of the new version of puma at gene expression level. MAQC project measured gene expression levels from highquality RNA samples to assess the comparability across multiple platforms. We select two RNA samples, the universal human reference RNA (UHRR) and the human brain reference RNA (HBRR), from Affymetrix Exon array and Affymetrix U133 GeneChip platforms. Each sample type has five replicates for both platforms. Experiments of Exon arrays were carried out in two independent labs: McGill University (MU) and Virginia Tech (VT). We randomly selected data from MU for the evaluation of GME. For U133 GeneChips, we use data AFX_1_[AB][15] from GSE5350. Apart from microarray experiments, MAQC project also conducted qRTPCR experiments for around one thousand genes which can be served as a goldstandard to benchmark gene expression values estimated from other platforms [22, 23].
Number of qRTPCR validated nonDE and DE genes and probesets for Exon arrays and H133 GeneChips
nonDE  DE  

DE+  DE  
Exon arrays  87  116  102 
U133 GeneChips  204  185  267 
HNSCC dataset
The qRTPCR validated head and neck squamous cell carcinoma (HNSCC) dataset [15] is used to verify the isoform expression calculated by GME. In HNSCC dataset, 15 cell lines from tongue and larynx were cultured and samples were assayed using Affymetrix Human Exon 1.0 ST microarrays. Amplification of the chromosome region 11q13 is a common genomic alteration in HNSCC. The 15 cell lines are divided into two sample groups, with 11q13 amplification (11q13+) and without 11q13 amplification (11q13). 11q13+ group contains seven cell lines and 11q13 group contains eight. qRTPCR experiments were performed for four alternatively spliced variants of two genes (ORAOV1 and NEO1) located in the 11q13 amplified region and associated with HNSCC. We use GME to calculate the expression levels for the four isoforms in all 15 cell lines and then apply PPLR to identify the differential expressed transcripts (DETs). The detected DETs are compared with qRTPCR findings to verify the performance of GME.
Accuracy of gene expression estimation for Exon array data
Area under ROC curves from different methods for Exon array data
Methods  2 replicates  5 replicates  

1  2  3  4  5  Average  
ttest  RMA  0.8945  0.8909  0.9107  0.9346  0.9316  0.9118  0.9475 
PLIER  0.8806  0.8852  0.9004  0.9084  0.9083  0.8937  0.9291  
GME  0.9082  0.9044  0.9415  0.9544  0.9427  0.9287  0.9580  
PPLR_1000  RMA  0.9243  0.9234  0.9385  0.9417  0.9387  0.9323  0.9489 
GME  0.9208  0.9093  0.9365  0.9297  0.8969  0.9188  0.9447  
*PPLR_10000  RMA  0.9227  0.9226  0.9419  0.9453  0.9432  0.9348  0.9492 
GME  0.9353  0.9317  0.9474  0.9374  0.9324  0.9274  0.9503  
IPPLR  RMA  0.9246  0.9301  0.9464  0.9468  0.9463  0.9382  0.9493 
GME  0.9379  0.9391  0.9457  0.9597  0.9549  0.9475  0.9589 
Validation of isoform expression estimation
GME results for the qRTPCR validated transcripts
Comparisons  qRTPCR  GME  m a x( P P L R,1− P P L R)  Consistency  

11q13+ vs. 11q13  ORAOV1201  +  +  1.0000  Y 
ORAOV1202  +  +  0.9968  Y  
NEO1201  +  +  0.8961  Y  
NEO1202  +  +  0.9719  Y  
ORAOV1201 vs. 202  11q13+  +  +  0.9154  Y 
11q13  +  +  0.5782  Y  
NEO1201 vs. 202  11q13+      0.9999  Y 
11q13      1.0000  Y 
Improvements for detection of differential expression
Run time of PPLR and IPPLR
Datasets  PPLR_1000  PPLR_10000  IPPLR 

2 replicates  73.1  1330.8  27.5 
5 replicates  125.4  3127.4  15.9 
Accuracy of gene expression estimation for 3’ GeneChips
Area under ROC curves from different methods for U133 GeneChip data
Groups  # of probesets  PMonly multimgMOS  multimgMOS  RMA  

nonDE  DE+  DE  
High  65  21  14  0.9842  0.9952  0.9308 
Medium  90  73  82  0.9062  0.8880  0.8827 
Low  49  91  171  0.8363  0.8180  0.8147 
All  204  185  267  0.8227  0.8130  0.7971 
Robust clustering considering technical and biological variance
PUMACLUSTII is a robust Student’s t mixture model and takes into accounts expression measurement error, and technical and biological variance. Our work in [20] has already demonstrated that PUMACLUSTII obtained more accurate partitions compared with other alternatives on synthetic data. Furthermore, the method was shown to obtain numbers of clusters similar to the number of underlying groups in realistic simulated data. Applications of PUMACLUSTII on yeast metabolic cycle and cell cycle datasets have already shown that the method led to more biologically relevant clusters in terms of both GO category and TFgene interaction.
Conclusions
We have presented the extended and improved functions of the new version of the puma package and demonstrated the usefulness of these new functions on the well studied MAQC dataset and the qRTPCR validated HNSCC dataset. With these extensions and improvements, puma is able to provide accurate expression estimates for both Affymetrix 3’ GeneChips and Exon arrays. In addition to gene expression measurements, the new puma can also provide reliable estimation of isoform expression from Exon array data. For 3’ GeneChip data, the stability of expression measurements for low expression genes was improved. Furthermore, a level of uncertainty associated with these expression estimates can also be obtained and this measurement error can be propagated into our downstream analysis approaches to obtain improved results. With the consideration of expression measurement error in the downstream analyses, methods can be computationally demanding. The new puma package significantly improves the computational efficiency of the previous method for finding DE genes and obtains even better accuracy. As the final contribution, the new puma provides a robust clustering method which considers the withinchip measurement error and acrosschip technical and biological variance.
There are two main advantages of the new puma package. One is that the package processes Affymetrix 3’ GeneChips and Exon arrays to obtain accurate gene and isoform expression estimates with a level of uncertainty associated with these measurements. The other is that the package offers various downstream analysis approaches which make use of measurement error of expression to produce improved results at both gene and isoform level. Note that the data used for these downstream analyses is not limited to expression measurements from microarrays. The data can be expression measurement obtained from any other platform so long as a reasonable level of uncertainty can be associated with each measurement. For example, RNASeq is increasingly applied for transcript quantification [24]. Some methods proposed to analyse RNASeq data are able to provide both expression estimates and measurement uncertainty [25, 26]. The transcript expression estimates and the related measurement error output by these methods can be used directly by the downstream analysis methods of puma. For all these reasons, puma is very useful to a large number of researchers who are interested in gene and transcript expression analysis.
Methods
Gamma model for Affymetrix GeneChip Exon array data
The posterior distributions of the logged gene/isoform expression can be estimated from equation (2) and (3), respectively. The expectation of the logged expression level is then computed and approximated by a Gaussian. The Gaussian approximation to the posterior distribution is useful for propagating the probelevel measurement error in subsequent downstream analyses of both gene and isoform expression.
PMonly multimgMOS for Affymetrix 3’ GeneChip data
where b_{ i j } is a latent variable which models probespecific effects for the same type of chip.
We use a Gaussian with a mean ${\widehat{\mu}}_{\mathit{\text{ic}}}$ and a variance ${\widehat{\sigma}}_{\mathit{\text{ic}}}$ to approximate the posterior distribution of the expectation of log(y_{ i j c }). The mean of the Gaussian is taken as the estimated gene expression and the variance shows the measurement error associated with this estimate.
Improved PPLR for finding differential expressed genes
where ${s}_{\mathit{\text{ij}}}^{2}$ is the probelevel measurement error, which can be obtained from multimgMOS or PMonly multimgMOS.
We use the EM algorithm combined with a variational method to work out the model. In the Estep of PPLR, the variational distribution of λ is obtained by importance sampling which slows down the computation of the method. In contrast, the computation in the Estep of IPPLR is analytical due to the introduction of the latent variable x_{ i j }. IPPLR is therefore more computationally efficient than PPLR.
where D is the observed dataset and $\widehat{\varphi}$ is the set of ML estimates of hyperparameters. The examined transcript is upregulated in the treatment when P P L R>0.5 while downregulated when P P L R<0.5.
PUMACLUSTII for clustering of replicated gene expression
Inference can be carried out using the variational EM algorithm. Specifying the maximum and minimum numbers of components, the algorithm automatically converged to the optimal number of mixture components by employing the minimum message length (MML) principle [27] for model selection.
Availability and requirements
Project name: puma Software
Project home page: http://www.bioinf.manchester.ac.uk/resources/puma
Operating systems: Platform independent
Programming language: R, C
Other requirements: R
Any restrictions to use: it is available for free download except that puma uses C scripts of donlp [28].
Declarations
Acknowledgements
XL acknowledges support from NSFC (61170152) and Qing Lan Project. LZ acknowledges support by “the Fundamental Research Funds for the Central Universities” (CXZZ11_0217). MR was supported by BBSRC award BB/H018123/2.
Authors’ Affiliations
References
 Łabaj PP, Leparc GG, E LB, Markillie LM, S WH, P KD: Characterization and improvement of RNASeq precision in quantitative transcript expression profiling. Bioinformatics 2011,27(13):i383i391. 10.1093/bioinformatics/btr247PubMed CentralView ArticlePubMedGoogle Scholar
 Pearson RD, Liu X, Sanguinetti G, Milo M, D LN, Rattray M: puma: a bioconductor package for propagating uncertainty in microarray analysis. BMC Bioinformatics 2009, 10: 211. 10.1186/1471210510211PubMed CentralView ArticlePubMedGoogle Scholar
 Liu X, Milo M, Lawrence ND, Rattray M: A tractable probabilistic model for Affymetrix probelevel analysis across multiple chips. Bioinformatics 2005, 21: 36373644. 10.1093/bioinformatics/bti583View ArticlePubMedGoogle Scholar
 Sanguinetti G, MIlo M, Rattray M, Lawrence ND: Accounting for probelevel noise in principal component analysis of mmicroarray data. Bioinformatice 2005, 21: 37483754. 10.1093/bioinformatics/bti617View ArticleGoogle Scholar
 Liu X, Milo M, Lawrence ND, Rattray M: Probelevel measurement error improves accuracy in detecting differential gene expression. Bioinformatics 2006, 22: 21072113. 10.1093/bioinformatics/btl361View ArticlePubMedGoogle Scholar
 Liu X, Lin KK, Andersen B, Rattray M: Including probelevel uncertainty in modelbased gene expression clustering. BMC Bioinformatics 2007, 9: 98.View ArticleGoogle Scholar
 Affymetrix: Guide to Probe Logarithmic Intensity Error. 2008. [Technical note] [Technical note]Google Scholar
 Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ: Exploreation, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4: 249264. 10.1093/biostatistics/4.2.249View ArticlePubMedGoogle Scholar
 Affymetrix: Alternative Transcript Analysis Methods for Exon Arrays. 2005. (11 October 2005, date last revised) [Http://media.affymetrix.com/support/technical/whitepapers/exon_alt_transcript_analysis_whitepaper.pdf] (11 October 2005, date last revised) []Google Scholar
 Purdom E, Simpson KM, Robinson MD, Conboy JG, Lapuk AV, Speed TP: FIRMA: a method for detection of alternative splicing from exon array data. Bioinformatics 2008, 24: 17071714. 10.1093/bioinformatics/btn284PubMed CentralView ArticlePubMedGoogle Scholar
 Xing Y, Stoilov P, Kapur K, Han A, Jiang H, Shen S, Black DL, Wong WH: MADS: a new and improved method for analysis of differential alternative splicing by exontiling microarrays. RNA 2008, 14: 14701479. 10.1261/rna.1070208PubMed CentralView ArticlePubMedGoogle Scholar
 Risueño A, Fontanillo C, E DM, J DLR: GATExplorer: genomic and transcriptomic explorer; mapping expression probe to gene loci, transcripts, exons and ncRNAs. BMC Bioinformatics 2010, 11: 221. 10.1186/1471210511221PubMed CentralView ArticlePubMedGoogle Scholar
 Wu Z, Irizarry RA, Gentleman R, MartinezMurillo F, Spencer F: A modelbased background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 2004, 99: 909917. 10.1198/016214504000000683View ArticleGoogle Scholar
 Turro E, Lewin A, Rose A, Dallman MJ, Richardson S: MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using wholetranscript Affymetrix arrays. Nucleic Acids Res 2010, 38: e4. 10.1093/nar/gkp853PubMed CentralView ArticlePubMedGoogle Scholar
 Chen P, Lepikhova T, Hu Y, Monni O, Hautamiemi S: Comprehensive exon array data processing method for quantitative analysis of alternative spliced variants. Nucleic Acids Res 2011, 39: e123. 10.1093/nar/gkr513PubMed CentralView ArticlePubMedGoogle Scholar
 Li C, Wong W: Modelbased analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA 2001, 98: 3136. 10.1073/pnas.98.1.31PubMed CentralView ArticlePubMedGoogle Scholar
 Bishop CM: Pattern Recognition and Machine Learning. New York: Springer; 2006.Google Scholar
 Pearson RD: A comprehensive reanalysis of the Golden Spike data: Towards a benchmark for differential expression methods. BMC Bioinformatice 2008, 9: 164. 10.1186/147121059164View ArticleGoogle Scholar
 Zhang L, Liu X: An improved probabilistic model for finding differential gene expression. In Proceedings of the 2nd International Conference on BioMedical Engineering and Informatics, BMEI 2009. Tianjin, China; 2009.Google Scholar
 Liu X, Rattray M: Including probelevel measurement error in robust mixture clustering of replicated microarray gene expression. Stat Appl Genet Mol Biol 2010, 9: 42.Google Scholar
 Consortium M: The MicroArray Quality Control (MAQC) project shows inter and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24: 11511161. 10.1038/nbt1239View ArticleGoogle Scholar
 Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Y LK, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, L RP, Samaha RR, Shi L, Yang W, Zhang L, M GF: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 2006, 24: 11151122. 10.1038/nbt1236View ArticlePubMedGoogle Scholar
 Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNASeq experiments. BMC Bioinformatics 2010, 11: 94. 10.1186/147121051194PubMed CentralView ArticlePubMedGoogle Scholar
 Nagalakshmi U, Wang Z, Waem K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional lanscape of the yeast genome defined by RNA sequencing. Science 2008, 320: 13441349. 10.1126/science.1158441PubMed CentralView ArticlePubMedGoogle Scholar
 Katz Y, Wang ET, Airoldi EM, Burge CB: Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010, 7: 10091015. 10.1038/nmeth.1528PubMed CentralView ArticlePubMedGoogle Scholar
 Glaus P, Honkela A, Rattray M: Identifying differentially expressed transcripts from RNAseq data with biological variation. Bioinformatics 2012, 28: 17211728. 10.1093/bioinformatics/bts260PubMed CentralView ArticlePubMedGoogle Scholar
 Figueiredo MAT, Jain AK: Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 2002, 24: 381396.View ArticleGoogle Scholar
 Spellucci PDB: An SQP method for general nonlinear programs using only equality constrained subproblems. Math Program 1998, 82: 413.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.