A semi-nonparametric mixture model for selecting functionally consistent proteins
- Lianbo Yu^{1} and
- RW Doerge^{2}Email author
DOI: 10.1186/1471-2105-11-486
© Yu and Doerge; licensee BioMed Central Ltd. 2010
Received: 11 April 2010
Accepted: 28 September 2010
Published: 28 September 2010
Abstract
Background
High-throughput technologies have led to a new era of proteomics. Although protein microarray experiments are becoming more common place there are a variety of experimental and statistical issues that have yet to be addressed, and that will carry over to new high-throughput technologies unless they are investigated. One of the largest of these challenges is the selection of functionally consistent proteins.
Results
We present a novel semi-nonparametric mixture model for classifying proteins as consistent or inconsistent while controlling the false discovery rate and the false non-discovery rate. The performance of the proposed approach is compared to current methods via simulation under a variety of experimental conditions.
Conclusions
We provide a statistical method for selecting functionally consistent proteins in the context of protein microarray experiments, but the proposed semi-nonparametric mixture model method can certainly be generalized to solve other mixture data problems. The main advantage of this approach is that it provides the posterior probability of consistency for each protein.
Background
Over the last decade or longer, microarray technology has been used for measuring gene expression and has greatly impacted biomarker discovery [1], transcription factor identification [2], the assessment of gene interactions [3], and the detection of biological pathways [4]. Despite the massive application of microarrays to transcriptome applications there are limitations to the extent of the conclusions that can be made. Messenger RNA (mRNA) is the intermediate product of genes, with proteins being the final products and the key factors of metabolism. Although the levels of mRNA and protein for a gene are related they are not always highly correlated, which can be due to many reasons, e.g., translation rate, protein stability, and post-translational modification, etc. [5]. Given that the motivation and goal of many experiments is to understand not only the function of genes, but the network of genes that encode proteins, the abundance of proteins themselves are of increasing interest. Toward this end, microarray technology when adapted to proteins, are known as protein microarrays, and have been developed and widely used to assess the abundance of proteins [6–12]. The similarities between microarray technology as applied to gene expression [13], and as applied to protein abundance, are the same in that improved accuracy and precision, as well as design issues and normalization techniques for protein microarrays have been established [14, 15].
Screening and identifying proteins as potential medical diagnostics and disease classification biomarkers is the main motivation of many protein microarray experiments [16–21]. The precursor to any successful screening application, and an essential issue that must be resolved to ensure that the accurate protein abundance measurements can be obtained by protein microarrays, is the consistency of a protein to report hybridization abundance. The protein itself is the probe on the array, and since proteins have a complex three dimensional structure, the structure itself, as well as the orientation of a protein, need to be retained. Toward this end, it is highly unlikely that every protein will be functional since different proteins often require different environment conditions for maintaining structures, and are typically much less stable than DNA. If the three dimensional nature of the structure is lost, or the required functional portion of the protein is not available to bind its target protein (i.e., the sample), the target protein abundance measurement will be much smaller than it should be, or missed all together. Proteins whose structure or function are not maintained when attached to the array as probes are called inconsistent proteins, and if used provide inflated biomarker error rates (i.e., false positive rate and false negative rate). Alternatively, proteins that retain their structure and function are called consistent proteins and are desirable as probes on the array, and ultimately potential biomarkers. As such the selection of proteins that maintain functional consistency across experiments is a major and necessary requirement in the design and analysis of protein microarray experiments [17].
Certainly, high-throughput chemical validation of protein consistency is possible, but it is expensive and time consuming. Toward this end it is possible to statistically estimate protein consistency. In its simplest form, Pearson's correlation coefficient has been employed as a consistency measure in an antibody microarray study by Miller et al. [17], but it only measures the linearity of repeated measurements, and therefore is limited in its usefulness. A concordance correlation coefficient that is able to measure the consistency of repeated measurements was proposed by Lin [22], and later expanded to a total deviation index (TDI) [23], which provides a boundary within which a certain required percentage of differences between paired observations is obtained while controlling the error rate. As described by Lin [24], TDI and the concordance correlation coefficient provide the same information, but from different perspectives, and thus share their limitations. Namely, both the concordance correlation coefficient and TDI only demonstrate good asymptotic properties under the assumption of normality; a reality that is often questionable in application. Furthermore, the comparability of concordance correlation coefficients across proteins requires the ranges of the abundance measurements of proteins to be similar, which is not practical in large scale experiments [25]. To address the challenges and issues that are associated with identifying functionally consistent proteins, we propose a new statistic based on variance components from an analysis of variance (ANOVA) model. We rely on a mixture model to achieve this goal. Applications of mixture models in biology have proven to be excellent for separating data into the correct number of classes. For example, Efron et al. [26] proposed a two-component mixture model for testing differential expression. In this application the distributions of the t-statistics from both differentially expressed genes and non-differentially expressed genes were estimated by a nonparametric method, but the tail probabilities were not able to be estimated accurately. Toward this end the accuracy of estimating the tail probability was improved by using a two-component mixture model Pan et al. [27] where a finite normal mixture was assumed for each component. For microarray data it certainly is possible to simulate test statistics under the null hypothesis (i.e., a single component) using permutation theory since the treatment conditions for testing differences are known. However, for protein array data the first challenge is to identify proteins that are consistent, and then work only with these data. In other words, we are focusing on separating proteins into inconsistent and consistent classes, and then using only the informative proteins (i.e., consistent proteins) to address the biological question(s). To achieve this we propose a novel two-component semi-nonparametric mixture model. Simulations demonstrate the performance of the proposed approach and provide food for thought when designing future protein microarray experiments. We also apply the proposed approach to real data for the purpose of demonstrating its usefulness.
Results and Discussion
Simulations were conducted for the purpose of providing insight into the performance and value of the proposed semi-nonparametric approach. Data were simulated from known consistency classifications. Data were analysed with the proposed approach and the number of times proteins are correctly classified is recorded. From these simulation results, false discovery rate, as well as false non-discovery rate were calculated and are discussed.
A power study
Nine different simulation scenarios.
Component 1 | Component 2 | ||||||||
---|---|---|---|---|---|---|---|---|---|
model | distance | ϕ _{ 11 } | ϕ _{ 12 } | µ _{ 1 } | σ _{ 1 } | ϕ _{ 21 } | ϕ _{ 22 } | µ _{ 2 } | σ _{ 2 } |
1 | D = 1 | π/2 | π/2 | 12 | 2 | π/2 | π/2 | 17 | 3 |
D = 2 | π/2 | π/2 | 12 | 2 | π/2 | π/2 | 24 | 4 | |
2 | D = 1 | π/2 | π/2 | 12 | 2 | 2.17 | π/2 | 19.4 | 3 |
D = 2 | π/2 | π/2 | 12 | 2 | 2.17 | π/2 | 26.7 | 4 | |
3 | D = 1 | π/2 | π/2 | 12 | 2 | 2 | 2.75 | 18 | 2 |
D = 2 | π/2 | π/2 | 12 | 2 | 2 | 2.75 | 30.3 | 4 | |
4 | D = 1 | 0.97 | π/2 | 9.8 | 2.3 | π/2 | π/2 | 17 | 3 |
D = 2 | 0.97 | π/2 | 9.8 | 2.3 | π/2 | π/2 | 24 | 4 | |
5 | D = 1 | 0.97 | π/2 | 10 | 2 | 2.17 | π/2 | 18.15 | 2.5 |
D = 2 | 0.97 | π/2 | 10 | 2 | 2.17 | π/2 | 26.15 | 4 | |
6 | D = 1 | 0.97 | π/2 | 9.8 | 2.3 | 2 | 2.75 | 19.8 | 2.5 |
D = 2 | 0.97 | π/2 | 9.8 | 2.3 | 2 | 2.75 | 31.3 | 4 | |
7 | D = 1 | 4.1 | 0.9 | 9.7 | 1.8 | π/2 | π/2 | 17 | 3 |
D = 2 | 4.1 | 0.9 | 9.7 | 1.8 | π/2 | π/2 | 24 | 4 | |
8 | D = 1 | 4.1 | 0.9 | 9.7 | 1.8 | 2.17 | π/2 | 20.5 | 3.5 |
D = 2 | 4.1 | 0.9 | 9.7 | 1.8 | 2.17 | π/2 | 28.5 | 4.6 | |
9 | D = 1 | 4.1 | 0.9 | 10 | 2 | 2 | 2.75 | 19.8 | 2.5 |
D = 2 | 4.1 | 0.9 | 10 | 2 | 2 | 2.75 | 31.3 | 4 |
and μ_{1} and μ_{2} represent the means of two components respectively, while σ_{1} and σ_{2} represent the standard deviations of two components, respectively. Under each combination of model settings, 1000 data sets were generated.
As expected, higher power is associated with larger sample size. Dramatically higher power is achieved when the distance between the two components is increased from 1 to 2 simply because the null hypothesis (18) is easier to reject when the mixture components are well separated. Furthermore, AIC tends to choose a larger model that has a larger likelihood ratio test statistic (19) when compared to the smaller model chosen by BIC or HQ [31], therefore the use of AIC yields higher power than BIC or HQ.
Simulated Data Scenario
where $C=-lo{g}_{2}\frac{{\displaystyle {\sum}_{j,k}{2}^{{G}_{j}+{\text{S}}_{k(j)}}}}{40}$, i = 1,2, j = 1,2,3,4, k = 1,2, ⋯, 10, l = 1,2, ⋯, 6. y_{ ijkl } represents the l th log signal ratio of patient k to the reference sample in group j for experiment i, θ_{ jk } represents the mean log signal ratio of the patient k sample to the reference in group j, $\overline{\mu}\mathrm{..}$ represents the average of μ_{ jk } 's over j and k, δ_{ ijk } represents the random error of experiment i for patient k in group j, and ϵ _{ ijkl } represents the l th random error within experiment i for patient k in group j. Assume that δ_{ ijk } is from a normal distribution with mean zero and variance ${\sigma}_{\delta}^{2}$, ϵ _{ ijkl } is from a normal distribution with mean zero and variance ${\sigma}_{e}^{2}$.
The model parameter settings for the simulation were taken from the aforementioned Zhou et al. antibody microarray data [32], such that the ${{G}^{\text{'}}}_{j}\text{s}$ were sampled from uniform distribution U[-1, 1], ${\sigma}_{Sj}^{2}$ = (0.4v)^{2}, where v were sampled from U[0.5, 2] for different j, ${\sigma}_{\delta}^{2}$ = 0.2^{2}, and ${\sigma}_{e}^{2}$ = 0.15^{2}. The hybridization abundance data for 300 functionally consistent proteins on each array were simulated from model (2) and (3) (see Methods). Fifty percent of these simulated proteins were randomly chosen to be functionally inconsistent proteins by adding a random between-array deviation with mean 0 and standard deviation drawn from U [0.05, 0.5], as well as a random within-array deviation with mean 0 and standard deviation taken from U[0.1, 0.4], to a randomly chosen number of separate arrays. Protein classification resulted from estimating the variance components in the ANOVA model (4; see Methods), and modelling the between and within-array variance component statistic with a semi-nonparametric mixture model. The main advantage of the proposed mixture model approach is that it provides the posterior probability of consistency for each protein which in turn establishes the classification rule, as well as estimates the respective error rates.
We compare the proposed semi-nonparametric approach to the work of Miller et al. [17] who selected functionally consistent proteins using an arbitrary cutoff value for Pearson's correlation coefficient. It is important to realize that their cutoff value is not statistically justified, nor does it provide error rate control. We calculated Pearson's correlation coefficients (PCC) for each of the 1000 simulated data sets, and reported in the average FDR and FNR results in Figures 2 and 3, respectively. Not surprisingly, larger true error rates are experienced for the Pearson's correlation coefficient when compared to the variance component (VC) statistic that is based on the between- and within-array variation. Essentially, the variation in the random error(s) captures the difference between consistent and inconsistent proteins allowing the variance estimate based on between- and within-array variation to provide information about protein consistency. Based on this rationale, the misclassification error rates of the proposed approach are expected to be smaller than the Pearson's correlation coefficient. As can be seen for Pearson's correlation coefficient, when the number of inconsistent proteins is 160, the false discovery rate is 0.310 (Figure 2) and the false non-discovery rate is 0.283 (Figure 3). By comparison, based on the between- and within-array variation statistics the false discovery rate is 0.083 and the false non-discovery rate is 0.024. The same phenomena occur at any other number of inconsistent proteins (Figures 2 - 3).
Biological and technical replication
A case study
We applied our method to data from an antibody microarray experiment from Zhou et al. [32]. Two-color rolling-circle amplification (RCA) was used to assess thirty five antibody proteins from duplicate sets of twenty four serum samples using antibody microarrays prepared on nitrocellulose. The twenty four serum samples consist of six liver cancer patients, six pre-cirrhotic patients, six cirrhotic, and six normals. Each antibody has 5 replicates on the array.
Discussion
The challenge of selecting and employing functionally consistent proteins for protein microarray experiments is complicated by the three-dimensional structure of the protein itself. Specifically, the proteins that are spotted on to the array as probes (during the fabrication of the array) need to maintain functional consistency for each sample hybridized to the array, as well as across experiments. Identifying and employing functionally consistent proteins continues to be a major and necessary concern in both the design and analysis of protein microarray experiments. To address this concern, a novel statistical approach based on modelling the between- and within-array variation, using a semi-nonparametric mixture model, is presented for the purpose of discriminating functionally consistent proteins. Of course, once functionally consistent proteins have been identified and the array fabricated, it is then necessary to develop additional statistical methods that can detect proteins of differing abundance.
After classifying proteins as consistent and inconsistent proteins, the abundance data from functionally consistent proteins can be used for differential protein abundance/expression analysis. The semi-nonparametric mixture model that was initially proposed to select functionally consistent proteins (5) can also be adapted for detecting differentially expressed proteins. Specifically, one component of the mixture identifies the non-differentially expressed proteins, while the other component acknowledges the differentially expressed proteins. The semi-nonparametric mixture model lies between parametric and nonparametric approaches since it does not put distributional assumption on the data themselves, but on the test statistics. The semi-nonparametric mixture model as applied to differential expression analysis was investigated and shows great performance [33].
The proposed semi-nonparametric mixture model is a novel and broadly applicable approach in the mixture model literature. For applications to either identifying functionally consistent proteins, or testing for differential protein abundance between samples, only two-component mixture models are employed. The extension of the semi-nonparametric mixture model to a multiple-component and multivariate mixture model has potential to address high-dimensional problems for the purpose of classification, and it has potential to work for a variety of data problems since it provides the flexibility necessary for model fitting.
Conclusions
A novel semi-nonparametric mixture model is proposed for the purpose of selecting functionally consistent proteins that can be used for protein microarray experiments. The proposed approach is able to attach a posterior probability of being inconsistent to each protein, from which false discovery and false non-discovery rates can be estimated. We validated the performance of our method through simulations. Additionally, the characteristics of the semi-nonparametric mixture model were studied by a power analysis. Our novel method provides an improvement in the accuracy of proteins that are selected as probes on a protein microarray, as well as an alternative approach to studying a variety of additional mixture data problems.
Methods
ANOVA model
Consider a repeated protein microarray experiment. There are m proteins (probes) spotted on n arrays. These n arrays are used to hybridize material for n test samples from J different patient groups. The same amount of a reference sample is mixed with each test sample, and each mixture is hybridized on one of n arrays. The background corrected abundance ratios of sample to reference are obtained for each probe on each array and properly normalized. There are several unique normalization methods proposed for protein microarray data, and the comparison of them are presented by Hamelinck [14].
where Y_{ ijkl } represents the protein abundance ratio between sample and reference of replicate l for sample k within group j in experiment i, μ represents the overall mean of the expression ratios, T_{ j } represents the fixed effects of group j with constraint ∑ _{ j } T_{ j } = 0, S_{ k }_{(j)} represents the random effects of sample k within group j with mean 0 and variance ${\sigma}_{Sj}^{2}$, δ_{ ijk } represents the normally distributed random between-experiment effect of experiment i for sample k in group j with mean 0 and variance ${\sigma}_{\delta}^{2}$, ϵ_{ ijkl } represents the normally distributed random error with mean 0 and variance ${\sigma}_{\u03f5}^{2}$.
The total of the between-array variation ${\sigma}_{\delta}^{2}$ and the within-array variation ${\sigma}_{\u03f5}^{2}$ represents the variation due to random error. Inconsistent proteins inflate both the between-array and within-array variation. By least-squares estimation of the ANOVA model (4), the estimation of ${\sigma}_{\delta}^{2}+{\sigma}_{\u03f5}^{2}$ is obtained for each protein and used for classification via a novel semi-nonparametic mixture model approach.
Semi-nonparametric mixture model
The total of the between-array variation, ${\sigma}_{\delta}^{2}$, and the within-array variation, ${\sigma}_{\u03f5}^{2}$, represents the variation due to random error in the ANOVA model (4) and can be estimated for each protein. To select functionally consistent proteins, we assume that all spotted proteins on the arrays represent both functionally consistent and functionally inconsistent proteins with certain proportions that are not too small to be negligible. By modelling the collection of consistency statistics ($\hat{{\sigma}_{\delta}^{2}}+\hat{{\sigma}_{\u03f5}^{2}}$) from each protein, using a mixture distribution, it is possible to estimate the consistent or inconsistent status for every protein that is represented on the array. Biologically and technically, consistent proteins are very reliable and are able to generate reproducible measurements between experiments. Because the proposed consistency statistic captures the differences in both consistent and inconsistent proteins, it should be smaller for consistent proteins simply because they have less variation (i.e., are reliable and reproducible) than the inconsistent proteins. Furthermore, statistically, when fitting a two-component mixture model and estimating two components simultaneously, the components have to be identifiable. Therefore, we assume that the mean of the statistics for consistent proteins is smaller than the mean of the statistics for inconsistent proteins, and that the statistics from the same class will be aggregated. For this application, defining a selection criterion is equivalent to finding the classification rule between two classes.
A mixture model with semi-nonparametric densities is proposed, and the Expectation-maximization (EM) quasi-Newton algorithm [34, 35] is employed to estimate the parameters. Inferences are then drawn from the estimated mixture model.
where θ_{0} and θ_{1} are the parameters for two densities, f_{0}(z|θ_{0}) is the density of the ${{z}^{\text{'}}}_{i}\text{s}$ that are the statistics for functionally consistent proteins, f_{1}(z|θ_{1}) is the density of the ${{z}^{\text{'}}}_{i}\text{s}$ that are the statistics for functionally inconsistent proteins, λ_{0} is the proportion of the functionally consistent proteins, λ_{1} is the proportion of the functionally inconsistent proteins, and the sum of λ_{0} and λ_{1} is 1. For a mixture model in (5), an order between the means of the two components is assumed. Specifically, let μ_{0} and μ_{1} represents the means of two components respectively, and assume μ_{0}≤ μ_{1}.
where i = 0, 1, K represents a tuning parameter that is nonnegative, ϕ(·) represents a standard normal density.
Here θ_{ i } = (ϕ_{1}, ϕ_{2}, u_{ i }, v_{ i } ), where i = 0, 1.
Based on the log-likelihood (11), maximization techniques can be employed to find the estimates of model parameters, and then classification methods can be implemented based on the estimates of the mixture model.
EM-algorithm
After substituting in the expectations of missing values, the log likelihood in (11) is maximized (the M-step) by a gradient algorithm that is accelerated by a quasi-Newton method [35]. Given initial values of the parameters, the EM-algorithm iterates between the E-step and the M-step until a convergence criterion is met or until a maximum iteration number is reached.
Determining the number of mixture components
Before applying the two-component mixture model to classify proteins (as consistent or inconsistent), we need to test that the number of components is compatible with the two component mixture model, and that the components can be identified.
where ${\widehat{\theta}}_{0}$ and ${\widehat{\theta}}_{\alpha}$ represent the estimated parameters under the null and alternative hypothesis, respectively. ${\widehat{\theta}}_{0}$ and ${\widehat{\theta}}_{\alpha}$ are obtained by choosing the best model via model selection when the two-component mixture model is fit to the data. A bootstrap method is performed to approximate the null distribution of -2logλ[39], and to provide a significance threshold for the likelihood ratio test statistic. Specifically, when estimating the null distribution of the likelihood ratio test statistics, we first bootstrap 500 data sets from the estimated distribution under the null hypothesis, and then perform a likelihood ratio test (19) for each simulated data set.
If the test statistic is significant, the two-component mixture model is suitable to fit the data in order to select functionally consistent proteins. Failure to reject the null hypothesis (18) indicates that consistent and inconsistent proteins are not separable, or that there is only one type of protein on the array. For this situation, specific chemical validation techniques have to be employed in order to provide additional consistency information.
Model selection
where logL is the log-likelihood, p is the number of free parameters in the model, and C(N) is a function of sample size N. AIC requires C(N) equals constant 2, BIC takes C(N) = logN, and HQ has C(N) = 2loglogN.
Classification rule and error rate control
where γ ∈ [0, 1] is the penalty for false positive, (1 - γ) is the penalty for false negative, and d is the number of declared inconsistent proteins by the critical value c*.
Classification error rates
Classification outcomes: consistent and inconsistent proteins.
Classified as consistent | Classified as inconsistent | Total | |
---|---|---|---|
Consistent | U | V | m _{0} |
Inconsistent | T | S | m - m _{0} |
Total | m - R | R | m |
Declarations
Acknowledgements
We wish to thank Brian B. Haab (The Van Andel Research Institute) for stimulating discussions on exploring the number of replicates in design of experiments.
Authors’ Affiliations
References
- Halvorsen O, Oyan A, Bo T, Olsen S, Rostad K, Haukaas S, Bakke A, Marzolf B, Dimitrov K, Stordrange L, Lin B, Jonassen I, Hood L, Akslen L, Kalland K: Gene expression profiles in prostate cancer: association with patient subgroups and tumour differentiation. International Journal of Oncology 2005, 26: 329–336.PubMedGoogle Scholar
- Lee S, Huang K, Palmer R, Truong V, Herzlinger D, Kolquist K, Wong J, Paulding C, Yoon S, Gerald W, Oliner J, Haber D: The Wilms tumor suppressor WT1 encodes a transcriptional activator of amphiregulin. Cell 1999, 98: 663–673. 10.1016/S0092-8674(00)80053-7View ArticlePubMedGoogle Scholar
- Nakahara H, Nishimura S, Inoue M, Hori G, Amari S: Gene interaction in DNA microarray data is decomposed by information geometric measure. Bioinformatics 2003, 19: 1124–1131. 10.1093/bioinformatics/btg098View ArticlePubMedGoogle Scholar
- Darvish A, Najarian K: Prediction of regulatory pathways using rnRNA expression and protein interaction data: application to identification of galactose regulatory pathway. Biosystems 2006, 83: 125–135. 10.1016/j.biosystems.2005.06.013View ArticlePubMedGoogle Scholar
- Gygi S, Rochon Y, Franza B, Aebersold R: Correlation between protein and mRNA abundance in yeast. Molecular Cell Biology 1999, 19: 1720–1730.View ArticleGoogle Scholar
- Lueking A, Horn M, Eickhoff H, Bussow K, Lehrach H, Walter G: Protein microarrays for gene expression and antibody screening. Anal Biochem 1999, 270: 103–111. 10.1006/abio.1999.4063View ArticlePubMedGoogle Scholar
- Ge H: UPA, a universal protein array system for quantitative detection of protein-protein, protein-DNA, protein-RNA and protein-ligand interactions. Nucleic Acids Res 2000, 28: e3. 10.1093/nar/28.2.e3View ArticlePubMedPubMed CentralGoogle Scholar
- MacBeath G, Schreiber S: Printing proteins as microarrays for high-throughput function determination. Science 2000, 289: 1760–1763.PubMedGoogle Scholar
- Zhu H, Klemic J, Chang S, Bertone P, Casamayor A, Klemic K, Smith D, Gerstein M, Reed M, Snyder M: Analysis of yeast protein kinases using protein chips. Nature Genetics 2000, 26: 283–289. 10.1038/81576View ArticlePubMedGoogle Scholar
- Kusnezow W, Banzon V, Schroder C, Schaal R, Hoheisel J, Ruffer S, Luft P, Duschl A, Syagailo Y: Antibody microarray-based profiling complex specimens: systematic evaluation of labeling strategies. Proteomics 2007, 7: 1786–1799. 10.1002/pmic.200600762View ArticlePubMedGoogle Scholar
- Domnanich P, Sauer U, Pultar J, Preininger C: Protein microarray for the analysis of human melanoma biomarkers. Sensors and Actuators B: Chemical 2009, 139: 2–8. 10.1016/j.snb.2008.06.043View ArticleGoogle Scholar
- Rimini R, Schwenk J, Sundberg M, Sjoberg R, Klevebring D, Gry M, Uhlen M, Nilsson P: Validation of serum protein profiles by a dual antibody array approach. Journal of Proteomics 2009, 73: 252–266. 10.1016/j.jprot.2009.09.009View ArticlePubMedGoogle Scholar
- Yang Y, Speed T: Design issues for cDNA microarray experiments. Nature Reviews - Genetics 2002, 3: 579–588.PubMedGoogle Scholar
- Hamelinck D, Zhou H, Li L, Verweij C, Dillon D, Feng Z, Costa J, Haab B: Optimized normalization for antibody microarrays and applications to serum-protein profiling. Molecular and Cellular Proteomics 2005, 4: 773–784. 10.1074/mcp.M400180-MCP200View ArticlePubMedGoogle Scholar
- Daly D, Anderson K, Seurynck-Servoss S, Gonzalez R, White A, Zangar R: An Interal Calibration Method for Protein-Array Studies. Statistical Applications in Genetics and Molecular Biology 2010, 9: Article 14. 10.2202/1544-6115.1506View ArticlePubMedGoogle Scholar
- Sreekumar A, Nyati M, Varambally S, Barrette T, Ghosh D, Lawrence S, Chinnaiyan A: Profiling of cancer cells using protein microarrays: Discovery of novel radiation-regulated proteins. Cancer Research 2001, 61: 7585–7593.PubMedGoogle Scholar
- Miller J, Zhou H, Kwekel J, Cavallo R, Burke J, Butler E, Teh B, Haab B: Antibody microarray profiling of human prostate cancer sera: Antibody screening and identification of potential biomarkers. Proteomics 2003, 3: 56–63. 10.1002/pmic.200390009View ArticlePubMedGoogle Scholar
- Belov L, Mulligan S, Barber N, Woolfson A, Scott M, Stoner K, Chrisp J, Sewell W, Bradstock K, Bendall L, Pascovici D, Thomas M, Erber W, Huang P, Sartor M, Young G, Wiley J, Juneja S, Wierda W, Green A, Keating M, Christopherson R: Analysis of human leukaemias and lymphomas using extensive immunophenotypes from an antibody microarray. British Journal of Haematology 2006, 135: 184–197. 10.1111/j.1365-2141.2006.06266.xView ArticlePubMedGoogle Scholar
- Ingvarsson J, Wingren C, Carlsson A, Ellmark P, Wahren B, Engstrom G, Harmenberg U, Krogh M, Peterson C, Borrebaeck C: Detection of pancreatic cancer using antibody microarray-based serum protein profiling. Proteomics 2008, 8: 2211–2219. 10.1002/pmic.200701167View ArticlePubMedGoogle Scholar
- Han M, Oh Y, Kang J, Kim Y, Seo S, Kim J, Park K, Kim H: Protein profiling in human sera for identification of potential lung cancer biomarkers using antibody microarray. Proteomics 2009, 9: 5544–5552. 10.1002/pmic.200800777View ArticlePubMedGoogle Scholar
- Song Q, Liu G, Hu S, Zhang Y, Tao Y, Han Y, Zeng H, Huang W, Li F, Chen P, Zhu J, Hu C, Zhang S, Li Y, Zhu H, Wu L: Novel autoimmune hepatitis-specific antoantigens identified using protein microarray technology. Journal of proteome research 2010, 9: 30–39. 10.1021/pr900131eView ArticlePubMedPubMed CentralGoogle Scholar
- Lin L: A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989, 45: 255–268. 10.2307/2532051View ArticlePubMedGoogle Scholar
- Lin L: Total deviation index for measuring individual agreement with applications in laboratory performance and bioequivalence. Statistics in Medicine 2000, 19: 255–270. 10.1002/(SICI)1097-0258(20000130)19:2<255::AID-SIM293>3.0.CO;2-8View ArticlePubMedGoogle Scholar
- Lin L, Hedayat A, Sinha B, Yang M: Statistical methods in assessing agreement: models, issues, tools. Journal of the American Statistical Association 2002, 97: 257–270. 10.1198/016214502753479392View ArticleGoogle Scholar
- Lin L, Chinchilli V: Rejoinder to the letter to the editor from Atkinson and Nevill. Biometrics 1997, 53: 777–778.Google Scholar
- Efron B, Tibshirani R, Storey J, Tusher V: Empirical Bayes analysis of a microarray experiment. J Amer Statist Assoc 2001, 96: 1151–1160. 10.1198/016214501753382129View ArticleGoogle Scholar
- Pan W, Lin J, Le C: A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics 2003, 3: 117–124. 10.1007/s10142-003-0085-7View ArticlePubMedGoogle Scholar
- Akaike H: Information theory and an extension of the maximum likelihood principle. 2nd International Symposium on Information Theory 1973, 473–476.Google Scholar
- Schwartz S: Estimating the dimension of a model. Annals of Statistics 1978, 6: 461–464. 10.1214/aos/1176344136View ArticleGoogle Scholar
- Hannan E: Rational transfer function approximation. Statistical Science 1987, 2: 1029–1054.Google Scholar
- Zhang D, Davidian M: Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 2001, 57: 795–802. 10.1111/j.0006-341X.2001.00795.xView ArticlePubMedGoogle Scholar
- Zhou H, Bouwman K, Schotanus M, Verweij C, Marrero J, Dillon D, Costa J, Lizardi P, Haab B: Two-color, rolling-circle amplification on antibody microarrays for sensitive, multiplexed serum-protein measurements. Genome Biology 2004, 5: R28. 10.1186/gb-2004-5-4-r28View ArticlePubMedPubMed CentralGoogle Scholar
- Yu L: Statistical issues in protein microarray analysis. PhD thesis. Purdue University, West Lafayette, IN, USA; 2006.Google Scholar
- Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B 1977, 39: 1–38.Google Scholar
- Lange K: A quasi-newton acceleration of the EM algorithm. Statistica Sinica 1995, 5: 1–18.Google Scholar
- Gallant A, Nychka D: Seminonparametric maximum likelihood estimation. Econometrica 1987, 55: 363–390. 10.2307/1913241View ArticleGoogle Scholar
- Davidon W: Variable metric methods for minimization. AEC Research and Development Report ANL-5990, Argonne National Laboratory 1959.Google Scholar
- Ledwina T: Data-driven version of Neyman's smooth test of fit. Journal of the American Statistical Association 1994, 89: 1000–1005. 10.2307/2290926View ArticleGoogle Scholar
- McLachlan G: On bootstrapping the likelihood ratio test statistics for the number of components in a normal mixture. Journal of the Royal Statistical Society Series C 1987, 36: 318–324.Google Scholar
- Newton M, Noueiry A, Sarkar D, Ahlquist P: Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 2004, 5: 155–176. 10.1093/biostatistics/5.2.155View ArticlePubMedGoogle Scholar
- Storey J: A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B 2002, 64: 479–498. 10.1111/1467-9868.00346View ArticleGoogle Scholar
- Genovese C, Wasserman L: Operating characteristics and extensions of the false discovery rate procedure. Journal of Royal Statistical Society, Ser B 2002, 64: 499–517. 10.1111/1467-9868.00347View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.