Development of sparse Bayesian multinomial generalized linear model for multi-class prediction

Madahian, Behrouz; Deng, Lih Y; Homayouni, Ramin

doi:10.1186/1471-2105-15-S10-P14

Volume 15 Supplement 10

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Poster presentation
Open access
Published: 29 September 2014

Development of sparse Bayesian multinomial generalized linear model for multi-class prediction

Behrouz Madahian¹,
Lih Y Deng¹ &
Ramin Homayouni^2,3

BMC Bioinformatics volume 15, Article number: P14 (2014) Cite this article

1067 Accesses
2 Citations
Metrics details

Background

Gene expression profiling has been used for many years to classify samples and to gain insights into the molecular mechanisms of phenotypes and diseases. A major challenge in expression analysis is caused by the large number of variables assessed compared to relatively small sample sizes. In addition, identification of markers that accurately predict multiple classes of samples, such as those involved in the progression of cancer or other diseases, remains difficult.

Materials and methods

In this study, we developed a multinomial Probit Bayesian model which utilized the double exponential prior to induce shrinkage and reduce the number of covariates in the model [1, 2]. A fully Bayesian hierarchical model was developed in order to facilitate Gibbs sampling which takes into account the progressive nature of the response variable. Gibbs sampling was performed in R for 100k iterations and the first 20k were discarded as burn-in. The method was applied to a published dataset on prostate cancer progression downloaded from Gene Expression Omnibus at NCBI (GSE6099) [3]. The data set contained 99 prostate cancer cell types in four different progressive stages. The dataset was randomly divided into training (N=50) and test (N=49) groups such that each group contained an equal number of each cell type. Before applying our model, for each gene we performed ordinal logistic regression. Genes were ranked based on the p-value of association. Using a cutoff value of 0.05 after Benjamini and Hochberg FDR correction resulted in a final set of 398 genes.

Results

Figure 1 shows the posterior mean of parameters associated with each gene. Using the top ten genes obtained from our model, we were able to achieve 86% classification accuracy in the training group and 82% accuracy in the test group. To test the robustness of the model, we switched the training and test groups and evaluated the classification accuracy. We obtained 88% classification accuracy on the new training group and 86% accuracy on the new test group. The classification accuracy by tumor type is shown in Table 1. Taken together, these results suggest that the Bayesian Multinomial Probit model applied to cancer progression data allows for reasonable subclass prediction.

Table 1 Classification accuracy of prostate cancer subtypes in the train and test groups.

Full size table

Conclusion

Our future plan is to perform resampling on the selection of training and test groups in order to obtain more robust results and to compare the performance of the model to other popular classifiers such as Support Vector Machine and Random Forest.

References

Park T, Casella G: The Bayesian lasso. J Am Stat Assoc. 2008, 103: 681-686. 10.1198/016214508000000337.
Article CAS Google Scholar
Albert J, Chib S: Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc. 1993, 88: 669-679. 10.1080/01621459.1993.10476321.
Article Google Scholar
Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, Kalyana-Sundaram S, Wei JT, Rubin MA, Pienta KJ, Shah RB, Chinnaiyan AM: Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007, 39 (1): 41-51. 10.1038/ng1935.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This work was supported by the University of Memphis Center for Translational Informatics and the Assisi Foundation of Memphis.

Author information

Authors and Affiliations

Department of Mathematical Sciences, University of Memphis, Memphis, TN, 38152, USA
Behrouz Madahian & Lih Y Deng
Bioinformatics Program, University of Memphis, Memphis, TN, 38152, USA
Ramin Homayouni
Department of Biology, University of Memphis, Memphis, TN, 38152, USA
Ramin Homayouni

Authors

Behrouz Madahian
View author publications
You can also search for this author in PubMed Google Scholar
Lih Y Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ramin Homayouni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramin Homayouni.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Madahian, B., Deng, L.Y. & Homayouni, R. Development of sparse Bayesian multinomial generalized linear model for multi-class prediction. BMC Bioinformatics 15 (Suppl 10), P14 (2014). https://doi.org/10.1186/1471-2105-15-S10-P14

Download citation

Published: 29 September 2014
DOI: https://doi.org/10.1186/1471-2105-15-S10-P14

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Development of sparse Bayesian multinomial generalized linear model for multi-class prediction

Background

Materials and methods

Results

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Development of sparse Bayesian multinomial generalized linear model for multi-class prediction

Background

Materials and methods

Results

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us