Semi-supervised class discovery using quantitative phenotypes – CVD as a case study

Steinfeld, Israel; Navon, Roy; Ardigò, Diego; Zavaroni, Ivana; Yakhini, Zohar

doi:10.1186/1471-2105-8-S8-S6

Volume 8 Supplement 8

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Oral presentation
Open access
Published: 20 November 2007

Semi-supervised class discovery using quantitative phenotypes – CVD as a case study

Israel Steinfeld¹,
Roy Navon¹,
Diego Ardigò²,
Ivana Zavaroni² &
…
Zohar Yakhini¹

BMC Bioinformatics volume 8, Article number: S6 (2007) Cite this article

2865 Accesses
3 Citations
Metrics details

Background

Genomic studies typically focus on comparing disease to healthy population. In our work, various parameters, including peripheral blood mononuclear (PBM) cells expression profiling, were stratified solely from healthy subjects. To analyze the data we developed a semi-supervised class discovery method, constraining the search space to patterns that respect an order induced by the rich quantitative annotations. We show that our method is robust enough to detect known clinical parameters with accordance to expected values. We also use our method to elucidate cardiovascular disease (CVD) putative risk factors.

Methods

One of the basic tasks in gene expression data analysis is finding differentially expressed genes between 2 classes (such as tumor vs. normal). Among the various methods for measuring differential expression (e.g. Student t-test), we focus on TnoM [1] which is a non-parametric statistical score that affords an exact p-value.

When many partitions of the sample set are possible, one would like to assess the statistical significance of any partition considered, and to compare between partitions. In overabundance [2] analysis the exact p-value of the TNoM score is used to estimate the expected number of differentially expressed genes. By comparing to the actually observed number we can calculate the overabundance of differentially expressed genes. This quantity can be used as a figure of merit: higher overabundance indicating a more profound change in the cell state.

Typical class discovery in gene expression data searches over all possible partitions of the set of samples and uses heuristic methods to do so [2, 3]. Given any quantitative phenotype, we can constrain the search space to patterns that respect the order it induces on the set of samples. This approach reduces the search space from O(3ⁿ) to O(n²) making the search feasible (Figure 1).

Results

We applied our method to PBMC gene expression profiling data, collected from 49 healthy subjects. Clinical, laboratory measurement and CVD prognostic indicators were also collected, adding more then 160 phenotypic parameters for each subject. One of the interesting phenotypic parameters is Carotid Intima-Media Thickness (IMT) [4], a CVD prognostic indicator. Using semi-supervised class discovery with the IMT values we received IMT threshold levels that are in agreement with the known prognosis values (Figure 1). The differentially expressed genes in this partition were enriched with GO terms related to vesicle-mediated transport (p < 10^-8) and glycolysis (p < 10^-6), giving mechanistic insights to the difference between the two cell states.

References

Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z: Tissue classification with gene expression profiles. J Comput Biol 2000, 7(3–4):559–83. 10.1089/106652700750050943
Article CAS PubMed Google Scholar
Ben-Dor A, Friedman N, Yakhini Z: Overabundance Analysis and Class Discovery in Gene Expression Data. Agilent Technical Report 2002. AGL-2002–4 AGL-2002–4
Google Scholar
von Heydebreck A, Huber W, Poustka A, Vingron M: Identifying splits with clear separation: A new class discovery method for gene expression data. Bioinformatics 2001, 17: S107-S114. 10.1093/bioinformatics/17.1.107
Article PubMed Google Scholar
Lorenz MW, Markus HS, Bots ML, Rosvall M, Sitzer M: Prediction of clinical cardiovascular events with carotid intima-media thickness: a systematic review and meta-analysis. Circulation 115(4):459–67. 2007, Jan 30; Epub 2007 Jan 22 2007, Jan 30; Epub 2007 Jan 22 10.1161/CIRCULATIONAHA.106.628875
Article PubMed Google Scholar

Download references

Acknowledgements

This study is supported by a EU FP6 grant in the framework of the Multi Knowledge Project. We thank Eran Eden, Doron Lipson and Benny Chor for useful discussions.

Author information

Authors and Affiliations

Agilent Laboratories, Tel Aviv, Israel
Israel Steinfeld, Roy Navon & Zohar Yakhini
Departments of Internal Medicine and Biomedical Sciences, University of Parma, Italy
Diego Ardigò & Ivana Zavaroni

Authors

Israel Steinfeld
View author publications
You can also search for this author in PubMed Google Scholar
Roy Navon
View author publications
You can also search for this author in PubMed Google Scholar
Diego Ardigò
View author publications
You can also search for this author in PubMed Google Scholar
Ivana Zavaroni
View author publications
You can also search for this author in PubMed Google Scholar
Zohar Yakhini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roy Navon.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Steinfeld, I., Navon, R., Ardigò, D. et al. Semi-supervised class discovery using quantitative phenotypes – CVD as a case study. BMC Bioinformatics 8 (Suppl 8), S6 (2007). https://doi.org/10.1186/1471-2105-8-S8-S6

Download citation

Published: 20 November 2007
DOI: https://doi.org/10.1186/1471-2105-8-S8-S6

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Semi-supervised class discovery using quantitative phenotypes – CVD as a case study

Background

Methods

Results

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Semi-supervised class discovery using quantitative phenotypes – CVD as a case study

Background

Methods

Results

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us