- Poster presentation
- Open Access
Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes
BMC Bioinformatics volume 11, Article number: P23 (2010)
DNA microarrays have been widely applied in cancer research for better diagnosis and prediction of the disease states. Traditionally, most microarray studies aim to identify differentially expressed genes (DEGs) by comparing the average gene expression levels between two groups (e.g., the treated vs. control or disease vs. non-disease) based on statistical analysis such as t-test and Significance Analysis of Microarrays (SAM) [1, 2].
Materials and methods
In this study, we defined the gene expression profile (GEP) of a gene as the distribution of the log2 values of its normalized expression signal intensities across the samples in the similarly studied microarrays. We hypothesized that the biomarker genes that distinguish disease samples from normal samples might form distinct GEPs between comparison groups. We applied Pearson Correlation Coefficient (PCC) and Kolmogorov-Smirnov Distance (KSD) metrics to identify disease-specific biomarkers by comparing GEPs between normal and disease states and then applied this technology to disease (e.g., cancer) related studies in order to discover some disease genes as biomarker candidates. These biomarkers’ gene profiles in normal and disease samples might be used to diagnose or monitor patient's disease state via regular gene expression analysis.
Results and conclusion
We applied the PCC and KSD metrics to three prostate cancer related microarray datasets. They were generated from the same study and were available in the GEO database (a total of 81 normal samples and 90 prostate cancer samples) . Using the cutoff values KSD > 0.4 and PCC < 0.7, we found 230 biomarker candidate genes. Our Gene Ontology (GO) analysis found that the top ranked biomarker candidate genes for prostate cancer were highly enriched in molecular functions such as “cytoskeletal protein binding” category. We used the top two ranked genes (ACTA1, encoding an actin subunit, and HPN, encoding hepsin) to demonstrate that prostate cancer might be diagnosed and monitored by marker genes. Furthermore, we picked top 20 significantly up-regulated and top 20 down-regulated genes based on PCC and KSD sorting. We found gene pairs comprising one up-regulated and another down-regulated had always best prediction performance (Table 1). Our study provided a promising tool to identify the potential biomarker genes for disease diagnosis and prognosis.
Jafari P, Azuaje F: An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inform Decis Mak 2006, 6: 27. 10.1186/1472-6947-6-27
Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98: 5116–5121. 10.1073/pnas.091062498
Chandran UR, Ma C, Dhir R, Bisceglia M, Lyons-Weiler M, Liang W, Michalopoulos G, Becich M, Monzon FA: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 2007, 7: 64. 10.1186/1471-2407-7-64
About this article
Cite this article
Huang, H., Zheng, S. & Zhao, Z. Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes. BMC Bioinformatics 11, P23 (2010). https://doi.org/10.1186/1471-2105-11-S4-P23
- Prostate Cancer
- Pearson Correlation Coefficient
- Disease Sample
- Prostate Cancer Sample
- Average Gene Expression Level