- Open Access
Reordering based integrative expression profiling for microarray classification
- Xiaogang Wu†1, 2, 3,
- Hui Huang†1, 2,
- Madhankumar Sonachalam1, 2,
- Sina Reinhard1,
- Jeffrey Shen1,
- Ragini Pandey2 and
- Jake Y Chen1, 2, 3Email author
© Wu et al.; licensee BioMed Central Ltd. 2012
- Published: 13 March 2012
Current network-based microarray analysis uses the information of interactions among concerned genes/gene products, but still considers each gene expression individually. We propose an organized knowledge-supervised approach - Integrative eXpression Profiling (IXP), to improve microarray classification accuracy, and help discover groups of genes that have been too weak to detect individually by traditional ways. To implement IXP, ant colony optimization reordering (ACOR) algorithm is used to group functionally related genes in an ordered way.
Using Alzheimer's disease (AD) as an example, we demonstrate how to apply ACOR-based IXP approach into microarray classifications. Using a microarray dataset - GSE1297 with 31 samples as training set, the result for the blinded classification on another microarray dataset - GSE5281 with 151 samples, shows that our approach can improve accuracy from 74.83% to 82.78%. A recently-published 1372-probe signature for AD can only achieve 61.59% accuracy in the same condition. The ACOR-based IXP approach also has better performance than the IXP approach based on classic network ranking, graph clustering, and random-ordering methods in an overall classification performance comparison.
The ACOR-based IXP approach can serve as a knowledge-supervised feature transformation approach to increase classification accuracy dramatically, by transforming each gene expression profile to an integrated expression files as features inputting into standard classifiers. The IXP approach integrates both gene expression information and organized knowledge - disease gene/protein network topology information, which is represented as both network node weights (local topological properties) and network node orders (global topological characteristics).
- Microarray Dataset
- Seed Gene
- Increase Classification Accuracy
- Network Topology Information
- Classification Performance Comparison
Network-based gene expression analysis has been proposed for candidate biomarker discovery by integrating disease susceptibility genes, gene expressions, and gene/protein interaction networks[1, 2]. Current network-based gene expression analysis methods do utilize the information of the interactions among concerned genes or gene products, but they still consider each single gene expression individually, without taking into account the expression values of neighbor genes with similar or related functions in a given network.
We propose a concept - Integrative eXpression Profiling (IXP), which can not only improve microarray classification accuracy by serving as a feature transformation approach, but also help in the discovery of groups of genes that have been too weak to detect individually through traditional methods. Functionally related genes individually expressed with lower differentials, which have often been considered as noise and ignored in traditional studies, can be readily identified by virtue of their coordinate expression within IXP profiles. To implement IXP, we need first to group functionally related genes together in an ordered way. Traditional network analyses often fail to find patterns in ranked or clustered adjacency matrix of a network when facing complex networks having higher inseparability, where no "clear cluster" or no "absolute rank" exists. Here we use the ant colony optimization reordering (ACOR) algorithm [3, 4], instead of conventional network-based gene ranking , or graph clustering . In the ACOR algorithm, the task of reordering nodes is represented as the problem of finding optimal density distributions of "ant colonies" on all nodes of the network, in which simulated ants roam all possible network paths iteratively. According to this density distribution, the adjacency matrix of the network with ranked nodes is shown as a map in order to reveal the system-level features of the network. The ACOR algorithm has been tested in both yeast protein networks  and human disease protein networks .
In this work, we use Alzheimer's disease (AD) as a case study, to illustrate how to apply the ACOR-based IXP approach to the blinded classification on a microarray dataset - GSE5281 with 151 samples (testing set, 67 controls and 84 AD patients), by using another much smaller microarray dataset - GSE1297 with 31 samples (9 controls and 22 AD patients) as training set. The result for the blinded classification on GSE5281 shows that our approach can improve accuracy from original 74.83% to 82.78% by using SVM classifier. A recently-published 1372-probe signature for AD can only achieve 61.59% accuracy in the same condition. The ACOR-based IXP approach also performs better than the IXP approach based on ranking, clustering, and random-ordering in an overall performance comparison.
AD-specific PPI network
We construct the AD-specific PPI network and visualize the network layout in Figure 1c-e. We also calculate the average differential expression values for the three AD status groups (incipient, moderate, and severe) vs. control group in GSE1297, and map them onto the genes in the network by representing them as node colors. There are 969 genes (90.2%) have expressions. From the comparisons of Figure 1c-e, we can see that differential expression increases from incipient to moderate, and then to severe AD status. This finding shows the validity of our network construction method, since this network is built specific for AD and the node color change directly reflects average gene expression shifts from incipient to severe AD. Moreover, not only hub genes (large sizes) and seed genes (green circled) are differentially expressed in different AD status, but also many non-hub genes (small sizes) surrounding hub genes are highly differentially expressed. This is the reason we could use IXP to make these "trivial" genes contribute the microarray classification.
Reordered adjacent matrix
We use the ACOR algorithm under populated mode to reorder the AD-specific PPI network. The reordered adjacency matrix is plotted in Figure 1f, which shows a fractal-like pattern also reported in another study on AD-specific PPI network, while using different seed genes . The data indicate that the ACOR algorithm is robust on different seed gene selection and network construction processes. Since both the × and Y axes in Figure 1f denote reordering indexes (1-1074) of proteins, we also investigate the relative position for each protein. From the genes labeled in Figure 1g (with the same order of Figure 1f), we find almost all the I-class seed genes appear in the fringe of the left-bottom "head", while most II-class seed genes appear in the fringe of the "main body". This finding implies that the ACOR algorithm can not only group functionally related genes together (clustering capability), but also put them in a meaningful order (ranking capability). This combined characteristic (generating relative ranks in clusters, finally causing fractal-like patterns) is exactly what IXP needs. We also show that this order performs better than both classical ranking and clustering in microarray classification by IXP.
Integrated expression profiles
We map the average differential expression values for the three AD status groups onto the gene list reordered by the ACOR algorithm. Then we integrate all the expression values for each group by using the IXP described by Equation (2) in Additional file 1. The integrated average expression profiles for the three AD status groups in GSE1297 are shown in Figure 1e. The profiles clearly indicate the distinctions among these three AD status groups and indicate the genes' differential expression increases from incipient to moderate, and then to severe AD status. This result not only verifies the usefulness of our MIXP method, but also validates our network construction method in a neater way than in network visualization.
Classification performance comparisons
From the blinded classifications on the testing microarray dataset with sample size 4 times bigger than the training microarray dataset from different microarray platforms, the ACOR-based IXP approach shows that it can serve as a knowledge-supervised feature transformation approach to increase classification accuracy dramatically, by transforming gene expression profiles to integrated expression files as features inputting into standard classifiers. The ACOR-based IXP approach also has better performance than the IXP approach based on ranking, clustering, and random-ordering. Since gene weights represent local topological properties and gene orders represent global topological characteristics, we find that both local and global network topology information can help IXP approach to improve classification accuracy. The order generated by ACOR algorithm provides the most help for sample classifications, a finding that implies the ACOR algorithm can group functionally related genes together in an ordered way.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 2, 2012: Proceedings from the Great Lakes Bioinformatics Conference 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S2
- Pujana MA, Han JDJ, Starita LM, Stevens KN, Tewari M, Ahn JS, Rennert G, Moreno V, Kirchhoff T, Gold B: Network modeling links breast cancer susceptibility and centrosome dysfunction. Nature Genetics. 2007, 39 (11): 1338-1349. 10.1038/ng.2007.2.View ArticlePubMedGoogle Scholar
- Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Molecular Systems Biology. 2007, 3 (1): 140-149.PubMed CentralPubMedGoogle Scholar
- Wu X, Huan T, Pandey R, Zhou T, Chen JY: Finding fractal patterns in molecular interaction networks: a case study in Alzheimer's disease. International Journal of Computational Biology and Drug Design. 2009, 2 (4): 340-352. 10.1504/IJCBDD.2009.030765.View ArticlePubMedGoogle Scholar
- Wu X, Pandey R, Chen JY: Network topological reordering revealing systemic patterns in yeast protein interaction networks. Conf Proc IEEE Eng Med Biol Soc. 2009, 2009: 6954-6957.PubMedGoogle Scholar
- Morrison JL, Breitling R, Higham DJ, Gilbert DR: GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics. 2005, 6 (1): 233-10.1186/1471-2105-6-233.PubMed CentralView ArticlePubMedGoogle Scholar
- Bar-Joseph Z, Gifford DK, Jaakkola TS: Fast optimal leaf ordering for hierarchical clustering. Bioinformatics. 2001, 17 (Suppl 1): S22-10.1093/bioinformatics/17.suppl_1.S22.View ArticlePubMedGoogle Scholar
- Ravetti MG, Rosso OA, Berretta R, Moscato P: Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus' gene expression profiles in Alzheimer's disease. PLoS One. 2010, 5 (4): e10153-10.1371/journal.pone.0010153.View ArticleGoogle Scholar
- Chen JY, Shen C, Sivachenko AY: Mining Alzheimer disease relevant proteins from integrated protein interactome data. Pac Symp Biocomput. 2006, 367-378.Google Scholar
- Chen JY, Mamidipalli S, Huan T: HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics. 2009, 10 (Suppl 1): S16-10.1186/1471-2164-10-S1-S16.PubMed CentralView ArticlePubMedGoogle Scholar
- Köhler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008, 82 (4): 949-958. 10.1016/j.ajhg.2008.02.013.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.