- Methodology article
- Open Access
Knowledge-guided multi-scale independent component analysis for biomarker identification
© Chen et al; licensee BioMed Central Ltd. 2008
- Received: 23 May 2008
- Accepted: 06 October 2008
- Published: 06 October 2008
Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data.
Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification.
We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.
- Independent Component Analysis
- Linear Mode
- Yeast Cell Cycle
- Independent Component Analysis Method
- Optimal Cluster Number
Under their broadest definition, biomarkers include any biological or chemical indicator of a specific underlying process. In genetics, biomarkers are defined as a set of genes that are associated with a disease or are associated with the susceptibility to develop a specific disease. Microarray technology makes it possible to measure simultaneously the expression levels of thousands of genes, and identifying meaningful and useful biomarkers from these large data sets is a common goal. Specifically, investigators attempt to detect genes differentially expressed across different types of tissue samples or the samples obtained under different experimental conditions. Traditional biomarker identification methods have mainly been applied to statistical analysis of microarray data alone; T-test  and significance analysis of microarray (SAM)  are frequently used to detect differentially expressed genes between two phenotypes. Several new statistical methods have been developed to analyze time-course microarray data. Storey et al. proposed an algorithm (EDGE) to fit the time-course microarray data with natural cubic splines, followed by a goodness-of-fit test to detect differentially expressed genes . Conesa et al. also proposed a two-step regression approach to sequentially identify differentially expressed genes from time-course microarray data under different conditions . However, these and many related approaches do not incorporate knowledge of gene function, with respect to the phenotypes of interest, into their statistical models.
Ideally, biomarkers should not only exhibit differential gene expressions between normal and disease samples, but more importantly, they should also reflect their biological role in the disease phenotype. Most significance analysis methods applied to population (static) or time-course microarray data have the limitation that genes are analyzed independently and the interactions among them are ignored. Clustering methods, such as k-means clustering  and self-organizing maps (SOMs) , were introduced to group the genes with similar expression patterns. A shortcoming of the clustering methods is that they do not allow genes to be shared by multiple clusters. However, a single gene can be involved in multiple distinct biological processes . One solution to this problem is to first infer gene regulatory networks [8–12] that appear to control or regulate phenotypically relevant biological functions, and then to extract the most biologically and statistically relevant biomarkers.
The application of Independent Component Analysis (ICA) to microarray data has shown some utility in regulatory network inference [10, 13]. ICA is a statistically-principled linear decomposition method that models the observations as a linear combination of some latent (or hidden) variables . From the perspective of a gene regulatory mechanism, any gene expression value can be regarded as a combinational effect of some regulatory inputs such as transcription factors, cellular functions, or responses to experiment conditions [10, 12]. As demonstrated in our previous work [15, 16] along with that of others [10, 12], novel applications of ICA to high-throughput data from microarray technology can help reveal dominant regulatory mechanisms.
It is not a trivial task to link the estimated latent variables from ICA to real biological functions. To identify biologically relevant biomarkers for a specific disease, the incorporation of prior knowledge is of great importance to improving the accuracy of computational methods . However, complete prior knowledge is often difficult to obtain. Some prior knowledge, such as regulatory motif information (promoter responsive element sequence) is available and can be incorporated into microarray data analysis to assist in regulatory module identification [18, 19]. Recently, we have developed a new approach called motif-directed network component analysis (mNCA) to infer transcription regulatory activities (TFAs). This approach incorporates a stability analysis procedure to overcome the problem of many false positives in motif information . Since we can only use known motifs, a clear limitation of the mNCA method is that we cannot infer any new potential regulatory biomarkers beyond prior knowledge from the model.
In this paper, we propose a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers beyond partial prior knowledge. We propose that a latent variable estimated by ICA from the entire gene expression population represents the joint effect of several biological functions. Disease-specific biomarkers could be involved in several different biological functions by the ICA latent variables or linear regulatory modes. Therefore, we first cluster the whole gene population into multiple sub-populations in which only a few biological processes are involved. We then uncover the knowledge-relevant regulatory modes in each subpopulation based on the partial prior knowledge. Finally, disease-specific biomarkers are extracted according to the strength of their association with the extracted regulatory modes. A statistical test is applied to evaluate the significant enrichment of transcription factors for the extracted biomarkers based on motif information.
For algorithm validation, we applied our approach to two time-course microarray data sets to demonstrate its improved performance. The first data set is a yeast cell cycle microarray data set with 104 well known cell cycle-related genes; the second is a remodeling and spacing factor 1 (Rsf-1) induced microarray data set from a profiling study of ovarian cancer. The experimental results show that our approach can identify biologically meaningful disease-specific biomarkers related to ovarian cancer, as compared to other gene selection methods with or without prior knowledge.
Independent component analysis (ICA)
Consider a gene expression data matrix X = [xji], whose rows correspond to different microarray samples, and columns correspond to individual genes. ICA decomposition model can be mathematically formulated as (assuming noiselessness for simplicity):
XN × L= AN × MSM × L, (1)
UM × L= WM × NXN × L, (2)
where Equation (1) describes the linear combination model with mixing matrix A, and Equation (2) the decomposition model with de-mixing matrix W. S, X and U are independent components, mixtures, and estimated independent components, respectively. M is the number of independent components, N the number of samples and L the number of genes.
The linear modes in A might reflect distinct regulatory mechanisms involved in gene regulation, such as transcription factor (TF) activities. The FastICA algorithm  can be utilized to obtain A and S based on the assumption that the components are statistically independent and have non-normal distributions (typically super-Gaussian). This assumption is biologically plausible as most genes are not expected to change dramatically. Only the genes involved in distinct regulatory mechanisms will change, producing super-Gaussian distributions in microarray data.
Several methods have been developed to associate a set of genes with a specific linear mode [10, 12, 22]. These methods each assume that genes with the highest absolute loading values are the significant genes associated with linear mode a k . In this paper, genes are ranked by a modified criterion based on the same assumption as described in the next subsection.
Knowledge-guided multi-scale ICA
Since ICA is an unsupervised method, it is difficult to determine which linear modes are related to specific biological functions. To identify the biomarkers relevant to a specific biological function, prior knowledge could provide guidance for any computational method. In this approach, we will collect a KGP containing genes strongly associated with the disease and use these to guide the ICA approach for disease-relevant biomarker identification. Notice that the total connection strength of the knowledge genes associated with a disease-relevant linear mode would be larger, in principle, than that of irrelevant linear modes. Based on this observation, the most knowledge-relevant linear mode can be determined from the estimated ICA modes and the associated genes can then be extracted.
However, if we apply ICA to the entire molecular profile, the estimated linear modes will likely reflect the joint effect of several biological functions, even for the most knowledge-relevant mode, because many disease-irrelevant but differentially expressed genes co-exist in the data. Conversely, biomarkers should be involved in several different linear modes in relation to underlying biological processes. Therefore, it is reasonable to first separate the entire profile into sub-populations. We can then find the specific ICA linear modes from different subsets of genes rather than from the whole gene population; this approach is referred to as the "multi-scale ICA" approach in this paper. Since these modes will be associated with different parts of the knowledge genes in the KGP, they are more suitable for biomarker identification. Clustering methods, such as k-means clusterin and SOMs, can be used to form the subsets of genes, with the assumption that the genes involved in similar biological functions are more likely to exhibit similar expression patterns than genes involved in different biological functions.
where s gm is the loading factor for gene g associated with linear mode a m , K i the subset of knowledge genes in the i th cluster, and M i the number of independent components in the i th cluster.
where w i is a weight to represent the significance of the linear mode in the i th subset associated with the prior knowledge. Here we define w i as the proportion of all knowledge genes in this subset with respect to the entire KGP (K). Once the knowledge-relevant linear modes in all subsets are determined, each gene will have a score assigned and we rank the genes in terms of their scores. The larger the score, the more strongly the gene is related to the biological process.
A key issue in this method is how to determine the optimal cluster number when forming the subsets of genes. In this paper, we determine the optimal cluster number by a cross-validation approach. Specifically, we assume the optimal cluster number is in some range, from 1 to an upper limit. For each cluster number, the knowledge genes are randomly stratified into a training gene set (as our partial prior knowledge gene set) and a test gene set by a ten-fold cross-validation approach. The method is applied with the partial prior knowledge genes to rank the whole gene population, and prediction accuracy is tested on the test gene set. The above procedure is repeated 10 times, once for each left out fold, and an average accuracy over the ten folds is reported. We select the number with the highest average accuracy as the optimal cluster number for clustering. The upper limit of cluster numbers should be cautiously determined by the number of knowledge genes and the number of genes in the full profile. If the number of clusters is too large, it will lose the ability to infer novel biomarkers. An extreme case is that each individual gene forms a cluster and then we can only obtain the correct ranks for known genes. Genes not in the KGP will be randomly ranked, which is not informative at all for biomarker identification. If the cluster number is too small, the estimated linear modes may be incorrect due to the presence of many irrelevant genes. In our experiments, we set the upper limit as 10 for the yeast cell cycle data set and 15 for the ovarian cancer microarray data set, respectively.
Knowledge gene pool (KGP)
Each KGP is a collection of those genes that are potentially most strongly related to a specific disease. Usually there are thousands of genes in microarray data and most of them are not relevant to a specific disease even though they exhibit changes in gene expression level. The knowledge gene pool is an important asset for data analysis since it helps reduce many false positives. However, in most cases, little prior knowledge can be obtained, and the available knowledge is usually neither complete nor sufficiently accurate to fully define the specific disease under study. Thus, the KGP is best used as a guide for biomarker identification. In our studies, the KGP is primarily constructed from the published biological literature or from databases such as Ingenuity Pathway Analysis (IPA; Ingenuity Systems: http://www.ingenuity.com) and the TRANSFAC 11.1 Professional Database .
Evaluation by motif enrichment analysis
For microarray data analysis, there is often no ground truth (i.e., true biomarkers known to be related to a specific biological process or disease under study) available for us to evaluate the performance of a biomarker identification method. However, we know that gene expression is often regulated by transcription factors (TFs), proteins that bind to promoter or enhancer sequence elements upstream of genes and either activate or inhibit gene expression. In this paper, with the motif information provided, we have designed a statistical test to evaluate the enrichment of transcription factors for a gene set identified. A gene-transcription factor matrix M is generated where each element in the matrix, m gf , represents how well the upstream sequence of a gene g matches the motif that a transcription factor f binds to. For human genes, 2 Kbp upstream regions from the transcription start sites (TSSs) of the genes are extracted from the UCSC genome databases . Match™  is then used to search the transcription factor binding site (TFBS) by its position-weighted matrices (PWMs) in a gene's upstream region, which outputs the scores of core similarity and matrix similarity for each matched motif. Since one TF may have multiple TFBSs, we use the summation of average scores of core similarity and matrix similarity to set the final value of m gf .
Baseline experiments and evaluation method
To evaluate the performance of our proposed approach, EDGE algorithm  was first considered as a comparison method since it was specially designed to identify statistically significant genes from time-course microarray data. However this comparison is insufficient due to that EDGE does not incorporate knowledge genes to provide guidance for biomarker identification. On the other hand, given partial prior knowledge genes, traditional supervised classification methods are not suitable to predict whether a gene is related to prior knowledge because there is no true negative gene available. Therefore, we design three baseline biomarker identification methods that incorporate partial prior knowledge for a fair comparison. The first baseline ICA method is designed to evaluate if our multi-scale strategy by clustering offers an improved performance for biomarker identification. Two correlation methods with or without clustering are then implemented to identify the genes exhibiting similar patterns with partial prior genes, compared to the ICA approach focusing on regulatory mode identification. Specifically, the first method is a baseline ICA method where ICA is applied to the entire expression profile and the partial prior knowledge is used to find the most knowledge-relevant linear mode by Equation (4). Genes are ranked according to their absolute connection strengths associated with this linear mode. The second method estimates the correlation with the partial prior knowledge genes without clustering (baseline correlation method-1). Genes are then ranked based on their absolute correlation coefficients between an individual gene expression profile and the average profile of partial prior knowledge genes. However, taking the average profile of all knowledge genes may reduce the sensitivity of detection, especially when the genes in KGP are not similar to each other. To overcome this problem, the third baseline method is a weighted correlation method based on a clustering approach (baseline correlation method-2). Similar to the multi-scale ICA method, the entire gene population is grouped into several sub-populations and a gene in each cluster is assigned a score. The score is the weighted absolute correlation coefficient between an individual gene expression profile and the average profile of partial prior knowledge genes in this cluster. The weight is then calculated using Equation (5) and genes are ranked according to their scores.
We applied our knowledge-guided multi-scale ICA method to two gene expression profiling studies: (1) a yeast cell cycle microarray data set  and (2) an Rsf-1-induced microarray data set. The yeast cell cycle data set consists of the expression of 6178 Open Reading Frames (ORFs) during the cell replication cycle in the budding yeast (Saccharomyces cerevisiae). The data set consists of 77 samples corresponding to various experiment conditions. Approximately 800 genes have been identified as cycle-regulated genes; among these 104 genes have been well studied . We us The goal of this experiment is to identify the cell cycle-regulated linear modes and then extract the corresponding genes associated with the cell cycle. We used the 104 genes as our training knowledge gene set and the remaining 704 genes as an independent test set for evaluation.
The Rsf-1-induced microarray data set was acquired and analyzed in our experiment. The dataset was generated using Affymetrix Human Genome U133 Plus 2.0 Arrays from an expression profiling study of ovarian cancer at the Johns Hopkins Medical Institutions. The study was designed to identify Rsf-1 regulated genes in ovarian cancer; Rsf-1 (also known as HBXAP) is a newly discovered gene frequently amplified in ovarian cancer ; the protein participates in chromatin remodeling which is essential for a variety of cellular functions including transcription, DNA replication, and DNA repair. The data set is composed of 7 samples with two biological conditions (Rsf-1-induced and not Rsf-1-induced) and four time points at 0 hour, 6 hours, 18 hours, and 30 hours. We used Affymetrix's Probe Logarithmic Intensity Error (PLIER) algorithm with quantile normalization to preprocess the original intensity data for gene expression measurements . After the preprocessing, we obtained expression measurements of 54,675 probe sets for each sample.
The EDGE algorithm was first applied to select statistically significant expressed genes from yeast cell cycle data and Rsf-1 induced ovarian cancer data, respectively. After ranking all genes in terms of their q-values estimated from EDGE, we calculated AUC values for yeast cell cycle-related genes and ovarian cancer-related genes, respectively (see below). As a result, both AUC values are relatively low (around 0.5), which indicates that the genes identified from pure data-driven methods (such as EDGE; without prior knowledge guidance) may not show strong biological relevance.
Yeast cell cycle data
P-values of Kolmogorov-Smirnov test for different methods on yeast cell cycle data using ten-fold cross-validation
P-values of K-S test
Correlation method 1
Correlation method 2
To further test the generalizability of our method, we conducted ten-fold cross-validation on the 104 genes using a subset of samples. The original data set includes 77 samples synchronized by three independent methods: α factor arrest, elutriation and arrest of a cdc 15 temperature-sensitive mutant . We selected 63 samples from all the samples by excluding those samples under elutriation condition. The resulting average AUC value is 0.9157 with standard deviation of 0.0458. Also the most frequent optimal cluster number is five (with a frequency of 65%), which shows a great consistency when compared to the result using all the samples.
P-values of kolmogorov-Smirnov test for different methods on yeast cell cycle data using an independent test gene set
P-value of K-S test
Correlation method 1
Correlation method 2
Top10 genes selected by the proposed multi-scale ICA method on yeast cell cycle data
CycLiN; G1 cyclin involved in regulation of the cell cycle
Chitin Synthesis Involved; protein of unknown function
Pathogen Related in Yeast; protein of unknown function
Mitotic Chromosome Determinant; expression is cell cycle regulated and peaks in S phase
Homeodomain-containing transcriptional repressor
POLymerase; proliferating cell nuclear antigen (PCNA)
Target of SBF; promoters of some genes involved in pheromone response and cell cycle;
AXiaL budding pattern; glycosylated by Pmt4p; potential Cdc28p substrate
Congo Red Hypersensitive; cell wall protein; putative chitin transglycosidase
RiboNucleotide Reductase; the RNR complex catalyzes the rate-limiting step in dNTP synthesis and is regulated by DNA replication and DNA damage checkpoint pathways via localization of the small subunits
Rsf-1-induced gene expression data
Knowledge gene pool (KGP)
To construct the KGP, we started with the known gene Rsf-1 and its related genes, NF-kappa B (NFKB1) and SMARCA5 (also known as hSNF2H) as reported in , to search the databases. We used Ingenuity Pathway Analysis (IPA) to extract 95 genes that are thought to be directly related to NFKB1 and SMARCA5. Note that there is no network related to Rsf-1 in the current IPA database. We also included 43 genes from TRANSFAC 11.1 Professional Database , whose protein products are transcription factors biologically relevant to ovarian cancer as reported in literature. Hence, our KGP consists of 141 distinct Affymetrix probe set identifiers that represent the expression values for the 138 genes.
Multi-scale ICA results
P-values of Kolmogorov-Smirnov test for different methods on Rsf-1-induced ovarian cancer microarray data
p-value of the K-S test
Correlation method 1
Correlation method 2
Evaluation by motif analysis
Ovarian cancer-related TFs and their TRANSFAC entry IDs & descriptions
PWM Access No.
Consensus Binding Site
Activator protein 2
Activating enhancer binding protein 2 alpha
Activating protein 2, AP-2A, Ker-1
Activator protein 2gamma, ERF-1
Specificity protein1, stimulating protein 1
Breast cancer type 1 susceptibility protein
EIIF protein, activator of myc, important for p107 promoter activity
Elk1, member of ETS oncogene family
Nuclear factor kappa B, p50
Specificity protein1, stimulating protein 1
5'-TG-3' interacting factor, TG-interacting factor, TGFB-induced factor
Nuclear factor kappa B c-Rel, p68
Tumor protein p53, TRP53
Discussion with biological interpretation
Top 10 genes selected by the proposed multi-scale ICA on Rsf-1-induced microarray data
Probe Set ID
Gene Full Name
FBJ murine osteosarcoma viral oncogene homolog B
v-fos FBJ murine osteosarcoma viral oncogene homolog
chemokine (C-C motif) ligand 20
heat shock protein 90 kDa alpha (cytosolic), class B member 1
Early growth response 1
cyclin-dependent kinase 2
Biomarker identification is an important goal in many microarray data analyses. We propose a novel method, knowledge-guided multi-scale ICA, to find relevant biomarkers associated with specific biological functions. We aimed to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. A knowledge gene pool is constructed from multiple knowledge sources to help identify disease-specific gene clusters. By applying ICA to multi-scale gene clusters, an examination of the revealed regulatory modes can uncover knowledge of the underlying biological regulatory mechanisms. In addition, we have designed a statistical test procedure to measure the transcription factor enrichment of a selected gene set based on motif information. The approach was successfully applied to two gene expression profile data sets to identify biomarkers: yeast cell cycle microarray data and Rsf-1-induced microarray data. The experimental results show that our method can extract apparently biologically meaningful and condition-related biomarkers. The performance of the proposed method significantly outperforms several baseline methods for biomarker identification. More importantly, the proposed method has notable potential to discover novel biomarkers beyond any partial prior knowledge.
This research was supported in part by NIH Grants (NS29525-13A, EB000830, CA109872, CA096483 and CA129080) and DoD/CDMRP Grant (BC030280). We also thank the anonymous reviews for their invaluable inputs that lead to several important improvements of this manuscript.
- Devore J, Peck R: Statistics: The Exploration and Analysis of Data. CA Duxbury Press; 1997.Google Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98(9):5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
- Storey JD, Xiao W, Leek JT, Tompkins RG, Davis RW: Significance analysis of time course microarray experiments. Proc Natl Acad Sci USA 2005, 102(36):12837–12842. 10.1073/pnas.0504609102PubMed CentralView ArticlePubMedGoogle Scholar
- Conesa A, Nueda MJ, Ferrer A, Talon M: maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 2006, 22(9):1096–1102. 10.1093/bioinformatics/btl056View ArticlePubMedGoogle Scholar
- Hartigan JA, Wong MA: A K-means clustering algorithm. App Statist 1978, 28: 100–108. 10.2307/2346830View ArticleGoogle Scholar
- Kohonen T: Self-Organizing Maps. NY: Springer; 1997.View ArticleGoogle Scholar
- Clarke R, Ressom HW, Wang A, Xuan J, Liu MC, Gehan EA, Wang Y: The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer 2008, 8(1):37–49. 10.1038/nrc2294PubMed CentralView ArticlePubMedGoogle Scholar
- Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet 2005, 37(4):382–390. 10.1038/ng1532View ArticlePubMedGoogle Scholar
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003, 34(2):166–176. 10.1038/ng1165View ArticlePubMedGoogle Scholar
- Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics 2002, 18(1):51–60. 10.1093/bioinformatics/18.1.51View ArticlePubMedGoogle Scholar
- Hori G, Inoue M, Nishimura S, Nakahara H: Blind gene classification on ICA of microarray data. ICA: 2001; San Diego, CA; 2001:332–336.Google Scholar
- Lee SI, Batzoglou S: Application of independent component analysis to microarrays. Genome Biol 2003, 4(11):R76. 10.1186/gb-2003-4-11-r76PubMed CentralView ArticlePubMedGoogle Scholar
- Saidi SA, Holland CM, Kreil DP, MacKay DJ, Charnock-Jones DS, Print CG, Smith SK: Independent component analysis of microarray data in the study of endometrial cancer. Oncogene 2004, 23(39):6677–6683. 10.1038/sj.onc.1207562View ArticlePubMedGoogle Scholar
- Hyvarinen A, Karhunen J, Oja E: Independent Component Analysis. John Wiley & Sons; 2001.View ArticleGoogle Scholar
- Gong T, Xuan J, Wang C, Li H, Hoffman E, Clarke R, Wang Y: Gene module identification from microarray data using nonnegative independent component analysis. Gene Regulation and Systems Biology 2007, 1: 349–363.PubMed CentralGoogle Scholar
- Wang C, Xuan J, Gong T, Clarke R, Hoffman E, Wang Y: Stability based dimension estimation of ICA with application to microarray data analysis. The International Conference on Bioinformatics & Computational Biology: 2007 2007.Google Scholar
- Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci USA 2003, 100(26):15522–15527. 10.1073/pnas.2136632100PubMed CentralView ArticlePubMedGoogle Scholar
- Conlon EM, Liu XS, Lieb JD, Liu JS: Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci USA 2003, 100(6):3339–3344. 10.1073/pnas.0630591100PubMed CentralView ArticlePubMedGoogle Scholar
- Joung JG, Shin D, Seong RH, Zhang BT: Identification of regulatory modules by co-clustering latent variable models: stem cell differentiation. Bioinformatics 2006, 22(16):2005–2011. 10.1093/bioinformatics/btl343View ArticlePubMedGoogle Scholar
- Wang C, Chen L, Zhao P, Hoffman E, Wang Y, Clarke R, Xuan J: Motifdirected network component analysis for regulatory network inference. Sixth International Conference on Bioinformatics: 2007; Hong Kong, China 2007.Google Scholar
- Hyvarinen A, E O: A fast fixed-point algorithm for independent component analysis. Neural Compuatation 1997, 9: 1483–1492. 10.1162/neco.1918.104.22.1683View ArticleGoogle Scholar
- Frigyesi A, Veerla S, Lindgren D, Hoglund M: Independent component analysis reveals new and biologically significant structures in micro array data. BMC Bioinformatics 2006, 7: 290. 10.1186/1471-2105-7-290PubMed CentralView ArticlePubMedGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, et al.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006, (34 Database):D108–110. 10.1093/nar/gkj143Google Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, et al.: The UCSC Genome Browser Database. Nucleic Acids Res 2003, 31(1):51–54. 10.1093/nar/gkg129PubMed CentralView ArticlePubMedGoogle Scholar
- Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E: MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 2003, 31(13):3576–3579. 10.1093/nar/gkg585PubMed CentralView ArticlePubMedGoogle Scholar
- Witten I, Frank E: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann; 2000.Google Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Shih Ie M, Sheu JJ, Santillan A, Nakayama K, Yen MJ, Bristow RE, Vang R, Parmigiani G, Kurman RJ, Trope CG, et al.: Amplification of a chromatin remodeling gene, Rsf-1/HBXAP, in ovarian carcinoma. Proc Natl Acad Sci USA 2005, 102(39):14004–14009. 10.1073/pnas.0504195102View ArticlePubMedGoogle Scholar
- Affymetrix: Guide to Probe Logarithmic Intensity Error (PLIER) Estimation. Edited by: . Affymetrix I Santa Clara, CA; 2005.Google Scholar
- Huang JY, Shen BJ, Tsai WH, Lee SC: Functional interaction between nuclear matrix-associated HBXAP and NF-kappaB. Exp Cell Res 2004, 298(1):133–143. 10.1016/j.yexcr.2004.04.019View ArticlePubMedGoogle Scholar
- Karin M-L: The Fos family of transcription factors and their role in tumourigenesis, European journal of cancer. European journal of cancer 2005, 41: 2449–2461. 10.1016/j.ejca.2005.08.008View ArticleGoogle Scholar
- Sharma SC, Richards JS: Regulation of AP1 (Jun/Fos) factor expression and activation in ovarian granulosa cells. Relation of JunD and Fra2 to terminal differentiation. J Biol Chem 2000, 275(43):33718–33728. 10.1074/jbc.M003555200View ArticlePubMedGoogle Scholar
- Lee LF, Hellendall RP, Wang Y, Haskill JS, Mukaida N, Matsushima K, Ting JP: IL-8 reduced tumorigenicity of human ovarian cancer in vivo due to neutrophil infiltration. J Immunol 2000, 164(5):2769–2775.View ArticlePubMedGoogle Scholar
- Xu L: Ovarian cancer angiogenesis, biology and therapy. University of Texas; 2000.Google Scholar
- Topilko P, Schneider-Maunoury S, Levi G, Trembleau A, Gourdji D, Driancourt MA, Rao CV, Charnay P: Multiple pituitary and ovarian defects in Krox-24 (NGFI-A, Egr-1)-targeted mice. Mol Endocrinol 1998, 12(1):107–122. 10.1210/me.12.1.107View ArticlePubMedGoogle Scholar
- Hayami R, Sato K, Wu W, Nishikawa T, Hiroi J, Ohtani-Kaneko R, Fukuda M, Ohta T: Down-regulation of BRCA1-BARD1 ubiquitin ligase by CDK2. Cancer Res 2005, 65(1):6–10.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.