LSI based framework to predict gene regulatory information

Roy, Sujoy; Xu, Lijing; Homayouni, Ramin

doi:10.1186/1471-2105-10-S7-A5

Volume 10 Supplement 7

UT-ORNL-KBRIN Bioinformatics Summit 2009

Meeting abstract
Open access
Published: 25 June 2009

LSI based framework to predict gene regulatory information

Sujoy Roy^1,2,
Lijing Xu^1,2 &
Ramin Homayouni^1,2

BMC Bioinformatics volume 10, Article number: A5 (2009) Cite this article

2362 Accesses
Metrics details

Background

Latent Semantic Indexing (LSI), a vector space model for information retrieval, has shown promise in predicting functional relationships between genes using textual information in MEDLINE abstracts. The underlying principle is that genes may be represented as document vectors in a multi-dimensional hyperspace, and the conceptual relationship between any two genes is determined by the cosine of the angle between their vectors [1]. In this study, we sought to extend this concept for identification of putative transcription factors (TFs) that regulate a group of co-regulated genes. We hypothesized that co-expressed genes identified by microarray experiments are functionally related and that at least some of these genes have previously been linked explicitly or implicitly to TFs in the literature. A transcriptional module is then defined as a set of genes clustered together in LSI space with closely related TFs (Figure 1). We devised a framework using these assumptions to identify transcriptional modules from microarray and promoter motif data (Figure 2). The framework requires as input, co-expressed genes from a microarray dataset and a set of TFs that have consensus motifs in the promoter regions of the co-expressed genes. Usually the set of such motif-derived TFs is large and makes the identification of the critical ones difficult. The framework first identifies functionally related clusters of co-expressed genes based on their latent relationships from literature, and then adds to each cluster TFs that are closely associated with the genes in the cluster. The putative transcriptional modules are ranked based on the degree of relative literature coherence amongst the entities in them.

Results and discussion

The LSI-based algorithm allows prediction of TFs based on latent (implicit) relationships in the literature. A preliminary evaluation of our method using previously published knock-out experiments revealed that it has reasonable recall and precision. A more rigorous evaluation of the method will require several additional TF knock-out microarray experiments. This work provides proof of principle that the combination of motif analysis and LSI may be used to identify putative transcriptional modules from microarray data.

References

Homayouni R, Heinrich K, Wei L, Berry MW: Gene clustering by latent semantic indexing of MEDLINE abstracts. Bioinformatics 2005, 21(1):104–115. 10.1093/bioinformatics/bth464
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biology, University of Memphis, Memphis, TN, 38152, USA
Sujoy Roy, Lijing Xu & Ramin Homayouni
Bioinformatics Program, University of Memphis, Memphis, TN, 38152, USA
Sujoy Roy, Lijing Xu & Ramin Homayouni

Authors

Sujoy Roy
View author publications
You can also search for this author in PubMed Google Scholar
Lijing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ramin Homayouni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramin Homayouni.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Roy, S., Xu, L. & Homayouni, R. LSI based framework to predict gene regulatory information. BMC Bioinformatics 10 (Suppl 7), A5 (2009). https://doi.org/10.1186/1471-2105-10-S7-A5

Download citation

Published: 25 June 2009
DOI: https://doi.org/10.1186/1471-2105-10-S7-A5

UT-ORNL-KBRIN Bioinformatics Summit 2009

LSI based framework to predict gene regulatory information

Background

Results and discussion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

UT-ORNL-KBRIN Bioinformatics Summit 2009

LSI based framework to predict gene regulatory information

Background

Results and discussion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us