Exact biclustering algorithm for the analysis of large gene expression data sets

Voggenreiter, Oliver; Bleuler, Stefan; Gruissem, Wilhelm

doi:10.1186/1471-2105-13-S18-A10

Volume 13 Supplement 18

Highlights from the Eighth International Society for Computational Biology (ISCB) Student Council Symposium 2012

Meeting abstract
Open access
Published: 14 December 2012

Exact biclustering algorithm for the analysis of large gene expression data sets

Oliver Voggenreiter¹,
Stefan Bleuler² &
Wilhelm Gruissem¹

BMC Bioinformatics volume 13, Article number: A10 (2012) Cite this article

2804 Accesses
15 Citations
Metrics details

Background

Biclustering of gene expression data is used to discover groups of genes that are co-expressed over a subset of tested conditions. The objective is to maximize the detection of significant biclusters; to do so, most approaches employ a heuristic approximation in order to avoid a non-polynomial computational complexity.

Previous algorithms have focused on enabling the discovery of biologically relevant results within the scope of single studies, where data size and complexity are limited. New methods and algorithms are required in order to enable applications of biclustering to larger scale data sets that can span multiple experiments and that are potentially far more heterogenous.

Results

The BiMax [1] algorithm uses a binary representation of the gene expression matrix that has been proven to discover enriched modules of biologically relevant genes in gene expression data. This model of biclustering allows for exact solutions, however, the BiMax algorithm performs best on a restricted size of input data. We can view the biclustering formulation of BiMax as the search for all maximal bicliques in a bipartite graph; where the nodes are genes or experiments and a connection between a gene and an experiment exists if the gene was significantly expressed in that experiment. We propose a new algorithm capable of enumerating all biclusters on such a graph. In order to solve the maximal biclique enumeration problem, we make use of the backtracking Bron-Kerbosch algorithm [2] for maximal clique enumeration. We have developed and successfully tested a new algorithm, the Bipartite Bron-Kerbosch algorithm, which uses similar principles to Bron-Kerbosch but traverses the bicliques on bipartite graphs. This approach enables the algorithm to explore all maximal bicliques without visiting branches of the search tree that contain previously discovered biclusters.

Conclusions

Our results, see Table 1, conclude that the new algorithm is significantly faster at bicluster exploration than BiMax, demonstrating a factor n improvement in running time (where n is proportional to the input data size). For instance, with input data of 800 genes and 800 experiments, BiMax solved for the over 500 thousand biclusters in just over three minutes whereas the Bipartite Bron-Kerbosch algorithm takes approximately 3 seconds.

Table 1 BiMax vs. Bipartite Bron-Kerbosch Running Times. Running times of the Bipartite Bron-Kerbosch (BBK) algorithm compared to BiMax on binary matrices derived from A. Thaliana gene expression data. Each matrix had a density of around 12% and the algorithms were given a maximum of 1 hour to complete on the same computer. The number of biclusters in each matrix is listed in the last column.

Full size table

References

Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006, 22(9):1122–1129. 10.1093/bioinformatics/btl060
Article CAS PubMed Google Scholar
Bron C, Kerbosch J: Finding All Cliques of an Undirected Graph. Communications of the ACM 1973, 16(9):3.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biology, ETH Zürich, Zürich, Switzerland
Oliver Voggenreiter & Wilhelm Gruissem
Nebion AG, Zürich, Switzerland
Stefan Bleuler

Authors

Oliver Voggenreiter
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Bleuler
View author publications
You can also search for this author in PubMed Google Scholar
Wilhelm Gruissem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oliver Voggenreiter.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Voggenreiter, O., Bleuler, S. & Gruissem, W. Exact biclustering algorithm for the analysis of large gene expression data sets. BMC Bioinformatics 13 (Suppl 18), A10 (2012). https://doi.org/10.1186/1471-2105-13-S18-A10

Download citation

Published: 14 December 2012
DOI: https://doi.org/10.1186/1471-2105-13-S18-A10

Highlights from the Eighth International Society for Computational Biology (ISCB) Student Council Symposium 2012

Exact biclustering algorithm for the analysis of large gene expression data sets

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Eighth International Society for Computational Biology (ISCB) Student Council Symposium 2012

Exact biclustering algorithm for the analysis of large gene expression data sets

Background

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us