Discovering biclusters in gene expression data based on highdimensional linear geometries
 Xiangchao Gan^{1},
 Alan WeeChung Liew^{2}Email author and
 Hong Yan^{3, 4}
DOI: 10.1186/147121059209
© Gan et al; licensee BioMed Central Ltd. 2008
Received: 27 July 2007
Accepted: 23 April 2008
Published: 23 April 2008
Abstract
Background
In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for submatrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns.
Results
In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transformbased hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes.
Conclusion
We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.
Background
In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification [1, 2]. In many situations, an interesting cellular process is active only under a subset of conditions, or a single gene may participate in multiple pathways that may or may not be coactive under all conditions [3, 4]. In addition, the data to be analyzed often include many heterogeneous conditions from many experiments. In these instances, it is often unrealistic to require that related genes behave similarly across all measured conditions and conventional clustering algorithms, such as the kmeans and hierarchical clustering algorithms [5, 6] and the selforganizing map [7], often cannot produce a satisfactory solution.
In this work, we deal with numerical biclusters only. There are also works [9, 10] that focus on biclusters containing symbolic data or the socalled coherent evolution biclusters, where the evolution (i.e., up, down, or no change) of the elements in a numerical data matrix is considered instead of the numerical values themselves. We choose to focus on the numerical data based on the following considerations. First, a numerical biclustering algorithm can be used to analyze symbolic data by assigning appropriate numerical values to the symbols. Second, many gene expression data analysis tasks, such as gene regulation network analysis, require numerical biclustering results.
Previous work on biclustering
Throughout the paper, we use F ∈ ℜ^{N × M}to denote a gene expression data matrix with N genes and M arrays or experiment conditions. In the matrix F, a row F_{ i }∈ ℜ^{1 × M}represents the expression of the gene i in M arrays. For simplicity, we only introduce biclustering algorithm for constant/coherent rows below, the corresponding algorithm for constant/coherent columns is similar and can be easily deduced.
Bicluster of constant values is obviously the simplest type. A bicluster of constant values can be modeled as
F(i, j) = u_{ IJ }+ ε(i, j),
where u_{ IJ }is the typical value of the bicluster and ε(i,j) is a small perturbation. Hartigan [11] split the original matrix into a predetermined set of submatrices, and use the variance to evaluate each submatrix $VAR(I,J)={\displaystyle \sum _{i\in I,j\in j}F(i,j){u}_{IJ}}$ to determine whether a bicluster should be accepted.
If the noise is additive, a bicluster of constant rows can be modeled as
>F(i,j) = u_{ IJ }+ f_{ i }+ ε(i,j).
where f_{ i }is the ith row offset. The straightforward method to detect a bicluster of constant row is to normalize the rows of the bicluster using the row mean. By doing so, a bicluster of constant row can be transformed into a bicluster of constant values and hence becomes detectable using algorithms for biclusters of constant values. Getz et al. [12] have developed a method based on this consideration and even extended it to detect biclusters of coherent values. However, methods based on data normalization have a dilemma: for a good normalization, we need to estimate the parameter f_{ i }for each row of a bicluster. However, for an accurate estimate of f_{ i }, we need to know the location of a bicluster, which is exactly the problem we need to solve. The noise ε(i,j) in the data further complicates the estimation of f_{ i }. Instead of relying on data normalization, Califano et al. [13] have developed a method to find some small biclusters first with each row satisfying
>max(F(i,j))  min(F(i,j)) <δ, ∀j ∈ J
and then add additional rows or columns into it to produce a bicluster that is as large as possible. Sheng et al. [14] have assumed that the multinomial distributions for different columns in a bicluster are mutually independent and used the Gibbs sampling for parameter estimation.
A bicluster of additive coherent values with additive noise can be modeled as
F(i, j) = u_{ IJ }+ f_{ i }+ g_{ j }+ ε(i, j).
Cheng and Church [15] are the first who applied biclustering to microarray data analysis. In their method, the mean squared residue $\frac{1}{\leftI\right\leftJ\right}{\displaystyle \sum _{i\in I,j\in J}\epsilon {(i,j)}^{2}}$ in (2) is minimized. Cho et al. [16] have improved this meansquaredresidue based method by using the variance as the second measure. Lazzeronic and Owen [17] have introduced a plaid model and proposed the general additive model to identify biclusters of constant rows, constant columns and additive coherent values. Prelic et al. [18] have compared many biclustering algorithms using the additive model.
A bicluster of multiplicative coherent values with additive noise can be modeled as
F(i, j) = u_{ IJ }× f_{ i }× g_{ j }+ ε(i, j)
Kluger et al. [19] have studied the checkerboard structure of this type of biclusters using a normalization scheme based on the above equation. Tang et al. [20] have developed a method to compute the cosine value of the angle between each normalized row vector and a predefined stable pattern and then measure the similarity between two rows or two columns. Getz et al. [12] have introduced the Couple TwoWay Clustering by repeatedly performing oneway clustering on the rows and columns of the data matrix.
Madeira and Oliveira [8] are the first to classify many existing numerical biclustering algorithms systematically based on the additive and multiplicative bicluster models. It should be pointed out that some symbolic, coherent evolution or numerical biclusters, such as those produced by cMonkey [9], SAMBA [10] and some statistical criteria, cannot be classified as additive or multiplicative patterns directly. For example, in cMonkey, additional information besides the usual gene expression value, such as motif cooccurrence and association network relationships, are taken into account. Moreover, cMonkey attempts to ensure that a greater percentage of genes that are observed in the data set are included in at least one cluster, while reducing redundancy between overlapping biclusters and maximizing the number of conditions that are included in each bicluster. These features cannot be modeled directly using the additive and multiplicative coherent patterns.
Although the classification into additive or multiplicative patterns is not perfect, it is nevertheless applicable to many existing biclustering algorithms, which can all be formulated using the general linear model proposed in this paper. In fact, in most biclustering algorithms that deal with expression values only, the underlying theme is the coherency in expression values within the biclusters. Our general linear model of Fig. 2(f) therefore conveniently captures the zero and first order coherent relationships within a bicluster.
A highdimensional geometric method for biclustering
As pointed out in [8], existing approaches are often based on searching for submatrices within a data matrix by optimizing certain heuristically defined merit functions. Obviously, the form of the merit function depends greatly on the bicluster pattern to be uncovered. In these methods, when the data contain different types of biclusters, multiple merit functions or different data normalizations or transformations are needed. This often results in a high computational complexity and the optimization procedure is NPhard in general can be easily trapped at a local optimal point.
In this paper, we extend our previous work [21] and present a novel perspective for biclustering problem through a geometric interpretation. Such a new perspective allows us to regard biclusters with different coherent patterns as hyperplanes in a high dimensional space, and facilitates the use of any generic plane finding algorithm for detecting them. The geometric viewpoint of our approach provides a unified framework to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. It also reveals the existence of the general linear model, which can unify the additive and the multiplicative models. As a particular realization of our framework, we implemented a Hough Transformbased hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm is highly effective for gene expression data biclustering.
Results
We tested our algorithm using synthetic dataset and human lymphoma dataset. For synthetic dataset, we use a test model proposed in [14], but deal with both additive and multiplicative biclusters. In the Gibbs sampling method [14], only additive biclusters are used. For human lymphoma dataset, we detect biclusters based on additive, multiplicative and general linear models, and investigate whether the detected biclusters are biological meaningful. Our experiments show that the proposed linear coherent model can produce biologically significant groups enriched by the genes in biclusters.
Synthetic dataset
In this experiment, the three biclusters contain additive coherent values, and both the Gibbs sampling method [14] and our algorithm can identify all of them, but with different accuracies. The Gibbs sampling method misses 2 genes in bicluster 2 and 4 genes in bicluster 3, whereas our algorithm detects all genes perfectly (Fig. 3). Interestingly, a new bicluster with 3 conditions and 60 rows was also reported by our method (Fig. 3f). This bicluster is located in the overlap region of biclusters 2 and 3 and comprises of last three columns of bicluster 2 and first three columns of bicluster 3 and all rows of the two biclusters. Although unexpected, this is a reasonable result since the extra bicluster detected is a valid bicluster by itself. In contrast, the Gibbs sampling method fails to detect this extra, but valid bicluster. The detection of this new bicluster further shows the efficacy of our algorithm in handling overlapping biclusters.
Biological Data: Human Lymphoma Dataset
We apply our algorithm to the lymphoma dataset [22]. This dataset is characterized by well defined expression patterns differentiating three types of lymphoma: diffuse large Bcell lymphoma, chronic lymphocytic leukaemia and follicular lymphoma. The dataset consists of expression data from 128 Lymphochip microarrays for 4026 genes in 96 normal and malignant lymphocyte samples. Missing values in the dataset are imputed using POCSimpute [23].
We compare our algorithm with six existing algorithms, i.e., OPSM [24], Bimax [18], Iterative Signature Algorithm, ISA [25], SAMBA [10], Cheng and Church's algorithm, CC [15] and xMotif [26], using the procedure proposed by Prelic et al. [18]. Since most existing numerical biclustering algorithms do not detect biclusters with general linear coherent values, we only compare the performance for the additive model. Similar to the validation method proposed by Tanay et al. [10], we investigate whether the gene groups produced by different algorithms show significant enrichment with respect to a specific Gene Ontology (GO) annotation. We know that biclustering algorithms aim to classify the genes involved in the same Molecular Function or Biological Process into a group, so a better biclustering algorithm can find more or larger groups that show significant enrichment. Specifically, in our experiment, biclusters are evaluated by computing the hypergeometric functional enrichment score [27] based on the GO Biological Process annotations, and the resulting scores are adjusted for multiple testing using the Westfall and Young procedure [27, 28].
In Additional File 1, we provide the algorithmic parameters used in the experiment for the lymphoma dataset. In our experiment, more than 600 biclusters are detected. In Additional File 2, we provide a list of all biclusters with 1 showing corresponding genes/arrays covered by the bicluster while 0 is the contrary. In Additional File 3, we selected 6 biclusters as an example for GO annotation. All the biclusters with full data are given in Additional File 4. The fullsized image of Fig. 5a is shown in Additional File 5.
Conclusion
We analyzed the different type of numerical biclusters and proposed a general linear coherent bicluster model that effectively captures the zero and first order coherent relationships within a bicluster. Then, we presented a novel interpretation of the biclustering problem in terms of the geometric distributions of data points in a high dimensional data space. In this perspective, the biclustering problem becomes that of detecting structures of known linear geometries, i.e., hyperplanes, in the high dimensional data space. We have shown that many common types of bicluster are just different spatial arrangements of the hyperplanes in the high dimensional data space. This novel perspective allows us to perform biclustering geometrically using a hyperplane detection algorithm. The experiment results on both synthetic and real gene expression datasets have demonstrated that our algorithm is very effective.
Method
Although the six patterns in Fig. 2 appear to be substantially different from each other, if we treat each measurement (column) as a variable in the 4D space [x, y, z, w] and each object (row) as a point in the 4D space, the six pattern in Figs. 2(a) to 2(f) would correspond to the following six geometric structures respectively: (a) a cluster at a single point with coordinate [x, y, z, w] = [1.2, 1.2, 1.2, 1.2], (b) a cluster defined by the lines x = y = z = w, (c) a cluster at a single point with coordinate [x, y, z, w] = [1.2, 2.0, 1.5, 3.0], (d) a cluster defined by the lines x = y  1 = z + 1 = w  2, (e) a cluster defined by the lines x = 0.5y = 2z = 2w/3, and (f) a cluster defined by the lines x = 0.5(y  0.1) = 2(z  0.1) = 2(w  0.2)/3. Each object (row) in a cluster is a point lying on one of these points or lines.
When a pattern is embedded in a larger data matrix with extra measurements, i.e., a bicluster that covers only part of the measurements in the data, the points or lines defined by the bicluster would sweep out a hyperplane in a high dimensional data space. Assume that we have a threemeasurement experiment with the measurements denoted by x, y, and z. If a bicluster covers measurements x and z, then there exists a plane where all data points in the bicluster would lie on. The plane is defined by:
β_{0} + β_{1}x + β_{3}z = 0
Based on the geometric perspective discussed above, we propose a geometric gene expression biclustering framework that involves the following two steps. First, we detect the hyperplanes that exist in the gene expression data. Then we analyze whether a required pattern exists for the genes that lie in these hyperplanes.
A powerful technique for line detection in noisy 2D images and for plane detection in noisy 3D data called the Hough transform (HT) [30] is widely used in pattern recognition. The HT has been extensively studied in image processing and is well known to be robust against noise for line detection in poor quality images. This robustness is especially useful in microarray data analysis since the data are often heavily corrupted by noise. The method has recently been applied successfully to two and threecolor microarray data analysis [31, 32]. Interested readers are referred to the survey paper [33] on the properties and general applications of the HT.
However, it may be difficult to use the standard HT for more than 3 dimensions because of the large computational complexity and storage requirement. In this work, we use the Fast Hough transform (FHT) [34] as our plane detection algorithm since it gives considerable speedup and requires less storage than the conventional HT. The FHT has a very simple and efficient highdimensional extension. Furthermore, the FHT uses a coarsetofine mechanism and provides good noise resistibility. In the following, we briefly discuss the basic principles of the FHT.
Plane detection using the fast Hough transform
We use {F_{0}, F_{1}... F_{M1}} to denote the coordinates of M arrays. For each gene j {j = 1, 2... N}, the expression vector is given as [F_{0}(j), F_{1}(j), ..., F_{M1}(j)].
In a 2D space, a line can be described by
y = mx + c,
where (m, c) are two parameters: the slope and the intercept of the line with y axis. However, a problem with the (m, c) parameterization of lines is its inability to describe vertical lines, i.e., m → ∞. Therefore, Eq. (5) is only used for lines with m ≤ 1. The second equation that swaps the roles of the x and y axes,
x = m'y + c'
is used for lines with m > 1. With m ≤ 1 and m' < 1, Eq. (2) and (3) describe all lines in a 2D space without overlap. A similar method can be used to describe hyperplanes in a high dimensional space. In this paper, this parameterization method is used for our hyperplane detection algorithm.
We find that the parameters {β_{1}, β_{2}, ..., β_{ M }} are given by the intersection of many hyperplanes given by Eq. (8).
Suppose that we know the initial ranges of value {β_{1}, β_{2}, ..., β_{ M }} are centered at {P_{1}, P_{2}, ..., P_{ M }} and with halflength {L_{1}, L_{2}, ..., L_{ M }}. We can divide these ranges into very small "array accumulators" so that each array accumulator can determine a unique array of values {β_{1}, β_{2}, ..., β_{ M }} within the acceptable tolerance. According to Eq. (8), one feature point in the observed signal space is mapped into many points (e.g., hyperplanes) in the parameter space. An accumulator in the parameter space containing many mapped points (e.g., the intersection of many hyperplanes) reveals the potential feature of interest.
According to above analysis, the FHTbased plane detection method includes three parts. First, we need a hyperplane formulation as in Eq. (8). Second, we divide the parameter space into accumulators that is small enough so that the desired resolution is satisfied. Third, for the accumulators, let every point in the observed data vote for them. If the votes that an accumulator receives is more than a selected threshold, we detect a hyperplane in the observed data space as given by Eq. (7), where the values of {β_{1}, β_{2}, ..., β_{ M }} are given by the accumulator. Now we introduce each part of the algorithm in details.
Hyperplane formulation
where X_{ i }is the ith dimension of the parameter space. Each a_{ i }(j) is a function of observed feature points and is normalized such that ${\sum}_{i=1}^{k}{a}_{i}^{2}(j)=1$. The initial range for each X_{ i }is an interval of length 2, with center at P_{ i }/L_{ i }. All these ranges comprise a hypercube in the parameter space (X_{1}, ...., X_{ k }).
Vote counting scheme
As mentioned before, every point in the observed data votes for supporting accumulators. We know that each accumulator corresponds to a group of range values of (X_{1}, X_{2}, ..., X_{ M }). For each point j in the observed data, if $\sum _{i=1}^{k}{a}_{i}(j){X}_{i}+{a}_{0}(j)=0$ can be satisfied when the values of (X_{1}, X_{2}, ..., X_{ M }) lie in this accumulator, and it will give a vote to this accumulator. An accumulator receiving votes more than a threshold reveals a corresponding hyperplane in the observed data space.
If Eq. (12) is satisfied, gene j will give a vote to the corresponding accumulator.
Ktree representation
For simplicity, we have assumed above that the parameter space was directly divided into very small accumulators. Actually, this is not necessary. The FHT algorithm recursively divides the parameter space into hypercubes from low to high resolutions. It performs the subdivision and the subsequent "vote counting" is done only in hypercubes with votes exceeding a selected threshold. This hierarchical approach leads to a significant reduction in both computational time and storage space compared to the conventional HT.
where S_{l+1}is the side length of the child at level l+1 and S_{l+1}= S_{ l }/2.
gene j will generate a vote for this child node.
According to the above analysis, the FHT is a mapping from an observed data space into a parameter space. Each feature point in the data space generates "votes" for a set of subareas (hypercubes) in the parameter space. A subarea in the parameter space that receives many votes reveals the feature of interest. The FHT algorithm recursively divides the parameter space into hypercubes from low to high resolutions. It performs the subdivision and the subsequent "vote counting" is done only in hypercubes with the number of votes exceeding a selected threshold. A hypercube with acceptable resolution and with votes exceeding a selected threshold indicate a detected hyperplane in the observed data.
The proposed geometric biclustering algorithm and parameter selection
To summarize, when given a set of genes expression data [F_{0}(j), F_{1}(j), ..., F_{M1}(j)], j = 1, 2,..., N under diverse experimental conditions, our geometric biclustering algorithm can be summarized as follows:
 (1)
The minimum votes count "T" as threshold and the desired finest resolution "q".
 (2)
A transformation that maps gene expression data [F_{0}(j), F_{1}(j), ..., F_{M1}(j)] into a hyperplane in the parameter space represented by $\sum _{i=1}^{k}{a}_{i}(j){X}_{i}+{a}_{0}(j)=0$ for j = 1,2,⋯,N. Based on the transformation, determine the initial bound of each X_{ i }and the root hypercube.
 (1)
Map gene expression data onto the parameter space.
 (2)
Compute the initial normalized distance from the hyperplane to the root node and perform the voting procedure for the root node. For each gene, if Eq. (16) is satisfied, add one to the vote count of the root node. If the vote count for root node is larger than the threshold T and the resolution is coarser than q, subdivide the root node into the Ktree child nodes.
 (3)
Vote for each child node and subdivide them if needed. A similar voteandsubdivide mechanism is performed for each new node until no new node appears.
 (4)
When there is no node with resolution equal to q and the vote count larger than T, record the node with the finest resolution. This is the most probable solution. When there are several nodes with resolution equal to q and vote counts larger than T, collect the planes associated with these nodes that have the same genes into a bundle.
 (5)
For each bundle of hyperplanes, check the common conditions (variables) and compare the hyperplanes with the models corresponding to different types of biclusters. A bundle of hyperplanes that are not consistent with any patterns in Fig. 2 or the corresponding bicluster covers too few samples will be discarded. If the bundle survives this process, it will be output as a bicluster. Repeat this step until all bundles are processed.
In the procedure above, there are two parameters: minimum vote count "T" and the desired finest resolution "q". The minimum vote count "T" denotes the minimum number of genes in a bicluster. T depends on the experiment objective and may be selected by the user. For example, the minimum may be 4, that is, a bicluster must contain at least 4 genes. The desired finest resolution "q" depends on the variance of noise in the data. For a perfect bicluster (for example, a perfect constant bicluster where all values are equal), "q" can be arbitrarily large, that is, one can use an arbitrarily fine resolution. However, in practical applications, perfect biclusters are rarely found and "q" reflects how much noise (or inconsistency) is permitted in the detected biclusters. If we wish to detect strongly coherent biclusters (i.e., near perfect bicluster with very little noise), q should be set to a large number. Smaller q can be used to detect biclusters that exhibit more inconsistency due to noise. In general, larger q results in biclusters of smaller size.
In many situations, one has no knowledge about the noise in the data. An appropriate range of q can be determined experimentally to return meaningful biclusters. Recall that the FHT uses a coarsetofine mechanism. At coarse resolution, there are fewer accumulator cells and the number of hyperplanes detected is small. At finer resolution, there are more accumulator cells. However, in this case the accumulator cells are also smaller and it is more difficult for a feature point to generate a hit. Many accumulators therefore cannot gain enough votes (exceeding the threshold) to ensure the existence of the corresponding hyperplane. So, if q is set too large or too small, fewer hyperplanes will be detected. Hence, the range of q can be chosen to be one that returns a reasonably large number of hyperplanes.
Computational complexity
For FHT, the following theorem from [34] limits the computational complexity. The "thin tree" property resulting from the theorem guarantees that the complexity of the FHT does not go beyond the bound due to the chosen q.
Theorem [34]:
The FHT algorithm is highly parallel. As shown, the processing for the hypercubes or accumulators is independent of each other. Furthermore, the intersection test for a hyperplane does not depend on that of other hyperplanes. Actually, in our implementation, some simple multiprocessing optimization, such as OpenMP or OpenMPI library, can achieve a high level of speedup.
In the above discussion, we assume that all the possible linear hyperlanes are to be detected using the FHT. In practice, detecting a small portion of hyperplanes is already enough for our biclustering algorithm. For example, in a dataset [F_{0}(j), F_{1}(j), ..., F_{M1}(j)], {j = 1, 2, ..., N, using Equation ${F}_{0}={\displaystyle \sum _{i=1}^{M1}{\beta}_{i}{F}_{i}+{\beta}_{M}}$ we can find all the biclusters covering F_{0}. However, using Equation ${F}_{0}={\displaystyle \sum _{i=1}^{M1}{\beta}_{i}{F}_{i}+{\beta}_{M}}$ with β_{ i }= 0 or t and t is a scale, i.e., a constraint that requires all the nonzero gradients to be equal, we can also find all the bicusters covering F_{0}. The second equation can significantly lower the comptational burden^{1}. Another optimization direction is to take advantage of the property of the gene expression data. Since the gene expression data values are distributed in the range of [5 5], the hyperplanes $\sum _{i=1}^{M}0.3{X}_{i}+20=0$ or $60{X}_{1}+{\displaystyle \sum _{i=2}^{M}0.3{X}_{i}+0.2=0}$ do not have any practical significance and can be disregarded. So scanning the dataset to determine the range of hyperplane parameters before biclustering can significantly lower the computational burden.
In certain special cases, we can simplify the problem according to the bicluster model. For example, if we extract biclusters of constant row, we only need to detect all the hyperplanes with $\sum _{i=1}^{M}{a}_{i}{X}_{i}=0$, a_{ i }= 0, 1 or 1, and if we extract multiplicative biclusters, we only need to detect those hyperplanes without intercept.
In term of CPU time, our algorithm is computationally intensive in its unoptimized general form. Based on the complexity of the FHT, the computational demands of the proposed biclustering algorithm depends on how many biclusters exist in the dataset. To give an indication of the computational cost, we run the unoptimized algorithm on a small test dataset on a personal computer (Linux OS with 2.0 G Intel Core 2 Duo processor and 1 GB memory) and record the CPU time.
We randomly select 16 conditions in Human Lymphoma Dataset to produce a 4026 × 16 matrix. The CPU time for over 800 biclusters is 1953 seconds (32.55 minutes). We can adjust the parameters to exclude small and noisy biclusters and reduce the computing time. For example, the CPU time reduces to 397 seconds (6.62 minutes) if we discard biclusters with less than 8 conditions.
For larger dataset, we need to run our algorithm on a computer cluster. For the entire 4026 × 96 Human Lymphoma Dataset, we run our algorithm on a computer cluster of 8 nodes with 2 processors each and it takes about 22 hours. Hence, the proposed algorithm is very timeconsuming for large datasets if we search through the entire highdimensional Hough space to obtain the optimal solution and detect all possible additive and multiplicative coherent patterns in the data.
The computing time can be substantially reduced if we allow the solution to the suboptimal. For example, we can divide 96 conditions into 6 sets with 16 conditions in each set. Then, only 39.7 (6 × 6.62) minutes are needed on Linux computer described above for the biclustering process. The biclusters from the 6 sets can then be combined. Such a strategy has already been used in [18]. We can also consider two conditions at a time and then combine subbiclusters gradually to form large ones [32].
Note
^{1} This method is easy to implement by only testing the hyperplane/accumulator with equal nonzero gradients. Assume there are t subranges for each β_{ i }. If we do not consider the coarsetofine optimization of FHT, the first equation need to process t ^{ M }accumulator while the second equation only need to process about t^{2}*2^{M1}. In the case of t = 5, M = 40, the computational burden of the second scheme is 1.5*10^{15} times that of the first scheme.
Abbreviations
 GO:

Gene Ontology, 2D: Two dimensional, 3D: Three dimensional, NP: Nondeterministic polynomial time, HT: Hough transform.
Declarations
Acknowledgements
This work is supported by a grant from the Hong Kong Research Grant Council (project CityU122506). X.Gan is now supported by EPSRC grant EP/D062012/1.
Authors’ Affiliations
References
 Rew DA: DNA microarray technology in cancer research. European Journal of Surgical Oncology. 2001, 27 (5): 504508.View ArticlePubMed
 Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286 (5439): 531537.View ArticlePubMed
 Laub MT, McAdams HH, Feldblyum T, Fraser CM, Shapiro L: Global analysis of the genetic network controlling a bacterial cell cycle. Science. 2000, 290 (5499): 21442148.View ArticlePubMed
 DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278 (5338): 680686.View ArticlePubMed
 Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nature genetics. 1999, 22 (3): 281285.View ArticlePubMed
 Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns. Proceedings of the National Academy of Sciences of the United States of America. 1998, 95 (25): 1486314868.PubMed CentralView ArticlePubMed
 Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America. 1999, 96 (6): 29072912.PubMed CentralView ArticlePubMed
 Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform. 2004, 1 (1): 2445.View ArticlePubMed
 Reiss DJ, Baliga NS, Bonneau R: Integrated biclustering of heterogeneous genomewide datasets for the inference of global regulatory networks. BMC bioinformatics. 2006, 7: 280PubMed CentralView ArticlePubMed
 Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002, 18 (Suppl 1): S136144.View ArticlePubMed
 Hartigan JA: Direct Clustering of a Data Matrix. Journal of the American Statistical Association. 1972, 67 (337): 123129.View Article
 Getz G, Levine E, Domany E: Coupled twoway clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97 (22): 1207912084.PubMed CentralView ArticlePubMed
 Califano A, Stolovitzky G, Tu Y: Analysis of gene expression microarrays for phenotype classification. Proceedings of the International Conference on Intelligent Systems for Molecular Biology. 2000, 7585.
 Sheng Q, Moreau Y, De Moor B: Biclustering microarray data by Gibbs sampling. Bioinformatics. 2003, 19 (suppl_2): ii196205.PubMed
 Cheng Y, Church GM: Biclustering of Expression Data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. 2000, AAAI Press
 Cho H, Dhillon I, Guan Y, Sra S: Minimum sum squared residue coclustering of gene expression data. Proceedings of the Fourth SIAM International Conference on Data Mining. 2004, 114125.
 Lazzeroni L, Owen AB: Plaid models for gene expression data. Statistica Sinica. 2002, 12 (1): 6186.
 Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006, 22 (9): 11221129.View ArticlePubMed
 Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 2003, 13 (4): 703716.PubMed CentralView ArticlePubMed
 Tang C, Zhang L, Zhang A, Ramanathan M: Interrelated twoway clustering: an unsupervised approach for gene expression data analysis. Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering Conference. 2001, 4148.View Article
 Gan X, Liew AWC, Yan H: Biclustering gene expression data based on a high dimensional geometric method. Proceedings of the International Conference on Machine Learning and Cybernetics. 2005, 33883393.
 Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X: Distinct types of diffuse large Bcell lymphoma identified by gene expression profiling. Nature. 2000, 403 (6769): 503511.View ArticlePubMed
 Gan X, Liew AWC, Yan H: Microarray missing data imputation based on a set theoretic framework and biological knowledge. Nucleic Acids Res. 2006, 34 (5): 16081619.PubMed CentralView ArticlePubMed
 BenDor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the orderpreserving submatrix problem. Proceedings of the Sixth Annual International Conference on Computational Biology. 2002, 4957.
 Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network. Nature genetics. 2002, 31 (4): 370377.PubMed
 Murali TM, Kasif S: Extracting conserved gene expression motifs from gene expression data. Proceedings of the Pacific Symposium on Biocomputing. 2003, 7788.
 Berriz GF, King OD, Bryant B, Sander C, Roth FP: Characterizing gene sets with FuncAssociate. Bioinformatics. 2003, 19 (18): 25022504.View ArticlePubMed
 Westfall PH, Young SS: Resamplingbased multiple testing: examples and methods for Pvalue adjustment. 1993, New York, Chichester, Wiley
 Bondell HD, Reich BJ: Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR. Biometrics. 2008, 64 (1): 115123.PubMed CentralView ArticlePubMed
 Ballard DH, Brown CM: Computer vision. 1982, Englewood Cliffs, N.J., PrenticeHall
 Zhao H, Yan H: HoughFeature, a novel method for assessing drug effects in threecolor cDNA microarray experiments. BMC Bioinformatics. 2007, 8: 256PubMed CentralView ArticlePubMed
 Zhao H, Liew AW, Xie X, Yan H: A new geometric biclustering algorithm based on the Hough transform for analysis of largescale microarray data. J Theor Biol. 2008, 251 (2): 264274.View ArticlePubMed
 Illingworth J, Kittler J: A survey of the Hough transform. Comput Vision Graph Image Process. 1988, 44 (1): 87116.View Article
 Li H, Lavin MA, Master RJL: Fast Hough transform: A hierarchical approach. Comput Vision Graph Image Process. 1986, 36 (2–3): 139161.View Article
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.