Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization

Background The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis. Results We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm. Conclusion We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.


Gene expression matrix
Data from microarray experiments [2,3] is frequently given as a large matrix showing expression levels of genes (rows) under different experimental conditions (columns). The so-called gene expression data can thus be written as a matrix of size m × n where m is the number of genes and n is the number of experimental conditions. Typically m is much greater than n. For example, (m, n) is (6220, 15) and (4026, 47) respectively for the time series yeast samples [4] and lymphoma specimens [5]. One of the challenges in microarray data analysis is to identify groupings of genes with similar behaviours/functions. Several clustering algorithms have been applied to DNA gene expression data to identify biologically relevant groupings based on similarity in expression profiles [6][7][8][9][10]. However, traditional clustering techniques are global in nature in which the expression patterns are grouped either along the entire row or along the entire column [1,11]. This implies that one would find the grouping of genes that would express similarly for all conditions, or the groupings of conditions in which all genes exhibit similar behaviour. However, in practice only a subset of genes is highly correlated under a subset of conditions. This requires simultaneous clustering along both the row and column directions, and is often called biclustering [11][12][13][14][15][16]. A bicluster often exhibit certain kinds of homogeneity, for example constant level of expression throughout the whole bicluster (constant bicluster), constant level of expression along either rows or columns (constant rows and constant columns), and rows/columns that are related by additions or multiplications [15], as shown in Figure 1. We have recently shown that the different bicluster patterns have a simple geometric interpretation as linear objects in a high dimensional feature space [14,15]. A comprehensive survey on different biclustering algorithms was given in references [11,13,16].

Parallel coordinate plots
The parallel coordinate (PC) technique is a powerful method for visualizing and analyzing high-dimensional data under a two-dimensional setting [17,18]. In this technique, each dimension is represented as a vertical axis, and then the N-dimensional axis is arranged in parallel to each other. By giving up the orthogonal representation, the number of dimensions that can be visualized is not restricted to only two [19][20][21]. Studies have found that geometric structure can still be preserved by the PC plot despite that the orthogonal property is destroyed [17][18][19][20][21]. In gene expression matrix, each gene is represented by a vector of conditions (i.e., row) and each condition is considered as a vector of genes (i.e., column). Since gene expression data always involves a large number of genes as well as a certain number of experimental conditions, the PC technique is well suited to their analysis. Moreover, visualization of gene expression data is an important problem for biological knowledge discovery [22]. Thus, the PC plots have been studied for gene expression data visualization [23,24]. Further details about visualization of biclusters using PC plots are provided in Additional file 1. In section "Method", a new greedy algorithm for bicluster identification is presented. Meanwhile, an interactive approach of parameter determination for the proposed biclustering algorithm based on PC visualization is discussed.

Identification of biclusters from difference matrix
The biclusters given in Figure 1(A)-(D) can be described by an additive model in which each pair of rows has the same difference in all the related columns or each pair of columns has the same difference in all the related rows. Thus, a difference matrix, each column of which represents the column differences between a pair of columns in a data matrix, provides useful information for identification of additive-related biclusters. Consider the data in Figure 2, there are two biclusters: the first one (shown in blue color) is a constant bicluster while the second one (shown in yellow color) is an additive-related bicluster. As the rows in a bicluster is supposed to correlate in a subset of columns, the column difference between every two columns is computed so as to identify this column subset. There are altogether 6(6-1)/2 = 15 permutations as shown in the difference matrix in Figure 3. In the difference matrix, we can find special features that are related to the biclusters. For example, consider column "C5-C3". There are only three distinct difference values: 0 (5 counts), 1 (1 count), 2 (5 counts). This suggests the existence of three biclusters formed between "C5" and "C3": • the first bicluster is for rows R1, R3, R5, R9 and R11 in which the difference between "C5" and "C3" is zero, i.e., a constant bicluster;

Examples of different biclusters
• the second bicluster is for rows R2, R4, R6, R8 and R10 in which the difference between "C5" and "C3" is two, i.e., an additive bicluster; and • the third bicluster involves row R7 only, thus it is not considered to be a valid bicluster.
Analyzing the distribution along the column direction in the difference matrix thus helps to identify possible biclusters. In the above example, we have two valid biclusters. Thus, C3 and C5 are merged to form two groups as shown in Figure 4. The analysis can be repeated for each of these two groups to find out whether any other columns can be merged to {C3, C5}, i.e., using either C3 or C5 as a reference, we check whether C1, C2, C4 and C6 can be merged with {C3, C5}. In particular, if C3 is used as a reference, two difference matrices as shown in Figure  5 can be obtained. Note that their difference values can be read directly from the original difference matrix of Figure  3. By examining the first difference matrix in Figure 5, we see that two paired columns, "C1-C3" and "C2-C3", show a single bicluster with a difference value equals to zero. This suggests that columns C1 and C2 can be merged to {C3, C5} for rows R1, R3, R5, R9 and R11. The second difference matrix also has a single cluster with a difference value equal to 1 at paired column "C6-C3". Therefore, C6 can be merged to {C3, C5} for rows R2, R4, R6, R8 and R10. Thus by this repeated bicluster growing processexpanding the column set and refining the row set, we can identify possible biclusters embedded in the dataset. Also, note that the difference matrix needs to be calculated only once. This greatly reduces the computational complexity of our algorithm.

Proposed algorithm for additive models
Additive-related biclusters can be found by progressively merging columns through studying the data distribution along each column in the difference matrix. If there is just one bicluster between two columns in the gene expression matrix, the distribution will have a single peak in one of the columns of the difference matrix. Related rows for this bicluster can then be identified. If there are multiple biclusters formed between two columns in the gene expression matrix, we can separate the rows into different groups by examining the distribution in the corresponding columns of the difference matrix. Therefore, by analyzing the distributions of difference values along columns of the difference matrix, peaks that correspond to different biclusters can be identified.
An example of gene expression matrix with two embedded biclusters  C1 C2 C3 C4 C5 C6  R1  1  1  1  5  1  0  R2  1  3  2  2  4  3  R3  1  1  1  2  1  2  R4  3  1  3  6  5  4  R5  1  1  1  0  1  3  R6  2  3  3  1  5  4  R7  0  3  6  7  7  1  R8  4  5  2  1  4  3  R9  1  1  1  3  1  3  R10  6  0  1  6  3  2  R11  1  1  1  2  1  4 The difference matrix for the dataset shown in Figure 2  The difference matrix for the dataset shown in Figure 2. 0  0  2  0  2  0  2  0  2  2  0  2  -2  0  2  R10  -6  -5  0  -3  -4  1  6  3  2  5  2  1  -3  -4  -1  R11  0  0  1  0  3  0  1  0  3  1  0  3  -1  2  3 An overview of the procedure of our proposed biclustering algorithm is shown in the flow chart provided in Figure 6 while the details are described in the pseudo-code in Figure 7. There are four parameters in our algorithm: noise threshold ε, minimum number of rows N r , minimum number of columns N c and maximum bicluster overlap in percentage P o . The parameter ε specifies the noise tolerance as well as the homogeneity in the identified biclusters. On the other hand, N r and N c set the lower bounds of the number of rows and columns of the identified biclusters respectively. P o determines the maximum degree of overlap between identified biclusters. More specifically, no overlap exceeding P o percentage in both the row and column dimensions simultaneously is allowed. In this paper, a bicluster with a subset of rows R and a subset of columns C is denoted by (R, C). At the beginning, the first-level difference matrix D 1 is calculated for the input expression matrix E as described in line 4 in Figure 7. Supposed that E has size m rows by n columns. There is altogether n(n -1)/2 different number of permutations so the size of D 1 is m × n(n -1)/2. In order to derive possible biclusters, a simple clustering algorithm can be applied to identify clusters for each column (lines [6][7][8][9][10][11][12]. Let X = {x 1 , x 2 ,...,x N } be a set of N expression values. By comparing x i with all values in X, a set of values S i similar to x i can be found as follows, (1) where i = 1, 2, ..., N. Also, the set of indices Q i associated with the values in S i can be obtained. Q i can be expressed by As an example, given that X = {1, 2, 9, 3} and ε = 2. S 2 = {1, 2, 3} and Q 2 = {1, 2, 4}. A clustering algorithm based on equation (1) would generate N clusters but these clusters may be very close to each other and have large overlap. In order to reduce unnecessary clusters, we adopt a two-step clustering approach presented in lines 56-84. In addition to the definitions in (1) and (2), let us denote the current collections of clusters and corresponding sets of indices by S and Q, which are both set to be empty initially. In the first step (lines 58-75), for i = 1, 2, ..., N, x i and its associated cluster S i are tested for the following three conditions with each S j ∈ S: (1) |x i -| ≥ ε where denotes the average operation of a set.

S j •
The two different groups formed by merging columns "C5" and "C3" Figure 4 The two different groups formed by merging columns "C5" and "C3".
The flow chart of the proposed biclustering algorithm Figure 6 The flow chart of the proposed biclustering algorithm.
The pseudo-code of the proposed biclustering algorithm Figure 7 The pseudo-code of the proposed biclustering algorithm. (3) .
If the above three conditions are satisfied for all S j ∈ S, S i and Q i are added to S and Q respectively. In the second step (lines 76-82), the clusters are refined. Denote the sets of output clusters and the corresponding indices by S' and Q' respectively. First, S' and Q' are set to be empty. For each S j ∈ S, a new cluster S' j is derived as The corresponding set of indices Q' j is given by S' j and Q' j are added to S' and Q' respectively if |S j '| ≥ N r . For the first-level difference matrix D 1 , each Q' j contains the row indices of the cluster S' j . Each column of D 1 consists of difference values between column i and j of the original expression matrix. Define the collection of row indices sets of the clusters to be U ij . After finding all U ij for all distinct column pairs (i, j), the row indices set of the clusters and their associated column pairs are collected to form a list of possible biclusters L 1 which can be expressed by .., n -1 and j = i As one always tries to find the biggest bicluster, a sorting is performed for the possible biclusters in L 1 based on the number of rows in line 13 so that a bicluster with the largest number of rows can be processed first.
Starting from the biggest bicluster l 1 in the sorted list of possible biclusters L 2 , the second-level difference matrix D 2 is formed as in line 24 in which one of the bicluster columns (column c k1 or c k2 ) is compared with all the remaining columns on those chosen rows (e.g. difference matrices illustrated in Figure 3). Note that the secondlevel difference matrix D 2 can be obtained directly from the first-level difference matrix D 1 . Before D 2 calculation, early termination can be introduced as presented in lines 17-23 as an optional step. In the early termination, the biclusters in L 2 which significantly overlap with the identified biclusters are skipped as they are unlikely to derive a well-distinguishable bicluster according to the given parameter P o . Similar to the clustering done for D 1 , clustering and sorting are performed for D 2 as described in lines 26-33. As a result, a list of possible column segments H 2 for growing the current bicluster is obtained. In lines 34-41, a possible bicluster (R, C) is constructed based on the row intersection with each column segment in H 2 . Initially, (R, C) is set to be the current bicluster l k in L 2 . If the size of the row set R does not fall below the user-defined threshold N r after the row intersects with a column segment, the column is included in C and R is updated. Otherwise, the process is moved to the next column segments until the last one is examined. Finally, the bicluster is validated with respect to the given requirements in bicluster size and degree of overlap as depicted in lines 42-50. Only a valid bicluster is output (lines 51-53).

Relation to existing δ-pCluster approaches
The proposed algorithm identifies biclusters which are homogeneous in each column pair. In this section, we show that the biclusters can be expressed as δ-pClusters [25]. Hence, any sub-matrix in an identified bicluster has similar homogeneity to that bicluster and the problem of outliers as in Cheng and Church algorithm [12] can be avoided. Denote a bicluster with a subset of rows U and a subset of columns V by B = (U, V). The bicluster B is a δ-pCluster if for each 2 × 2 sub-matrix M, the following condition holds where , a ij denotes a value of the expression matrix at position (i, j), i, m ∈ U and j, n ∈ V. In our algorithm, the clustering (the second step) performed in the second-level difference matrix ensures that there exists a column k ∈ V such that |a ij -a ik -l jk | <ε for ∀j ∈ V and some constant I jk (7) where ε is the noise threshold parameter of the proposed algorithm. Hence, for any i, m ∈ U, we have where the last inequality follows from inequality (7). For a column n ∈ V with n ≠ k, using inequality (8), it is shown that |a ij -a in -(a mj -a mn )| = |a ij -a ik -(a mj -a mk ) + a ik -a in -(a mk -a mn )| ≤ |a ij -a ik -(a mj -a mk )| + |(a ik -a in ) -(a mk -a mn )| < 4ε (9) This means that the bicluster B is a δ-pCluster with δ = 4ε. Although the biclusters identified by our algorithm are δ-pClusters, it should be emphasized that our algorithm is not designed specially for detecting δ-pClusters but rather is based on the clustering results in the difference matrix. Hence, there are some differences between our bicluster- ing strategy and the other δ-pClusters algorithms like pCluster algorithm [25] and S. Yoon et al. approach [13]. Specifically, our algorithm takes into account the cluster density in which cluster centroids are considered. In contrast, the other two δ-pCluster based algorithms rely only on the inter-distances between elements in the difference matrix as defined by the inequality (6). This results in an exponential-time complexity in the worst case. Our proposed algorithm can be regarded as a greedy version of the other two algorithms. In particular, for each column-pair bicluster, our proposed algorithm derives a possible bicluster by greedily finding a larger column set through sequential intersection with other column-pair biclusters. The large column-pair biclusters usually contain the whole or a large part of the true gene set. On the other hand, these simplifications significantly reduce the complexity from exponential-time to polynomial-time.

Complexity estimation
In general, a biclustering problem is NP-complete [11]. However, we have adopted a simple clustering algorithm and bicluster growing strategy to reduce the complexity. Given a matrix of size m × n, the complexity of obtaining the difference matrix is O(mn 2 ). The simple clustering algorithm applied on each column requires operations on the order of O(m 2 ) because it involves comparing the value of each element with the others and the centroids of the found clusters. In addition, the total number of clusters found would not exceed m. Therefore, the complexity in obtaining clusters in the difference matrix is O(m 2 n 2 ) and the number of clusters is at most mn(n-1)/2. The sorting of the clusters requires a complexity of O(mn 2 log mn 2 ). After that, each identified cluster is used as a seed to construct a bicluster. In the biclusters growing process, a seed is first checked if it has significant overlap with other identified biclusters for early termination. The overlapping in rows can be checked by sorting followed by element-wise comparison. The complexity is thus O(mlogm).
For columns, as a seed has only two columns, the complexity is O(n). Note that the number of identified biclusters is bounded by the number of seeds. Thus, the complexity for checking overlaps in all identified biclusters is O(mn 2 (n + m log m)). If the seed is valid, a submatrix of the difference matrix is extracted as the secondlevel difference matrix. This step requires no arithmetic operations due to data reuse. Clustering and sorting procedure are then performed on this second-level difference matrix. As the matrix has n-1 columns only, the clustering and the sorting processes need operations on the order of O(m 2 n) and O(mn log(mn)), respectively. Note that there are at most (n-1)m clusters detected in the second-level difference matrix. In the bicluster construction, row intersection is performed. In total, the complexity is O(m 2 n log m). Finally, the new identified bicluster is validated (i.e. filtered) with respect to the number of columns and degree of overlap with other biclusters. The validation requires an additional complexity of O(mn 2 (m log m + n log n)). Among the operations for obtaining each biclusters from the first-level difference matrix, the validation step dominates. So the entire processing for bicluster formation from seeds is O(m 2 n 4 (m log m + n log n)). Since this cost dominates all other costs in previous steps, our algorithm has a polynomial-time complexity of O(m 2 n 4 (m log m + n log n)). The above estimation shows the worst case complexity, in which the validation process dominates. In practice, the number of biclusters is far less than mn(n -1)/2. Moreover, some of the validation steps can be avoided through early termination of invalid biclusters. Elimination of invalid biclusters reduces the number of potential biclusters and this in turn reduces the complexity inside the validation step.

Modification for multiplicative models
As seen in Figure 1(E), a multiplicative-related bicluster is a bicluster in which any two rows are related by the same ratio in all the related columns or any two columns are related by the same ratio in all the related rows. In order to modify the proposed framework for multiplicative models, the difference matrix is replaced by a ratio matrix which is in the form of c i /c j or c j /c i for all the n(n -1)/2 distinct combinations between columns i and j where c k represents the values in the k-th column. In practice, we select the column which has the largest average magnitude as the denominator because quotient is sensitive to noise when the divisor is small. Thus, the major change for detecting multiplicative-related biclusters is to replace the difference matrix by a ratio matrix. Note that the complexity for multiplicative models is essentially the same as that for additive models.

Interactive adjustment of noise threshold using PC plots
The setting of the noise threshold ε is important for the proposed algorithm as it balances the homogeneity requirement and the noise tolerance in the identified biclusters. The noise threshold is determined through visual inspection of the homogeneity of the detected biclusters in the PC plots [26]. The PC visualization for a data matrix embedded with biclusters can be found in Additional file 1. Consider a noisy 100 × 10 dataset which contains uniformly distributed values between -5 and 5 embedded with a 30 × 4 additive-related bicluster shown in Figure 8. Furthermore, an additive Gaussian noise with variance of 0.2, which was chosen empirically for clear demonstration, was introduced. In this example, we var- We can see that these three rows are missed because the noise threshold is not large enough. In practice, since we do not know the bicluster in advance, we should adopt an exploratory approach for setting the parameter ε. Start with the current value of ε, we gradually increase ε while visualizing the bicluster using the PC plot. Initially, we would see more and more related rows being included into the bicluster. Then, at some point, unrelated rows start to creep into the bicluster. When this is observed in the PC plot, we stop increasing the noise threshold. Using this procedure, we found that when ε is set to 1.5, all the rows are correctly detected. This example shows that the PC plot can be a powerful visualization and interactive tool that allows us to examine the biclusters found.

Evaluation methods
We analyze the performance of our algorithm on both artificial datasets and a real dataset. For artificial datasets, biclusters information is known in advance. So accuracy in bicluster discovery can be measured using the overall match score [16]. The overall match score of a set of biclusters M 1 with respect to another set of biclusters M 2 is defined as, where and are gene and condition match scores respectively. is calculated as, The PC plots of the bicluster identified using noise threshold of 1. , the red color indicates rows of the true bicluster that are found by our algorithm while the blue color represents the three missed rows of the true bicluster.
The PC plot of the true additive-related bicluster Figure 8 The PC plot of the true additive-related bicluster.
where a bicluster with a subset of genes U i and a subset of To unify the two measures into a single quantity for evaluation, their average is computed as the biclustering accuracy.
The performance of the proposed algorithm for artificial datasets has been compared with two existing algorithms with the additive model assumption, namely the Cheng and Church (C&C) algorithm [12] and the pCluster algorithm [25]. We considered the biclustering accuracy together with other measures such as number of biclusters, bicluster size and processing time. The programs for both algorithms are publicly available [27,28]. The proposed algorithm was implemented in a C MEX-file and ran in Matlab 6.5. All the experiments were conducted on the Window XP platform in a computer with 2.4 GHz Intel Pentium 4 CPU and 512 MB RAM. In identification of multiplicative-related biclusters, since C&C algorithm and the pCluster algorithm are designed for additive models, logarithm operation was applied to the expression data so that the multiplicative models become additive models. For comparison, we also applied the proposed algorithm for additive models to the logarithm values. Henceforth, the proposed algorithm for additive models and multiplicative models will be referred to as PA and PM respectively while the proposed algorithm for additive models with the logarithm operation as pre-processing will be referred to as PAL.
The evaluation on real datasets was performed on three aspects: biological, homogeneity and statistical assessment. In the biological assessment, we used the Gene Ontology (GO) annotations [29] to determine the functional enrichment of biclusters. The measure was the percentage of overrepresented biclusters in one or more GO annotation. A bicluster is said to be overrepresented in a functional category if it gives a small p-value. Given that a bicluster B with k genes is identified in a gene expression matrix with a gene set S of size N. For a functional category with C genes in S, the bicluster B possesses r genes. The p-value is defined as the probability of choosing k genes from S with r genes in that category [30], i.e., In other words, the p-value is the probability of including genes of a given category in a cluster by chance. Thus, the overrepresented bicluster is a cluster of genes which is very unlikely to be obtained randomly. The annotations consist of three ontologies, namely biological process, cellular component and molecular function.
For the homogeneity aspect, mean squared residue score (MSRS) [12] and average correlation value (ACV) [31] were computed. For an m × n bicluster, the MSRS is defined as where a ij is the value of the bicluster at position (i, j), is the average of the i-th row, is the average of the j-th column and is the overall average. ACV is defined by where c_row ij is the correlation coefficient between rows i and j and c_col pq is the correlation coefficient between columns p and q. ACV is applicable to additive models as well as multiplicative models but the MSRS is valid only for additive models. In order to measure homogeneity of multiplicative-related biclusters, logarithm was applied onto the expression values before calculating MSRS values so that a multiplicative-related bicluster can be formulated using an additive model. In order to avoid confusion, the MSRS for the logarithm of expression values is denoted by MSRS l . A bicluster with high homogeneity in expression levels should have a low MSRS/MSRS l value but a high ACV value. The minimum value of MSRS/ MSRS l is zero while ACV has a maximum value of one.
The statistical properties of the biclustering results refer to quantities including the number of discovered biclusters and the bicluster size. Comparative studies were performed in the three aspects with several existing biclustering algorithms such as C&C, iterative signature algorithm (ISA) [32,33], order-preserving submatrix (OPSM) approach [1] and xMotifs [34], which are available in [27]. In addition, the computational complexity of the proposed algorithm and other approaches is estimated using processing time as done for the artificial datasets. Despite the dependence of factors such as programming language and parameter settings, a rough comparison in complexity can still be achieved.

Datasets
Two types of artificial datasets were considered, one for the additive models and the other for the multiplicative models. The first type of dataset TD1 had a size of 200 rows by 40 columns. Uniformly distributed random values were first generated. Then four biclusters were embedded. Their details are as follows: • bicluster A is a constant row bicluster of size 40 × 7; • bicluster B is a constant row bicluster of size 25 × 10; • bicluster C is a constant column bicluster of size 35 × 8; and • bicluster D has coherent values related by additions of size 40 × 8.
Biclusters A and B have two columns in common but in different rows; bicluster B overlaps with bicluster C in five rows and three columns; biclusters C and D have one column in common but in different rows. Finally, Gaussian noise with different standard deviation (s.d.) was added to the dataset. At each non-zero noise level, five expression matrices were generated. Figure 10 shows the dataset TD1 with 4 embedded biclusters before noise was added.
The second type of dataset TD2 consists of 60 × 15 positive values embedded with two 25 × 7 multiplicativerelated biclusters. The two biclusters overlap in two columns. A positive-biased Gaussian noise was added to the dataset so that all the values in the resultant datasets remained positive. The positive-valued dataset was essential for Cheng and Church algorithm, the pCluster algorithm and our proposed algorithm for the additive models PAL due to the use of the logarithm operation. It should be noted that the proposed algorithm for multiplicative-related biclusters PM can be applied on datasets with negative values because no logarithm operation is needed. Figure 11 shows the dataset TD2 with two embedded multiplicative biclusters before noise was added. These two artificial datasets allowed us to test the performance of our algorithm in realistic situations as real expression data often involves various types of biclusters with overlaps (i.e. regulatory complexity) and noise.
The real dataset used was the yeast Saccharomyces cerevisiae cell cycle dataset as used in [12], which contains 2884 genes and 17 conditions. The non-missing values were all non-negative. As multiplicative models were also investigated, those zero non-missing values were set to some small positive values. The missing values were filled with positive uniformly distributed random values to minimize the influence to our analysis.

Performance on artificial datasets
For the artificial datasets with additive-related biclusters, biclusters with rows and columns more than or equal to 21 and 5 respectively were identified. It was further required that any detected bicluster cannot have more than 50% overlap with another bicluster simultaneously in the row and column dimensions. Since the Cheng and Church (C&C) algorithm and the pCluster algorithm cannot be directly configured to discover biclusters with all the given requirements, a post-filtering procedure was adopted to eliminate those invalid biclusters. The post-filtering parameters are provided in Table 1 together with the parameters of the biclustering algorithms. Note that parameters for noise tolerance (ε/δ) were determined for optimal performance under different noise levels. The biclustering accuracies are plotted against various noise levels in Figure 12. As can be seen, the proposed algorithm always has higher biclustering accuracy than C&C and the pCluster algorithm. For the expression dataset with noise of standard deviation at or below 0.1, we detect the four embedded biclusters perfectly. The pCluster algorithm did not attain perfect discovery even in the noise-free case because more than one maximal δ-pCluster (defined by equation (6)) exists for one or more column pair due to column overlap between some biclusters in the datasets [35]. In more noisy case such as when the noise s.d. is 0.5, the biclustering accuracy of our algorithm still has a high value of 0.89. In contrast, the accuracies of C&C and the pCluster algorithm are 0.70 and 0.26 respectively.
Statistical properties of the biclustering results before filtering are given in Table 2. Unlike the pCluster algorithm, the number of biclusters identified by the proposed algorithm is insensitive to noise level. On average, there were 6.6 biclusters identified at the highest noise level which was close to the true number 4. For the pCluster algorithm, a large number of biclusters with high overlap were detected under noisy situation. The post-filtering procedure was therefore necessary for the pCluster algorithm to extract the significant biclusters. The number of biclusters identified by C&C was 40 which is the same as that specified in its parameter setting. In fact, this parameter setting was necessary to acquire high biclustering accuracy. With respect to the biclusters size, the proposed algorithm shows the closest agreement to those embedded in the datasets. The average numbers of rows and columns in the biclustering results are always around 34 and 7.8 respectively while the actual average numbers of rows and col- The first type of dataset with four additive-related biclusters before noise is added Figure 10 The first type of dataset with four additive-related biclusters before noise is added.
umns are 35 and 8.25 respectively. The pCluster algorithm also produced good results. C&C gave the worst performance as it does not allow any constraints to be imposed on the biclusters dimensions. Therefore, the post-filtering procedure is essential for C&C to find the embedded biclusters.
For the datasets with two multiplicative-related biclusters, a bicluster was considered to be valid if its size is no smaller than 18 and 4 in row and column dimensions respectively and the overlap with other valid biclusters is less than or equal to 25%. The settings for the biclustering algorithms and the post-filtering procedure are also included in Table 1. The biclustering accuracies of the proposed algorithms PAL and PM, together with C&C and the pCluster algorithm (applied on log values) at various noise levels is shown in Figure 13. At all the noise levels, our two proposed algorithms outperform C&C and the pCluster algorithm. Both PAL and PM can exactly detect the true biclusters in the noise-free case while the other two algorithms fail to do so. In particular, the failure of perfect discovery in the pCluster algorithm can be attributed to the column overlap in the datasets. The performance of PM is slightly better than that of PAL in general. The biclustering accuracy decreases when the noise level increases except in the case of C&C when noise level changes from 0.4 to 0.5. It was probably because outlier is less likely to be included in biclusters at high noise levels.
In terms of the statistical properties given in Table 3, the two proposed algorithms exhibit closest match to the true embedded biclusters, with PM performs slightly better than PAL. Similar to the case of the additive models, the proposed algorithms can return more reasonable number of biclusters with similar dimensions to those embedded than the other two algorithms without any post-filtering procedure.
In order to justify the efficiency of our proposed algorithms, processing time for the artificial datasets with The second type of dataset with two multiplicative-related biclusters before noise is added Figure 11 The second type of dataset with two multiplicativerelated biclusters before noise is added.   Table 4. The proposed algorithm PA required an average of 3.8 sec for the artificial datasets with additive-related biclusters. This showed substantial improvement over the pCluster algorithm which needed 2716 sec to finish. The reduction in computational complexity is achieved by the bicluster growing strategy in which similar patterns in column-pair are combined to form biclusters through row intersection. The proposed algorithm is also more efficient than C&C by a factor of 4.5. For datasets embedded with multiplicative-related biclusters, the matrices sizes are smaller than those used for additive-model experiments so less processing time was obtained in all the algorithms. However, it can be seen that the proposed approach PM has the lowest computational complexity. The average processing time was 0.0232, 1 and 105.2 sec for PM, C&C and the pCluster algorithm respectively. In conclusion, the results on artificial datasets demonstrate that our proposed algorithms have high accuracy in detecting additive-related and multiplicative-related biclusters, even in the presence of overlap and noise contamination. The computational complexity of the proposed algorithms is lower than several biclustering algorithms with similar model assumption.

Performance on a real dataset
Experiments have been conducted on the yeast cell cycle dataset using the proposed algorithms and Cheng and Church (C&C) algorithm [12], iterative signature algorithm (ISA) [32,33], order-preserving submatrix (OPSM) approach [1] and xMotifs [34]. Post-filtering was applied to the biclustering results in order to eliminate insignificant biclusters as well as impose common constraints for comparison. The parameter settings of various algorithms and post-filtering are provided in Table 5. These values were selected based on the guideline in [16] and our experimental work. The functional enrichment was studied over a number of upper bounds on p-value, p 0 and illustrated in Figure 14. Compared with C&C which possesses the same model assumption as the proposed algorithm for additive model (PA), higher percentage of functionally-enriched biclusters were identified by the proposed algorithm at p 0 ≥ 5 × 10 -4 , 1 × 10 -2 and 5 × 10 -3 in the biological process, cellular component and molecular function ontologies respectively. In particular, at p 0 = 1 × 10 -2 , the percentage of functionally-enriched biclusters found by PA is 96.0%, 88.0% and 80.0% which correspond to an improvement of 18.6%, 13.8% and 9.0% to C&C in the biological process, cellular component and molecular function ontologies, respectively. At the lowest value of p 0 = 1 × 10 -5 , our proposed algorithm PA outperforms C&C in the cellular component ontology but not in Biclustering accuracy against noise level for additive models Figure 12 Biclustering accuracy against noise level for additive models. The biclustering accuracies of the proposed algorithm, Cheng and Church algorithm and pCluster algorithm are represented by the curves 'Proposed', 'C&C' and 'pCluster' respectively.  the other two ontologies. However, the reduction in the percentage of functionally-enriched biclusters is less than 7.5% in the biological process ontology and 2.5% in the molecular function ontology, which is relatively small compared with the improvement at the large values of p 0 . The homogeneous analysis provided in Table 6 shows that the biclusters identified by PA are more homogeneous than C&C with the average MSRS lower by 142.4 and the average ACV higher by 0.0199. From the statistical results in Table 7, it can be found that PA can also avoid identification of very large bicluster as is in the case of C&C. The largest bicluster size found using PA is 597 × 17 while that found using C&C is 1391 × 17.
When multiplicative model is concerned, i.e. the proposed algorithm for multiplicative model (PM) and C&C applied on log value (C&C (log)), the functional enrichment drops in general. At first glance, PM gives poorer performance in term of functional enrichment. Nonetheless, if the number of identified biclusters is also considered, PM actually outperforms C&C (log) by identifying more significant biclusters. The total number of biclusters identified by PM was 59 but CC only found 5 biclusters. In addition, the biclusters identified by PM exhibit higher homogeneity. The average values of MSRSl and ACV are 9.573 × 10-3 and 0.9219 for PM respectively. In comparison, the average values of MSRSl and ACV are 6.262 × 10-2 and 0.5740 for the C&C (log) respectively.
In addition to C&C based algorithms, Figure 14 shows  son is that xMotifs is designed to find biclusters with coherent state in each gene, which is only a subclass of additive models. The homogeneity analysis suggests that the difference in biological relevance of identified biclusters between various algorithms such as the proposed algorithm PA and ISA is not merely due to implementation architecture but also due to the model assumption.
In addition to the identification of biologically-significant biclusters, the efficiency of the proposed algorithm is justified by the processing time provided in Table 8. PA and PM require 0.72 and 1.35 sec respectively to finish. The results are the best and show improvement by a factor of at least 23.7 compared with the others. This implies that our algorithms have low computational complexity.
Details of annotation results of the proposed algorithms PA and PM are shown in Tables 9, 10, 11 and Tables 12,  13, 14 at p-value < 0.001 respectively. In these tables, Bonferroni correction of p-value which adjusts the probability of random annotation for multiple tests [30] is provided. Consideration of the corrected p-value is important when multiple terms are tested for annotation in a single bicluster. The 4-th additive-related bicluster identified by PA has the lowest p-value in all the three ontologies. For the biological process ontology, 69 out of 201 genes are assigned to category "translation" at p-value of 6.92 × 10 -44 . The annotation is also significant after multiple test correction as it has a low corrected The experiments on the real dataset show that our proposed algorithms PA and PM can identify biclusters with high biological relevance efficiently. Furthermore, PA can always give a reasonable number of biclusters, and with a good degree of homogeneity. Although GO annotation only provides descriptions currently known in the biolog-   ical community, the results still give a reasonable indication of performance. Furthermore, the biclusters which have no GO terms assigned should be investigated for any new biological discoveries.

Determination of biclusters homogeneity
In previous experiments, the homogeneity parameter, i.e.
noise threshold ε of our algorithms is determined empirically. In fact, the aforementioned exploratory approach based on the PC plots can be employed to determine this parameter in an interactive manner for a given dataset. This exploratory approach uses an assumption that the homogeneity decreases monotonically with ε while the biclustering accuracy is a concave function of ε. To see this, we apply the proposed algorithm for additive models to artificial datasets with noise s.d. of 0.3 over a wide range of ε. Figure 15 shows the graphs of biclustering accuracy and ACV against ε. The biclustering accuracy first rises rapidly to its maximum value when ε changes from 0.5 to 1. The biclustering accuracy then decreases slightly until ε becomes 1.75. A steeper drop is found when ε is larger than 1.75. In other words, the biclustering accuracy is approximately concave with respect to ε. On the other hand, when ε increases, the average ACV of detected biclusters decreases as expected. From the graph, it can be observed that the ACV decreases faster when ε exceeds 1.25. Meanwhile, the biclustering accuracy remains high for ε between 1 and 1.25. These observations support the use of the interactive approach for parameter determination.

Conclusion
In this paper, a novel biclustering algorithm for additive models is proposed. First, we performed analysis on the difference matrix computed from a gene expression matrix. It was shown that the column-wise differences of an additive-related bicluster appear as clusters in each corresponding column in the difference matrix. Similarly, clusters can be found from the column-wise ratios calculated from multiplicative-related biclusters. These observations were then explored to construct biclusters greedily from the clustering results in column-wise differences or ratios in the proposed algorithms.
The proposed algorithms have been analyzed by comparing with pCluster algorithm. The results suggest that the proposed algorithms can be regarded as a greedy version of the pCluster algorithm. The biclusters found by the proposed algorithms can be expressed as δ-pClusters but clustering density is utilized in pattern discovery. Although the identified δ-pClusters is not guaranteed to be maximal, the proposed algorithm is much more efficient.  Experiments showed that the computational time of the proposed algorithms is lower than that of the pCluster algorithm by a factor of hundreds or more. Moreover, we have verified that the worst case complexity of the proposed algorithms is polynomial-time instead of exponential-time as in the case of the pCluster algorithm or other δ-pCluster based approaches.
The robustness of our algorithms to noise and regulatory complexity has been verified empirically using artificial datasets. It was found that our algorithm is capable of discovering overlapping biclusters under noisy condition. Biological significance of biclustering results has been verified on the yeast cell-cycle dataset using Gene Ontology annotations. Comparative study shows that the proposed algorithm is the best or close to be the best one among several existing algorithms in terms of the percentage and the number of functionally-enriched biclusters for p-values below a range of value from 5 × 10 -3 to 5 × 10 -2 . In particular, there are 96.0%, 88.0% and 80.0% of the biclusters annotated with p-value below 0.01. The proposed algorithm can identify biclusters with less deviation from the additive models. The identified biclusters also have reasonable size ranged from 10 to 597 genes and 5 to 17 conditions. Comparison in processing time suggests that the proposed algorithm has the highest efficiency.
In the proposed algorithm, the noise threshold is a crucial parameter as it balances the homogeneity requirement and the noise tolerance in the identified biclusters. In order to determine an appropriate value for the noise threshold, an exploratory approach based on the PC plots is adopted. We believe that the proposed biclustering algorithm and the interactive PC plots offer an effective data analysis tool for gene expression data. In future, our research will be focused on detecting bicluster types other than additive or multiplicative models, e.g. biclusters of coherent evolution.