Factor analysis (FA) as well as principal component analysis (PCA) is used to describe a number of observed variables by a smaller number of unobserved variables. Unlike PCA, FA also includes independent additive measurement errors on the observed variables. FA assumes that the observed variables become uncorrelated given a set of hidden variables called *factors*. It can also be seen as a clustering method where the variables described by the same factors are highly correlated, thus belonging to the same cluster, while the variables depending on different factors are uncorrelated and placed in different clusters.

FA has been successfully used in a number of areas such as computer vision, pattern recognition, economics and more recently in bioinformatics [1–4]. The suitability of FA for gene expression analysis is also the motivation of this work. Genes are transcribed into mRNAs which in turn are translated into proteins. Some of these proteins activate or inhibit, as transcription factors (TFs), the transcription of a number of other genes creating a complex *gene regulatory network*. The number of transcription factors is much smaller than the number of transcribed genes and most genes are regulated only by a small number of transcription factors. Hence, the matrix that describes the connections between the transcription factors and the regulated genes is sparse. Using microarrays, mRNA levels of thousands of genes can be measured simultaneously, but no direct information is obtained about TF activity. Our aim is two-fold: to identify the genes regulated by a common TF, that is, to reconstruct the connectivity structure and weights in a two-layer network, and to reconstruct the activity profile of each TF.

Liao et al. [5] have suggested the use of a network component analysis (NCA) algorithm for reconstructing the profiles of the TFs (see also [6] and [7]), while Boulesteix and Strimmer [8] have used an approach based on partial least squares regression. They have both shown that such methods can faithfully reconstruct the expression profiles of the TFs. However, both methods rely heavily on the availability of connectivity information. Nonzero positions in the *factor loadings matrix*, which describes the connections between the factors and the genes, need to be specified in advance. The algorithms then estimate the values at these positions (which might turn out to be zero). This is a strong limitation since often only little information about genes regulated by specific TFs is available. FA models are faced with a much harder task where both the structure of the factor loadings matrix and the activity profiles of the factors have to be reconstructed. Independent component analysis (ICA) has also been widely used in bioinformatics (see for example [9, 10] and [11]). This approach assumes that the transcription factors are statistically independent. A comparison of NCA and ICA can be found in Liao et al. [5], and thus ICA will not be considered further here. A further advantage of the Bayesian FA models is that any information about the underlying structure can be easily incorporated through priors. This improves performance, but is not required for the algorithms to be applicable in the first place, as in the case of NCA and its generalisations.

Hinton et al. [12] first introduced an EM algorithm for factor analysis in order to model the manifolds of digitised images of handwritten digits. Later Ghahramani and Hinton [13] presented an exact EM algorithm for both factor analyzers and mixtures of factor analyzers. More recently, Utsugi and Kumagai [14] used a Gibbs sampler instead of the EM algorithm suggested by Ghahramani and Hinton [13] for mixtures of factor analyzers. West [3] was the first to introduce Bayesian factor analysis in the bioinformatics field. To accommodate the required sparsity regarding the connections between the factors and the genes, he suggested the use of a mixture prior on the factor loadings matrix. As is shown in the results section, the predicted factor loadings matrix has the desired sparsity, at the expense of increasing computing time as the number of hidden variables increases. Recently, Sabatti and James [4] have used the framework by West [3] for the reconstruction of transcription factor profiles. In order to avoid the computational burden of estimating the factor loadings matrix at each step of the Gibbs sampler and to facilitate the reconstruction process, they set a large number of entries to zero based on information obtained from the Vocabulon algorithm [15]. This algorithm scans DNA sequences for multiple motifs and associates with each transcription factor a probability of binding to a specific site. This approach resembles the approach of Liao et al. [5], and Boulesteix and Strimmer [8] where the structure of the factor loadings matrix is given in advance.

Note that the algorithms of Ghahramani and Hinton [13], and Utsugi and Kumagai [14] have not previously been applied to biological data, and that the algorithm of Sabatti and James [4] is an adaptation of the algorithm of West [3] with the difference that an informative prior is used for the factor loadings matrix. Also, Sabatti and James [4] applied the FA model to yeast and *E. coli* data, while West [3] applied his algorithm to cancer data.

In this paper, we suggest the use of Fokoue's algorithm [16] as an alternative to West's algorithm [3]. This algorithm utilises a Gamma prior distribution on the variance of the factor loadings matrix that imposes the required sparsity but, at the same time, avoids the computational burden introduced by the use of a mixture prior [3]. Since this algorithm avoids the combinatorial problem of West's algorithm, a prior knowledge on the underlying model is not required. At the same time, we give a thorough review of all FA algorithms mentioned above and examine the applicability of those algorithms to biological data. To the best of our knowledge such a comparison of FA algorithms in the scope of analyzing microarray data has not been presented before. Moreover, we extend these algorithms by suggesting a further factor rotation analysis which produces additional sparsity of the factor loadings matrix. This additional sparsity not only facilitates the interpretation of the results, but it is also useful in a biological context where a very sparse matrix is required. Finally, we show that merging the information provided by each algorithm to obtain a combined result leads to better performance. The algorithms are compared based on their ability to reconstruct the underlying factor loadings matrix and the profiles of the transcription factors.

The comparison is done on both simulated data where the true answer is known and on experimental data. We evaluate the performance of the algorithms on the Hemoglobin data obtained by Liao et al. [5] and on the Escherichia coli (*E. coli*) data in Kao et al. [6]. Although time series data show correlation that is ignored in a factor analysis, which in fact assumes independence across data points, we used these data sets for comparison of our results with that in Liao et al. [5], Kao et al. [6], and Boulesteix and Strimmer [8].

### Factor analysis model

Let us assume that we have a random observed vector variable *x* of *P* dimensions, *x* = (*x*
_{1},..., *x*
_{
P
})'. We denote an instance of this vector with a superscript *n* and we assume that we have *N* such instances, *x*
^{
n
}where *n* = 1,..., *N*. Similarly, *f* = (*f*
_{1},..., *f*
_{
K
})' is a vector of *K* hidden variables, known as *factors*. Note that the number *K* of factors is always smaller than or equal to the number *P* of observed variables. The factor analysis model states that the observed variables are a linear combination of the factors plus a mean and an error term. For case *n*

where *μ* = (*μ*
_{1},..., *μ*
_{
P
})' and *ε*
^{
n
}= (
,...,
)' are column vectors of dimension *P* with elements corresponding to the mean and the error of the *P* observed variables. The vector *μ* is the same for all cases. Λ is the unobserved *transition matrix* also referred to as the *factor loadings matrix*. The factor loadings matrix has *P* × *K* dimensions. That is, each column corresponds to a factor and each row corresponds to an observed variable. The entries of the factor loadings matrix indicate the strength of the dependence of each observed variable on each factor. For example, if *λ*
_{
pk
}is zero, then variable *x*
_{
p
}is independent of factor *f*
_{
k
}. In matrix form equation 1 is

where *X* = (*x*
^{1},..., *x*
^{
N
}), *F* = (*f*
^{1},..., *f*
^{
N
}), *E* = (*ε*
^{1},..., *ε*
^{
N
}), *M* = *μe*
_{
N
}with *e*
_{
N
}an *N* dimensional row vector of ones. FA models assume that the error terms *ε*
^{
n
}are independent, and multivariate normally distributed with mean zero and covariance matrix Ψ, *ε*
^{
n
}~
(0, Ψ), where Ψ = diag(
,...,
). Thus the probability distribution of *x* for each observed case *n* has a multivariate normal density given by

or in matrix notation

where tr is the trace, the sum of the diagonal elements. In the methods section, we discuss in detail the prior and posterior probabilities of the parameters *F*, *μ*, Λ and Ψ, as well as algorithms for their estimation.

### Identifiability problems

As shown in equation 5 in the methods section, the complete density of the data, when factors are integrated out, is given by a normal distribution with covariance matrix Λ*Σ*
_{
f
}Λ' + Ψ. There is a scale identifiability problem associated with Λ and Σ_{
f
}. In order to avoid this problem, we could either restrict the columns of Λ to unit vectors or set Σ_{
f
}to the identity matrix. The second approach is often preferred in factor analysis.

There is also an identifiability problem associated with equation 2. Let us assume that we have an orthogonal matrix *Q* of dimensions *K* × *K* with *QQ*' = *Q'Q* = *I*
_{
K
}. Then we can have

Λ*F* = Λ*QQ*'*F* = Λ**F**

with cov(*F**) = cov(*F*). That is, it is not possible to distinguish between Λ and all its possible orthogonal transformations Λ* based on knowledge of the product Λ*F* only. However, as we show in the results section, if the loadings matrix underlying the data generating process is sparse enough, it can often be reconstructed. This can be done either by using sparsity priors on the entries of the loadings matrix in a Bayesian setting or by orthogonal rotations enforcing sparsity (see methods section).

Note that orthogonal transformations also include permutations of the factors. Factors could be ordered by the amount of variance explained. Or, as in the case of regulatory networks, we would have to map known TFs to the inferred factors. In Sabatti and James [4], the factors are constrained by assigning a priori zero values to the factor loadings matrix. Here, we map the TFs to the inferred factors based on previous knowledge about their activity profiles, as for example reported in Kao et al. [6].