Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model
- Lujia Chen^{1},
- Chunhui Cai^{1},
- Vicky Chen^{1} and
- Xinghua Lu^{1}Email author
https://doi.org/10.1186/s12859-015-0852-1
© Chen et al. 2016
Published: 11 January 2016
Abstract
Background
A living cell has a complex, hierarchically organized signaling system that encodes and assimilates diverse environmental and intracellular signals, and it further transmits signals that control cellular responses, including a tightly controlled transcriptional program. An important and yet challenging task in systems biology is to reconstruct cellular signaling system in a data-driven manner. In this study, we investigate the utility of deep hierarchical neural networks in learning and representing the hierarchical organization of yeast transcriptomic machinery.
Results
We have designed a sparse autoencoder model consisting of a layer of observed variables and four layers of hidden variables. We applied the model to over a thousand of yeast microarrays to learn the encoding system of yeast transcriptomic machinery. After model selection, we evaluated whether the trained models captured biologically sensible information. We show that the latent variables in the first hidden layer correctly captured the signals of yeast transcription factors (TFs), obtaining a close to one-to-one mapping between latent variables and TFs. We further show that genes regulated by latent variables at higher hidden layers are often involved in a common biological process, and the hierarchical relationships between latent variables conform to existing knowledge. Finally, we show that information captured by the latent variables provide more abstract and concise representations of each microarray, enabling the identification of better separated clusters in comparison to gene-based representation.
Conclusions
Contemporary deep hierarchical latent variable models, such as the autoencoder, can be used to partially recover the organization of transcriptomic machinery.
Keywords
Yeast Transcription Gene expression Transcriptomic machinery Signal transduction Deep learning Deep hierarchical neural network Unsupervised learning Data miningBackground
A cell constantly responds to its changing environment and intracellular homeostasis. This is achieved by a signal transduction system that detects the signals, assimilates the information of diverse signals, and finally transmits its own signals to orchestra cellular responses. Many of such cellular responses involve tightly regulated transcriptomic activities, which can be measured by microarray or RNA-seq technology and used as readouts reflecting the state of the cellular signaling system.
Reverse engineering the signaling system controlling gene expression has been a focus area of bioinformatics and systems biology. However, this task is significantly hindered by the following difficulties: 1) a transcriptomic profile of a cell (with contemporary technology, often a population of cells) at a given time represents a convolution of all active signaling pathways regulating transcription in the cells, and 2) the states of the majority of these signaling pathway are not observed, making it a challenging task to infer which genes are regulated by a common signal pathway, and it is even more challenging to reveal the relationships among signaling pathways.
Different latent variable models, such as principle component analysis [1], independent component analysis [2], Bayesian vector quantizer model [3], network component analysis [4]–[6], and non-negative matrix factorization [5], [6] models have been applied to analyze transcriptomic data, with an aim to represent the states of latent pathways using latent variables. Despite the different strengths and limitations of these models, they share a common drawback: the latent variables in these models are assumed to be independent, i.e., the latent variables are organized in single “flat” layer without any connection among them; as such the models lack the capability of representing the hierarchical organization of cellular signaling system.
In this family of deep hierarchical models, multiple layers of hidden (latent) variables are organized as a hierarchy, which can be used to capture the compositional relationships embedded in the transcriptomic data in a distributed fashion, i.e., different layers can capture different degrees of detail. For example, the relationships between TFs and their target genes can be captured by a hidden variable layer (hereafter referred to as hidden layer) immediately above the observed the layer of observed gene expression variables, whereas the function of pathways regulating TFs can be represented by higher hidden layers. Therefore, deep hierarchical models provide an abstract representation of the statistical structure of the transcriptomic data with flexibility and different degrees of granularity. We hypothesize that, if accurately trained, a deep hierarchical model can potentially represent the information of real biological entities and further reveal the relationships among them.
In this study, we designed and trained a sparse deep autoencoder model to learn how the information is encoded in yeast cells when subjected to diverse perturbations. Our results indicate that deep learning models can reveal biologically sensible information, thus learning a better representation of the transcriptomic machinery of yeast, and we believe that the approach is applicable to more complex organisms.
Methods
In this study, we investigated using the autoencoder model [7] and sparse autoencoder model [8] to represent the encoding system of the signal transduction systems of yeast cells. Before introducing the autoencoder model and sparse autoencoder model, we will first briefly review restricted Boltzmann machines (RBMs) as building blocks for the autoencoder.
Restricted Boltzmann Machines (RBMs)
In this equation, the binary state of visible variable i is represented by v_{i}, the binary state of hidden variable j is by h_{j} and the model parameters are θ = {a, b, W}. The bias for visible variable i is a_{i} the bias for hidden variable j is b_{j} and the weight between visible variable i and hidden variable j is w_{ij}.
where σ(x) is the logistic function 1/(1 + exp(−x)), m is the total number of visible variables and n is the total number of hidden variables.
The efficient algorithm for learning parameters of the RBM model was introduced in detail in literature and our previous work [7], [9], [10].
Autoencoder
Unlike a RBM, which captures the statistical structure of data using a single layer of hidden nodes, an autoencoder uses multiple layers in a distributed manner, such that each layer captures the structure of different degrees of abstraction. As shown in Fig. 1c, an autoencoder contains one visible (input) layer and one or more hidden layers. To efficiently train the autoencoder, we treat it as a series of two-layered restricted Boltzmann machines (RBM) stacked on top of each other [7], [9]. The inference of the hidden node states and learning of model parameters are performed by learning the RBM stacks bottom-up, which is followed by a global optimization of generative parameters using the back-propagation algorithm. More details of the algorithm and pseudo code for training an autoencoder were discussed in both literature and our previous work [7], [9], [10].
Sparse autoencoder
Where λ is the regularization constant and p is a constant (usually representing the percent of nodes desired to be on) controlling the sparseness of the hidden units h_{j}. For the traditional RBM, the parameters are updated just based on the gradient of the log-likelihood term. But for the sparse RBM, the parameters are updated not only based on the gradient of the log-likelihood term but also the gradient of the regularization term.
Non-negative matrix factorization
Given that the gene expression data is represented as matrix V, NMF factorizes it into a basis matrix (W) and a coefficient matrix (H). All three matrices should have non-negative elements. The number of hidden regulators is pre-defined, and is usually much smaller then the number of genes. In this study, we used the Matlab function nonnegative matrix factorization “nnmf” to perform NMF analysis.
Model selection of autoencoder and sparse autoencoder
We performed a series of cross-validation experiments to search for an “optimal” structure for autoencoders and sparse autoencoders. We adopted a four-layered autoencoder to represent the hierarchical structure of biological processes shown in Fig. 1c. We then explored models with different numbers of hidden units in each hidden layer. We set the initial structure of both autoencoder and sparse autoencoder to the following ranges: h^{(1)}: 100–428; h^{(2)}: 50–100; h^{(3)}: 50; and h^{(4)}: 25. We iteratively modified the structure of the model by changing the number of hidden nodes within a layer using a step size of 50 for the first and second hidden layer. Then we explored all combinations in the range stated above. In this case, the total number of models tested is 14 (7*2) for both autoencoder and sparse autoencoder. For the sparse autoencoder, we chose three sparsity constants that are 0.05, 0.1 and 0.15. Under each particular setting, we performed ten fold cross-validation to assess the performance of a model.
where $\widehat{\mathrm{L}}$ is the maximized value of the likelihood function of the model, k is the number of free parameters to be estimated, n is the number of samples, p is the probability predicted from the model for a gene to be active in an experiment, and m is the true binary state of a unit in the input data.
Mapping between the hidden units and known biological components
Based on the weights between each hidden unit in the first hidden layer and all the visible units (genes), we used a threshold (top 15 % of the absolute values of weights) to cut the edges between a hidden node and the observed genes, such that an edge indicates that the hidden node regulates the gene. We then identified all genes predicted to be regulated by a hidden node as a gene set. Based on the DNA-Protein interaction table [14], [15], we also identified the gene set regulated by a known TF. We then assessed the significance of overlapping of gene sets regulated by hidden nodes and TFs using hypergeometric testing.
Consensus clustering of experiment samples
Consensus cluster clustering [16] was used to cluster the experiment samples using different datasets as input. The R implementation of ClusterCons [17] was downloaded from CRAN (https://cran.r-project.org/src/contrib/Archive/clusterCons/). The inputs for consensus clustering are the samples represented using original gene expression values, NMF megagenes values and the states of hidden variables under all experiment samples respectively. The partition around medoids (PAM) and K-means algorithms were used as base clustering algorithms. The inputs for cluster by cluster consensus clustering are the samples represented using samples clusters derived from the nodes from different hidden layers as features. If one sample belongs to a sample cluster, its input value is 1. Otherwise, its input value is 0.
Finding pheromone related hidden units
We calculated the significance between the state of a hidden node and the state of proteins related to pheromone signaling pathway by using the chi-square test. First, we used a threshold (top 15 %) to designate the state of a hidden unit as active or inactive based on its activation probability. Then, for each hidden unit, we created a contingency table to collect the counts of the joint state of the hidden node and whether any member of the pheromone pathway is perturbed in a specific experiment. We used the contingency table to perform the chi-square test. We used a p-value of 0.01 as the significance threshold.
Gene ontology analysis
GO [18] provides a standard description of what gene products do and where they are located. One of the frequently used databases that provide GO information for yeast is Saccharomyces Genome Database SGD. We first used the combination of weights [19] between neighboring hidden layers to get the weights between the hidden units in a particular hidden layer and the genes. A gene is regarded as being regulated by a hidden unit if their weight is in the top 15 % of all weights. When a gene set of interest associated with a hidden unit is available, we used the method mentioned in [20] to summarize the GO terms capturing as much as semantic information associated with those genes [21]. We identified the GO terms that could summarize the largest number of genes, while undergoing a minimal information loss.
Results and discussion
Training different models for representing yeast transcriptomic machinery
We collected a compendium of 1609 yeast cDNA microarrays from the Princeton University Microarray Database (puma.princeton.edu), and we combined them with 300 microarrays from the study by Hughes et al. [22], which was used in a previous study of the yeast signaling system [23]. The combined dataset is ideal for studying the yeast signaling system because it represents a large collection of perturbation experiments that are of biological interest. For example, the data from the study by Hughes et al. [22] were collected from yeast cells with genetic perturbations (deletion of genes) or chemical treatments, and similarly the microarrays from the database were collected from specific conditions and contrasted with “normal” growth condition. Taking advantage of the experiment-vs-control design of cDNA microarrays, we identified differentially expressed genes (3-fold change) in each array and retained 2228 genes that were changed in at least 5 % of the microarrays. We then represented the results of each microarray experiment as a binary vector, in which an element represented a gene, and its value was set to 1 if the gene was differentially expressed, and 0 otherwise. Thus, each microarray represented the transcriptomic changes in response to a certain condition, presumably regulated by certain specific signaling components, which is unknown to us.
We investigated the utility of the autoencoder model (also known as deep belief network) [9], with one observed layer representing the microarray results and 4 hidden variable layers (hereafter referred to as hidden layers) representing the yeast signaling components in yeast transcriptomic machinery. In this model, a hidden node is a binary variable, which may reflect the state of a collection of signaling molecules or a pathway, such that the switching of the node state between 1 and 0 can reflect the changing state of a pathway.
The probabilistic distribution of the state of a node in a given layer is dependent on the nodes in the adjacent parent layers, defined by a logistic function. The directed edges between nodes of adjacent layers indicate that, conditioning on the state of nodes in parent layer, the nodes in a child layer are independent. In other words, the statistical structure (patterns of joint probability of nodes) among the nodes in a child layer is captured by the nodes in the parent layer. For example, in our case, if the nodes in the 1^{st} hidden layer (directly above the gene expression layer) represent the states of transcription factors, then co-differential expression (covariance) of a set of genes is solely dependent on (or explained by) the TFs that regulate the genes. Similarly, the co-regulation of TFs is determined by its parent layer, which may reflect the state of signaling pathways. Thus, this model is suited to capture the context-specific changes and compositional relationship among signaling components in a distributed manner. The model is referred to as autoencoder because, when given a collection of observed data, it learns to use hidden nodes to encode the statistical structure of observed data, and it is capable of probabilistically reconstructing the observed data.
Since the autoencoder model in our study is biologically motivated, we hypothesize that the nodes in the first hidden layer would likely capture the signal of TFs. Thus the number of nodes in this layer should be close to the number of known TFs for yeast, of which there are around 200 well-characterized yeast TFs [24]. However, for a given microarray from a perturbation experiment, genes that respond to a specific perturbation are likely regulated by a few transcription factors. Thus we also investigated a model referred to as sparse autoencoder [8], [25], which performs regularized learning of model parameters and allows a user to constrain the percent of nodes in a layer that can be set to the “on” state, see Methods for details. In our experiment, we constrained that, in the first hidden layer, around 10 % of hidden nodes should be used to encode the changes in a microarray.
Reconstruction error of models with different architectures
Reconstruction error | Architecture 1 (100:100:50:25) | Architecture 2 (150:100:50:25) | Architecture 3 (214:100:50:25) | Architecture 4 (428:100:50:25) |
---|---|---|---|---|
Autoencoder training | 150.20 | 150.23 | 148.94 | 150.81 |
Autoencoder test | 188.57 | 189.12 | 190.76 | 189.63 |
Sparse autoencoder training (0.1) | 170.19 | 170.07 | 170.41 | 171.94 |
Sparse autoencoder test (0.1) | 206.58 | 208.40 | 203.99 | 203.27 |
BIC scores of different models
Arch 1 (100,100,50,25) | Arch 2 (214, 100, 50, 25) | |
---|---|---|
Autoencoder | 3.25e + 006 = 1.99e + 006 + 1.26e + 006; | 4.41e + 006 = 1.71e + 006 + 2.70e + 006; |
Sparse autoencoder | 2.06e + 006 = 1.93e + 006 + 1.26e + 005; | 1.96e + 006 = 1.69e + 006 + 2.70e + 005; |
Distributed representation enhances discovery of signals of TFs
Indeed, the results indicate that sparse autoencoder is capable of capturing and representing the information of TFs, in that there is an almost one-to-one mapping between hidden nodes and known TFs as shown in Fig. 3a. For a few hidden variables that are significantly mapped to multiple TFs, we further investigated if these TFs are members of known TF complexes [27]. As an example, we found that a hidden node is significantly mapped to AFT1 and PUT3, which are two yeast TFs known to cooperatively regulate genes involved in ion transportation [28]. As another example, a hidden node is mapped to both MSN2 and MSN4 [29], which belong to the same family and form hetero-dimers to regulate genes involved in general stress response.
To demonstrate the advantage of hierarchical and distributed representations, we compare our results with another latent variable model commonly used to represent microarray data, the non-negative matrix factorization model (NMF) [6]. The NMF can be thought of as a model consisting of an observed layer (gene expression) and a single hidden layer (hidden variables/metagenes [5]), which is used to capture all signals in embedded in microarrays, whereas the same information is distributed to multiple layers of hidden variables in the sparse autoencoder. We trained a NMF model with 214 “metagenes”, which is the same as the number of hidden nodes in the 1^{st} hidden layer of the sparse autoencoder, and the results of mapping between latent factors and TFs are shown in Fig. 3b.
Indeed, the results clearly demonstrate the expected difference between the two models. With the capability of capturing the context-specific and compositional relationship of signals regulating expression in a distributed manner, the hidden nodes in the 1^{st} hidden layer clearly capture the specific signals of TFs, whereas the signals regulated TFs are delegated to the higher level hidden nodes. In contrast, with only a single layer of latent variables, all signals in the data need to be “squeezed” into these latent variables, such that a latent factor (a “metagene”) has to represent the signal of multiple TFs. Therefore, the results support our hypothesis that, benefitting from the distributed representation of the statistical structures across multiple hidden layers, the sparse encoder can concisely learn and represent the information of biological entities, in this case the TFs.
Latent variables can capture the information of signaling pathways
We further investigated whether certain hidden nodes can represent the states of well-known yeast signaling pathways, i.e., whether the state of a hidden node can be mapped to the state of a collection of proteins in a pathway. In a previous study [23], we were able to recover the pheromone signaling pathway and a set of target genes whose transcription were regulated by the pheromone pathway, by mining the systematic perturbation data from the study by Hughes et al. [22] in which 14 genes involved in yeast pheromone signaling were perturbed by gene deletion. In the current study, we identified the microarray experiments in which the aforementioned 14 pheromone-pathway genes were perturbed, and we examined if the state of any hidden node was statistically associated with perturbation of pheromone pathway, using the chi-square test (see Methods). Interestingly, we found 2 hidden nodes in the 1^{st} hidden layer that are significantly associated with the perturbation experiments, one with a chi-square test p ≤ 2.47e-05, and the other with a p ≤ 3.82e-02. Further inspecting the genes predicted to be regulated by these hidden nodes, we found a significant overlap between the pheromone target genes from our previous study and the genes regulated by these hidden nodes (data not shown). These results indicate that the hidden nodes of the sparse autoencoder model are capable of capturing the signals of specific yeast pathways. However, it should be noted that, by design, a hidden node in the high level layers of the sparse autoencoder might encode the signals of multiple pathways that share strong covariance.
The hierarchical structure captures signals of different degrees of abstraction
Concise representation enhances the discovery of global patterns
The results clearly demonstrate that, if samples are represented in the high-dimensional gene expression space, the majority of the samples cannot be grouped into distinct clusters. When we use the low-dimensional NMF metagenes space, it performs slightly better than using original gene expression space. But, the clusters derived are still not well separated. In contrast, when samples were represented using the expected states of hidden nodes of the 1^{st} hidden layer, the samples can be consistently separated into distinct clusters. Representing samples using the expected states of the nodes from other hidden layers also produced clearly separated clusters (data not shown). The results indicate that the states of hidden nodes are much more informative in terms of representing distinct characteristics of individual samples, thus enabling clean separation of samples by the clustering algorithms. Although it would be interesting to systematically inspect the common characteristics of the samples in terms of whether the perturbation experiments affect common signaling pathways, such an analysis requires broad and in depth knowledge of yeast genetics, which is beyond the expertise of our group.
Information embedded in data is consistently represented in different hidden layers
Conclusion
In this study, we investigated the utility of contemporary deep hierarchical models to learn a distributed representation of statistical structures embedded in transcriptomic data. We show that such a model is capable of learning biologically sensible representations of the data and revealing novel insights regarding the machinery regulating gene expression. We anticipate that such a model can be used to model more complex systems, such as perturbed signaling systems in cancer cells, thus directly contributing to the understanding of disease mechanisms in translational medicine.
Declarations
Acknowledgements
The authors would like to thank Dr. Ruslan Salakhutdinov for discussion. This work is partially supported by the following NIH grants: R01LM012011, R01LM010144, R01LM011155. The publication cost is covered by the NIH grant R01LM011155.
Declarations
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 1, 2016: Selected articles from the Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016). The full contents of the supplements are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/17/S1.
Authors’ Affiliations
References
- Raychaudhuri S, Stuart JM, Altman RB. Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput. 2000:455–466.Google Scholar
- Liebermeister W: Linear modes of gene expression determined by independent component analysis. Bioinformatics. 2002, 18 (1): 51-60. 10.1093/bioinformatics/18.1.51.View ArticlePubMedGoogle Scholar
- Lu X, Hauskrecht M, Day RS. Modeling cellular processes with variational Bayesian cooperative vector quantizer. Pac Symp Biocomput. 2004:533–544.Google Scholar
- Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci U S A. 2003, 100 (26): 15522-7. 10.1073/pnas.2136632100.View ArticlePubMedPubMed CentralGoogle Scholar
- Brunet JP, Tamayo P, Golub TR, Mesirov JP: Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci U S A. 2004, 101 (12): 4164-9. 10.1073/pnas.0308531101.View ArticlePubMedPubMed CentralGoogle Scholar
- Devarajan K: Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology. Plos Comput Biol. 2008, 4 (7): e1000029-10.1371/journal.pcbi.1000029.View ArticlePubMedPubMed CentralGoogle Scholar
- Hinton GE, Osindero S, Teh YW: A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18 (7): 1527-54. 10.1162/neco.2006.18.7.1527.View ArticlePubMedGoogle Scholar
- Lee HE, Ekanadham, C, Ng, A.Y. Sparse deep belief net model for visual area V2. Advances in Neural Information Processing Systems 2008.Google Scholar
- Hinton GE, Salakhutdinov RR: Reducing the dimensionality of data with neural networks. Science. 2006, 313 (5786): 504-7. 10.1126/science.1127647.View ArticlePubMedGoogle Scholar
- Chen L, Cai C, Chen V, Lu X. Trans-species learning of cellular signaling systems with bimodal deep belief networks. Bioinformatics. 2015.Google Scholar
- Goh HT, Thome, N, Cord M. Biasing Restricted Boltzmann Machines to Manipulate Latent Selectivity and Sparsity. NIPS. 2010.Google Scholar
- Vincent P, Larochelle H, Bengio, Y.. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th international conference on Machine learning 2008.Google Scholar
- Posada D, Buckley TR: Model selection and model averaging in phylogenetics: Advantages of akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst Biol. 2004, 53 (5): 793-808. 10.1080/10635150490522304.View ArticlePubMedGoogle Scholar
- Huang SS, Fraenkel E: Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks. Sci Signal. 2009, 2 (81): ra40-10.1126/scisignal.2000350.View ArticlePubMedPubMed CentralGoogle Scholar
- Yeger-Lotem E, Riva L, Su LJ, Gitler AD, Cashikar AG, King OD, et al: Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet. 2009, 41 (3): 316-23. 10.1038/ng.337.View ArticlePubMedPubMed CentralGoogle Scholar
- Monti S, Pablo T, Mesirov J, Golub T: Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Mach Learn. 2003, 52 (1–2): 91-118. 10.1023/A:1023949509487.View ArticleGoogle Scholar
- Simpson TI, Armstrong JD, Jarman AP: Merged consensus clustering to assess and improve class discovery with microarray data. BMC bioinformatics. 2010, 11: 590-10.1186/1471-2105-11-590.View ArticlePubMedPubMed CentralGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-9. 10.1038/75556.View ArticlePubMedPubMed CentralGoogle Scholar
- Dumitru EC, A., Bengio,Y. Understanding representations learned in deep architectures. Technical report,1355, Universite de Montreal/DIRO. 2010.Google Scholar
- Chen V, Lu X: Conceptualization of molecular findings by mining gene annotations. BMC Proc. 2013, 7 (Suppl 7): S2-10.1186/1753-6561-7-S7-S2.View ArticlePubMedPubMed CentralGoogle Scholar
- Lu S, Lu X: Integrating genome and functional genomics data to reveal perturbed signaling pathways in ovarian cancers. AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science. 2012, 2012: 72-8.PubMedPubMed CentralGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, et al: Functional discovery via a compendium of expression profiles. Cell. 2000, 102 (1): 109-26. 10.1016/S0092-8674(00)00015-5.View ArticlePubMedGoogle Scholar
- Lu S, Jin B, Cowart LA, Lu X: From data towards knowledge: revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data. PLoS ONE. 2013, 8 (4): e61134-10.1371/journal.pone.0061134.View ArticlePubMedPubMed CentralGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, et al: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104. 10.1038/nature02800.View ArticlePubMedPubMed CentralGoogle Scholar
- Ng A. Sparse autoencoder. CS294A Lecture notes. 2011.Google Scholar
- Barron AR, Rissanen J, Yu B: The Minimum Description Length Principle in Coding and Modeling. Information Theory, IEEE Transactions on. 1998, 44 (6): 2743-60. 10.1109/18.720554.View ArticleGoogle Scholar
- Hill CS, Marais R, John S, Wynne J, Dalton S, Treisman R: Functional analysis of a growth factor-responsive transcription factor complex. Cell. 1993, 73 (2): 395-406. 10.1016/0092-8674(93)90238-L.View ArticlePubMedGoogle Scholar
- Alkim C, Benbadis L, Yilmaz U, Cakar ZP, Francois JM: Mechanisms other than activation of the iron regulon account for the hyper-resistance to cobalt of a Saccharomyces cerevisiae strain obtained by evolutionary engineering. Metallomics. 2013, 5 (8): 1043-60. 10.1039/c3mt00107e.View ArticlePubMedGoogle Scholar
- Fabrizio P, Pozza F, Pletcher SD, Gendron CM, Longo VD: Regulation of longevity and stress resistance by Sch9 in yeast. Science. 2001, 292 (5515): 288-90. 10.1126/science.1059497.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.