Gene expression information has been widely used to elucidate complex biological mechanisms, including the prediction of protein functions, the precise classification of phenotypes at the modular level, the study of expression modes under certain experimental conditions, and the reduction of experimental noise, with the ultimate aim of affecting the direction of biological research. RNA-Seq is a revolutionary DNA sequencing technology recently developed that provides a high throughput method for cDNA sequencing, generating information about mRNA content and quantifying gene expression. This kind of novel sequencing technology when contrasted with traditional microarray hybridization technology, reduces background noise and is sensitive enough to detect a wider range (>90%) of the transcriptome, even mRNA that are expressed at very low levels or that are rapidly degraded . Not only can RNA-Seq more accurately measure gene expression levels , but this new technology promises to deliver more advantages, such as investigation of alternative splicing  and allele specific expression . In addition, the combination of strand-specific array data and sequencing data reveals information on new, non-coding transcripts and gene structures distinct to each case , which largely benefits the study of condition specific sub-networks or modules in biological applications.
The widespread and growing application of RNA-Seq techniques to the study of various biological systems emphasize the need for computational methods to analyze the huge amount of RNA-Seq data, with the ultimate goal of obtaining a greater understanding of biological mechanisms at a systems level. In order to partially address this challenge, we developed and applied an array of bioinformatics methods to analyze the RNA-Seq transcriptome data obtained through studies of soybean nodulation. Soybean (Glycine max L. merr.), a major crop providing an important source of protein and oil, is very important in biological nitrogen fixation research. The symbiosis between leguminous plants and rhizobia leads to the formation of a novel root organ, the nodule. In mature nodules, rhizobia provide the host plant with ammonium, which is produced through bacterial nitrogen fixation. In recent years, research progress on understanding nodule formation has accelerated through the application of modern molecular methods. For example, using high-throughput sequencing technologies, we obtained gene expression data derived from different conditions (tissues) in soybean. With these data we constructed nodule-related gene regulatory networks as a tool to aid biologists to formulate testable hypothesis about how nodule development is regulated.
Several algorithms exist to infer regulatory networks from microarray gene expression data [5–8]. Among of them, the method based on the Bayesian probabilistic network  to infer co-regulated genes and their putative regulators, transcription factors, was successfully applied to the microarray data of a model species: Saccharomyces cerevisiae. However, the application of computational methods to predict plant gene regulatory networks is still at an early stage . Specifically, there is a lack of bioinformatic tools or integration methods to combine RNA-Seq data with other data sources to study gene modules and their regulatory relationships. In the case of soybean, the availability of the complete genome sequence [9, 10] and numerous annotation resources (e.g. SoyDB, a functional annotation database of all putative transcription factors ), makes it now possible to develop and integrate a set of bioinformatic methods to reliably construct gene regulatory modules by integrating the vast soybean RNA-Seq data with functional genomics data .
In line with an integrative bioinformatics framework for predicting gene regulatory networks from microarray gene expression data , here we developed and applied an integrated protocol for differential expression analysis, gene clustering, co-regulated gene module and regulator construction, DNA binding motif identification, and gene function prediction to construct and verify gene regulatory modules from RNA-Seq data. Although the basic framework of constructing regulatory module network is the same as our previous method  developed for microarray data, the preprocessing and normalization of RNA-Seq data, the filtering of differentially expressed genes, and the biological application are quite different. Here, considering the nodulation may have three separate stages, we separately selected the differentially expressed genes for each stage and also studied the differentially expressed genes commonly present in all the three stages. Thus, this work is a new application and adaptation of the previous framework for increasingly important RNA-Seq data analysis in a new biological context. Furthermore, we added a new random computational method to evaluate the predicted network models.
For the 10 regulatory modules constructed based on genes which respond at all the three stages of nodulation formation, we validated them from different aspects, such as, by existing literature, function enrichment and binding site analysis. The results demonstrated that we can obtain reliable results about regulatory mechanisms in the process of soybean nodulation formation by constructing regulatory networks and modules from RNA-Seq data. In addition, a series of condition specific regulatory networks and modules separately based on the different stages of nodulation were produced by our method. The experiments demonstrated that our computational methods can effectively integrate RNA-Seq transcriptome data with other data sources to construct gene regulatory networks for a cell responding to different biological conditions.