The GEDI toolbox provides an user-friendly environment to perform both well-known basic analysis and advanced methods published in the last few years. GEDI allows the analysis of gene expression data in four major steps, starting from eliminating the bias generated by the microarray technique (normalization step), followed by identification of differentially expressed genes, classification of samples based on molecular profiles to identify potential biomarkers or targets for drugs, and, finally, inferring gene functionality by constructing gene expression regulatory networks.
-
1.
Microarray data normalization: global and quantile [5] normalization methods are implemented in this version of GEDI. In addition, several normalization methods based on non-parametric regressions are also implemented, comprising the following methods: Loess [6], Splines [7, 8], Wavelets [9] and SVR (Support Vector Regression) [1]. Also, for more than two microarrays, the cyclic normalization is performed as described in [5].
-
2.
Identification of differentially expressed genes: t-test, t-test with permutation and the non-parametric Wilcoxon test with FDR (False Discovery Rate) [10] adjustment are available. Moreover, the recently published SAM (Significance Analysis of Microarray) is also available [11]. Putative differentially expressed genes are listed from the lowest to the highest significant FDR-adjusted p-value.
-
3.
Samples clustering and classification: often, a clinical interest requires identification of biomarkers, which may discriminate between pathological and normal samples. Therefore, GEDI has implemented the k-means clustering method, linear/quadratic Fisher discriminant analysis [12], hierarchical clustering [13] and the recently described SVM (Support Vector Machine) approach [14] with a cross-validation procedure.
-
4.
Construction of gene regulatory networks: usually, it is of interest to identify which pathways the identified genes are related to. Unfortunately, depending on the treatment conditions, cell lines or tissues, these pathways have not yet been studied or are not yet known. GEDI offers some approaches to infer regulatory networks, based on gene expression data, with no a priori additional biological information. The methods employed are Pearson and Spearman partial correlation analysis to infer instantaneous associations and advanced methods based on Granger causality [15], such as VAR (Vector Autoregressive) [16], DVAR (Dynamic Vector Autoregressive) [2] and SVAR (Sparse Vector Autoregressive) [3]. The VAR methods are of great interest because they allow infering Granger causalities from time-series gene expression data. DVAR may infer time (cell cycle)-varying connectivities, while SVAR may allow constructing large networks from only a few samples.
Figure 1 illustrates the GEDI interface. The user-friendly interface allows that, with a few clicks, the user may access any analitical method implemented in GEDI. The graphical user interface (GUI) is displayed using the Tcl/Tk library, opening interactive windows where it is possible to easily input the parameters required for each method.
The input data format is very simple and independent of the microarray platform, i.e., it should consist of text files organized in a matrix, where each column is one microarray and each row is represented by one gene. To facilitate for the user, the input files have the same format for all functionalities.
The outputs are composed by graphics and numerical results. The plots may be saved as vectorial postscript files, allowing zoom without losing resolution. The numerical results may be saved in a plain tab delimited text file, which may be viewed using any text editor.
-
Differentially expressed genes: Given the FDR-adjusted p-value threshold, GEDI provides an ordered list from the lowest to the highest level of significance (the most differentially expressed genes) adjusted by FDR [10].
-
Samples clustering and classification: Statistics for each kind of analysis is provided, such as the number of corrected classified samples after cross-validation.
-
Gene expression regulatory networks: GEDI plots graphs which represent the regulatory networks (Figure 3). Each node of the graph represents the gene, and the edges represent the Granger causalities (VAR, DVAR and SVAR) and correlations (Pearson and Spearman). It also plots the time-varying connectivity graphic, time × connectivity plot, to visualize how the connectivity changes with time in the DVAR method (Figure 4).
GEDI is very user-friendly, since all that is required is to upload GEDI in the R environment, leading it to automatically start running. Moreover, one may easily add new functionalities and extend GEDI.
As perspectives, we intend to continue the development of GEDI by incorporating new functionalities as soon as new algorithms and statistical methods are developed to analyze gene expression data, allowing and facilitating the access to advanced methods by biomedical researchers.