Data aggregation
Data aggregation is one of the core features of the DEvis package that makes it possible to examine and extract meaningful information from complex result sets. For example, analysis of time series data often requires the identification of differentially-expressed genes from each time point relative to a control sample, that may or may not also consist of time-controlled data. These types of experiments will produce a potentially dissimilar sets of differentially expressed genes for each time point contrast, meaning that direct comparisons between gene sets will not be immediately possible. DEvis makes it possible to combine results from multiple contrasts using union or intersection-based merging of data, which can be transformed and reshaped automatically to display different aspects of the data, such as sample similarity or group-wise expression changes, without relying on the user for data manipulation and reformatting. As a result, the researchers can efficiently investigate the similarities and differences from various combinations of the aggregated result sets and sensibly determine a master result set that can be filtered, subset, sorted, and visualized using other DEvis methods. For example, in a study with multiple time points for two experimental conditions, a researcher might be interested in identifying the differentially expressed genes unique to each condition regardless of the time point and then viewing changes in expression in both conditions for those genes across all time points. Researchers can visualize the levels of overlapping significant genes from aggregation of all time points for each condition and make informed decisions regarding the next steps. For instance, if there is little agreement in significant genes for a union-based aggregation of multiple time points for a single condition, a researcher could conclude that such a merging of data would likely introduce noise to the data set and consider an intersection-based approach or an aggregation of different combinations of result sets that would better characterize the condition under investigation. By providing easy and fast aggregation and an array of tools for generating and manipulating data, such as metadata generation, filtration, and subset functions, researchers have the ability to make educated decisions about their data analysis methods, the burden of data preparation and formatting is substantially reduced, and the potential for exploratory analysis is enhanced.
Visualization
DEvis incorporates multiple methods of visualizations, each featuring easily configurable parameters that provide direct control over the data being visualized. DEvis also utilizes sample specific metadata that allows cross-group and multi-group data visualization based on user-defined parameters, making it possible to observe the effect of combining experimental factors on transcriptomic differences. Each visualization offers configurable layout and color schemes, with many providing filtering, sorting, and subsetting parameters for displaying and retrieving data, that facilitate exploratory analysis, and manipulation of data within the context of a complex experiment.
Visualizations for different aspects of transcriptomic analysis, such as batch effect identification, overall expression pattern investigation, sorting and filtration of significant differentially expressed genes, and co-expression analysis, are unique features of our tool. Prior to the investigation of differentially-expressed genes, the overall data set can be examined using hierarchical clustering, heat maps, MDS plots, and box plots. Clustering-based distance plots and dendrograms displaying metadata about each sample make it possible to quickly examine relationship of samples to one another and to identify outlying samples and batch effects. Additional plots, such as dispersion plots and boxplots, depicting expression metrics with regard to group-wise metadata can be used to identify large-scale changes in expression between conditions or to examine the effects of normalization. Individual genes can be examined using box plots based on group-wise metadata, making it possible to explore and identify the factors responsible for expression differences in genes of interest by splitting data by factors of interest, such as time point or experimental treatment conditions. Multi-dimensional scaling plots that incorporate convex hulls and confidence interval information further allow for identification of multi-factor causes as the source of large-scale differences between data.
After differential expression contrasts are performed and a master data set consisting of all comparisons of interest is aggregated, many additional visualizations become available. Density plots that display p-values or log fold-changes for the aggregated result set can be used to examine the impact of data aggregation, highlighting similarity or differences in each individual contrast with respect to the differentially-expressed genes identified in the aggregated master data set. Summary plots for displaying differentially-expressed gene counts and their respective expression levels for each contrast, as well as plots that show group-wise changes in differential gene expression and volcano plots are available. Heat maps and expression profile plots provide visualizations of gene expression changes across contrasts and provide sorting and filtering options that make it possible to identify the most important genes of interest with minimal effort. Finally, series plots allow genes to be clustered based on similarity of expression across multiple contrasts, making it possible to visualize and extract groups of genes based on co-expression across multiple contrasts. Examples of some of these plots can be seen in Fig. 1.
Data organization
Several data organization functions are built into the DEvis package. A directory structure is created upon initialization, containing folders to house data files such as differentially-expressed gene lists, and visualized plots in either high resolution png or pdf format. Users can toggle whether data and plots should be automatically saved to the appropriate directory whenever a visualization is employed, standardizing project results and simplifying project management. This feature of DEvis simplifies the management of analysis results, standardizes the format and structure of results, and reduces the chance of human error, ensuring future reproducibility and providing a standard for result storage that can be easily navigated and understood long after completion of analysis.