MiBiOmics is implemented in R (Version 3.6.0) as a Shiny app providing an interactive interface to perform each step of a single- or multi-omics data analysis (Fig. 1). MiBiOmics is also accessible as a standalone application that can be easily installed via Conda (Version 4.6.12). The application is divided into five sections as described below:
Data upload
Within MiBiOmics, the user can upload up to three omics datasets, allowing the data exploration and network analysis of a single- or multi-omics dataset. There must be common samples between omics datasets in order to perform all analyses provided by the application. An annotation table describing external parameters (e.g. pH, site of extraction, physiological measures) needs to be provided. These parameters may be quantitative or qualitative, and available for each sample. An additional taxonomic annotations table can be uploaded when one omics table corresponds to microbial lineages [e.g. as Operational Taxonomic Units (OTUs) or Amplicon Sampling Variants (ASVs)].
Following data upload, the user can filter, normalize and transform each data matrix using common methods, such as the center log ratio (CLR) transformation to deal with the compositional nature of sequencing data, or filtration based on prevalence. In this section, it is also possible to detect and remove potential outlier samples.
To allow new users to easily test the functionality of MiBiOmics, we provide two example datasets: the breast TCGA datasets from The Cancer Genome Atlas [14] allows to explore associations between miRNAs, mRNAs and proteins in different breast cancer subtypes; and a dataset from the Tara Oceans Expeditions [15, 16] to explore prokaryotic community compositions across depth and geographic locations.
Data exploration
In this section, two ordination plots [Principal Component Analysis (PCA), Principal Coordinates Analysis (PCoA)] [17] are dynamically produced to visualize and explore relationships between samples, and to identify main axes of variation in each dataset. When OTUs or ASVs are uploaded with their taxonomic annotations, it is possible to obtain a relative abundance plot describing the proportion of lineages at a given taxonomic level (e.g. Phylum, Family, Genus or Species) in each sample.
Network inference
The network inference section allows to perform a Weighted Gene Correlation Network Analysis (WGCNA [18]). Help sections are available to assist the user with parametrization, notably for optimizing the scale-free topology of the network. Here, WGCNA networks can be inferred for each uploaded omics dataset. We strongly advise users to read the WGCNA original publication and associated tutorials for this step of the analysis.
Network exploration
The network exploration section allows to compute and explore significant associations between subnetworks or modules (e.g. of genes, transcripts, metabolites), and communities (of lineages) delineated from each omics layer, which contain highly correlated features. Each module is associated to all external parameters provided in the annotation table and correlations are visualized as a heatmap (Fig. 2a). Modules associated to parameters of interest can be further analyzed. The user can also identify which samples are contributing the most to the delineation of a specific module (Fig. 2b), a method provided by the WGCNA R package, which computes modules eigenvalues and allows to quantify the relative contribution of a given sample to the inference of a module. In case an OTUs/ASVs table is provided with taxonomic annotations, the relative abundance of lineages contributing to each module can be visualized as bar plots.
In addition, OPLS (Orthogonal Partial Least Square) regressions [19] can be performed using a selected module component as features in order to estimate its capacity to predict a given contextual parameter, and are useful to cross-validate a module-parameter association. The results of this analysis are represented as hive plots with two axes. On the x-axis, the module features are ordered according to their Variable Importance Projection (VIP) score (a measure of their weight in the OPLS regression), while on the y-axis they are ordered according to their correlations to an external parameter of interest (Fig. 2c).
Multi-omics analysis
Here, MiBiOmics allows users to detect and study associations across omics datasets. Multivariate statistical tools including Procrustes analysis [17] and multiple co-inertia [20] are useful to compute and visualize the main axes of covariance, to extract multi-omics features driving this covariance, and to assert how the distribution of multi-omics sets can be compared. This central section of MiBiOmics implements an innovative approach for detecting robust links between omics layers. Building upon the WGCNA pipeline we innovate here by providing an applied methodology to link groups of variables from different omics nature to external variables capturing a trait of interest. To do so, all modules delineated within each omics-specific network are associated to each other by directly correlating their eigenvectors. Here, the dimensionality reduction of each omics dataset through module definition ensures a small number of correlations, thereby increasing the statistical power for detecting significant associations between omics layers. For visualization, a hive plot helps summarizing significant associations between each module as a multilayer network integrating links between omics-specific modules as well as their association to contextual parameters (traits or phenotypic characteristics). In this hive plot, each axis represents the network of a given omics layer. Corresponding modules are ordered on the axes according to their association to a contextual parameter of interest selected by the user. Modules with no significant associations are not depicted. Significant associations between omics-specific modules are represented, and individual associations between modules can also be visualized as heatmaps and data frame. Conveniently, the user can also select modules of interest to investigate pairwise correlations between modules' features and delineate groups of modules associated together and to an external parameter of choice. Following the identification of multi-omics modules related to a parameter of interest, the user can further investigate the pairwise correlations between variables of both modules inferred from different omics layers through the bipartite network represented in Fig. 3c or with the correlation heatmap.
Herein, we developed and implemented a novel multi-omics integration tool called multi-WGCNA. By reducing the dimensionality of each omics dataset in order to increase statistical power, multi-WGCNA is able to efficiently detect robust associations across omics layers. In addition, these multi-omics associations are linked to external traits (categorical or continuous) into a network of features for extracting robust biomarkers. We also implemented new visualization graphics to represent these multi-omics associations, an important addition in our opinion since representing multilayer associations is often challenging. Importantly, all figures generated by the application (PCA, PCoA, relative abundance plots, WGCNA outputs, hive plots, multiple co-inertia, Procrustes plots, correlograms, bipartite networks) can be downloaded (as svg or pdf files), as well as network features as csv files (WGCNA modules information, eigenvalues and co-inertia drivers).