Multi-omics experimental approaches are becoming common practice in biological and medical sciences underlining the need to design new integrative techniques and applications to enable the multi-scale characterization of biological systems. The integrative analysis of heterogeneous datasets generally allows to acquire additional insights and generate novel hypotheses about a given biological system. However, it can become challenging given the often-large size of omics datasets and the diversity of existing techniques. Moreover, visualization tools for interpretation are usually non-accessible to biologists without programming skills.
Here, we present MiBiOmics, a web-based and standalone application that facilitates multi-omics data visualization, exploration, integration, and analysis by providing easy access to dedicated and interactive protocols. It implements classical ordination techniques and the inference of omics-based (multilayer) networks to mine complex biological systems, and identify robust biomarkers linked to specific contextual parameters or biological states.
The multi-scale characterization of biological systems is extending our knowledge about the functioning of organisms and natural ecosystems. Today, their multi-omics characterization is becoming standard, thus novel methodologies and easily accessible tools are required to facilitate the study of associations and interactions within and across omics layers [e.g. (meta-)genome, (meta-) transcriptome, metabolome] and scales (e.g. cells, organs, holobionts, communities). The analysis of single omics datasets has helped to identify molecular signatures associated to phenotypes of interest . However, it usually does not allow to predict mechanisms underlying phenotypic variabilities . Although multi-omics information is not sufficient to identify causes and consequences of a biological process, it can contribute to delineate key players sustaining it . Indeed, exploring a biological system across several omics layers enable to capture additional sources of variability associated with a variation of interest and potentially to infer the sequence of events leading to a specific process or state . Within the last decade, multi-omics integrative approaches have been applied across various fields including microbial ecology , genetics  and personalized medicine . As of today, several integrative methods have been developed, but are often specific to a given experimental design, data type or a precise biological question . Indeed, tools such as MONGKIE , are based on prior knowledge and integrate data by projecting them on known metabolic networks and biological pathways. More generally, existing multi-omics pipelines are focusing on certain data types (Metabolomics with MetaboAnalyst ) or on disease-related mechanisms (MergeOmics ). More widely applicable methods exist, such as the R package mixOmics  that provides several semi-supervised methodologies often based on ordination techniques. Considering the multiplicity of existing techniques, the selection of an appropriate workflow is challenging for biologists, especially when it comes to the representation of several system-level omics layers and its interpretation. There is a clear need for accessible (web) tools to facilitate the integration, analysis and representation of multi-omics datasets through an intuitive and guided approach.
MiBiOmics aims to provide established and novel techniques to reveal robust signatures in high dimensional datasets  through a graphical user interface allowing to perform widely applicable multi-omics analyses for the detection and description of associations across omics layers. Available as a web-based and a stand-alone application, it gives access to several R packages and tools to help users who are not familiar with programming to load and explore their data in a simple and intuitive way. MiBiOmics allows the parallel study of up to three omics datasets, as well as the in-depth exploration of each single dataset. It also provides easy access to exploratory ordination techniques and to the inference of (multilayer) correlation networks enabling useful dimensionality reduction and association to contextual parameters. The user can then compare results from these different approaches and cross-validate multi-omics signatures to generate confident novel hypotheses.
MiBiOmics is implemented in R (Version 3.6.0) as a Shiny app providing an interactive interface to perform each step of a single- or multi-omics data analysis (Fig. 1). MiBiOmics is also accessible as a standalone application that can be easily installed via Conda (Version 4.6.12). The application is divided into five sections as described below:
Within MiBiOmics, the user can upload up to three omics datasets, allowing the data exploration and network analysis of a single- or multi-omics dataset. There must be common samples between omics datasets in order to perform all analyses provided by the application. An annotation table describing external parameters (e.g. pH, site of extraction, physiological measures) needs to be provided. These parameters may be quantitative or qualitative, and available for each sample. An additional taxonomic annotations table can be uploaded when one omics table corresponds to microbial lineages [e.g. as Operational Taxonomic Units (OTUs) or Amplicon Sampling Variants (ASVs)].
Following data upload, the user can filter, normalize and transform each data matrix using common methods, such as the center log ratio (CLR) transformation to deal with the compositional nature of sequencing data, or filtration based on prevalence. In this section, it is also possible to detect and remove potential outlier samples.
To allow new users to easily test the functionality of MiBiOmics, we provide two example datasets: the breast TCGA datasets from The Cancer Genome Atlas  allows to explore associations between miRNAs, mRNAs and proteins in different breast cancer subtypes; and a dataset from the Tara Oceans Expeditions [15, 16] to explore prokaryotic community compositions across depth and geographic locations.
In this section, two ordination plots [Principal Component Analysis (PCA), Principal Coordinates Analysis (PCoA)]  are dynamically produced to visualize and explore relationships between samples, and to identify main axes of variation in each dataset. When OTUs or ASVs are uploaded with their taxonomic annotations, it is possible to obtain a relative abundance plot describing the proportion of lineages at a given taxonomic level (e.g. Phylum, Family, Genus or Species) in each sample.
The network inference section allows to perform a Weighted Gene Correlation Network Analysis (WGCNA ). Help sections are available to assist the user with parametrization, notably for optimizing the scale-free topology of the network. Here, WGCNA networks can be inferred for each uploaded omics dataset. We strongly advise users to read the WGCNA original publication and associated tutorials for this step of the analysis.
The network exploration section allows to compute and explore significant associations between subnetworks or modules (e.g. of genes, transcripts, metabolites), and communities (of lineages) delineated from each omics layer, which contain highly correlated features. Each module is associated to all external parameters provided in the annotation table and correlations are visualized as a heatmap (Fig. 2a). Modules associated to parameters of interest can be further analyzed. The user can also identify which samples are contributing the most to the delineation of a specific module (Fig. 2b), a method provided by the WGCNA R package, which computes modules eigenvalues and allows to quantify the relative contribution of a given sample to the inference of a module. In case an OTUs/ASVs table is provided with taxonomic annotations, the relative abundance of lineages contributing to each module can be visualized as bar plots.
In addition, OPLS (Orthogonal Partial Least Square) regressions  can be performed using a selected module component as features in order to estimate its capacity to predict a given contextual parameter, and are useful to cross-validate a module-parameter association. The results of this analysis are represented as hive plots with two axes. On the x-axis, the module features are ordered according to their Variable Importance Projection (VIP) score (a measure of their weight in the OPLS regression), while on the y-axis they are ordered according to their correlations to an external parameter of interest (Fig. 2c).
Here, MiBiOmics allows users to detect and study associations across omics datasets. Multivariate statistical tools including Procrustes analysis  and multiple co-inertia  are useful to compute and visualize the main axes of covariance, to extract multi-omics features driving this covariance, and to assert how the distribution of multi-omics sets can be compared. This central section of MiBiOmics implements an innovative approach for detecting robust links between omics layers. Building upon the WGCNA pipeline we innovate here by providing an applied methodology to link groups of variables from different omics nature to external variables capturing a trait of interest. To do so, all modules delineated within each omics-specific network are associated to each other by directly correlating their eigenvectors. Here, the dimensionality reduction of each omics dataset through module definition ensures a small number of correlations, thereby increasing the statistical power for detecting significant associations between omics layers. For visualization, a hive plot helps summarizing significant associations between each module as a multilayer network integrating links between omics-specific modules as well as their association to contextual parameters (traits or phenotypic characteristics). In this hive plot, each axis represents the network of a given omics layer. Corresponding modules are ordered on the axes according to their association to a contextual parameter of interest selected by the user. Modules with no significant associations are not depicted. Significant associations between omics-specific modules are represented, and individual associations between modules can also be visualized as heatmaps and data frame. Conveniently, the user can also select modules of interest to investigate pairwise correlations between modules' features and delineate groups of modules associated together and to an external parameter of choice. Following the identification of multi-omics modules related to a parameter of interest, the user can further investigate the pairwise correlations between variables of both modules inferred from different omics layers through the bipartite network represented in Fig. 3c or with the correlation heatmap.
Herein, we developed and implemented a novel multi-omics integration tool called multi-WGCNA. By reducing the dimensionality of each omics dataset in order to increase statistical power, multi-WGCNA is able to efficiently detect robust associations across omics layers. In addition, these multi-omics associations are linked to external traits (categorical or continuous) into a network of features for extracting robust biomarkers. We also implemented new visualization graphics to represent these multi-omics associations, an important addition in our opinion since representing multilayer associations is often challenging. Importantly, all figures generated by the application (PCA, PCoA, relative abundance plots, WGCNA outputs, hive plots, multiple co-inertia, Procrustes plots, correlograms, bipartite networks) can be downloaded (as svg or pdf files), as well as network features as csv files (WGCNA modules information, eigenvalues and co-inertia drivers).
Results and Discussion
MiBiOmics enables the exploration, integration, analysis and visualization of up to three omics datasets. Through the primary exploration of a dataset, the inference of biological networks and the extraction of multi-omics associated features, the application provides a ready-to-use analysis pipeline to interactively explore sources of variability and variables of interest in a given biological dataset, as well as associations between multi-omics features in multi-scale studies.
The inference of networks from omics features is useful to represent and model the complex architecture of putative interactions in biological systems. In addition, networks provide a way to reduce the dimensionality of a dataset by delineating cohesive groups of co-varying, often functionally related features, that can then be associated to contextual or phenotypic characteristics of interest . A key functionality of MiBiOmics is the multi-omics adaptation of WGCNA  to explore association across omics datasets via a network-based approach. As shown in Fig. 2a, the interface provides the ability to interactively probe associations in each omics layers of different breast cancer subtypes  within each network and their association to patient parameters. We further used these associations to external parameters to infer relation across multi-omics modules. The original WGCNA outputs are provided by the application to deepen the analysis between modules and external parameters (Fig. 2b). In addition, we provide the user with the possibility to perform an OPLS regression for modules of interest to evaluate the robustness of these variables to predict a given trait or phenotype. Figure 2c is an example of an OPLS regression using WGCNA module variables as features. On the x-axis the features of the red module are ordered according to their VIP score (their importance for the module), and on the y-axis according to their correlation to the subtype parameter. This figure highlights how central features of a WGCNA module relate to an external parameter.
The exploratory multi-omics analysis allows to study the main axes of covariance across omics profiles and give the ability to discover and select variables implicated in an association between omics datasets. The concomitant application of (multiple) co-inertia (Fig. 3a) and/or Procrustes multivariate techniques with the exploration of multi-omics correlations between WGCNA modules of distinct omics layer (Fig. 3b), provides a complementary vision of multi-omics relationships. The MiBiOmics interface allows to explore WGCNA modules of interest to directly infer significant associations between features from distinct omics layers (Fig. 3c). In a multi-omics adaptation, WGCNA can be used to delineate a group of modules associated together and to a parameter of interest and extract features of different omics nature but related to each other. While an interactive version of WGCNA already exists , MiBiOmics goes beyond by providing a multi-omics strategy to identify correlated modules across omics layers and generate novel hypotheses. Associating modules across different datasets has already been performed in the original WGCNA article  and reproduced in several studies. For example, the overlap of modules between transcriptional profiles of different tissue  was assessed, as well as a comparison between proteomics and gene expression profile of modules in a cohort of Alzheimer patients . In both cases, the association between modules was determined by overlapping identical features (e.g. same genes in a given reference genome) within each module, a method which is not applicable when omics datasets do not contain similar data types or refer to the same biological system. In MiBiOmics, we enable the inference of relationships between omics layers within an entire biological system (e.g. holobiont) or ecosystem (e.g. the plankton), which makes it more widely applicable and especially suited for omics-based environmental studies.
We compared methods integrated in MiBiOmics (see Additional file 1 for details) to the mixOmics DIABLO methodology . Within MiBiOmics, the multiple co-inertia analysis and the multi-WGCNA procedures provide the user with two integrative and exploratory methods, which can be applied to any type of data, and associated to not only categorical traits, but also quantitative traits. To highlight the complementarity of our application with DIABLO, we performed an in-depth comparison of biomarkers extracted by each method when analyzing the TCGA dataset. Only few multi-omics features associated to breast cancer subtypes in the TCGA dataset were extracted by all three methods (n = 32, Fig. 4a). Both methods integrated in MiBiOmics (i.e. multiple co-inertia and multi-omics WGCNA) and DIABLO extracted mostly distinct features (Fig. 4a) underlining the probable complementarity of these multi-omics integrative strategies. Scores attributed by each method to the common set of extracted features were also dissimilar (Fig. 4b–d and Additional file 1: Table S1). This may be explained by the fact that these methods implement fundamentally different approaches to features extraction and selection, which confirms the complementary nature of each analysis. For comparing the predictive power of models integrating features extracted by each method, we performed Sparse Partial Least Square Discriminant Analysis (sPLS-DA) and computed the corresponding mean AUC scores (Fig. 4a and Additional file 1: Figure S1). All models can be considered to be highly predictive of the cancer subtype phenotypes, with the miBiOmics multi-omics WGCNA methodology obtaining the highest AUC score (AUC = 0.9945), while the multiple co-inertia analysis performed very well too (AUC = 0.9903). Features extracted by the DIABLO method from mixOmics resulted into a lowest score (AUC = 0.9808) but remained highly predictive. Generally, these methods may benefit from an enrichment method applied to the list of extracted drivers .
Through a gene-disease functional enrichment analysis (see Additional file 1), only the multi-WGCNA and multiple co-inertia methods were able to extract several biomarkers significantly associated to breast cancer while DIABLO found no mRNA related to breast cancer (Fig. 5a). In proportion, MiBiOmics tools extracted more mRNAs related to several stage of breast cancer development or tumor type (Fig. 5a). Some of these terms, such as the Carcinoma breast stage IV, were only retrieved by mRNAs extracted via multi-WGCNA. Also, the results obtained with the multiple co-inertia were more specific with close to 40% of mRNAs related to breast cancer (Fig. 5a). We performed a similar analysis on extracted miRNA features by retrieving their targeted genes. Similarly, for subsets of validated gene targets, we performed a functional enrichment analysis to find their association to diseases (Fig. 5b). Here, most breast cancer associated terms were found by all three methods. Notably, both multi-WGCNA and multiple co-inertia analyses were also able to highlight specific annotations related to male disposition in breast cancer or basal-like phenotype of breast tumor. The ratio of breast cancer related terms against other pathologies related terms was low for all methods but may be explained by the generally wide targeting nature of miRNAs. The functional enrichment analysis on extracted proteins by DIABLO (mixOmics), multi-WGCNA and multiple co-inertia analysis (MiBiOmics) (Fig. 5c) was also performed, and most of the breast related annotations were found by all three methods. However, DIABLO extracted several proteins associated to additional terms related to different stage of breast cancer evolution, while multi-WGCNA extracted the highest proportion of breast cancer related proteins compare to other pathologies.
This comparison of disease related annotations of extracted features showed the complementarity of the three methods. While the analyses extracted mostly different features related to tumor type, all of them were found highly predictive of the tumor subtype and were often associated to the same disease. Features extracted exclusively by one of the three method also participated in the enrichment of specific breast cancer stage annotations, and highlighted the potential of these methods in complementing each other in the analysis and characterization of multi-omics associations. We also evaluated the potential of each method to extract mRNAs, miRNAs, and proteins related to breast cancer annotations by computing the accuracy, recall, and F1-score for each method and each omics data type (See Additional file 1 and Additional file 1: Table S1). The performance of each method was found dependent of the nature of the data and both pipelines performed differently in terms of accuracy to extract features associated to breast cancer. Overall, the multi-WGCNA approach was found more accurate with regards to the mRNA features extraction associated to breast cancer, while DIABLO was found more accurate in extracting proteins associated to breast cancer (Additional file 1: Table S1).
Overall, MiBiOmics provides two complementary methods to extract associated variables between omics layers and in relationship with a trait of interest. Both multi-WGCNA and multiple co-inertia analyses highlighted specific protein biomarkers that were not identified by DIABLO. For example, both multi-WGCNA and multiple co-inertia analyses highlighted specific annotations related to male disposition in breast cancer or basal-like phenotype of breast tumor. A more specific example is the identification of the SYK protein only by the multi-WGCNA method. The SYK protein appears to have a dual role: depending on the alternative splicing of the mRNA it may act as a pro-oncogene or a tumor suppressor protein, and can interact differentially with its targeted genes . The mechanisms surrounding this dual role of the SYK protein are still being largely studied . Here, the multi-omics hive plots and bi-partite networks provided by MiBiOmics can be useful to generate new hypothesis on the associations and potential interactions between SYK and specific genes and miRNAs linked in the multi-omics network. While MiBiOmics may be useful to generate new hypotheses about molecular processes, it cannot infer causal mechanisms between omics features and phenotypes. This would require experimental validations, which can actually be guided by MiBiOmics results. To provide an exploratory and integrative framework for multi-omics studies, MiBiOmics distinguishes itself by providing a powerful dimensionality reduction and unsupervised method combining both ordination and graph-based techniques, which enables to study complex biological systems as a whole. Importantly, it also integrates contextual information by linking multi-omics signatures to qualitative and quantitative contextual parameters.
MiBiOmics is an interactive web-based (and standalone) application to easily and dynamically explore associations across omics datasets. Through an innovative network-based integrative strategy, it can help biologists to identify putative mechanisms of interactions and generate novel hypotheses. The core of the application lies behind the reduction of dimensionality across omics datasets to efficiently link them at the molecular level, and to identify biomarkers associated with a given trait or phenotype. The MiBiOmics pipeline facilitates the exploration, integration, and analysis of multi-omics datasets to a broad audience by providing scientists a powerful way to predict and explore putative molecular mechanisms underlying complex phenotypes across a wide range of biological scales and systems.
Heintz-Buschart A, May P, Laczny CC, Lebrun LA, Bellora C, Krishna A, et al. Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes. Nat Microbiol. 2016;2:1–12. https://doi.org/10.1038/nmicrobiol.2016.180.
Shu L, Zhao Y, Kurt Z, Byars SG, Tukiainen T, Kettunen J, et al. Mergeomics: Multidimensional data integration to identify pathogenic perturbations to biological systems. BMC Genomics. 2016;17:1–16. https://doi.org/10.1186/s12864-016-3198-9.
Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al. Ocean plankton. Structure and function of the global ocean microbiome. Science (New York, NY). 2015;348:1261359. https://doi.org/10.1126/science.1261359.
Xiao X, Moreno-Moral A, Rotival M, Bottolo L, Petretto E. Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules. PLoS Genet. 2014;10:e1004006.
Lamb DJ, Rust A, Rudisch A, Glüxam T, Harrer N, Machat H, et al. Inhibition of SYK kinase does not confer a pro-proliferative or pro-invasive phenotype in breast epithelium or breast cancer cells. Oncotarget. 2020;11:1257–72.
The authors wish to thank Mélanie Fouesnard, Damien Eveillard, Philippe Bordron, Gaëlle Boudry, Catherine Michel and Simon Beck for their useful feedback while testing and using the application. We also thank the bioinformatics core facility of Nantes (BiRD—Biogenouest) for providing computing resources and support.
This work has received financial support from the Region Pays de la Loire (MiBioGate 2016-11179 to M.N.), the French National Institute of Health and Medical Research, and the CNRS through the MITI interdisciplinary program Modélisation du Vivant [GOBITMAP to S.C.].
Authors and Affiliations
INSERM, TENS, Université de Nantes, Nantes, France
Johanna Zoppi & Michel Neunlist
CHU Nantes, Inserm, CNRS, SFR Santé, Inserm UMS016, CNRS UMS 3556, Université de Nantes, 44000, Nantes, France
CNRS UMR6004, LS2N, Université de Nantes, 44000, Nantes, France
Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France
JZ and SC participated equally in the design and development of MiBiOmics as a standalone web-application and writing the manuscript. MN contributed to the writing of the manuscript. JFG was responsible for the distribution and maintenance of MiBiOmics webserver at https://shiny-bird.univ-nantes.fr/app/Mibiomics. All authors read and approved the final version of the manuscript.
Supplementary material (methods, table S1, and figure S1).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Zoppi, J., Guillaume, JF., Neunlist, M. et al. MiBiOmics: an interactive web application for multi-omics data exploration and integration.
BMC Bioinformatics22, 6 (2021). https://doi.org/10.1186/s12859-020-03921-8