MBECS: Microbiome Batch Effects Correction Suite

Olbrich, Michael; Künstner, Axel; Busch, Hauke

doi:10.1186/s12859-023-05252-w

Software
Open access
Published: 03 May 2023

MBECS: Microbiome Batch Effects Correction Suite

BMC Bioinformatics volume 24, Article number: 182 (2023) Cite this article

2013 Accesses
2 Citations
10 Altmetric
Metrics details

Abstract

Despite the availability of batch effect correcting algorithms (BECA), no comprehensive tool that combines batch correction and evaluation of the results exists for microbiome datasets. This work outlines the Microbiome Batch Effects Correction Suite development that integrates several BECAs and evaluation metrics into a software package for the statistical computation framework R.

Introduction

The emergence of unwanted variation in next-generation sequencing applications is a well-researched challenge. A particular form of unwanted technical variation are batch effects (BE) that potentially result from any distinct grouping of samples during the processing steps. Hence, the introduced variability reflects the differences in, for example, the environmental conditions, batches of reagents, sequencing machines, or sample handling for corresponding batches [1, 2]. Consequently, unwanted variation can negatively affect the downstream statistical analyses as it represents a confounding factor that can obscure or exacerbate the biological truth in a dataset [3]. The comprehensive scientific research into causes and strategies for preventing and correcting batch effects indicates this topic's importance [4, 5]. While appropriate measures during the planning and execution of an experiment can limit the emergence and magnitude of batch effects, they are not entirely preventable and thus need to be accounted for before statistical analyses [6]. Despite the availability of batch effect correcting algorithms (BECA) and instructive guides on mitigating of BEs [4], no comprehensive tool that combines batch correction and evaluation of the results exists for microbiome datasets. This work introduces the Microbiome Batch Effects Correction Suite (MBECS), which integrates several established BECAs and evaluation metrics into a software package for the R statistical computation framework.

Features

The Microbiome Batch Effect Correction Suite is designed as a software toolbox that enables users to estimate the severity of batch effects, facilitates the utilization of different BECAs, and finally provides comparative metrics to evaluate the success of each method. To that end, the package offers a convenient 5-step workflow that produces a report to guide the user in selecting the optimal results for downstream analyses.

The software builds upon the phyloseq [7] package, which facilitates the intuitive import and export of existing microbiome datasets and enables the use of other count-based datasets. The packages' data object extends the phyloseq class with additional fields that store normalized and batch-corrected feature abundance tables. All operations are performed on this single data object that keeps track of the results, promoting tidy scripts and enabling MBECS comparative reporting.

The normalization methods implemented in MBECS are total-sum scaling (TSS) and centered log-ratio transformation (CLR) [8]. Available BECAs include established correction algorithms such as ComBat and Remove Batch Effects from the SVA package [9] and Remove Unwanted Variation 3 implemented in the RUV package [10]. Additionally, the package implements batch mean centering, Percentile Normalization, and Singular Value Decomposition as correction approaches [11].

Quantifying the variability in a dataset that can be attributed to batch effects is not trivial. A relative log expression (RLE) plot, for example, can indicate the presence of batch effects, yet it is not a suitable approach to determining whether or not they have been removed successfully by a correction algorithm [12]. Thus, the suite implements several distinct metrics to provide the user with comprehensive information to assess the severity of BEs before and after batch-correction procedures. Available methods include constructing linear models from recorded biological and batch factors to estimate the variability attributed to batch effects before and after the correction procedures. Further approaches implemented are partial redundancy analysis and principal variance components analysis [13, 14]. Finally, the silhouette coefficient is a qualitative measure of the goodness of fit of samples to their respective biological groupings [15].

The packages' native workflow depicted in Fig. 1 will create a preliminary report upon importing the dataset. This report summarizes the data concerning covariate information, distribution of samples into biological groups and known batches, heatmaps, and box plots of the most variable features concerning the batch factor and relative log-expression plots. The preliminary report also provides the metrics mentioned above to assess variability for the uncorrected data. The user can decide whether or not batch correction is required based on that account. The subsequent processing step allows the application of selected correction methods depending on the experimental design. Methods like RUV-3 specifically require technical replicates in different batches to work; Batch mean centering is only applicable to datasets that comprise two-factor biological groupings, i.e., case–control studies [10]. Therefore, it is up to the user which methods to use, and all the correction results are stored within the data object.

The third step constructs the post-correction report. This report provides comparative analyses between uncorrected data and all the employed correction algorithms. The user can use these to evaluate the correction algorithms in terms of reduced unwanted variability while preserving the biological variation that is investigated with the experimental design. An instructive manual for the package and examples of preliminary and post-corrections reports are available as supplemental material accompanying the online article (Additional file 1, Additional file 2, Additional file 3).

Implementation

The Microbiome Batch Effect Correction Suite is available as a software package for the R programming framework at Bioconductor. The latest development version can be obtained from the GitHub repository.

Availability and requirements

Project name: MBECS Microbiome Batch Effect Correction Suite
Project home page: http://www.bioconductor.org/packages/release/bioc/html/MBECS.html
Operating system(s): Platform independent
Programming language: R (> = 4.1)
Other requirements: CRAN and Bioconductor packages (methods, magrittr, phyloseq, limma, lme4, lmerTest, pheatmap, rmarkdown, cluster, dplyr, ggplot2, gridExtra, ruv, sva, tibble, tidyr, vegan, stats, utils, Matrix)
License: Artistic-2.0
Any restrictions to use by non-academics: None

Availability of data and materials

The source code is freely available under Artistic-2.0 license at https://github.com/rmolbrich/MBECS and at https://bioconductor.org/packages/release/bioc/html/MBECS.html. The packages vignette and examples utilize artificial mockup data to illustrate workflow and execution. The package vignette and two exemplary reports are available as supplementary data.

References

Chen C, et al. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS ONE. 2011;6:e17238.
Article CAS PubMed PubMed Central Google Scholar
Čuklina J, et al. Review of batch effects prevention, diagnostics, and correction approaches. In: Matthiesen R, editor., et al., Mass spectrometry data analysis in proteomics, methods in molecular biology. New York: Springer; 2020. p. 373–87.
Google Scholar
Goh WWB, et al. Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol. 2017;35:498–507.
Article CAS PubMed Google Scholar
Wang Y, LêCao KA. Managing batch effects in microbiome data. Brief Bioinform. 2020;21:1954–70.
Article PubMed Google Scholar
Scherer A, editor. Batch effects and noise in microarray experiments: sources and solutions. Chichester: Wiley; 2009.
Google Scholar
Zhou L, et al. Examining the practical limits of batch effect-correction algorithms: when should you care about batch effects? J Genet Genomics. 2019;46:433–43.
Article PubMed Google Scholar
McMurdie PJ, Holmes S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8:e61217.
Article CAS PubMed PubMed Central Google Scholar
Kucera M, Malmgren B. Logratio transformation of compositional data—a resolution of the constant sum constraint. Mar Micropaleontol. 1998;34:117–20.
Article Google Scholar
Leek JT, et al. The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–3.
Article CAS PubMed PubMed Central Google Scholar
Gagnon-Bartsch JA, Speed TP. Using control genes to correct for unwanted variation in microarray data. Biostatistics. 2012;13:539–52.
Article PubMed PubMed Central Google Scholar
Gibbons SM et al. Correcting for batch effects in case-control microbiome studies. 17
Gandolfo LC, Speed TP. RLE plots: Visualizing unwanted variation in high dimensional data. PLoS ONE. 2018;13:e0191629.
Article PubMed PubMed Central Google Scholar
Li J, et al. Principal variance components analysis: estimating batch effects in microarray gene expression data. In: Scherer A, editor., et al., Batch effects and noise in microarray experiments. Chichester: Wiley; 2009. p. 141–54.
Chapter Google Scholar
Liu Q. Variation partitioning by partial redundancy analysis (RDA). Environmetrics. 1997;8:75–85.
Article CAS Google Scholar
Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.
Article Google Scholar

Download references

Acknowledgements

None.

Funding

Open Access funding enabled and organized by Projekt DEAL. Hauke Busch acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany`s Excellence Strategy (EXC 22167-390884018).

Author information

Axel Künstner and Hauke Busch have contributed equally to this work

Authors and Affiliations

Lübeck Institute for Experimental Dermatology, University of Lübeck, Lübeck, Germany
Michael Olbrich, Axel Künstner & Hauke Busch
Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany
Michael Olbrich & Axel Künstner
Center for Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates
Michael Olbrich

Authors

Michael Olbrich
View author publications
You can also search for this author in PubMed Google Scholar
Axel Künstner
View author publications
You can also search for this author in PubMed Google Scholar
Hauke Busch
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, MO, AK, HB; Methodology, MO; Validation, MO; Formal analysis, MO; Investigation, MO; Resources, MO; Data curation, MO; Writing—original draft preparation, MO; Writing—review and editing, MO, AK, HB; Visualization, MO; Supervision, AK, HB; Project administration, AK, HB; Funding acquisition, HB. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Michael Olbrich or Hauke Busch.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Vignette.

Additional file 2.

Preliminary report.

Additional file 3.

Post-correction report.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Olbrich, M., Künstner, A. & Busch, H. MBECS: Microbiome Batch Effects Correction Suite. BMC Bioinformatics 24, 182 (2023). https://doi.org/10.1186/s12859-023-05252-w

Download citation

Received: 12 October 2022
Accepted: 20 March 2023
Published: 03 May 2023
DOI: https://doi.org/10.1186/s12859-023-05252-w

MBECS: Microbiome Batch Effects Correction Suite

Abstract

Introduction

Features

Implementation

Availability and requirements

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

MBECS: Microbiome Batch Effects Correction Suite

Abstract

Introduction

Features

Implementation

Availability and requirements

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Additional file 2.

Additional file 3.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us