MEPHAS was implemented with various R packages (Additional file 1: AF_list.docx) for statistical operations and the package shiny for the web-based GUI. The MEPHAS web server was built on CentOS 7. The R package mephas was developed with R (version 3.6.1) and RStudio (version Version 1.2.5001).
Availability and usage
Users can use either the MEPHAS webserver or install R packages locally according to the following instructions.
The MEPAHS webserver can be accessed at https://alain003.phs.osaka-u.ac.jp/mephas/. It is accessible through most major web browsers with current versions installed in Windows, macOS (iOS), Android, or Linux. Users can use the webserver by referring to the help and tutorials.
The underlying functions of MEPHAS have been wrapped in R package mephas and mephas.tools for users to install locally. The R package mephas.tools contains the subsidiary functions for MEPHAS, while the R package mephas includes the services to activate the interfaces. Both packages are published on GitHub (https://mephas.github.io). The installation command is based on the R package remotes [11]. After installing remotes, users can download and install the latest version from GitHub by typing the following command in R console (Code 1).
Code 1 installation code of R package mephas and mephas.tools from GitHub
> > install.packages(“remotes”)
> > remotes::install_github(c(“mephas/mephas.tools”,“mephas/mephas”),upgrade = “never”)
Users need to load the functions in the package mephas.tools and mephas after installation and use function mephasOpen to activate the web-based interfaces (Code 2).
Code 2 function to open MEPHAS interface
> > library(“mephas”).
> > mephasOpen (method = “condist”).
The list of methods and contents of the functions in R package mephas can be found in the documentation (https://mephas.github.io/reference/index.html).
The GUI is activated on the users’ default browsers when they use the R console. If users run the codes in RStudio’s console, the GUI is shown in RStudio’s window. Users can minimize the console and use only the web-based GUI to input data, choose parameters, and generate results.
Design
To date, MEPHAS has supported four categories of statistical analysis (probability, hypothesis testing, regression model, and dimensional analysis) that were developed in 12 independent web-based interfaces (Additional file 1: AF_table1.docx). Every interface has a similar construction and design. There are tabs for different methods, an input panel on the left, and an output panel on the right (Fig. 1). The general functionalities include data input, parameter configuration, and result output.
Data input
MEPHAS comes with example data for interpreting purposes. Easy-to-use example data are available in each interface for users to use and practice. Some example data are available in the input box on the left-side panel for users to start with, overwrite, or modify. Some example datasets are embedded in the interface for users to choose. All the data to be used can be previewed in the Data Preview panel.
Except for choosing or overwriting the example data, MEPHAS provides a tab for “Upload CSV file” for users to upload their data from local devices. Users can upload a CSV file by clicking the “Browse … ➔Select data➔ Open” button.
Configuration
After input or upload the data, configurations are required for generating the statistical results. The configuration includes setting parameters and preprocessing the data. To facilitate the method for choosing the parameters in statistical analysis, MEPHAS takes advantage of various widgets for users to control the input. Checkbox gives a binary choice; the input box reminds users to input values; the radio buttons and select box help users to make decisions among different situations; the slider bar helps users choose values.
MEPHAS provides some functions to enable users to preprocess the data. For manually input data, users can alter the variable names when necessary. For regression models and dimensional analyses, users can transform the data, change the types of variables from numeric to categorical, and vice versa, and remove some possible outliers. Specifically, the factor level of a categorical variable is changeable for regression models.
Results output
MEPHAS provides statistical results via reactive statistical tables and plots. Most statistical results are shown in the responsive tables that are downloadable and enable users to page, filter, search, and sort the values. MEPHAS adopted interactive plots that show the real-time values on the plots and can be configured by users to download. MEPHAS also provides statistical 3D-dimensional visualization to support dimensional analyses.
Statistics
To help users find the proper statistical methods from the interfaces, MEPHAS provided a flowchart (https://alain003.phs.osaka-u.ac.jp/mephas/MEPHAS_flow.pdf).
Probability
Two interfaces, Continuous Probability Distribution and Discrete Probability Distribution, are designed to provide visualization of the commonly used statistical probability distributions; thus, complex theories are not presented. Users can view how the probability distribution curves and cumulative probability curves are altered when they resize the parameters. This allows them to understand the functionalities of the parameters. Particularly, MEPHAS provides a histogram from the simulated numbers to illustrate the process that simulated numbers approximate to the real distribution. The simulation numbers can be downloaded. Users can also upload new data and generate the distribution plots from new data. Some supplementary results, such as the mean, standard deviation, and the area to the left of the percentage points, are also presented.
Hypothesis testing
Hypothesis testing helps users quickly access and master their desired methods. This topic includes five interfaces: i.e., Parametric T Test for Means, Non-parametric Test for Medians, Test for Binomial Proportions, Test for Contingency Tables, and Analysis of Variance. Under each of these five methods, MEPHAS presents introductions on the methodology, case examples, and necessary explanations. By following the guidance and embedded example data, users can locate a suitable test and obtain the real-time outputs quickly.
Regression model
The interfaces for the regression model include Linear regression, Logistic regression, and Survival Analysis. These interfaces separate data preparation from model construction and provide a step-by-step guide in building the statistical models. The first (and compulsory) step is to choose the independent and dependent variables from the prepared dataset. The second step is to remove or keep the addictive terms. For example, whether to add interactive terms and a constant, whether to add the random effect term, such as cluster, strata, and frailty terms, in the Cox regression and AFT model. After the formula of the statistical model is constructed correctly, the third step is to check the formula and click the button to generate results. In the outputs, MEPHAS presented abundant results in parameter estimation and model evaluation. Furthermore, MEPHAS enables users to upload new datasets and derive prediction and evaluation results based on the existed model.
Dimensional analysis
The dimensional analysis includes two interfaces. One interface contains the principal components analysis and explanatory factor analysis; the other contains Principal Component Regression (PCR), PLS-R, and SPLS-R. These methods are often used to analyze high-dimensional data, such as gene expression and chemical data. PLS-R and the related method are widely used in QSAR analysis [7]. SPLS-R was developed to select variables as well as derive principal components with application in omics data [8]. Among these methods, the common problem is to decide the optimal number of components. Usually, users need to repetitively change the number of components and determine the optimal scenario according to the results. MEPHAS facilitates this optimization process by using real-time feedback. Users can adjust the parameters on the left-side panel and immediately obtain the results on the right (main)-side panel for decisions about the parameter settings. In detail, we provide all possible choices of algorithms for model fitting and validation to remind users of other possible methods for data analyses. To better visualize the results, MEPHAS provided reactive plots in 2D and 3D dimensions. For PCR, PLS-R, and SPLS-R, MEPHAS enables users to upload new data for prediction.