The prediction of discrete states or categories for any event or for any object requires a classification process. In order to be useful, many real-world applications rely on an optimized classification process. These include some important problems such as diagnosis of diseases, definition of medical treatments, economical and security risk assessment, weather forecast, air traffic regulation and quality control of industrial processes [1].

Classification tasks in bioinformatics are also common and can be found in many different and relevant applications, such as the prediction of genome and protein structure [2, 3], the prediction of the cellular location [4], the prediction of molecular function [5] and the prediction of molecular interactions [6]. In general, a classification process is always involved in the prediction of a pattern that can be related to some response in living systems.

A popular approach for assessing binary classifiers is analysis of their ROC curves on a set of representative data [7, 8]. A ROC curve corresponds to a bidimensional plot of the sensitivity versus 1-specificity for a given classifier with continuous or ordinal output score. Two main factors have to be considered by the user when estimating the ROC curves: **1) The design of the study**. Three types of dataset can be generated when pursuing a classification task: (i) paired data, where all classifiers are applied to each individual, (ii) unpaired data, where only one of the classifiers is applied to each individual, and (iii) partially-paired data [9], where the dataset is composed of both paired and unpaired data. In the case of paired and partially-paired datasets, correlation between ROC curves has to be taken into consideration. Our software is primarily designed for paired data. However, it can also analyze balanced unpaired data where the number of units is the same for each classifier. It cannot be used with partially-paired data. **2) The outcome distribution**. Depending on this factor, three types of approaches can be more or less appropriate for fitting ROC curves and estimating the corresponding AUCs: (i) A parametric approach [10, 11], where we assume a parametric distribution for the outcomes of the positive and negative individuals. (ii) A semiparametric approach, where we assume that discrete ordinal outcomes correspond to classification of an unobserved latent decision variable into ordinal categories defined by unknown cut-points or threshold values, or that continuous outcomes can be expressed as an unknown monotonic transformation of the latent distribution [12], with positive and negative individuals having different latent decision variables. In this case, parametric distributions (e.g. normal, logistic, log-normal) are assumed for the latent decision variables. (iii) A non-parametric approach [13, 14], where no distributional assumptions are made about the outcomes for the positive and negative individuals.

For each type of approach, different methods for estimating the AUC after the ROC curve is generated have been described [11, 15, 16]. The advantages and disadvantages of using one or another approach under different scenarios have been previously assessed [17, 18]. Among the advantages of using a parametric or semiparametric method are that these methods generate a smooth ROC curve, and the assumption of a distribution provides a natural means by which statistical inference such as hypothesis testing and confidence intervals can be achieved. When data deviate from the assumed distribution (e.g. normal or log-normal) or simply the outcome distribution for the positive or negative individuals is uncertain, non-parametric methods for estimating the ROC curve become a useful and robust alternative. Even though the ROC curve generated by non-parametric methods is jagged, that problem has been tackled in a non-parametric manner by means of kernel density estimation of the empirical distributions in a previous study [13].

Although many examples in the existing literature about the development of new classifiers describe the use of ROC curves and their corresponding AUCs to assess their performances, the statistical significance of their differences is often not reported. This is mostly due to the lack of freely available software that is easy to use or to automate for the pairwise comparison of many binary classifiers. Albeit there are several software for performing statistical ROC analysis [19], to the best of our knowledge, the only free and readily available software for statistical ROC analysis that assesses the significance of the difference of the AUC for a pair of classifiers is ROCKIT [20, 21]. This software uses maximum likelihood to fit a binormal ROC curve to the data and the statistical significance of the differences of a variety of indexes are assessed on the basis of a bivariate binormal model. In terms of usability, it has some drawbacks: 1) the input data format is rather cumbersome; 2) the output file contains many relevant data embedded in a human-readable text and thus needs to be parsed for further analysis; 3) the number of classifiers that can be simultaneously assessed is quite limited; 4) additional software is needed for plotting the ROC curves; 5) in case of errors, the program does not provide any feedback to the user about the causes of the abnormal interruption; and 6) it cannot be easily automated when a fast comparison of several classifiers is required.

In this work we describe new software that is freely available as a web server tool and also as a standalone application for the Linux operating system that allows the simultaneous pairwise comparison and statistical assessment of many binary classifiers. The approach chosen is the nonparametric method for comparing AUCs based on the Mann-Whitney U-statistic for comparing distributions of values from two samples [14]. It has been shown that the AUC calculated by the trapezoidal rule is equal to the Mann-Whitney U-statistic applied to the outcomes for the negative and positive individuals. Thus, two or more AUCs for paired data can be statistically compared by estimating the covariance matrix for the AUCs, based on the general theory of U-statistics, and then constructing a large-sample test in the usual way. The implementation of this method, not freely available until now, provides the advantages and robustness of using a nonparametric approach for estimating AUCs in the case of paired datasets, accounting for the inherent correlation of this type of data. One limitation of this method that may be considered, as stated by DeLong *et al*. [14], is that the trapezoidal rule underestimates the true AUC when the variables take a small number of discrete values. Among the main features of our software are: 1) it is based on a non-parametric approach for the analysis of AUCs [14], 2) it uses a simple input format, 3) it can plot multiple ROC curves simultaneously, 4) the output data is compact, simple and can be exported for further analysis with other statistical tools; and 5) it generates a human-readable report in PDF format, which is useful for a fast initial inspection of the results.

It is worth noting that a freely available computer program for Windows, though still in its beta version, is DBM MRMC 2.1 [22]. This software is an extended version of a previous package, LABMRMC [23–28], which allows users to compare AUCs using the jackknife method. DBM MRMC 2.1 provides, among other functionalities, statistical analysis of the AUC computed by the trapezoidal method, which is equivalent to the AUC computed with the Mann-Whitney U-statistic. The program uses ANOVA methods together with jackknifing [23, 25, 26] (instead of the Delong method used by our program) to assess the statistical significance of the observed difference between two classifiers. Even though this software provides a wide range of alternatives in ROC fitting, measurement of ROC indexes and assessment of statistical significance for those indexes, DBM MRMC is still in its beta version at this moment and has the same drawbacks in terms of usability found in ROCKIT (these drawbacks are also present in LABMRMC package), such as the input/output handling, lack of automated options for fast analysis of many classifiers and lack of straightforward plotting of results.