A reverse-engineering approach to dissect post-translational modulators of transcription factor’s activity from transcriptional data

Background Transcription factors (TFs) act downstream of the major signalling pathways functioning as master regulators of cell fate. Their activity is tightly regulated at the transcriptional, post-transcriptional and post-translational level. Proteins modifying TF activity are not easily identified by experimental high-throughput methods. Results We developed a computational strategy, called Differential Multi-Information (DMI), to infer post-translational modulators of a transcription factor from a compendium of gene expression profiles (GEPs). DMI is built on the hypothesis that the modulator of a TF (i.e. kinase/phosphatases), when expressed in the cell, will cause the TF target genes to be co-expressed. On the contrary, when the modulator is not expressed, the TF will be inactive resulting in a loss of co-regulation across its target genes. DMI detects the occurrence of changes in target gene co-regulation for each candidate modulator, using a measure called Multi-Information. We validated the DMI approach on a compendium of 5,372 GEPs showing its predictive ability in correctly identifying kinases regulating the activity of 14 different transcription factors. Conclusions DMI can be used in combination with experimental approaches as high-throughput screening to efficiently improve both pathway and target discovery. An on-line web-tool enabling the user to use DMI to identify post-transcriptional modulators of a transcription factor of interest che be found at http://dmi.tigem.it. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0700-3) contains supplementary material, which is available to authorized users.


Background
Modulation of transcriptional regulation in a cell can be exerted at many different levels, including transcription factor (TF) activation/deactivation by post-translation modifications (PTMs). PTMs involve amino-acid residues in a protein that are covalently modified "on the fly". Through this mechanism, a cell is able to tightly regulate protein function such as its activity, localisation and interaction with other molecules. Capturing this kind of regulatory interactions using only transcriptional data, such as gene expression profiles (GEPs), is considered challenging since GEPs are further downstream of the PTM event and only indirectly linked to it.
Post-translational modulations act as a trigger for many signalling network and thus their alterations are found in many pathologies. Hence, many efforts have been made in the reconstruction of phosphorylation networks from experimental data [1]. These studies have then led to the development of new computational methods to predict the substrate specificities of protein kinases [1][2][3][4][5][6][7]. Initially, computational approaches relied on protein sequences in order to identify the consensus motif recognized by the active site of kinase catalytic domain [2][3][4]. However, such motifs often lack sufficient information to uniquely identify their physiological substrates.
Recently, more sophisticated algorithms have been proposed: Linding et al. [6] developed a analysis pipeline (NetworKIN) to assign experimentally validated phosphorylation sites to specific kinases by combining consensus information from sequence motifs with protein interaction networks. NetworKIN is based on the availability of experimental biochemical data, thus limiting the general applicability of this approach; Wang et al. [5] proposed a reverse-engineering method based on an information-theoretic approach to infer new posttranslational modulators of the MYC transcription factor from gene expression profiles. The authors exploited changes in the transcriptional level of a kinase/phosphatase across a set of GEPs to infer the post-translational activation of MYC. Specifically, they developed a computational method (MINDy) based on the estimation of pair-wise Conditional Mutual Information between a TF and its target genes. MINDy detects whether changes in the expression level of a kinase affect the co-expression between a TF and one of its target genes. This method requires the TF and its target gene(s) to be co-expressed across a set of GEPs, at least when the TF is active. Some TFs, however, are not co-expressed with their target genes, thus limiting MINDy applicability.
Here, we developed and applied a new reverseengineering strategy called Differential Multi-Information (DMI or ΔI method) to infer post-translational modulators of a TF of interest. Our working hypothesis is the scenario depicted in Fig. 1a-b, in which a modulator (i.e. kinase/phosphatases) when expressed activates the TF. The TF, in turn, will induce concurrent expression changes in its target genes, hence these genes will be coexpressed among themselves (Fig. 1a). On the contrary, when the modulator is not expressed (or not functional), the TF will be inactive and thus not able to regulate its target genes; this will result in a loss of co-expression among target genes (Fig. 1b). DMI requires in input a subset of the TF's target genes and returns as output a ranked list of predicted modulators. Crucially, DMI does not take into account the TF expression levels, nor it requires the TF to be co-expressed with its target genes.
We applied DMI to an experimental dataset consisting of 5,372 GEPs [8] to identify kinases regulating 14 different transcription factors for which we were able to collect bona-fide transcriptional targets.
Our results demonstrate that DMI is able to detect post-translation modulators of TFs from GEPs, thus making it an ideal tool for both basic research and drug discovery.

DMI: a Differential Multi-Information approach to the identification of post-translational modula-tors
We developed an algorithm (DMI) to identify posttranslational modulators (i.e. kinases or phosphatases) of a Transcription Factor (TF) of interest from gene expression profiles (GEPs). DMI works by detecting changes in the co-regulation of the TF's target genes in the presence or absence of the modulator (Fig. 1a-b). To this end, we computed the Rényi Multi-Information measure (I) to estimate the target genes' co-regulation (G 1 … G d ) [9] (Material and Methods). Multi-information is a multidimensional extension of pair-wise Mutual Information, which quantifies the extent of statistical dependency across a set of d variables. A null value of multiinformation implies that the d variables are statistically independent, whereas positive values correspond to increasing degrees of dependency, i.e. co-regulation.
In order to compute changes in the Rényi Multi-Information I of a set of TF's target genes G 1 … G d in the presence or absence of a modulator M across a set of GEPs, we followed the procedure depicted in Fig. 1c: we first sorted GEPs according to the modulator M's expression; we then subdivided GEPs into three subsets each containing the same number of profiles. In the first subset ("Low"), the expression level of the modulator M will be lower than in the second subset ("Medium"), which in turn will be lower than in the third subset ("High"). Finally, we computed the Difference in Multi-Information (ΔI) between the High and Low subsets (Fig. 1c). ΔI quantifies how much the modulator M is able to influence the co-regulation of the TF's target genes. Positive values of ΔI imply that when the kinase is present, the TF's target genes are co-regulated and hence the kinase is able to activate the TF. On the contrary, negative values of ΔI indicate that the kinase is a negative modulator of TF activity. Since M is not known a-priori, ΔI is computed for each modulator M to be tested. The modulators are then ranked by ΔI and by pvalue, computed using a permutation test, as detailed in the Methods section. The full pipeline of the method is summarised in Additional file 1: Figure S1.

Validation of DMI "in silico"
We generated two datasets (D1 and D2) consisting of 100 simulated GEPs each. In one half of the GEPs, the TF target genes were co-expressed; in the other half they were assumed to be independent (Material and Methods). Dataset D1 consists of 60 genes: 10 genes were assumed to be the known targets of the TF; 50 genes were assumed to be the potential modulators M of the TF, but only 20 of them were the effective modulators. In addition, we assumed that 10 of the remaining 30 potential modulators were indeed unknown targets of the TF and hence co-regulated with the TF's target genes, thus making it harder for the methods to distinguish them from the effective modulators.
Dataset D2 consists of 760 genes. As for dataset D1, only 10 genes were assumed to be the known targets of the TF, whereas the remaining 750 genes were assumed to be potential modulators M of the TF, with only 50 of them being the effective modulators (Material and Methods).
The output of DMI is a list of all the possible modulators (50 in D1 and 750 in D2) ranked according to their differential multi-information, and associated to a p-value.
In order to estimate the performance of DMI, we computed the percentage of correct predictions at each position in the rank (also known as Positive Predictive Value-PPV) as PPV = TP/(TP + FP), where TP are the true positives and FP are the false positives. We also computed the fraction of the real modulators discovered at each position in the rank, (also known as Sensitivity) equal to TP/(TP + FN), where FN are the false negatives. A perfect performance would be a constant value of PPV equal to 1.
The results for the first dataset D1 are shown Additional file 1: Figure S2, as a PPV-Sensitivity curve, where the method achieves a perfect performance, i.e. PPV = 1 (Material and Methods).
The results for the dataset D2 are instead reported Additional file 1: Figure S3A, when using either two or three subsets when subdividing the GEPs according to the modulator expression level. Also in this case, the DMI method achieves the best performance ranking the 50 modulators in the top 50 positions.
In order to simulate a more "biologically realistic" scenario and to make it harder for the method to distinguish the modulators present in the dataset, we also generated 4 additional datasets with the same parameters as in D2 but with "noisy bins". Specifically, in these 4 datasets the number of GEPs in which the TF's targets are dependent, is either 30, 40, 60 or 70 out of 100 GEPs. Hence, for example, consider the dataset where the targets are dependent in 70 out of 100 GEPs. In this case, when the dataset is discretized in 2 bins with equal number of samples according to the expression of the modulator, the first bin (i.e. low expression of the modulator) should contain only GEPs in which the TF target genes are not co-expressed. However, since this bin will contain 50 GEPs, only in 30 out 50 GEPs the targets will not be co-expressed, thus adding "noise" to the bin.
The PPV-sensitivity curves for these 4 dataset are reported in Additional file 1: Figure S3B-E respectively. In all of the cases tested, DMI performed significantly better than random.
Finally, we also compared Multi-Information measure against other two method used to estimate the dependency among multidimensional variables. As reported in the section "Additional analysis" of supplementary data Multi-Information performed better than those based on pair-wise measures.

Validation of DMI in human gene expression profiles to identify modulators of transcription factors
DMI requires in input a list of target genes G for a TF of interest, a set of Gene Expression Profiles (GEPs) and a list of potential modulators M to test (Additional file 1: Figure S1). Therefore, in order to evaluate the performance of DMI when applied to real experimental data, we first collected bona-fide transcriptional targets from Chromatin ImmunoPrecipitation (ChIP) [10] and binding motifs data [11] for Transcription Factors (TFs) whose activity is regulated by a set of well-characterized kinases. We thus selected 14 TFs for which high quality information was available (Additional file 1: Table S3 and Material and Methods).
We then selected a compendium of 5,372 high quality human GEPs representing 369 different cells and tissue types, disease states and cell lines [8] (Material and Methods). To generate the list of potential modulators to test, we selected all of the 481 genes associated to a Gene Ontology (GO) molecular function term equal to "protein kinase activity" [12]. However, 190 out of 481 kinases had to be filtered out because their expression level was not changing sufficiently in the gene expression compendium, thus leaving a total of 291 kinases as potential modulators (Material and Methods).
We then applied the DMI method to the compendium of 5,372 GEPs for each of the 14 TFs. We thus obtained, for each TF, a list of the 291 kinases ranked according to their differential Multi-Information and with an associated p-value (Material and Methods).
In order to assess the predictive ability of DMI, we collected the known kinases modulating the activity of each of the 14 TFs from PhosphoPOINT [13], Networ-KIN [6] and CEASAR [7]. We thus obtained a "Golden Standard" for each TF consisting of experimentally verified kinases (Material and Methods). We estimated the performance of DMI by computing the both the overall PPV-Sensitivity and receiver operating characteristic (ROC) curves across the 14 TFs ( Fig. 2a-b) and both the individual PPV-Sensitivity and ROC curves for each of the TFs (Fig. 2c-d and Additional file 1: Figure S4). We also reported the expected performance when ranking the 291 modulators randomly (dashed line in Fig. 2). It can be appreciated that the DMI performance is about ten-fold better that the random performance.
These results show that top-ranking kinases according to DMI are those that have a high probability of being the modulators for most of the TFs tested.
We also predicted for each transcription factor the kinase family regulating it, as well as the most likely signalling pathway controlling the TF activity. To this end, we detected whether members of a specific family of kinases or signalling proteins were statistically enriched at the top of the ranked modulators' list as reported in Table 1 and Additional file 1: Table S1 (Material and Methods).

Comparison with MINDy
We compared the performance of DMI with MINDy [5] (Material and Methods), a state-of-the-art computational method for the identification of post-translational modulators from gene expression profiles. MINDy is based on a pair-wise computation of Mutual Information between the TF and each of its target genes, whereas DMI is based on an ensemble estimation of the Multi-Information across all of the target genes, without the need to assume that the TF is co-expressed with its target genes.
We used MINDy to predict from the list of 291 kinases, the modulators of the 14 TFs. The PPV curve computed from MINDy predictions was compared to the one obtained by applying the DMI method (Fig. 3). Both methods performed better than random, but DMI has clearly an improved performance. It has to be taken into account, however, that DMI requires the knowledge of the TF target-genes, whereas MINDy automatically predicts, given the TF, its target genes, as well as, the post-translational modifiers of the TF activity. Hence, MINDy uses much less information than DMI, therefore a lower performance is to be expected. Additional file 1: Figure S5 shows the PPV curve for MINDy when forcing MINDy to use only the collected bona-fide targets for each one of the 14 TFs.
The results of the comparison show that the two approaches are complementary, in that if the targets of the TF are known, DMI offers a better predictive ability than MINDy; on the other hand if the targets are unknown, DMI cannot be applied, whereas MINDy is generally applicable.

Discussion and conclusions
DMI is based on the assumption that when a posttranslational modulator activates a transcription factor, its target genes will be co-expressed, and hence coregulated (the opposite will happen if the modulator deactivates the TF). We further assume that the level of expression of the modulator is a good proxy for its activity in the cell. It is important to underline that our working hypothesis does not rely on changes in the TF expression level, nor of its target genes but rather on changes in their co-regulation. This is relevant, since we have previously shown that changes in the co-regulation of metabolic pathway enzymes are predictive of their tissue activity even when their expression levels are low and do not change significantly across tissues [14].
DMI relies on the estimation of the Renyi Multi-Information of the TF's direct target genes in a subset of GEPs as a measure of the degree of their co-regulation. Unlike other common techniques that measure pair-wise co-regulation between genes, such as correlation and mutual information, Multi-Information can estimate coregulation among all of the target genes at once. This property makes Multi-Information more robust than pair-wise approaches, thus reducing the number of false positives.
Our strategy, differently from the others proposed in the literature, does not require the transcription factor and its target genes to be co-expressed, thus making the approach more generic, albeit requiring the TF's target genes to be known. We also performed additional analyses supporting our working hypothesis showing that in presence of a post-translational modulator of a TF, the TF itself does not necessarily change its expression level nor it correlates with its target genes (Additional file 1: Supplementary Data, sections 1.2 and 1.3 of "Additional analysis" and Additional file 1: Figures S6-S8).
We first showed that DMI is able to correctly identify post-translational modulators of 14 transcription factors including P53, MYC and members of STAT and SMAD families from a compendium of 5,372 GEPs [8].
One of the limitations of our approach is the assumption that the expression level of the modulator (e.g. a kinase or phosphatase) is a good proxy of its enzymatic activity, which may not always be the case. Moreover, we require that expression level of the modulator across the compendium of GEPs changes at least one-fold, otherwise no ; c PPV-sensitivity curve for each one of the 14 transcription factor in which we compared the performance of DMI with and without applying a significance threshold for the p-value (P < 0.05) after Benjamini-Hochberg correction; d ROC curve for each one of the 14 transcription factor in which we compared the performance of DMI applying a significance threshold for the p-value (P < 0.05) after Benjamini-Hochberg correction significant prediction can be made. A further limitation is that DMI needs in input a subset of the TF's target-genes. Despite these limitations, DMI can be effectively used for the identification of post-translational regulatory interactions affecting the activity of a transcription factor in an efficient and cost-effective manner, thus filling the gap between transcriptional networks, identified by classic reverse-engineering approaches, and signalling networks identified by ad-hoc experimental approaches.

Estimation of the Rényi Multi-Information
The Rényi Multi-Information (RMI) can be used to estimate the statistical dependency among d real-valued random variables X = (X 1 , X 2 , …, X d ) with joint probability density function f : ℝ d → ℝ and marginal densities f i : ℝ → ℝ, 1 ≤ i ≤ d [9,15]. For α ≠ 1, RMI is defined for any real parameter α, assuming the underlying integrals exist, as: When α = 1, I a (X) is defined in the limit I 1 = log a → 1 I α . Indeed, the classical multi-information across d variables is just a special case of RMI with α = 1. In what follows, we set α = 0.99.
As reported in [15] the RMI among the d real-valued random variables X = X 1 , X 2 , …, X d from a sample of i.i.d. random variables X 1 : n = X 1 , X 2 , …, X n , we adapted an algorithm based on the generalized nearest-neighbor graph with the copula transformation. First of all, the algorithm estimates the entropy H α (f ) for α ∈ (0, 1) as follows: where L p (⋅) equals to the sum of the p-th power of Euclidian distance of the nodes in the nearest-neighbor graph NN S (⋅) for some finite non-empty S ⊂ ℕ + ; γ is a numeric constant dependent on d, p and S that can be  estimated empirically from a large sample (n ≫ 1) [15]. Finally, the Rényi Multi-Information I α of the d variables X = X 1 , X 2 , …, X d from a sample of i.i.d random variables X 1 : n = (X 1 … X n ) can be computed as [15],: Where Ĥ α is defined as before and the samplê Z 1 ; ;Ẑ 2 ; …; ;Ẑ n À Á ¼F X 1 ð Þ;F X 2 ð Þ; …;F X n ð Þ À Á .F ⋅ ð Þ is called empirical copula transformation [16], where the jth coordinate of Ẑ i equals: where rank (x, A) is the number of elements of A less than or equal to x. The computational complexity T(n) for the estimation of Rényi Multi-Information strongly depends of the complexity of the K nearest-neighbors algorithm, which is linear in the number of points and the number of features for each point, and the complexity of copula transformation, which is quadratic in the number of points. Specifically, the computational complexity for the estimation of Rényi Multi-Information is: where n is the number of i.i.d. samples used for its estimation (in this setting it represents the number of gene expression profiles) and d is the number of features of each i.i.d sample (i.e. number of target genes).

Convergence of Rényi Multi-Information estimator (Î α )
We tested the convergence of the estimation algorithm to the true value of the Rényi Multi-Information numerically by generating simulated dataset of 4000 i.i.d. samples each, sampled from a multivariate Gaussian distribution of dimension d = 3, 10 or 20 with zero mean and an identity covariance matrix, corresponding either to independent variables (i.e. the true value is I = 0), or to a randomly chosen symmetric covariance matrix, corresponding to dependent variables (i.e. with an I > 0). The estimation of Î α = 0.99 among d = 3 variables and its error are shown in Additional file 1: Figure S9 in the case of dependent variables (i.e. I > 0), and in Additional file 1: Figure S10 in the case of independent variables (i.e. I = 0). Additional file 1: Figure S11 reports the estimation of Î α = 0.99 among 10 and 20 variables in both cases of dependent and independent variables. We then repeated the same analysis as above, but this time generating simulated dataset of 4000 i.i.d. values sampled from a multivariate Beta distribution (rather than a Gaussian as before) of dimension 10 and 20 with alpha and beta parameter randomly selected from the standard uniform distribution in the open interval [0,1].
Additional file 1: Figure S12 shows the estimation of Î α = 0.99 among 10 and 20 variables in the case of dependent and independent variables. The Gaussian Copula transformation was used to build these distributions. For more details and the closed-form expression of the true divergence with Beta distribution please refer to [17] (lemma 14).

Differential Multi-Information method (DMI)
The DMI method is based on quantifying the change in co-regulation among a set of downstream targets G 1 … G d of a TF in the presence or absence of a modulator M, by estimating the difference in Renyi Multi-Information between two subsets of GEPs. These subsets are obtained by first sorting GEPs according to the expression of the modulator M being tested and then dividing the ranked list of GEPs into two (or more) subsets. A prefiltering step is applied to remove those modulator genes (M) whose expression does not change significantly between the "high" subset (i.e. where M is highly expressed) and the "low" subset (i.e. M is expressed at low levels). Specifically, we excluded from the analysis those modulators whose average expression in the "high" subset divided but their average expression in the "low" subset (i.e. the fold change) is less than one.

Computation of the Significance of ΔI using permutation tests
We used a permutation test in order to estimate the empirical distribution of ΔI and, from that, the associated p-value. Specifically, given a set of d target genes (i.e. variables), we computed the significance of a modulator M by randomly selecting d genes in L = 10,000 number of trials, and each time computing the ΔI value thus obtaining its empirical distribution. The p-value was finally estimated as the percentage of random trials with a value of ΔI greater than the measured one.

Construction of the "in silico" dataset D1 and D2
In order to construct the in silico datasets D1 (and similarly for D2) we simulated two sets of gene expression profiles. One set (co-regulated set) was obtained by sampling from a multivariate Gaussian distribution with zero mean and a covariance matrix whose elements were equal to ρσ ij 2 , where ρ = 0.6 and σ ij 2 randomly chosen in the interval ]0, 0.5[. The second set (independent set) was obtained by changing the covariance matrix to a diagonal matrix with σ ii 2 randomly chosen in the interval ]0, 0.5[. The expression profiles of the potential modulators (i.e. modulators that do not regulate the TF) were generated using a Gaussian distribution with zero mean and variance σ 2 in the interval ]0, 0.5[. Finally, in order to simulate the expression profiles of the effective modulators we followed this strategy: in the co-regulated subset, we sampled from a Gaussian distribution N (1,0.1) (i.e. with average expression equal to 1), on the contrary in the independent subset, we used a normal distribution N (1,0.1) (i.e. with average expression equal to 0).

Gene expression profile compendium and kinase selection
We applied DMI to a compendium of 5,372 high quality human GEPs representing 369 different cell and tissue types, disease states and cell lines, described in [8]. GEPs were measured using the Affymetrix HG-U133A platform. We normalized this dataset using the Robust Multi-array Average (RMA) normalization as implemented in the R package Bioconductor [18] and using the custom CDF files present on BrainArray [19], thus obtaining a gene-wise normalised dataset.
The list of 481 kinases to test as possible modulators for a given transcription factor was obtained by collecting all the genes with an associated Gene Ontology (GO) molecular function term equal to "protein kinase activity". Only 291 out 481 kinases were used in further analyses, because only 291 out of 481 kinases had a fold change greater than one in the GEP compendium.

Gene Set Enrichment Analysis for the prediction of kinases' family and signalling pathways
Gene Set Enrichment Analysis (GSEA) [20] was applied to the ranked list produced by DMI to identify kinase subfamilies and signalling pathways regulating the TF activity. We downloaded the information regarding the kinase subfamilies from a recent published study collecting a total of 40 distinct subfamily [21] (Additional file 1: Table S2). In order to apply the GSEA, we used only subfamilies with more than one member. We also collected 22 signalling pathways from MSigDb (the curated dataset CP:KEGG) [11] (Additional file 1: Table S3).

Comparison with MINDy
MINDy is computationally intensive and requires a large amount of memory due to large number of samples in our GEP compendium [5]. Thus, before running MINDy we had to reduce the number of samples. To this end, we built two dataset containing the 10 % and the 50 % randomly selected samples from the compendium of 5,372 GEPs. We run MINDy using the default parameters. In the first step MINDy computes Mutual Information (MI) between a modulator and transcription factor (TF) to test the statistical independence between them. Once statistical independence between modulator and TF pair is established, MINDy ranks all samples from low to high expression of that modulator and selects 35 % of samples from each tail (low and high expression samples). In each tail, MINDy computes the mutualinformation (MI) between the TF and all of its candidate target genes (MI low and MI high ) and it assesses the statistical significance of both MI values. If at least one of the two MIs is significant then MINDy calculates ΔMI, defined as ΔMI = MI high −MI low .
We assessed the statistical significance of ΔMI using a null model that is generated by randomising the data [5]. A TF-target pair is considered to be modulated by that modulator if the (corrected) p-value of ΔMI is < =0.05. Finally MINDy summarizes the result for each modulator pair by counting the number of target genes by that pair. Further details can be found in the original publication describing MINDy [5].

Estimation of the cumulative PPV-Sensitivity and ROC curves
For the estimation of the composite PPV-Sensitivity (or Precision-Recall) curve across the 14 transcription factors the tecnique of the 11-point interpolated average precision [22] was used. Basically, for each transcription factor the interpolated PPV is measured at the 11 sensitivity levels of 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0. By definition the interpolated PPV p interp at a certain sensitivity level r is defined as the highest PPV found for any sensitivity level r ' ≥ r: p interp (r) = max r ' ≥ r (r ') [22]. To notice, that with this definition, the interpolated PPV at a sensitivity of 0 is always defined as 1. Finally, the composite PPV-Sensitivity curve among the 14 TFs was estimated as the arithmetic mean across the 11 sensitivity levels of the interpolated PPV of each transcription factor.
For the estimation of the composite Receiver Operator Characteristic (ROC) curve the tecnique of the vertical averaging [23] was instead used. The vertical averaging consits in taking vertical samples of the ROC curves for fixed true negative rates (TNR) and averages the corresponding values of sensitivity. Specifically, 11 TNR levels of 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 were used for the estimation of the composite ROC curve among the 14 TFs. Obviously one of more of the 11 TNR levels may be absent in some of the ROC cuvers we are vertical averaging, in these cases the corrispondig value of sensitivity to average has been simply estimated by interpolation using its next and prcedent value in the considered ROC curve.