PCTFPeval: a web tool for benchmarking newly developed algorithms for predicting cooperative transcription factor pairs in yeast
© Lai et al. 2015
Published: 9 December 2015
Computational identification of cooperative transcription factor (TF) pairs helps understand the combinatorial regulation of gene expression in eukaryotic cells. Many advanced algorithms have been proposed to predict cooperative TF pairs in yeast. However, it is still difficult to conduct a comprehensive and objective performance comparison of different algorithms because of lacking sufficient performance indices and adequate overall performance scores. To solve this problem, in our previous study (published in BMC Systems Biology 2014), we adopted/proposed eight performance indices and designed two overall performance scores to compare the performance of 14 existing algorithms for predicting cooperative TF pairs in yeast. Most importantly, our performance comparison framework can be applied to comprehensively and objectively evaluate the performance of a newly developed algorithm. However, to use our framework, researchers have to put a lot of effort to construct it first. To save researchers time and effort, here we develop a web tool to implement our performance comparison framework, featuring fast data processing, a comprehensive performance comparison and an easy-to-use web interface.
The developed tool is called PCTFPeval (Predicted Cooperative TF Pair evaluator), written in PHP and Python programming languages. The friendly web interface allows users to input a list of predicted cooperative TF pairs from their algorithm and select (i) the compared algorithms among the 15 existing algorithms, (ii) the performance indices among the eight existing indices, and (iii) the overall performance scores from two possible choices. The comprehensive performance comparison results are then generated in tens of seconds and shown as both bar charts and tables. The original comparison results of each compared algorithm and each selected performance index can be downloaded as text files for further analyses.
Allowing users to select eight existing performance indices and 15 existing algorithms for comparison, our web tool benefits researchers who are eager to comprehensively and objectively evaluate the performance of their newly developed algorithm. Thus, our tool greatly expedites the progress in the research of computational identification of cooperative TF pairs.
Understanding combinatorial or cooperative transcriptional regulation by two or more transcription factors (TFs) has become an important research topic in the recent decade. Researchers have studied and modelled various types of TF-TF interactions which contribute to positive or negative synergy in regulating genes [1–3]. Attributing to the availability of various kinds of genome-wide datasets (e.g. gene expression data, ChIP-chip data, TF binding site motifs, protein-protein interaction data and TF knockout data), researchers continued developing advanced algorithms to predict cooperative TF pairs. Some algorithms only utilized ChIP-chip data [3–6] or gene expression data , and the others integrated multiple data sources [8–17].
The numbers of the compared algorithms, the performance indices, and the predicted cooperative TF pairs (PCTFPs) for each of the 15 existing algorithms.
# of existing algorithms used for performance comparison in their paper
# of indices used for performance evaluation in their paper
# of PCTFPs
Banerjee and Zhang
(NAR 2003) 
Harbison et al.
(Nature 2004) 
Nagamine et al.
(NAR 2005) 
Tsai et al.
(PNAS 2005) 
Chang et al.
(Bioinformatics 2006) 
He et al.
(IEEE GCCW 2006) 
Yu et al.
(NAR 2006) 
(JBI 2007) 
Elati et al.
(Bioinformatics 2007) 
Datta and Zhao
(Bioinformatics 2008) 
Chuang et al.
(BMC Bioinformatics 2009) 
Wang Y et al.
(NAR 2009) 
Yang et al.
(Cell Research 2010) 
Chen et al.
(Bioinformatics 2012) 
Lai et al.
(BMC Systems Biology 2014) 
To meet this need, in our previous study , we proposed/adopted eight performance indices to compare the performance of 14 existing algorithms. Our results showed that the performance of an algorithm varies widely across different performance indices, implying that researchers may make a biased conclusion based on only a few performance indices. Therefore, in order to conduct a comprehensive and objective performance comparison, we designed two overall performance scores to summarize the comparison results of the eight performance indices.
Most importantly, our performance comparison framework can be applied to comprehensively and objectively evaluate the performance of a newly developed algorithm. Therefore, researchers who develop a new algorithm definitely would like to use our performance comparison framework to quickly evaluate the prediction performance in order for improvement when needed. However, to use our framework, researchers have to put a lot of effort to construct it first. Constructing our framework involves collecting and processing multiple genome-wide datasets from the public domain, collecting the lists of the predicted cooperative TF pairs from 15 existing algorithms in the literature, and writing a lot of codes to implement the eight performance indices. To save researchers time and effort, here we develop a web tool called PCTFPeval (Predicted Cooperative TF Pair evaluator) to implement our performance comparison framework, featuring fast data processing, a comprehensive performance comparison and an easy-to-use web interface. Constructing PCTFPeval is not a daunting task for us since we already have many experiences in developing databases and web tools [20–26].
Fifteen existing algorithms used for performance comparison
Our tool provides 15 existing algorithms for users to conduct a performance comparison. As far as we know, this is the most comprehensive collection of the existing algorithms whose lists of the predicted cooperative TF pairs in yeast are available. The numbers of the predicted cooperative TF pairs from different algorithms vary widely, ranging from 13 to 300 (see Table 1).
Eight existing performance indices used for performance evaluation
The eight performance indices implemented in our tool
Performance index type
Data sources used
Yeast physical PPI data from BioGRID database 
Measure the overlap significance of the physical PPI partners of a PCTFP*
Yeast physical PPI data from BioGRID database 
Measure the shortest path length of a PCTFP in the physical PPI network
Yang et al.'s functional similarity scores of any two yeast genes 
Measure the functional similarity of a PCTFP
Yang et al.'s high-quality benchmark set of 27 cooperative TF pairs in yeast 
Measure the overlap significance of the list of PCTFPs from an algorithm and the benchmark set of 27 cooperative TF pairs
Balaji et al.'s co-regulatory coefficient dataset of 3459 TF pairs in yeast 
Measure the co-regulatory coefficient of a PCTFP
Measure the expression coherence of a PCTFP's common target genes
Measure the functional coherence of a PCTFP's common target genes
Measure the physical PPI coherence of a PCTFP's common target genes
Two existing overall performance scores used for representing the comprehensive performance comparison results
Our tool implements two existing overall performance scores  to summarize the comparison results of the selected performance indices. The first one is called the comprehensive ranking score defined as the sum of the rankings in the selected performance indices . The ranking of an algorithm in an index is k if its performance ranks #k among all the compared algorithms in that index. For example, the ranking of the best performing algorithm is 1. Therefore, the smaller the comprehensive ranking score, the better the overall performance of an algorithm.
where and is the normalized score and the original score of the algorithm i calculated using the index j, respectively; n is the number of the algorithms being compared; L is the number of the selected indices. Note that and if and only if the algorithm i is the best performing algorithm in the index j (i.e. it has the highest original score calculated using the index j). The larger the CNS, the better the performance of an algorithm.
Results and discussion
Knowing the cooperative TFs is crucial for understanding the combinatorial regulation of gene expression in eukaryotic cells. This is why the computational identification of cooperative TF pairs has become a hot research topic. Researchers will keep developing new algorithms. Using our tool, researchers can quickly conduct a comprehensive and objective performance comparison of their new algorithm to the various existing algorithms. If the performance of their new algorithm is not satisfactory, researchers can modify their algorithm and use our tool again to see if the performance is improved. Therefore, having our tool in hand, researchers can now totally focus on designing new algorithms and need not worry about how to comprehensively and objectively evaluate the performance of their new algorithms. In conclusion, our tool can greatly expedite the progress in this research topic.
Availability and requirements
Project name: PCTFPeval
Project home page: http://cosbi.ee.ncku.edu.tw/PCTFPeval/
Operating system(s): platform independent.
Other requirements: Internet connection.
License: none required.
Any restrictions to use by non-academics: no restriction.
This study was supported by National Cheng Kung University and Ministry of Science and Technology of Taiwan MOST-103-2221-E-006 -174 -MY2.
The publication of this paper was funded by National Cheng Kung University and Ministry of Science and Technology of Taiwan MOST-103-2221-E-006 -174 -MY2.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 18, 2015: Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S18.
- Miller JA, Widom J: Collaborative competition mechanism for gene activation in vivo. Mol Biol Cell. 2003, 23 (5): 1623-1632.View ArticleGoogle Scholar
- Tanay A: Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 2006, 16: 962-972.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen MJ, Chou LC, Hsieh TT, Lee DD, Liu KW, Yu CY, Oyang YJ, Tsai HK, Chen CY: De novo motif discovery facilitates identification of interactions between transcription factors in Saccharomyces cerevisiae. Bioinformatics. 2012, 28 (5): 701-708.View ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104.PubMed CentralView ArticlePubMedGoogle Scholar
- Yu X, Lin J, Masuda T, Esumi N, Zack DJ, Qian J: Genome-wide prediction and characterization of interactions between transcription factors in Saccharomyces cerevisiae. Nucleic Acids Res. 2006, 34 (17): 917-927.PubMed CentralView ArticlePubMedGoogle Scholar
- Datta D, Zhao H: Statistical methods to infer cooperative binding among transcription factors in Saccharomyces cerevisiae. Bioinformatics. 2008, 24: 545-552.View ArticlePubMedGoogle Scholar
- Elati M, Neuvial P, Bolotin-Fukuhara M, Barillot E, Radvanyi F, Rouveirol C: LICORN: learning cooperative regulation networks from gene expression data. Bioinformatics. 2007, 23 (18): 2407-2414.View ArticlePubMedGoogle Scholar
- Banerjee N, Zhang MQ: Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Res. 2003, 31: 7024-7031.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagamine N, Kawada Y, Sakakibara Y: Identifying cooperative transcriptional regulations using protein-protein interactions. Nucleic Acids Res. 2005, 33: 4828-4837.PubMed CentralView ArticlePubMedGoogle Scholar
- Tsai HK, Lu HHS, Li WH: Statistical methods for identifying yeast cell cycle transcription factors. Proc Natl Acad Sci USA. 2005, 102: 13532-13537.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang YH, Wang YC, Chen BS: Identification of transcription factor cooperativity via stochastic system model. Bioinformatics. 2006, 22 (18): 2276-2282.View ArticlePubMedGoogle Scholar
- He D, Zhou D, Zhou Y: Identifying synergistic transcriptional factors involved in the yeast cell cycle using Microarray and ChIP-chip data. Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops:21-23 October 2006; Hunan. Edited by: Xiao N, Buyya R, Liu Y, Yang G. 2006, Los Alamitos: IEEE Computer Society, 357-360.View ArticleGoogle Scholar
- Wang J: A new framework for identifying combinatorial regulation of transcription factors: a case study of the yeast cell cycle. J Biomedical Informatics. 2007, 40 (6): 707-725.View ArticlePubMedGoogle Scholar
- Chuang CL, Hung K, Chen CM, Shieh GS: Uncovering transcriptional interactions via an adaptive fuzzy logic approach. BMC Bioinformatics. 2009, 10: 400-PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Zhang XS, Xia Y: Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data. Nucleic Acids Res. 2009, 37 (18): 5943-5958.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Y, Zhang Z, Li Y, Zhu XG, Liu Q: Identifying cooperative transcription factors by combining ChIP-chip data and knockout data. Cell Res. 2010, 20 (11): 1276-1278.View ArticlePubMedGoogle Scholar
- Lai FJ, Jhu MH, Chiu CC, Huang YM, Wu WS: Identifying cooperative transcription factors in yeast using multiple data sources. BMC Systems Biology. 2014, 8 Suppl 5: S2-View ArticlePubMedGoogle Scholar
- Norel R, Rice JJ, Stolovitzky G: The self-assessment trap: can we all be better than average?. Mol Syst Biol. 2011, 7: 537-PubMed CentralView ArticlePubMedGoogle Scholar
- Lai FJ, Chang HT, Huang YM, Wu WS: A comprehensive performance evaluation on the prediction results of existing cooperative transcription factors identification algorithms. BMC Systems Biology. 2014, 8 Suppl 4: S9-View ArticlePubMedGoogle Scholar
- Chang DTH, Huang CY, Wu CY, Wu WS: YPA: an integrated repository of promoter features in Saccharomyces cerevisiae. Nucleic Acids Res. 2011, 39 (1): D647-D652.PubMed CentralView ArticlePubMedGoogle Scholar
- Chang DTH, Li WS, Bai YH, Wu WS: YGA: identifying distinct biological features between yeast gene sets. Gene. 2012, 518 (1): 26-34.View ArticlePubMedGoogle Scholar
- Chiu CC, Chan SY, Wang CC, Wu WS: Missing value imputation for microarray data: a comprehensive comparison study and a web tool. BMC Syst Biol. 2013, 7 Suppl 6: S12-View ArticlePubMedGoogle Scholar
- Yang TH, Wang CC, Wang YC, Wu WS: YTRP: a repository for yeast transcriptional regulatory pathways. Database. 2014, bau014-Google Scholar
- Yang TH, Chang HT, Hsiao ESL, Sun JL, Wang CC, Wu HY, Liao PC, Wu WS: iPhos: toolkit to streamline the alkaline phosphatase assisted comprehensive LC-MS phosphorproteome investigation. BMC Bioinformatics. 2014, 15 (Suppl 16): S10-PubMed CentralView ArticlePubMedGoogle Scholar
- Yang TH, Wang CC, Hung PC, Wu WS: cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila. BMC Syst Biol. 2014, 8 (Suppl 4): S8-PubMed CentralView ArticlePubMedGoogle Scholar
- Hung PC, Yang TH, Liaw HJ, Wu WS: YNA: an integrative gene mining platform for studying chromatin structure and its regulation in Yeast. BMC Genomics. 2014, 15 (Suppl 9): S5-PubMed CentralView ArticlePubMedGoogle Scholar
- Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, Reguly T, Rust JM, Winter A, Dolinski K, Tyers M: The BioGRID Interaction Database: 2011 update. Nucleic Acids Res. 2011, 39 (Database issue): D698-D704.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang H, Nepusz T, Paccanaro A: Improving GO semantic similarity measures using download random walks. Bioinformatics. 2012, 28 (10): 1383-1389.View ArticlePubMedGoogle Scholar
- Balaji S, Babu MM, Iyer LM, Luscombe NM, Aravind L: Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. J Mol Biol. 2006, 360 (1): 213-227.View ArticlePubMedGoogle Scholar
- Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG: Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformatics. 2007, 23 (20): 2692-2699.View ArticlePubMedGoogle Scholar
- Abdulrehman D, Monteiro PT, Teixeira MC, Mira NP, Lourenço AB, dos Santos SC, Cabrito TR, Francisco AP, Madeira SC, Aires RS, Oliveira AL, Sá-Correia I, Freitas AT: YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Res. 2011, 39 (Database issue): D136-D140.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.