PuTmiR: A database for extracting neighboring transcription factors of human microRNAs
© Bandyopadhyay and Bhattacharyya. 2010
Received: 20 November 2009
Accepted: 15 April 2010
Published: 15 April 2010
Skip to main content
© Bandyopadhyay and Bhattacharyya. 2010
Received: 20 November 2009
Accepted: 15 April 2010
Published: 15 April 2010
Some of the recent investigations in systems biology have revealed the existence of a complex regulatory network between genes, microRNAs (miRNAs) and transcription factors (TFs). In this paper, we focus on TF to miRNA regulation and provide a novel interface for extracting the list of putative TFs for human miRNAs. A putative TF of an miRNA is considered here as those binding within the close genomic locality of that miRNA with respect to its starting or ending base pair on the chromosome. Recent studies suggest that these putative TFs are possible regulators of those miRNAs.
The interface is built around two datasets that consist of the exhaustive lists of putative TFs binding respectively in the 10 kb upstream region (USR) and downstream region (DSR) of human miRNAs. A web server, named as PuTmiR, is designed. It provides an option for extracting the putative TFs for human miRNAs, as per the requirement of a user, based on genomic locality, i.e., any upstream or downstream region of interest less than 10 kb. The degree distributions of the number of putative TFs and miRNAs against each other for the 10 kb USR and DSR are analyzed from the data and they explore some interesting results. We also report about the finding of a significant regulatory activity of the YY1 protein over a set of oncomiRNAs related to the colon cancer.
The interface provided by the PuTmiR web server provides an important resource for analyzing the direct and indirect regulation of human miRNAs. While it is already an established fact that miRNAs are regulated by TFs binding to their USR, this database might possibly help to study whether an miRNA can also be regulated by the TFs binding to their DSR.
MicroRNAs (miRNAs) are non-coding small RNAs (21-23 nt), which regulate mRNA stability and translation through the action of an RNA-induced silencing complex (RISC) [1, 2]. Various biological processes, e.g., insulin secretion, cell proliferation, brain development, apoptosis etc., are controlled by miRNAs and emerging evidences strengthen the belief about their involvement in various diseases like fragile X syndrome, schizophrenia, cancer and many others [3–5]. Recent studies in systems biology suggest that there exists a complex regulatory network between genes, transcription factors (TFs) and miRNAs . Study of such regulatory network is indeed promising for disease analysis in various organisms. The initiatives for exploring this comprehensive regulatory network primarily focus on pairwise regulation (e.g., gene-TF, gene-miRNA and TF-miRNA regulations) [6, 7]. However, these approaches face a major problem due to the unavailability of proper repositories, specially for the regulating factors for miRNAs. In this paper, we focus on TF to miRNA regulation and provide a novel interface for extracting the list of putative TFs for human miRNAs. In the subsequent discussions, we use the representation TF→miRNA to denote a pair of TF and miRNA in which the TF might be a possible regulator of the miRNA.
There are very few earlier studies focusing on TF→miRNA relationships for a specific organism. Instead of being based on computational approaches, most of these rely on literature survey. In a recently published database, named TransmiR (version 1.0), a sum of 288 TF→miRNA relationships (of these 242 are unique), for 10 organisms, have been accumulated based on a manual survey  of about 5000 reports collected from PubMed Central . Of these 242 distinct pairs, 221 pairs report about the regulation of human miRNAs. The complete list of TF→miRNA pairs includes 82 unique TFs and 100 unique miRNAs. Of these, 88 miRNAs and 69 TFs are related to the human category. Along with the regulation information, it also provides the information on the involvement of the TF with tumor and the miRNA with tumor and disease, respectively. Certainly, this is a very small fraction of the possible comprehensive network. Therefore, instead of depending only on literature survey there is a strong requirement of further in-depth computational analysis.
Here, we provide the first repository of the set of putative TFs for any arbitrary human miRNA binding in the possible regulatory regions (within 10 kb USR and DSR, respectively) of that miRNA. We call it hereafter as the repository of putative transcription factor to miRNA regulation (PuTmiR). It's up-to-date interface processes a region-specific search upstream of a given miRNA to find out the possible regulating TFs. As an additional feature, the web server implementation also provides the putative TFs binding to the downstream region of miRNAs. In spite of the conventional focus on upstream regulatory factors of genes, PuTmiR also provides the data for downstream regions as this might possibly be useful in exploring the fact whether miRNAs could be regulated from downstream regions or not. This kind of interface is a novel one for the system level study of miRNA regulation.
The complete list of human (Homo sapiens) miRNAs, which have stem-loop structure precursor sequence validated with literature evidence, and their corresponding genomic locations have been collected from the up-to-date repository of miRBase . This produces a list of total 721 miRNAs. Some of the miRNAs have been discarded from this list because they are found absent from the assembly of UCSC . Notably, we have used UCSC to extract some additional information on TF binding sites. Specifically, several chromosomal entries like MT (mitochondrial) and HS chromosomes are found to be unusable for this study. The chromosomes HSCHR6_MHC_COX, HSCHR6_MHC_MANN, HSCHR6_MHC_DBB, HSCHR6_MHC_MCF, HSCHR6_MHC_QBL containing hsa-mir-219-1, HSCHR6_MHC_COX, HSCHR6_MHC_DBB, HSCHR6_MHC_MCF, HSCHR6_MHC_MANN, HSCHR6_MHC_QBL, HSCHR6_MHC_SSTO containing hsa-mir-877, HSCHR6_MHC_COX, HSCHR6_MHC_DBB, HSCHR6_MHC_MCF, HSCHR6_MHC_QBL, HSCHR6_MHC_SSTO containing hsa-mir-1236, and MT containing hsa-mir-1974, hsa-mir-1977 and hsa-mir-1978 as mentioned in the miRBase have been discarded. To search for the TFs we retrieved data from the UCSC genome browser . We define the USR and DSR with respect to the 5' and 3' untranscribed end of an miRNA, respectively. Again, hsa-mir-941-4 has been removed from the list because its chromosomal location is empty in the miRBase repository.
assembly: Mar. 2006,
track: TFBS Conserved,
table: tfbsConsSites and tfbsConsFactors.
The list of genomic locations of miRNAs in miRBase that are invalid according to the Mar. 2006 repository of UCSC.
The UCSC uses the public version of the Transfac binding site distribution matrix (TRANSFAC®7.0)  for predicting putative TFs. The Transfac public repository contains position-weight matrices for 398 TFBSs characterized through empirical studies reported in the scientific literature. From this, 258 binding matrices for known TFs conserved in the human, mouse or rat alignment (i.e., the prediction accuracy is higher than a threshold for its binding matrix over all the species considered) were used in the tfbsConsSites track of UCSC. These matrices are used for predicting the TFBSs in a genomic location. Again, the tfbsConsFactors track contains the TFs corresponding to these transcription factor binding sites. By combining these two tracks, we have retrieved the list of putative TFs.
For predicting the TFBSs, the matching scores corresponding to the binding matrices are computed for each genomic location and for all the species. Here 3 species, human, mouse and rat are considered.
where μ log_score and σ log_score in Eqn. (3) denote the mean and standard deviation of the log_scores, respectively, derived for all the cases. This z-score is then used to create the threshold for each binding matrix, namely, 1.64 standard deviations above the mean. The threshold is calculated for each species such that the maximum p-value becomes 0.05. Tfloc is then run with this threshold on each chromosome for the 3-way multiz alignments.
By combining the retrieved information, we have produced the final results in the form of two complete tables for 10 kb USR and DSR, respectively. Notably, the TFs extracted through these processes are taken from a table which contains the location and score of TFBSs conserved in the human/mouse/rat alignment. A conserved binding site across the alignment is defined for which the score is greater than or equal to a threshold for its binding matrix in all these three species. Therefore, the extracted TFs are specific to the organisms human, mouse and rat. As we are focusing on the human TF→miRNA relationships in this study, the TFs specific to mouse and rat are pruned out from the data. In this way, we have finally obtained at least one putative TF binding to the 10 kb USR of 601 miRNAs and in the 10 kb DSR of 613 miRNAs, respectively. In total, 11485 and 12251 relationships are found where a TF binds in the 10 kb USR and DSR of the human miRNAs considered in the study, respectively. In course of the preparation of the final dataset, We have included the Refseq IDs corresponding to the TFs considered in the list. The complete datasets for upstream region and downstream region are provided in Additional files 1 and 2, respectively.
We have studied the significance of the TF→miRNA data through comprehensive literature survey. We have found a couple of TFBSs in chromosome 1, which corresponds to the binding site of the TF YY1, in the upstream of hsa-mir-29b-2 and hsa-mir-29c, and report it in the PuTmiR database. This information has earlier been accounted in a study that reveals the repression of hsa-mir-29b and hsa-mir-29c through YY1 protein . It also validates the association of the regulation with lung cancer, Tcl1 expression, Mcl-1 protein expression and apoptosis. This is a significant support to the putative TFs found for the human miRNAs. Moreover, we have found a total of 114 miRNAs (and reports in PuTmiR) whose common putative TF is the YY1 protein. Based on the reports in a recent survey in , we identified that 13 of these miRNAs (hsa-let-7a-1, hsa-let-7e, hsa-let-7f-2, hsa-mir-107, hsa-mir-135b, hsa-mir-143, hsa-mir-145, hsa-mir-199a, hsa-mir-21, hsa-mir-218-2, hsa-mir-29b-2, hsa-mir-29c, hsa-mir-375) are related to colon cancer. Interestingly, the miRNAs hsa-mir-29b-2 and hsa-mir-29c, which are biologically validated targets of YY1 , also belongs to this list. So, most likely this group of miRNAs is regulated by the YY1 protein acting as the TF.
The PuTmiR web server has been designed for providing a custom interface for extracting TFs of human miRNAs for a specific binding region of interest. At the lower level, we have the complete list of putative TFs binding to the 10 kb USR and DSR of the miRNAs, where the binding confidence is based on the position-weight matrices stored in the TRANSFAC database . We provide an optional interface to define the region for which the TFs are required to be extracted for upstream or downstream. The server side code of the interface is written in PHP and has been made non-specific to browsers. The extracted output in the tabular form provides the genomic locations, strand, locations of interest, location, name and strand of TFs, and z-score of the binding prediction accuracy for the miRNA given as query. The extracted data can also be downloaded in tabular format.
The user-defined region is restricted not to exceed 10 kb because the 10 kb region is expected to be the possible regulating region of any miRNA gene [6, 7]. Finally by clicking on the upstream or downstream optional buttons, the putative TFs are extracted. The results are available in tabular format or in downloadable text format. As an interesting feature of PuTmiR, we have included a link to the PubMed central database, which is regularly updated with published scientific literature, to browse for the regulation information (by TF) on the miRNA given as query.
The data evolving as an output from this study is important in the perspective of further system level analysis. Systems biology involves analysis in the lower molecular levels, which mean in the levels of genes and proteins (including TFs). Recent addition to this list is the miRNAs that actively participate in the negative regulation of genes , and thus proteins. It is conjectured that there exists regulatory interactions between genes, TFs and miRNAs . Additionally, current studies strengthen the fact that while TFs regulate the miRNAs, miRNAs themselves target (directly or indirectly) signaling proteins, enzymes and TFs, and they all are believed to form a complex regulatory network . Thus, the putative TF→miRNA data is useful in a systems biology context. It can be integrated and mapped onto various networks for various studies.
A handful of studies have been carried out in the recent past to integrate regulatory information with different biological networks. In one of these few, Cui et al. have mapped interactions between miRNAs to a human cellular signaling network for further regulatory analysis. They identified that miRNAs preferentially target the positive regulatory motifs, highly connected scaffolds and most downstream network components (e.g., signaling TFs). On the other hand, the negative regulatory motifs, common components of basic cellular machines and most upstream network components (such as ligands) are found as weak candidates to be a target . Similar analysis has also been done with metabolic networks  and protein networks . In , miRNA targets have been mapped onto the enzymes of a human metabolic network and statistically analyzed to discover novel regulatory patterns. They concluded that highly connected enzymes are potential targets of majority of the miRNAs. As the regulation of metabolic enzyme-encoding genes (by the miRNAs) controls the production of metabolites, the regulation of miRNAs is itself an important study to pursue. Similarly, miRNA regulation has also been studied over the protein-protein interaction networks . The TFs which regulate miRNAs is therefore the key factors of indirect regulation of different biological networks.
Pursuing their earlier study in , Cui et al. have integrated cancer mutated and cancer-associated methylated genes onto a human cell signaling network to investigate the oncogene-signaling map . They found that the genes in the downstream of the network are frequent targets for activating mutations. Most importantly, the genes highly connected to mutated genes are mostly cancer associated. Recent studies on the construction of disease regulatory networks, as published on specific diseases like cancer upon well-organized surveys , are already in progress. These studies can thus be further extended from the TF→miRNA regulatory information available in the database. One of our earlier studies shows that co-expressed miRNAs have common TFs, by analyzing a stem cell dataset . Ongoing investigations have already revealed the involvement of miRNAs in stem-cell biology . So, this type of data on miRNAs is indeed promising for exploring various disease regulatory activities of miRNAs, directly or indirectly.
In this work, alongside providing the upstream neighboring TFs of miRNAs in the PuTmiR web server, we also accumulate the TFs binding in the DSR of an miRNA. While analyzing the DSR data, we have identified some potential TF→miRNA pairs, i.e. some corresponding putative TFBSs within the 10 kb downstream region of some miRNAs, which have been biologically validated earlier. It is an established fact that the TF p53 upregulates hsa-mir-192 . In our study, we have found four putative TFBSs of p53 within the 10 kb downstream region of the miRNA 192. On the other hand, for each of the proteins MEF2 and YY1, a single putative TFBS has been identified within the 10 kb DSR of the miRNAs 133a-1 and 29b-2, respectively, which are already known to be validated TFs of these miRNAs. Recent studies have revealed that MEF2 regulate the expression of a bicistronic miRNA cluster encoding hsa-mir-133a-1 in cardiac and skeletal muscle , and also, the derepression of hsa-mir-29b is caused by the downregulation of YY1 protein . Therefore, some of these putative findings are also biologically convincing. It is an important recent concern whether the miRNAs possibly be regulated by the TFs binding to their downstream region. In fact, there is hardly any comprehensive study establishing this fact. However, we summarize the few in literature for reviewing current beliefs on downstream miRNA regulation. Earlier in 2002, in silico cloning was used in a study to identify a putative promoter region located about 215 bp downstream of the miR16 gene . The investigations on conserved regulatory motifs in 3' untranscribed regions (UTRs) over multiples species were originally started by Xie and his colleagues . They have identified 72 highly conserved (over the species human, mouse, rat and dog) 3' UTR 8-mer motifs which are most possibly binding sites of miRNA genes. In a recent biological analysis on the tumor suppressor PRDM5 (binds in 41% of the promoter sequences), it is found to occupy the proximal region (within the 1 kb upstream or downstream of mature miRNA sequence) of 10% miRNAs . They identified the binding sites to be intronic, intergenic or within the exons of protein-coding genes. In another study on the roles of Myeloid transcription factors in the transcription of Pri-miR-223 in mouse, it is figured out that the mutation in the downstream PU.1-binding motif results in a drastic decrease of transcriptional activity . It suggests a dominant role of the corresponding site in transcriptional control of mouse pri-miR-223. These results demonstrate that transcription of mouse pri-miR-223 is optimally induced in the presence of both of two PU.1-binding sites in the conserved promoter, therefore implying a critical role of PU.1 in pri-miR-223 transcription.
In a relatively recent analysis on the distribution of CpG islands, a useful aid for promoter prediction, Saini et al. have identified a good number of candidates at the downstream of intergenic miRNAs . They strongly conjectured the existence of TF-binding sites in the downstream of miRNAs and also the presence of distal regulatory elements like enhancers or silencers regulating therein. The enhancers, a class of regulatory elements, can modify expression of a nearby gene when they are present, on either strand, upstream or downstream, and in either orientation. They can even be located relatively far from the gene regulated. Later, Saini and his group have studied many conserved primary transcripts of intergenic miRNAs as a follow through of their earlier work . These studies indeed suggest that TF binding sites of miRNA genes might act as enhancers. Also, there are current evidences that proposes the initialization of miRNA transcripts in the upstream or downstream of promoter regions of miRNAs . So, downstream regulation of miRNAs is in fact an interesting due study.
This paper describes a novel repository of putative TF→miRNA information. This repository provides promising information for further study. The downstream regulatory activities of miRNAs, which is hitherto an unproved hypothesis, can be statistically studied using the information on putative TFs binding to the DSR of miRNAs provided by this study. It is indeed promising to analyze whether bi-directional regulation really happens for miRNAs or not. In fact, it is really surprising to receive a longer list of putative TFs binding in the DSR of miRNAs as compared to the USR. These observations may lead to further analyzes in this regard. It is an important concern in a study like this that miRNAs are of two types - intragenic and intergenic, according to their genomic locality. So, it may happen that the promoter of an miRNA is in the upstream of the gene from which it is encoded, without being in the exact upstream of the miRNA. Further studies pointing towards these directions are really important. It is a limitation that we are using the miRBase and UCSC genome browser at the intermediate stages of this analysis. The web server suffers from the incompatibility issues arising between these two repositories. This could be improved by retrieving updated TF information from Transfac to predict putative TFs more accurately compatible to genomic locations provided in miRBase. We plan to further improve the database by incorporating the literature-validated support in the regulatory relations obtained.
The complete datasets of putative TFs for both the 10 kb USR and DSR are provided. The web server implementation can be accessed at the link: http://www.isical.ac.in/~bioinfo_miu/TF-miRNA/TF-miRNA.html.
The authors are thankful to Dr. Neil R. Smalheiser, Assistant Professor in Psychiatry, University of Illinois, Chicago, USA for his important highlights during the current study. The authors wish to thank the anonymous reviewers for their invaluable suggestions that greatly facilitated the improvement of contents of the present article. Additionally, SB gratefully acknowledges the financial support from the grant no.- DST/SJF/ET-02/2006-07 under the Swarnajayanti Fellowship scheme of the Department of Science and Technology, Government of India.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.