miTarget: microRNA target gene prediction using a support vector machine
- Sung-Kyu Kim†1, 2,
- Jin-Wu Nam†1, 2,
- Je-Keun Rhee1, 2,
- Wha-Jin Lee1, 2 and
- Byoung-Tak Zhang1, 2, 3Email author
© Kim et al; licensee BioMed Central Ltd. 2006
Received: 21 December 2005
Accepted: 18 September 2006
Published: 18 September 2006
MicroRNAs (miRNAs) are small noncoding RNAs, which play significant roles as posttranscriptional regulators. The functions of animal miRNAs are generally based on complementarity for their 5' components. Although several computational miRNA target-gene prediction methods have been proposed, they still have limitations in revealing actual target genes.
We implemented miTarget, a support vector machine (SVM) classifier for miRNA target gene prediction. It uses a radial basis function kernel as a similarity measure for SVM features, categorized by structural, thermodynamic, and position-based features. The latter features are introduced in this study for the first time and reflect the mechanism of miRNA binding. The SVM classifier produces high performance with a biologically relevant data set obtained from the literature, compared with previous tools. We predicted significant functions for human miR-1, miR-124a, and miR-373 using Gene Ontology (GO) analysis and revealed the importance of pairing at positions 4, 5, and 6 in the 5' region of a miRNA from a feature selection experiment. We also provide a web interface for the program.
miTarget is a reliable miRNA target gene prediction tool and is a successful application of an SVM classifier. Compared with previous tools, its predictions are meaningful by GO analysis and its performance can be improved given more training examples.
MicroRNAs (miRNAs) are endogenous ~22 nucleotide noncoding RNAs, which act as posttranscriptional regulators in animals and plants. MiRNAs use two distinct posttranscriptional mechanisms to downregulate gene expression. They act by binding to the complementary sites on the 3' untranslated region (UTR) of the target gene to induce cleavage with near perfect complementarity or to repress productive translation [1–6]. They also facilitate deadenylation, which leads to rapid mRNA decay [7, 8]. The choice between the translational inhibition and destruction is thought to be governed by the degree of mismatch between a miRNA and its target mRNA. The behaviors of miRNAs differ between animals and plants. Those of plants tend to show near perfect complementarity to their target messenger RNAs (mRNAs), but the miRNAs of animals usually have imperfect characteristics, including mismatches, gaps, G:U wobble pairs and others [9–15]. This makes it hard to find target animal mRNAs using only sequence complementarity. Nevertheless, strong sequence conservation observed in target mRNA sites and in miRNA sequences makes it possible to develop programs for the prediction of potential targets [16–18]. This evolutionarily meaningful evidence shows the importance of sequence preservation as a requirement for function. Particularly, no specific role has been explained for the 3' ends of miRNAs even though they tend to be evolutionarily conserved over their entire lengths.
To date, computational methods have been widely used for the prediction of miRNAs [16, 19–21] and miRNA target genes [12, 22–28]. Different approaches have been used for miRNA target predictions in plants and animals. For plant sequences, similarity-based approaches have shown high performance because complementarity is nearly perfect [22, 25]. However, such approaches are not appropriate for animal genomes because of the imperfect nature of the miRNA:mRNA interaction. Studies for animal sequences have been based on both the complementarity to the 5' part of miRNAs and conserved motifs over species [12, 23, 24, 26]. These can be implemented by a model containing weighted position features and comparative information to detect target mRNA sites and to reduce false positives. Scoring methods using dynamic programming [26, 27, 29] and a complementarity-based strategy. [23, 28] are generally preferred to rank the prediction results. They have been quite successful for a few top-ranked results. However, the results are often limited by the conserved nature of the data set used.
In this article, we present a support vector machine (SVM) classifier to predict miRNA target genes. An SVM is one of the most popular machine learning algorithms and it has good performance in classification problems. Moreover, we collected training data from the literature to make a biologically relevant simulation. Generally, the efficiency and the reliability of a machine learning algorithm depend on choosing relevant data and specific features. Thus, a biologically relevant data set is as important as a good algorithm. An SVM builds a classifier directly from the data by investigating its characteristics. It does not require conservation information for classification, so it is free from the limitations described above. Our SVM classifier gave good results for predicting the targets of miRNAs.
Biologically relevant data set
The training data set configuration.
Stark et al.
Johnston et al.
Nelson et al.
Kiriakidou et al.
Vella et al.
Doench et al.
Yekta et al.
Lai et al.
Brennecke et al.
The training data set gained directly from the literature contained 235 examples including 152 positives and 83 negatives. There were too few negative examples to build an effective classifier. We needed more negative data because these usually contribute to the specificity of a classifier much more significantly than positive data. Specificity is usually more important than sensitivity in genome analysis because slight decreases in specificity values can generate many false predictions because of the large size of genome sequences. However, we did not use randomly generated negative examples because such sequences often interact with miRNAs, as shown in the signal-to-noise ratio experiments of previous studies [23, 30, 31]. Instead, we inferred 163 negative examples as described below. Thus, the final size of the data set was 398 (152 positives, 246 negatives).
For the inferred negative examples, we noted that deletion of target sites on the target mRNA sequence can give a large number of negative examples. Thus, in one report , let-7 miRNA could not repress expression after deleting the target sites of let-7 miRNA on lin-41, and in another , let-7 miRNA was inactivated by knocking out the target sites on the gene for the cold shock protein LIN-28. That is, the remaining region on the lin-41 3' UTR will not now work with let-7 miRNA. This is the same for LIN-28. We conclude that if all the actual binding sites on lin-41 and LIN-28 are masked, then all the other remaining sites with favorable seed pairings are apposite as negative examples. In practice, we collected examples with more than 4-mer matches at their seed part and discarded the rest to improve the quality of the data set. As a result, we gained 163 inferred negative examples: 50 from lin-41 and 113 from LIN-28.
Support vector machine
We used an SVM [32, 33] to build a classifier discriminating the binding sites of a miRNA on the 3' UTR region of a gene. SVMs allow an implicit mapping of the sample vectors into a high-dimensional, non-linear feature space, in which the samples may be separated better using a similarity function between pairs of samples, called a kernel. To implement a kernel method, let us denote S = (x1,...,x n ) as a set of miRNA target data to be trained. We suppose that each datum x i is an element of a set X of all possible target data. To design a data classification method, the data set S is then represented as the set of features, Φ(S) = (Φ(x1),..., Φ(x n )), where Φ(x) can be defined as a real-valued vector. The size of the vector is the number of features. This classification method is designed to process a set of pairwise comparisons of data x i and x j . It is represented by an n × n matrix of pairwise comparisons ki,j= k(x i ,x j ). The n × n matrix is used as input data of our kernel. In our study, a radial basis function (RBF) kernel is used:
k(x i , x j ) = exp(-γ ||x i - x j ||2), (1)
where the parameter γ determines the similarity level of the features so that the classifier becomes optimal.
SVMs are often believed to find an optimal hyperplane separating the training data. In practice, however, a separating hyperplane may not exist when a problem is very noisy or complex. To accommodate this case, slack variables ξ i ≥ 0 for all i = 1,...,n are introduced to loosen the constraints as follows. :
y i (⟨w,x i ⟩ + b) ≥1 - ξ i for all i = 1,...,n. (2)
A classifier that generalizes well is then obtained by adjusting both the classifier capacity ||w|| and the sum of the slacks ∑ i ξ i . The latter can be shown to provide an upper bound on the number of training errors. Such a soft margin classifier can be realized by minimizing the following objective function:
subject to the constraints on ξ i and (2), where the constant C > 0 determines the trade-off between margin maximization and training error minimization. We implemented a modified version of SVMlight  to solve our problem.
Parameter optimization and classifier evaluation
In this section, we describe the training and evaluation of classifiers and optimization of parameters. Before the evaluation of a classifier, a tool needs to train the classifier and optimize two SVM parameters, C and γ. We evaluated the classifier with a completely independent test data set. For this, we repeatedly performed three steps as follows. First we divided the data equally into training and test sets through random sampling (without replacement). Then we performed tenfold cross validation with the training data to train a classifier and to optimize parameters. Finally we evaluated the optimized SVM classifier with the remaining test data (which must be completely independent). We performed 10 repeated evaluations as above and averaged the results. For the adjustment of the two parameters, C and γ, we searched for a parameter set that maximized the accuracy of upper tenfold cross validation using:
where C ranges from 1 to 200 in steps of 1.0 and γ ranges from 0.01 to 2.0 in steps of 0.01.
The discriminative power of our method can be described using receiver operating characteristic (ROC) analysis, which is a plot of the true positive rate against the false positive rate for the different possible cutoffs of a diagnostic test. ROC analysis reveals all possible trade-offs between sensitivity and specificity. For this, we measured the performance of classifiers across 24 cutoff points in the evaluation step (-4, -3, -2, -1.8, -1.6, -1.4, -1.2, -1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, 2, 3). The ROC was plotted with the specificity and the sensitivity averaged from the results of 10 repeated evaluations.
The RNAfold program requires a single linear RNA sequence as input, so the 3' end of the target mRNA sequence and the 5' end of miRNA sequence are connected by a linker sequence, "LLLLLL". The "L" denotes that it is not an RNA nucleotide, thus it does not match with any nucleotide and so prevent mRNA and miRNA nucleotides from binding with sequence-specific linker sequences . Thus, the RNAfold program produces an RNA secondary structure alignment with a linker sequence, exemplified in Figure 1. The positions in the alignment are numbered from the 5'-most position of the seed region. Alignments are extended until the 20th position and the rest positions are discarded.
For structural and thermodynamic features, we divided the secondary alignment into three parts consisting of the 5' part (seed part), the 3' part, and the total alignment as shown in Figures 1 and 2. Each count value of matches, mismatches, G:C matches, A:U matches, G:U matches, and other mismatches from the three parts was considered as a structural feature. The free energy values of the 5' part, the 3' part, and the total miRNA:mRNA alignment structure are thermodynamic features that are also calculated by RNAfold. Here, the sequence "AAAGGGLLLLLLCCCUUU" was used as a linker sequence to ensure that each part of the subsequence was paired. The sequences "AAAGGG" and "CCCUUU" were designed to prevent any unexpected alignment of the short matches. Although such linkers may change the original signal, the thermodynamic effect of the linker sequence will be the same for all short matches.
Position-based features are important because they imitate the shape and mechanism of the seed pairing. Doench et al.  and Brennecke et al.  focused on the sequence-specificity of miRNA:mRNA interaction. They found that a single point mutation could inhibit the miRNA's function depending on its position. In contrast to our earlier belief, their research revealed that examples with favorable thermodynamic free energy might not regulate expression. Therefore, we investigated the binding mechanism. Position-based features corresponded to point mutations in the above two experiments. Each position had one of the four nominal values consisting of a G:C match, an A:U match, a G:U match, and a mismatch. To make these values available for SVMlight, we translated them into decimal values from 1 to 4, respectively, and normalized them.
Performance of the SVM classifier
Supporting evidence from microarray data
Significance of miRNA target predictions based on miRNA microarray perturbation data.
We calculated probability (P)-value relative to hypergeometric distribution to test statistically significant enrichment of the downregulated genes among the target gene candidates using the following equation:
where X denotes the number of downregulated genes (with 3' UTR sequences), N denotes the number of human 3' UTR sequences used for target prediction, n is the number of predicted target genes, and x is the number of downregulated genes matched by the predicted target genes. The P-values are shown in Table 2: the predicted targets were statistically significant.
In addition, we performed a significance test like that above to compare our method with a simple predictor, which searches for targets based on miRNA seed matches of positions 2–7, as a baseline. The results are summarized in Supplementary Table 4 [see Additional file 4]. In miR-124a and miR-373, excluding miR-1, miTarget was more significant than the seed match. Although target genes of miR-1 by miTarget were less enriched in downregulated genes, miTarget showed more robust target prediction of three miRNAs and the seed match showed a high number of false positives. Because several known miRNA:mRNA alignments have one or two mismatches in the seed region and some miRNAs mutated at the seed region are still functional, a simple approach based on the seed match may produce more false negatives.
It is necessary to emphasize that we note that some of the downregulated genes might be indirectly affected via other genes, so they may not be targets of these particular miRNAs. Also, some genuine targets may be repressed only translationally without being affected at mRNA level that is measured in the microarray experiments. Thus, although we cannot precisely measure it, the sensitivity of our classifier may be better than assumed here.
Annotation using gene ontology (GO)
Using GO  to validate the target prediction is one of the most biologically relevant approaches for indicating the functional coherence of target genes . It is achieved readily by searching for statistically significant GO terms.
The top 15 contributing features.
5' part free energy
AU matches at the 5' part
Mismatches at the 5' part
Matches at the 5' part
Total GU matches
GU match at the 5' part
GU match at the 3' part
Total AU matches
Total free energy
Comparison of random negative data sets
As can be seen in Figure 5a, the classifier built with the original data set showed a higher performance (ROC area: 88.7%) than the classifier built with the random data set (ROC area: 84.4%), as evaluated on the original test data set. This means that the classifier created on the original data set is more appropriate for genuine targets than the classifier created on the random data set. However, the random classifier (ROC area; 96.7%) showed a higher performance than the original classifier (ROC area; 93.3%) for random test data alone (Figure 5b).
The two results above indicate that a manually selected original data set is clearly important for the development of an efficient classifier. Although random negative data are widely used for machine learning algorithms, great care should be taken when using this approach. "Random" does not mean "negative", so the random data may contain real cases by chance, leading to relatively low sensitivity. In addition, such random data are often biologically infeasible, so they can be distinguished easily from positive data, which is why specificity is so high. Thus, we did not use random negative examples and used only actual examples from the literature, so that our data set was biologically relevant.
Comparison with previous tools
There are several miRNA target gene prediction tools and each has its own merits. Lewis et al.  developed TargetScan to identify mammalian miRNA targets. It depends on a strong seed pairing mechanism and conservation among species, and gives an acceptable performance in wetlab experiments for validation. Enright et al.  implemented a dynamic-programming-based program, called miRanda, to identify targets for Drosophila melanogaster. They validated their result with wetlab experiments, but found a false positive rate of about 30%. Rehmsmeier et al.  improved existing RNA folding algorithms and presented RNAhybrid for prediction in the Drosophila melanogaster genome. By forcing seed matches on positions 2–7, it detected many previously known targets.
We performed a comparison on our miRNA:mRNA data set, and the ROC result is presented in Figure 3. Because the previous methods did not have cutoffs, we could not draw their ROC curves. Instead, we have indicated their performances as rectangles on the graph and compared specificities based on their sensitivity. Overall, miTarget gave a more stable performance than miRanda and RNAhybrid, but a slightly lower specificity (0.93) at a sensitivity of 0.63 than TargetScan (0.94). The higher specificity of TargetScan seems to arise from its strong constraint on the seed region. As it requires six continuous pairings on positions 2–7 of the seed region, this constraint made it predict many of the examples in our data set as negative. However, the limitation of TargetScan is that its best sensitivity was 0.63. Indeed, TargetScan seems to be rather conservative and is less flexible than our classifier, which can predict with optional accuracy across a broad range of specificity and sensitivity. Unlike other methods, our classifier is trainable and can be improved continuously if we can obtain more biologically relevant data.
Contribution of each feature
We devised a feature selection method to investigate which might play a more dominant role in miRNA target regulation. Such methods are used to improve the performance of a classifier, to make it cost effective, and to help understand the problem. Our intention was to understand the hidden mechanism of miRNA function. We anticipated dominantly functioning features or noninformative features.
We used Weka software , and the features were evaluated using the OneR classifier and Ranker methods. The top 15 contributing features are shown in Table 3. Position-based features were ranked in half of the top 10 features, with position five in the lead. The continuous pairing of positions four, five, and six may be important for miRNA function, because they are ranked at fourth, first, and third, respectively. A G:U wobble pair also plays an important but maybe negative role; such a pair is known as a disturbing factor [4, 13]. Therefore, G:U related features were ranked high. More than three G:U wobble pairs are believed to impair miRNA function , which is consistent with this result.
In this paper, we used an SVM classifier to predict miRNA target sites with biologically relevant data and measured its performance in various ways. We also investigated which features might contribute significantly to miRNA function. Our SVM classifier, called miTarget, performed well on our data set and produced significant results. Structural and thermodynamic features contributed to overall performance enhancement and position-based features increased specificity. miTarget also showed a stable performance compared with existing tools. Features on positions four, five and six seem to have more significant effects on seed pairing, according to the result of feature selection analysis. This is consistent with the general belief that the seed regions of paired structures should be stable. Moreover, out study suggests that individual positions in the seed region may unequally contribute to target recognition.
As mentioned above, high specificity is required in genome research because of the volume of data. For combinations of nucleotide pairs, the rates of match and mismatch were 0.25 and 0.75, respectively, and the expected specificity would be 0.75 with one position-based feature where a match is important: position five for example. One of the main limitations in miRNA target prediction for the human genome is the prevalence of long 3' UTR sequences compared with other species. The longer such sequences: the higher the false positive rate. In order to reduce the false positive rate, our classifier needs to be improved in specificity by introducing more inferred or experimental negative examples.
We verified our miRNA target-gene prediction results by the analysis of GO terms. We have shown the results for miR-1, miR-124a, and miR-373 and these are consistent with the general idea that miRNA targets are diverse in function . To reveal more details of miRNA function, sophisticated wetlab experiments are essential for understanding the mechanisms of miRNA targeting.
According to a previous analysis , there should be three classes of miRNA target sites: canonical 5' dominant, seed dominant, and 3' compensatory. Two of these classes need to have strong complementarity on the seed. However, the 3' compensatory class needs to have only a moderate level of complementarity in the seed, while the 3' part is considerably matched. Our results failed to explain this. We found only one feature ranked at the 12th position, which was a G:U match at the 3' part, and almost all of the other features were about the 5' parts. This may be because the test was biased toward the effect of the miRNA seed. Recent studies have concentrated on the conservation of seed motifs. [48, 49] and wetlab experiments are performed accordingly. Therefore, experiments on the 3' part are rarely done and it is hard to get appropriate data to investigate the effect of this region.
In addition, multiclass classification algorithms may be possible to explain this situation. If there really are three distinct classes in miRNA:mRNA pairing mechanisms, our binary classification approach is not an optimal solution. The lack of data is going to be another main limitation. Our data set is still small for standard machine learning approaches. However, for multiclass problems, the data set size should be much larger than the binary problem. Because machine learning algorithms often depend on the quality and amount of data set, many biologically verified high quality data are required.
We constructed miTarget, an SVM classifier for miRNA target-gene prediction, and have shown its reliability in several ways. We collected a biologically relevant data set from the literature and designed new position-based features implying the manner of miRNA targeting. This predicted significant functions of human miRNA miR-1, miR-124a, and miR-373 by GO analysis. The feature selection experiment revealed that pairings at positions four, five and six are more important than other seed regions.
Nevertheless, there are still limitations in applying computer-based approaches. First, the actual mechanism of miRNA function remains unclear. Second, biologically relevant data are scarce. Third, "real" biological mechanisms can be species-specific. With more biologically relevant and unbiased data sets available, our SVM-based approach will be easily improved and create more reliable features reflecting the real actions of miRNAs.
Availability and requirements
Project name: miTarget (microRNA target prediction)
Project home page: http://cbit.snu.ac.kr/~miTarget
Operating system(s): developed on Linux, Red Hat Enterprise Linux AS4
Programming language: Python, C
Other requirements: Vienna RNA package, SVMlight
Support Vector Machine.
radial basis function.
false discovery rate.
receiver operating characteristic.
directed acyclic graph.
This work was supported by the National Research Laboratory program (M10412000095-04J0000-03610) of the Korean Ministry of Science and Technology and by a Seoul Science Fellowship from Seoul City.
- Lai EC: microRNAs: runts of the genome assert themselves. Curr Biol 2003, 13: R925–36. 10.1016/j.cub.2003.11.017View ArticlePubMedGoogle Scholar
- Brennecke J, Stark A, Russell RB, Cohen SM: Principles of microRNA-target recognition. PLoS Biol 2005, 3: e85. 10.1371/journal.pbio.0030085PubMed CentralView ArticlePubMedGoogle Scholar
- Carrington JC, Ambros V: Role of microRNAs in plant and animal development. Science 2003, 301: 336–338. 10.1126/science.1085242View ArticlePubMedGoogle Scholar
- Nelson P, Kiriakidou M, Sharma A, Maniataki E, Mourelatos Z: The microRNA world: small is mighty. Trends Biochem Sci 2003, 28: 534–540. 10.1016/j.tibs.2003.08.005View ArticlePubMedGoogle Scholar
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004, 116: 281–297. 10.1016/S0092-8674(04)00045-5View ArticlePubMedGoogle Scholar
- Ambros V: The functions of animal microRNAs. Nature 2004, 431: 350–355. 10.1038/nature02871View ArticlePubMedGoogle Scholar
- Giraldez AJ, Mishima Y, Rihel J, Grocock RJ, Van Dongen S, Inoue K, Enright AJ, Schier AF: Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 2006, 312: 75–79. 10.1126/science.1122689View ArticlePubMedGoogle Scholar
- Wu L, Fan J, Belasco JG: MicroRNAs direct rapid deadenylation of mRNA. Proc Natl Acad Sci U S A 2006, 103: 4034–4039. 10.1073/pnas.0510928103PubMed CentralView ArticlePubMedGoogle Scholar
- Llave C, Xie Z, Kasschau KD, Carrington JC: Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 2002, 297: 2053–2056. 10.1126/science.1076311View ArticlePubMedGoogle Scholar
- Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, Weigel D: Control of leaf morphogenesis by microRNAs. Nature 2003, 425: 257–263. 10.1038/nature01958View ArticlePubMedGoogle Scholar
- Tang G, Zamore PD: Biochemical dissection of RNA silencing in plants. Methods Mol Biol 2004, 257: 223–244.PubMedGoogle Scholar
- Stark A, Brennecke J, Russell RB, Cohen SM: Identification of Drosophila MicroRNA targets. PLoS Biol 2003, 1: E60. 10.1371/journal.pbio.0000060PubMed CentralView ArticlePubMedGoogle Scholar
- Vella MC, Reinert K, Slack FJ: Architecture of a validated microRNA::target interaction. Chem Biol 2004, 11: 1619–1623. 10.1016/j.chembiol.2004.09.010View ArticlePubMedGoogle Scholar
- Vella MC, Choi EY, Lin SY, Reinert K, Slack FJ: The C. elegans microRNA let-7 binds to imperfect let-7 complementary sites from the lin-41 3'UTR. Genes Dev 2004, 18: 132–137. 10.1101/gad.1165404PubMed CentralView ArticlePubMedGoogle Scholar
- Robins H, Li Y, Padgett RW: Incorporating structure to predict microRNA targets. Proc Natl Acad Sci U S A 2005, 102: 4006–4009. 10.1073/pnas.0500775102PubMed CentralView ArticlePubMedGoogle Scholar
- Ohler U, Yekta S, Lim LP, Bartel DP, Burge CB: Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. Rna 2004, 10: 1309–1322. 10.1261/rna.5206304PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 2005, 120: 15–20. 10.1016/j.cell.2004.12.035View ArticlePubMedGoogle Scholar
- Altuvia Y, Landgraf P, Lithwick G, Elefant N, Pfeffer S, Aravin A, Brownstein MJ, Tuschl T, Margalit H: Clustering and conservation patterns of human microRNAs. Nucleic Acids Res 2005, 33: 2697–2706. 10.1093/nar/gki567PubMed CentralView ArticlePubMedGoogle Scholar
- Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, Burge CB, Bartel DP: The microRNAs of Caenorhabditis elegans. Genes Dev 2003, 17(8):991–1008. 10.1101/gad.1074403PubMed CentralView ArticlePubMedGoogle Scholar
- Lai EC, Tomancak P, Williams RW, Rubin GM: Computational identification of Drosophila microRNA genes. Genome Biol 2003, 4: R42. 10.1186/gb-2003-4-7-r42PubMed CentralView ArticlePubMedGoogle Scholar
- Nam JW, Shin KR, Han J, Lee Y, Kim VN, Zhang BT: Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res 2005, 33: 3570–3581. 10.1093/nar/gki668PubMed CentralView ArticlePubMedGoogle Scholar
- Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP: Prediction of plant microRNA targets. Cell 2002, 110: 513–520. 10.1016/S0092-8674(02)00863-2View ArticlePubMedGoogle Scholar
- Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB: Prediction of mammalian microRNA targets. Cell 2003, 115: 787–798. 10.1016/S0092-8674(03)01018-3View ArticlePubMedGoogle Scholar
- Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS: MicroRNA targets in Drosophila. Genome Biol 2003, 5: R1. 10.1186/gb-2003-5-1-r1PubMed CentralView ArticlePubMedGoogle Scholar
- Jones-Rhoades MW, Bartel DP: Computational Identification of Plant MicroRNAs and Their Targets, Including a Stress-Induced miRNA. Mol Cell 2004, 14: 787–799. 10.1016/j.molcel.2004.05.027View ArticlePubMedGoogle Scholar
- John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS: Human MicroRNA targets. PLoS Biol 2004, 2: e363. 10.1371/journal.pbio.0020363PubMed CentralView ArticlePubMedGoogle Scholar
- Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, Mourelatos Z, Hatzigeorgiou A: A combined computational-experimental approach predicts human microRNA targets. Genes Dev 2004, 18: 1165–1178. 10.1101/gad.1184704PubMed CentralView ArticlePubMedGoogle Scholar
- Rajewsky N, Socci ND: Computational identification of microRNA targets. Dev Biol 2004, 267: 529–535. 10.1016/j.ydbio.2003.12.003View ArticlePubMedGoogle Scholar
- Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R: Fast and effective prediction of microRNA/target duplexes. Rna 2004, 10: 1507–1517. 10.1261/rna.5248604PubMed CentralView ArticlePubMedGoogle Scholar
- Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A: Identification of mammalian microRNA host genes and transcription units. Genome Res 2004, 14: 1902–1910. 10.1101/gr.2722704PubMed CentralView ArticlePubMedGoogle Scholar
- Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, Rajewsky N: Combinatorial microRNA target predictions. Nat Genet 2005, 37: 495–500. 10.1038/ng1536View ArticlePubMedGoogle Scholar
- Boser BE, Guyon IM, Vapnik V: A training algorithm for optimal margin classifiers: ; Pittsburgh. ; 1992.View ArticleGoogle Scholar
- Vapnik V: Statistical Learning Theory., Wiley; 1998.Google Scholar
- Bennett KP, Mangasarian OL: Robust Linear Programming Discrimination Of Two Linearly Inseparable Sets. Optimization Methods adn Software 1992, 1: 23–24.View ArticleGoogle Scholar
- Joachims T: Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Machines. Cambridge, MA., MIT Press; 1998:169–184.Google Scholar
- Hofacker IL: Vienna RNA secondary structure server. Nucleic Acids Res 2003, 31: 3429–3431. 10.1093/nar/gkg599PubMed CentralView ArticlePubMedGoogle Scholar
- Doench JG, Sharp PA: Specificity of microRNA target selection in translational repression. Genes Dev 2004, 18: 504–511. 10.1101/gad.1184404PubMed CentralView ArticlePubMedGoogle Scholar
- Babak T, Zhang W, Morris Q, Blencowe BJ, Hughes TR: Probing microRNAs with microarrays: tissue specificity and functional inference. Rna 2004, 10: 1813–1819. 10.1261/rna.7119904PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang BT, Yang J, Chi SW: Self-Organizing Latent Lattice Models for Temporal Gene Expression Profiling. Machine Learn 2003, 52: 67–89. 10.1023/A:1023993325417View ArticleGoogle Scholar
- Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM: Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 2005, 433: 769–773. 10.1038/nature03315View ArticlePubMedGoogle Scholar
- Baskerville S, Bartel DP: Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. Rna 2005, 11: 241–247. 10.1261/rna.7240905PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Yoon S, De Micheli G: Prediction of regulatory modules comprising microRNAs and target genes. Bioinformatics 2005, 21 Suppl 2: ii93-ii100. 10.1093/bioinformatics/bti1116PubMedGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004, 20: 1464–1465. 10.1093/bioinformatics/bth088View ArticlePubMedGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003, 100: 9440–9445. 10.1073/pnas.1530509100PubMed CentralView ArticlePubMedGoogle Scholar
- Saetrom O, Snove OJ, Saetrom P: Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. Rna 2005.Google Scholar
- Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Edited by: Gray J. San Francisco, Morgan Kaufmann; 1999.Google Scholar
- Lai EC, Tam B, Rubin GM: Pervasive regulation of Drosophila Notch target genes by GY-box-, Brd-box-, and K-box-class microRNAs. Genes Dev 2005, 19: 1067–1080. 10.1101/gad.1291905PubMed CentralView ArticlePubMedGoogle Scholar
- Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M: Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 2005, 434: 338–345. 10.1038/nature03441PubMed CentralView ArticlePubMedGoogle Scholar
- Johnston RJ, Hobert O: A microRNA controlling left/right neuronal asymmetry in Caenorhabditis elegans. Nature 2003, 426: 845–849. 10.1038/nature02255View ArticlePubMedGoogle Scholar
- Nelson PT, Hatzigeorgiou AG, Mourelatos Z: miRNP:mRNA association in polyribosomes in a human neuronal cell line. Rna 2004, 10: 387–394. 10.1261/rna.5181104PubMed CentralView ArticlePubMedGoogle Scholar
- Yekta S, Shih IH, Bartel DP: MicroRNA-directed cleavage of HOXB8 mRNA. Science 2004, 304: 594–596. 10.1126/science.1097434View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.