Demonstration of two novel methods for predicting functional siRNA efficiency
© Jia et al; licensee BioMed Central Ltd. 2006
Received: 10 December 2005
Accepted: 29 May 2006
Published: 29 May 2006
siRNAs are small RNAs that serve as sequence determinants during the gene silencing process called RNA interference (RNAi). It is well know that siRNA efficiency is crucial in the RNAi pathway, and the siRNA efficiency for targeting different sites of a specific gene varies greatly. Therefore, there is high demand for reliable siRNAs prediction tools and for the design methods able to pick up high silencing potential siRNAs.
In this paper, two systems have been established for the prediction of functional siRNAs: (1) a statistical model based on sequence information and (2) a machine learning model based on three features of siRNA sequences, namely binary description, thermodynamic profile and nucleotide composition. Both of the two methods show high performance on the two datasets we have constructed for training the model.
Both of the two methods studied in this paper emphasize the importance of sequence information for the prediction of functional siRNAs. The way of denoting a bio-sequence by binary system in mathematical language might be helpful in other analysis work associated with fixed-length bio-sequence.
RNA interference (RNAi) is a biological mechanism by which double stranded molecules inhibit gene expression by mediating sequence-specific mRNA degradation . The process starts when dsRNA molecules are degraded into short interfering RNA (siRNA) molecules, about 21–23 nucleotides in length, by the RNase enzyme Dicer. These siRNAs are subsequently incorporated into a silencing complex called RISC (RNA-induced silencing complex), which identifies and destroys complementary RNAs . RNAi is an evolutionally conserved mechanism for targeted repression of gene expression that has been developed into an experimental tool for silencing specific genes across systems [1, 2]. Gene silencing efficiency has varied greatly among siRNAs targeting at different positions of a specific gene or at different genes. Therefore, efficient predictive tools are in high demand for siRNA design.
Many efforts have been made trying to develop computational methods which can provide improved prediction of potentially functional siRNA [2–6]. These methods have been based on various parameters including sequence features, energy features, RNA secondary structure features, and so on . Each of these characteristics may affect RNAi efficiency. Scoring algorithms have been widely utilized since statistically significant nucleotide base preferences can easily be applied in the construction of scoring algorithms . Data mining methods have also been employed in siRNA prediction and shown promising performance .
In the field of machine learning, siRNA prediction can be considered as a typical pattern recognition or classification problem. Based on this consideration we developed two kinds of algorithms, both of which achieved high performence. The first one is a sequence based statistical model that has been successfully used in signal peptide prediction . In this paper, we applied this method to siRNA analysis and also obtained satisfactory prediction quality. In addition, we employed Vapnik's Support Vector Machine (SVM) as an alternative solution to this problem . SVM has many attractive characteristics, including over fitting avoidance, large feature spaces handling and key information extracting from a given data set. This approach has provided satisfactory performance for a wide variety of classification problems in bioinformatics areas including microarray data analysis , protein structure classification , signal peptide prediction  and protein subcellular localization identification  and so on. SVM has already been applied to predict the efficacy of short oligonucleotides in antisense and siRNAs by Satron and got good results [8, 15]. Unlike above methods, we use the SVM algorithm in a novel way by introducing a binary system to denote sequences of fixed-length. Besides the binary system, thermodynamic profile and nucleotide composition were also introduced to construct the vector space of SVM.
As we have known, to objectively assess a prediction method, a homogeneous and sufficiently large dataset is of high importance. It should also be very careful to combine datasets from different resources, because the efficiency of a siRNA changes variously under different biological and experimental conditions. Fortunately, the recently published Dieter's dataset from a high-throughput assay makes it possible to break the bottleneck of directly comparing different source datasets . Besides, in order to compare with the former work, we also use the dataset from Satron, which combines published siRNAs from several researches and has been used in Satron's research . To guarantee the unification of the dataset and to make the training processes of our algorithms objectively, we trained those two datasets separately and took the results from Dieter's dataset as the main evaluation measurement. Moreover, the Satron's dataset provides siRNAs with 19 nt in length while the Dieter's dataset with 21 nt in length, which also makes it impossible to train them together.
Three cut-off values have been used to generate positive and negative subsets from the Dieter's and Satron's datasets respectively. Six columns indicating six combinations of positive and negative subsets are listed with the number of siRNAs in each subset.
Number of siRNAs in the positive dataset
Number of siRNAs in the negative dataset
The self-consistency and jackknife results for the sequence-based method trained by the six combinations listed in table 1.
The jackknife results for the method of support vector machine trained by the six combinations listed in table 1. Three attributes have been defined, namely binary system (denoted by "A" in the table), thermodynamic profile ("B" in the table) and composition ("C" in the table). Seven combinations of the attributes are put forward, which are A+B+C (means "binary, thermodynamic and composition"), A+B (means "binary and thermodynamic"), B+C (means "thermodynamic and composition"), A+C (means "thermodynamic and composition"), A (means "binary only"), B (means "thermodynamic only") and C (means "composition"). The self-consistency and jackknife test are executed in all the seven vector space respectively to compare the contribution from each of the three attributes. To save space, here we just listed the results of jackknife test. This table lists results of Dieter's dataset. See table 4 for Satron's dataset. Self-consistency results have been placed in the supplemental file (see additional file 3) Dieter's dataset, jackknife test:
Everything is the same with table 3 except that the dataset is from Satron's work. Satron's Dataset, Jackknife test:
Those results in table 2, 3 and 4 clearly show that both of the methods got high performance on Dieter's dataset. During the process of jackknife test, as for the sequence-based statistical model, the accuracy on Dieter's dataset are 89.35%, 89.47% and 89.43% for cut-off value 0.7, 0.6 and 0.5, while on Satron's dataset are 70.94%, 70.59%, 70.23% for cut-off value 0.7, 0.6 and 0.5 (table 2). Also the process of jackknife test, as for SVM, when all of the three attributes are used, the accuracy on Dieter's dataset achieved 94.78%, 94.65% and 94.65% for cut-off value 0.7, 0.6 and 0.5, respectively (table 3), while on Satron's dataset the accuracy are 78.07%, 71.66% and 72.55% for cut-off value 0.7, 0.6 and 0.5 (table 4). Obviously, the method of SVM performed better than the sequence-based statistical model, and both of the two methods performed better on Dieter's dataset than Satron's dataset. For the three kinds of attributes, namely binary, thermodynamic and composition, the highest accuracy achieved when only the binary attribute is used for SVM training processes against Dieter's dataset during the jackknife test with any of the cut-off value. For Satron's dataset, the highest accuracy appeared when "binary and composition", or "thermodynamic and composition", or "binary, composition and thermodynamic" are used during the jackknife process.
From table 3 and table 4 we can also see that the values of sensitivity and specificity differ from each other greatly during SVM training processes tested on Satron's dataset with cut-off value of 0.5, 0.6 or 0.7 and on Dieter's dataset with cut-off value of 0.5 or 0.7. For example, for Satron's dataset when the cut-off value 0.7 was used, we got a positive subset containing 141 siRNAs and a negative subset containing 420 siRNAs. We use these datasets as the input for jackknife test by SVM and got the sensitivity of 21.99% and the specificity of 96.09%. For the six combinations listed in table 1, only the one generated by Dieter's dataset using cut-off value of 0.6 has no this problem.
Compare with Dieter's results. Using the same training and testing dataset, both of the two methods have been applied to compute the pearson correlation coefficient. Also, cut off value should be specified as 0.5, 0.6 or 0.7. Here we just showed the result when cut off value is 0.6. The results when cut-off value is 0.5 or 0.7 are detailed in supplemental file (see additional file 4), which also shows the accuracy, sensitivity, specificity and ROC for the two methods. For more info about Dieter's work or the explanation about the datasets used by them, please consult . SVM, cut-off 0.6
Random all (1091)
Random all (727)
Random all (545)
Random all (218)
SeqSta, cut-off 0.6
Comparing the three attributes in the vector space for SVM training
During the training processes executed by SVM, we constructed three kinds of attributes which are the binary representation, the thermodynamic profile and the nucleotide composition of the sequence. All the seven combinations of the three attributes have been chosen as the input of SVM training machine to find their contributions and all the results are listed in supplemental file (see additional file 3). From the results in table 3 and table 4, we can definitely come to the conclusion that the binary representation system plays the most important role. For both of the datasets, the accuracy of the prediction will be improved greatly whenever the binary system has been added. Take Dieter's dataset for example, the four attribute combinations, all of which contain the binary system, have their accuracy higher than 90%, with about 10~25% improvement comparing with that of the rest three combinations. Neither the thermodynamic profile nor the nucleotide composition can provide such an obvious enhancement. We refer this phenomenon to the fact that the binary representation, though not indicate any biological or chemistry property of the sequence, might carry the sequence speciality such as sequence order, base preferences at certain sites, etc. Previous studies have proved that effective siRNAs show base preferences at positions 3, 10, 13 and 19 of the sense strand . Other sequence characteristics have been also noted by Kumiko Ui-Tei et al . The high correlation coefficients in our research emphasized the fact that the sequence of a potential siRNA oligo is intimately correlated with its function.
Actually, what we discussed here is ubiquitous in the field of bio-sequence based function prediction – the problem is how to describe a bio-sequence in a suitable mathematical language. The binary system performs well in this problem of siRNA prediction. It also works well in other machine learning areas in bioinformatics, such as prediction work about protein signal sequences and their cleavage sites . This kind of binary system can be used to represent qualitative concepts such as season, blood group, etc. Generally speaking, when a variable has n types, the binary system use n-dimension vector to denote each one of the n type, with the value of the i th dimension equals 1 and all other dimensions equals 0 for the i th type. We suggest this binary model might be used in other sequence based prediction works.
The nucleotide composition, including single nucleotide and di-nucleotide compositions, indicating sequence profile at a certain level thus also has its special role in training process. Also take Dieter's dataset for example, when the input attributes are "Thermodynamic and Composition", "Thermodynamic only" or "Composition only", the accuracy are 85.97%, 78.69% or 81.65% respectively for cut-off value as 0.7, 83.83%, 76.31% or 80.09% for 0.6, and 83.67%, 77.46% or 79.47% for 0.5 (see table 3). Briefly speaking, the attribute of composition profile provides about 6~7% enhancement for the prediction, without considering the possibly weakening or enhancing interaction between two attributes. However, it should be noted that composition profile is not sufficient enough since two oligos having the same nucleotide composition might differ greatly in sequence order. Thus we proposed to use this attribute together with other attributes to provide enough information for mapping a siRNA oligo onto the vector space.
As has been proved by many experimental researches, the thermodynamic profile of a siRNA plays important role in the RNA interference mechanism . That is why we take the thermodynamic profile as the input of SVM machine. The results from our work are consistent with the previous work. When the "thermodynamic only" attribute was provided as the input for SVM training processes, the accuracy during jackknife test achieved 74.87%, 68.27% and 63.99% for Straon's dataset with cut-off value as 0.7, 0.6 and 0.5, while the corresponding value are 78.69%, 76.31% and 77.46% for Dieter's dataset. This sufficiently showed the importance of the thermodynamic character during the RNA interference process. However, considering all the seven combinations of the three attributes, we suggest it is better to put them together as the input of SVM machine.
To avoid redundancy between the three attributes, we calculated the Pearson correlation coefficiency. For Dieter's dataset, the correlation coefficiency between the nucleotide composition and the thermodynamic profile is 1.381E-4, the nucleotide composition and binary system is 1.132E-5, and the thermodynamic profile and binary system is 3.092E-4. The low correlation between these attributes indicates that it is proper to combine them together for prediction.
Balancing the biased dataset in SVM training
Randomly choose a subset from the larger dataset until the subset has the same number of records as the smaller dataset;
Repeat step 1 for ten times to construct ten combinations of this "sub-larger dataset + whole smaller dataset". Make sure that these combinations cover at least 99% of the larger dataset.
Training the ten combinations by SVM in the seven vector spaces one by one.
Take the average result of the ten combinations as the over all result.
Take Satron's dataset for example, when cut-off value of 0.7 is used, the positive dataset has 141 records while the negative has 420 records. We randomly choose 141 records from the negative dataset for ten times to construct ten subsets. Each of these ten subsets will be trained with the whole positive dataset by SVM in the seven vector spaces. The work of randomly chosen is executed by JAVA program. Only the case when the dataset is Dieter's and cut-off value is 0.6 did not need this randomly chosen scheme. The randomly chosen data of the other five subsets has been supplied in the supplemental material (see additional file 1).
By randomly choosing 10 subsets from each of the 5 datasets whose positive subset departs from the negative one greatly, we try to alleviate the bias between the value of sensitivity and specificity. Here we show the average results tested on Satron's dataset with cut-off value as 0.6. All the sub-datasets and results have been supplied in the additional file (see additional file 1 and additional file 3).
68.43 ± 1.45%
67.13 ± 1.95%
65.37 ± 2.19%
68.09 ± 2.14%
68.76 ± 2.59%
61.55 ± 2.56%
63.90 ± 2.52%
69.05 ± 1.20%
69.78 ± 2.98%
64.33 ± 3.41%
67.25 ± 3.19%
69.72 ± 3.63%
63.93 ± 2.96%
58.03 ± 3.63%
67.81 ± 3.07%
64.49 ± 2.21%
66.4 ± 3.36%
68.93 ± 2.53%
67.81 ± 2.68%
59.16 ± 3.63%
69.78 ± 4.32%
0.4683 ± 0.03219
0.4618 ± 0.03205
0.4112 ± 0.03306
0.4447 ± 0.03097
0.4718 ± 0.03558
0.3758 ± 0.02486
0.3196 ± 0.02920
0.7452 ± 0.02383
0.7255 ± 0.02886
0.7093 ± 0.02450
0.7367 ± 0.02454
0.7411 ± 0.02933
0.6768 ± 0.02220
0.6715 ± 0.02659
The methods are robust for different cut-off values
The cut-off value for siRNA inhibitory activity might be various according to the requests from different experimenter or different experimental intention. Thus we applied three kinds of cut-off values to construct the positive and negative dataset for separate training. For the method of sequence-based statistical model, the prediction results various little under different cases of cut-off values. The accuracy of the sequence-based statistical model is 70.94%, 70.59% and 70.23% for Satron's dataset by cut-off value of 0.7, 0.6 and 0.5 while for Dieter's dataset the corresponding accuracy is 89.35%, 89.47% and 89.43% (table 2). For the method of SVM executed in the space of "binary, thermodynamic and composition", the accuracy in jackknife test on Dieter's dataset is 94.78%, 94.65% and 94.65% for cut-off value of 0.7, 0.6 and 0.5, while the corresponding value on Dieter's dataset is 78.07%, 71.66% and 72.55%, respectively. The little differences from these results show that the three cut-off values affect little on the performance of the two methods we presented in this paper.
SVM performs better than the sequence-based statistical model
From the results listed in table 2, 3 and 4, we can also see that the SVM model trained in the vector space of "binary, thermodynamic and composition" performs better than the sequence-based model, without discounting the latter (see additional file 2 and 3). From table 5, the high correlation coefficient serves as a strong demonstration of the utility of the sequence-based model and the ability of providing high accuracy than the artificial neural network constructed by Dieter et al .
We applied the sequence-based statistical model and support vector machine to the identification of functional siRNA. We constructed three kinds of attribute, namely the binary representation, the thermodynamic profile and the nucleotide composition of a sequence to build the vector space of SVM training machine. Both of the two methods achieved high performance and showed their potential ability to predict efficient siRNAs. We also put forward a procedure to reduce high false positive or false negative values in the situation when the number of records differs greatly between the positive dataset and the negative dataset.
We chose two datasets for the training of the prediction system. One is Satron's dataset which has been collected from several published experimental reports . The original dataset contains 581 records related with 40 target genes. 8 of the 581 siRNA oligo targeting at least two different genes are deleted from the training dataset. Another four records containing mismatched nucleotides are also deleted. Therefore, the "Satron's dataset" in this research contains 561 records correlated with 40 target genes. Another dataset originated from a recently published high-throughput screening work conducted by Dieter et al . This dataset contains 2431 siRNAs covering 34 target genes. Since the high-throughput dataset is sufficiently large and homogeneous for training, we took the Dieter's dataset as the main assessing dataset. Three cut-off values, 0.5, 0.6 and 0.7, are used to separate the whole dataset into a positive one and a negative one (see additional file 1).
The sequence-based model 
A siRNA oligo nucleotide with 19 or 21 bases can be present as: R1R2R3...R i ...R19 or R1R2R3...R i ...R19R20R21, where R i is the nucleotide in the i th site. We will take the 19-nt oligos for convenience. All the formulas below will be applicable to the 21-nt oligos. In our research, we suppose that the nucleotide at each site can be treated as an independent element, such that there is no coupling among these sub-sites, then the attribute of the functional and non-functional siRNAs can be formulated, respectively, as:
where (R i ) is the probability of nucleotide R i at the sub-site i (i = 1,2,...,19) for the functional siRNAs, while (R i ) the corresponding probability for the non-functional siRNAs. The both values of (R i ) and (R i ) can be derived from a positive and negative training datasets, respectively. The superscript "+" or "-" of ψ indicates the attribute quality of the dataset as positive or negative, respectively. The subscript 0 of ψ indicates that the attribute function is formed by independent probabilities in which there was no coupling effect among subsites. Based on this approach, prediction can be performed: if ψ+ > ψ-, then the target siRNA is deemed to be a functional siRNA; if ψ+ <ψ-, then a non-functional one. We constructed a discriminant function given by:
Δ(R1...R5R6R7R8R9...R19) = ω+(R1...R5R6R7R8R9...R19) - ω-(R1...R5R6R7R8R9...R19) (eq2)
where ω+ and ω- are the weight factors for the attribute functions derived from the positive and negative training dataset, respectively. Except in special cases, these values are generally set to 1. (ω+ = ω- = 1). Thus, the criterion of a functional siRNA prediction for a given RNA oligo can be formulated as follows: The RNA oligo is functional if Δ > 0, and it is non-functional if Δ ≤ 0.
Generally speaking, if the coupling effects of the μ (μ = 1,2,3...) of the closet neighboring nucleotide need to be considered, then eq. 1 should be modified according to the μ th-order Markov chain theory and the attribute function ψ0 should be replaced by ψ μ . In this research, we do not yet consider the coupling effects.
Support Vector Machine
The support vector machine has been widely used in many areas for its attractive features. It is a kind of machine learning methodology based on statistical learning theory. Briefly speaking, the SVM defines a feature space (often with a higher dimension) and map the input vectors into this space. In the feature space, SVM tries to classify the data points into two classes by seeking an optimized classifier called hyperplane. It has also been applied in the field of multiple classification problems recently. In this paper, we use it to classify functional siRNAs from non-functional siRNAs. To learn more about the mechanism of SVM, readers can consult a series of previous studies for further explanation on the procedure of how to use the support vector machine [10, 12–14, 19, 20].
To construct the vector space, we introduced three kinds of attributes, representing three aspects of a siRNA oligo. The first one is the binary system by which the 5 nucleotides of a s, t s, u s, c s and g s are coded by 4-D vectors composed of only 0 and 1 (adenine = 0001, uracil or thymine = 0010, cytosine = 0100, guanine = 1000). Then a 19-nt siRNA is represented by a 76-dimension vector in the SVM input or a 21-nt siRNA can be represented by an 84-dimension vector. The second attribute describes the thermodynamic profile of the siRNAs. The nearest neighbour model was introduced to calculate the pentamer sub-sequences of the siRNA oligos under examination to reflect its internal stability values . For the terminal four bases of the 3'end of the oligos, the targeting sequences would be extended for calculation. If extension is impossible (for example when the targeting sites on the targeted gene is before the third base of the gene), the thermodynamic value of the terminal four bases are set to be zero. The third attribute defines the composition features of siRNA sequences including 4 single nucleotide and 16 di-nucleotide compositions.
The parameters for the training process by SVM were set to be default values. The software used to implement SVM was SVM_light by Joachims . The training process was carried out on the computer we used (Dell OptiPlex GX270 computer with an Intel Pentium4 2.80 GHz CPU).
To objectively assess our prediction system, we employed the following measurements for sensitivity, specificity, accuracy and receiver operating characteristics (ROC):
TP, true positive; TN: true negative; FP, false positive; FN, false negative.
We would like to express our gratitude to Lin Li for valuable suggestions. We thank Pål Saatrom for supplying the dataset. We also thank Xiaojing Yu, Changzheng Dong, Guohui Ding and Wu Wei for valuable discussions during the study. The State Key Program of Basic Research of China grants 001CB510209, 2002CB713807 and 2003CB715901, and National Natural Science Foundation of China grant 90408010 supported this project.
- Mello CC, Jr DC: Revealing the world of RNA interference. Nature 2004, 431: 338–341. 10.1038/nature02872View ArticlePubMedGoogle Scholar
- Hannon GJ, Rossi JJ: Unlocking the potential of the human genome with RNA interference. Nature 2004, 431: 371–378. 10.1038/nature02870View ArticlePubMedGoogle Scholar
- Wang L, Mu FY: A Web-based design center for vector-based siRNA and siRNA cassette. Bioinformatics 2004, 20(11):1818–1820. 10.1093/bioinformatics/bth164View ArticlePubMedGoogle Scholar
- Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, khvorova A: Rational siRNA design for RNA interference. Nature Biotechnology 2004, 22: 326–330. 10.1038/nbt936View ArticlePubMedGoogle Scholar
- Ui-Tei K, Naito Y, Takahashi F, Haraguchi T, Ohki-Hamazaki H, Juni A, Ueda R, Saigo K: Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. Nucl Acids Res 2004, 32(3):936–948. 10.1093/nar/gkh247PubMed CentralView ArticlePubMedGoogle Scholar
- Henschel A, Buchholz F, Habermann B: DEQOR: a web-based tool for the design and quality control of siRNAs. Nucl Acids Res 2004, 32(suppl_2):W113–120.PubMed CentralView ArticlePubMedGoogle Scholar
- Satrom P, Snove JO: A comparison of siRNA efficacy predictors. Biochemical and Biophysical Research Communications 2004, 321(1):247–253. 10.1016/j.bbrc.2004.06.116View ArticleGoogle Scholar
- Saetrom P: Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming. Bioinformatics 2004, 20(17):3055–3063. 10.1093/bioinformatics/bth364View ArticlePubMedGoogle Scholar
- Chou KC: Using subsite coupling to predict signal peptides. Protein Engineering 2001, 14(2):75–79. 10.1093/protein/14.2.75View ArticlePubMedGoogle Scholar
- Vapnik: The Nature of Statistical Learning Theory. In Springer. New York ; 1995.Google Scholar
- Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Jr., Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. PNAS 2000, 97(1):262–267. 10.1073/pnas.97.1.262PubMed CentralView ArticlePubMedGoogle Scholar
- Cai YD, Liu XJ, Xu X, Zhou GP: Support Vector Machines for predicting protein structural class. BMC Bioinformatics 2001, 2(1):3. 10.1186/1471-2105-2-3PubMed CentralView ArticlePubMedGoogle Scholar
- Cai YD, Lin S, Chou KC: Support vector machines for prediction of protein signal sequences and their cleavage sites. Peptides 2003, 24(1):159–161. 10.1016/S0196-9781(02)00289-9View ArticlePubMedGoogle Scholar
- Cai YD, Liu XJ, Xu XB, Chou KC: Support Vector Machines for Prediction of Protein Subcellular Location. Molecular Cell Biology Research Communication 2000, 4: 230–233. 10.1006/mcbr.2001.0285View ArticleGoogle Scholar
- Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Meloon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt F, Hall J: Design of a genome-wide siRNA library using an artificial neural network. Nat Biotech 2005, 23(8):995–1001. 10.1038/nbt1118View ArticleGoogle Scholar
- Khvorova A, Reynolds A, Jayasena SD: Functional siRNAs and miRNAs Exhibit Strand Bias. Cell 2003, 115(2):209–216. 10.1016/S0092-8674(03)00801-8View ArticlePubMedGoogle Scholar
- Chou KC: Predicting of Protein Signal Sequences and Their Cleavage Sites. PROTEINS: Structure, Function, and Genetics 2001, 42: 136–139. Publisher Full Text 10.1002/1097-0134(20010101)42:1%3C;136::AID-PROT130%3E;3.0.CO;2-FView ArticleGoogle Scholar
- Cai YD, Liu XJ, Xu X, Chou KC: Support vector machines for predicting HIV protease cleavage sites in protein. Journal of Computational Chemistry 2001, 23(2):267–274. 10.1002/jcc.10017View ArticleGoogle Scholar
- Hua S, Sun Z: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001, 17(8):721–728. 10.1093/bioinformatics/17.8.721View ArticlePubMedGoogle Scholar
- M.FREIER SUSAN, KIERZEK RYSZARD, A.JAEGER JOHN, SUGIMOTO NAOKI, H.CARUTHERS MARVIN, NEILSON THOMAS, H.TURNER DOUGLAS: Improved free-eneregy parameters for predictions of RNA duplex stability. PNAS 1986, 83: 9373–9377.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.