 Methodology Article
 Open Access
A semi–supervised tensor regression model for siRNA efficacy prediction
 Bui Ngoc Thang^{1, 2}Email author,
 Tu Bao Ho^{1, 3} and
 Tatsuo Kanda^{4}
https://doi.org/10.1186/s1285901504952
© Thang et al.; licensee BioMed Central. 2015
 Received: 21 August 2014
 Accepted: 10 February 2015
 Published: 13 March 2015
Abstract
Background
Short interfering RNAs (siRNAs) can knockdown target genes and thus have an immense impact on biology and pharmacy research. The key question of which siRNAs have high knockdown ability in siRNA research remains challenging as current known results are still far from expectation.
Results
This work aims to develop a generic framework to enhance siRNA knockdown efficacy prediction. The key idea is first to enrich siRNA sequences by incorporating them with rules found for designing effective siRNAs and representing them as enriched matrices, then to employ the bilinear tensor regression to predict knockdown efficacy of those matrices. Experiments show that the proposed method achieves better results than existing models in most cases.
Conclusions
Our model not only provides a suitable siRNA representation but also can predict siRNA efficacy more accurate and stable than most of state–of–the–art models. Source codes are freely available on the web at: http://www.jaist.ac.jp/\~bao/BiLTR/.
Keywords
 RNAi
 siRNA
 siRNA design rule
 Tensor
 Bilinear tensor regression
 Semi–supervised learning
Background
RNA interference (RNAi) is a cellular process in which RNA molecules inhibit gene expressions, typically by causing the destruction of mRNA molecules. Long double stranded RNA duplex or hairpin precursors are cleaved into short interfering RNAs (siRNAs) by the ribonuclease III enzyme Dicer. The siRNAs are sequences of 19–23 nucleotides (nt) in length with 2 nt overhangs at the 3 ^{′} ends. Guided by RNA induced silencing complex (RISC), siRNAs bind to their complementary target mRNAs and induce their degradation.
In 2006, Fire and Mello received the Nobel Prize for their contributions to research on RNA interference (RNAi). Their work and those of others on discovery of RNAi have had an immense impact on biomedical research and will most likely lead to novel medical applications [16]. In RNAi research, highly effective siRNAs can be synthesized to design novel drugs for viralmediated diseases such as influenza A virus, HIV, hepatitis B virus, RSV viruses, cancer disease and so on. As a result, siRNA silencing is considered one of the most promising techniques in future therapy and predicting their inhibition efficiency is crucial for proper siRNA selection. Therefore finding the most effective siRNAs constitutes a huge challenge facing researchers [714]. Numerous algorithms have been developed to design and predict effective siRNAs. These algorithms could be divided into two following generations [1517].
The first generation consists of siRNA design rule–based tools that were developed through the analysis of small datasets. Various siRNA design rules have been found by empirical processes since 1998. The first rational siRNA design rule was detected by Elbashir et al. [18]. They suggested that siRNAs having 19–21 nt in length with 2 nt overhangs at the 3 ^{′} ends can efficiently silence mRNAs. Scherer et al. [19] reported that the thermodynamic properties to target specific mRNAs are important characteristics. Soon after these studies, many rational design rules for effective siRNAs have been proposed [2026]. For example, Reynolds et al. [22] analyzed 180 siRNAs systematically, targeting every other position of two 197 −base regions of luciferase and human cyclophilin B mRNA (90 siRNAs per gene), and found the following eight criteria for improving siRNA selection: (i) G/C content 30 −52%, (ii) at least 3 As or Us at positions 15 −19, (iii) absence of internal repeats, (iv) an A at position 19, (v) an A at position 3, (vi) an U at position 10, (vii) a base other than G or C at position 19, (viii) a base other than G at position 13.
However, the performance of tools in the first generation was not high enough to our satisfaction. About 65% of siRNAs produced by the abovementioned design rules have failed when experimentally tested, says, they were 90% in inhibition and nearly 20% of them were found to be inactive [27]. One reason is that the previous empirical analyses were only based on small datasets and focused on siRNAs for specific genes. Therefore, each of these rules is poor to individually design highly effective siRNAs.
The second generation consists of predictive models by employing machine learning techniques that were learned through larger datasets. Tools based on these models in this generation are more accurate and reliable than tools in the first one [28]. In particular, Huesken and colleagues [29] developed a new algorithm, Biopredsi, by applying artificial neural networks to a dataset consisting of 2431 scored siRNAs (i.e., siRNAs whose knockdown efficacy (score) was experimentally observed). This dataset was widely used to train and test other predictive models such as the ThermoComposition21 [28], DSIR [7], i–Score [15] and Scales models [30]. The five above mentioned models are currently estimated as the best predictors [16,30]. Most notably, Qui et al. [31] used multiple support vector regression with RNA string kernel for siRNA efficacy prediction, and Sciabola et al. [17] applied threedimension structural information of siRNA to increase predictability of their regression model. Alternatively, several works [32,33] used classification methods on labeled siRNAs which were experimentally labeled in terms of knockdown efficacy.
It is worth noting that most of those methods suffer from some drawbacks. Their performance is still slow and unstable. It can be caused by the following reasons: (i) siRNAs datasets are heterogeneous provided by different groups under different protocols in different scenarios [33,34]. Thus the performance of these models is considerably decreased and changed when they were tested on independent datasets such as the performance of 18 current models tested on three independent datasets [17]. (ii) The performance of machine learning methods also heavily depends on the choice of data representation (or features) on which they are applied. In the previous models, siRNAs were encoded by binary, spectral, tetrahedron, and sequence representations. However, because of siRNA distribution diversity and unsuitable measures based on these siRNA representations, they can be inappropriate to represent siRNAs in order to build a good model for predicting siRNA efficacy.
 1.
Construct a suitable representation of siRNAs, enriched matrix representation, by incorporating available siRNA design rules and employing both of labeled and scored siRNAs.
 2.
Develop a higher and stable predictive method to predict the siRNA efficacy by building the bilinear tensor regression model. The learning processes of transformation matrices and parameters of the model are combined together to make more accurate and precise siRNA representation. Labeled siRNAs are used to supervise the learning process of parameters.
 3.
Quantitatively determine positions on siRNAs where nucleotides can strongly influence inhibition ability of siRNAs.
 4.
Provide guidelines based on positional features for generating highly effective siRNAs.
We developed a bilinear tensor regression predictor, BiLTR, by using C++ programming language on X–Code environment. BiLTR is experimentally compared with published models on the Huesken dataset and three independent datasets commonly used by the research community. The results show that the performance of the BiLTR predictor is more stable and higher than that of other models.
Results
This section presents experimental evaluation by comparing the proposed method of bilinear tensor regression model (BiLTR) with the most recent reported methods for siRNA knockdown efficacy prediction on commonly datasets.

The Huesken dataset of 2431 siRNA sequences targeting 34 human and rodent mRNAs, commonly divided into the training set HU_train of 2182 siRNAs and the testing set HU_test of 249 siRNAs [29].

The Reynolds dataset of 240 siRNAs [22].

The Vicker dataset of 76 siRNA sequences targeting two genes [35].

The Harborth dataset of 44 siRNA sequences targeting one gene [36].
To construct siRNA representation and learn BiLTR model, we employed labeled and scored siRNA datasets as well as seven siRNA design rules. The seven design rules used to enrich representation of siRNAs are Reynolds rule, Uitei rule, Amarzguioui rule, Jalag rule, Hsieh rule, Takasaki rule and Huesken rule [2023,29,37,38]. To capture the natural clustering and the diversity properties of siRNAs, and also supervise the parameter learning process, the labeled siRNAs were collected from the siRecords database [27] consisting of siRNAs classified into 4 classes: ‘very high’, ‘high’, ‘medium’, and ‘low’ knockdown efficacy. This database is an extensive one of mammalian RNAi experiments with consistent efficacy ratings. siRecords consists of the records of all kinds of siRNA experiments conducted with various laboratory techniques and experimental settings. In our work, sense siRNAs of 19 nucleotides in length were collected. After removing duplicative siRNAs, ‘very high’ and ‘medium’ and ‘low’ siRNAs were used (to improve the balance between classes while keeping the separation between them, ‘medium’ and ‘low’ siRNAs were merged into one class, denoted by ‘low’). As a result, there are 2470 labeled siRNAs in the ‘very high’ class and 2514 labeled siRNAs in the ‘low’ class. Scored siRNAs in the Huesken dataset were also used to learn BiLTR model.
The fitted turning parameters of objective function 10 in 10 times of 10–fold cross validation
λ _{ 1 }  λ _{ 2 }  λ _{ 3 } 

0.00995033  0.000119984  1.03 
0.00995033  0.000119984  1.02 
0.00995033  0.000119993  1.03 
0.00995033  0.000119993  1.03 
0.0198026  0.000119993  1.03 
0.0198026  9.9995e05  1.03 
0.00995033  0.00013999  1.03 
0.00995033  0.000179984  1.03 
0.00995033  0.000179984  1.03 
0.00995033  0.000179984  0.92 
After finding turning parameters, the final model, BiLTR, is learned by using all of the labeled siRNAs, the siRNA design rules, and the scored siRNA training set.
 1.Comparison of BiLTR with Multiple Kernel Support Vector Machine proposed by [31]. The authors reported their Pearson correlation coefficient (R) of 0.62 obtained by 10–fold cross validation on the whole Huesken dataset. The Pearson correlation coefficient (R) is carefully evaluated by BiLTR by 10 times of 10fold cross validation with the average value of 0.64 (Table 2). Concerning the standard deviation (SD) of error rates between predicted and target labels, the SD of our model is 0.23, however Qui and coworkers [31] did not show.Table 2
The R values and standard deviations of models on the the whole Huesken dataset and HU_test dataset
Algorithm
Huesken dataset
HU_test
(2431 siRNAs)
(249 siRNAs)
Qui’s method
0.62 (–)
–
BIOPREDsi
–
0.66 (0.216)
Thermocomposition21
–
0.66 (0.216)
DSIR
–
0.67 (0.161)
SVM
–
0.80 (–)
BiLTR
0.64 (0.23)
0.67 (0.164)
 2.
Comparison of BiLTR with BIOPREDsi [29], Thermocomposition21 [28], DSIR [7], and SVM [17] when trained on the same scored siRNA dataset, HU_train and tested on the HU_test dataset. The R values of those four models are 0.66, 0.66, 0.67 and 0.80, respectively. The SD values of the first three models are 0.216, 0.216, and 0.161, respectively. However, SD value of the SVM model was not shown. The R value of BiLTR estimated on the HU_test set is 0.67 that is equivalent to the R value of DSIR model, slightly higher than that of the first two models but lower than that of the last model (Table 2). The SD value of the BiLTR model is 0.164 that is similar to the SD value of the DSIR model and higher than that of first two models as well. It can be observed that the performance of SVM is significantly better than that of BiLTR in Table 2.One reason comes from the current limitation of BiLTR as it employs positional features of available design rules but not other characteristics such as GC content, thermodynamic properties, GC stretch, and 3D information while SVM employs positional features and 3D information. This feature captures the flexibility and strain of siRNAs that can be important characteristics for siRNAs of the HU_test set extracted from human NCI–H1299, Hela genes and rodent genes [29]. Therefore, at this moment the performance of the BiLTR model is similar to that of BIOPREDsi, Thermocomposition21, DISR models but cannot achieve higher performance than the SVM model [17] when tested on the HU_test set.
 3.Comparison of BiLTR with 18 models including BIOPREDsi, DSIR, SVM when all of models were trained on the HU_train set and tested on three independent datasets of Reynolds, Vicker and Harborth as reported in the recent article [17]. We also computed SD values of error rates between predicted and experimental variables. However, we lack of standard deviations of some models, especially that of the SVM model, because their models’ predicted labels were not shown in their publication. As a result, the BiLTR considerably achieved results higher than all of 18 methods on the all three independent testing datasets as shown in Table 3 (taken from [17] with the last row added for the BiLTR result). The lower performance of SVM than BiLTR in Table 1 can be explained as the added 3D information in SVM does not make it better than BiLTR, especially when testing data are more independent from the Huesken dataset. The lower performance of SVM than BiLTR in Table 3 can be viewed as the added 3D information in SVM does not always make it better than BiLTR, especially when testing data are more independent from the Huesken dataset. Besides that, unlike most of other models, the BiLTR model produces the stable results across each of independent siRNA datasets.Table 3
The R values and standard deviations of 18 models and BiLTR on three independent datasets
Algorithm
R ^{ Reynolds }
R ^{ Vicker }
R ^{ Harborth }
(244si/7 g)
(76si/2 g)
(44si/1 g)
GPboot [39]
0.55 (–)
0.35 (–)
0.43 (–)
Uitei [23]
0.47 (–)
0.58 (–)
0.31 (–)
Amarzguioui [20]
0.45 (0.30)
0.47 (0.23)
0.34 (012)
Hsieh [37]
0.03 (0.31)
0.15 (0.23)
0.17 (0.12)
Takasaki [40]
0.03 (0.3)
0.25 (0.23)
0.01 (0.14)
Reynolds 1 [22]
0.35 (0.3)
0.47 (0.224)
0.23 (0.12)
Reynolds 2 [22]
0.37 (0.291)
0.44 (0.232)
0.23 (0.12)
Schawarz [24]
0.29 (–)
0.35 (–)
0.01 (–)
Khvorova [41]
0.15 (–)
0.19 (–)
0.11 (–)
Stockholm 1 [42]
0.05 (–)
0.18 (–)
0.28 (–)
Stockholm 2 [42]
0.00 (–)
0.15 (–)
0.41 (–)
Tree [42]
0.11 (–)
0.43 (–)
0.06 (–)
Luo [43]
0.33 (–)
0.27 (–)
0.40 (–)
iscore[15]
0.54 (0.262)
0.58 (0.19)
0.43 (0.12)
BIOPREDsi [29]
0.53 (0.31)
0.57 (0.23)
0.51 (0.12)
DSIR [7]
0.54 (0.26)
0.49 (0.21)
0.51 (0.11)
Katoh [44]
0.40 (0.34)
0.43 (0.23)
0.44 (0.15)
SVM [17]
0.54 (–)
0.52 (–)
0.54 (–)
BiLTR
0.57 (0.25)
0.58 (0.19)
0.57 (0.10)
In these comparative studies, it was found that the performance of BiLTR is more stable and higher than that of other models. The first reason is that previous siRNA representations can be unsuitable to represent siRNAs provided different groups under different protocols. In our method, the representation is enriched by incorporating background knowledge of siRNA design rules and learned by employing heterogeneous labeled siRNAs. By combining the representation and parameter learning processes together. Therefore it can capture the distribution diversity of siRNA data. The second reason is that using labeled siRNAs in different distributions to learn our model, BiLTR model can predict more accurate knockdown efficacy of siRNAs.
Discussion
In this section, we discuss more detail about three main issues: the performance of BiLTR model, the importance of learned transformation matrices and the effect of nucleotide design at particular positions on siRNAs.
Concerning the first issue, as presented in the experimental comparative evaluation, BiLTR achieved better results than most other methods in predicting siRNA knockdown efficacy. There are some reasons for that. First, it is expensive to experimentally analyze the knockdown efficacy of siRNAs, and thus most of available datasets have relatively small size leading to limited results. Second, BiLTR has its advantages by incorporating domain knowledge (siRNA design rules) experimentally found from different datasets. Third, BiLTR is generic and can be easily exploited when new design rules are discovered, or more scored or labeled siRNAs are obtained. As a result, when tested on the three independent datasets generated by different empirical experiments, the performance of BiLTR is better than that of the four above models. Additionally, some models achieve the best results as the BiLTR model when tested on the Vicker dataset (e.g., iscore, Uitei models) but none of them simultaneously reaches the highest result as BiLTR when tested on the three independent datasets (Table 3).
Characteristics of Reynolds rule
Position  3  10  13  19 

Effective  A  U  A/C/G  A/U 
Conclusion
In this paper, we have proposed a novel method to predict the knockdown efficacy of siRNA sequences by using both labeled and scored datasets as well as available design rules to transform the siRNAs into enriched matrices, then learn a bilinear tensor regression model for the prediction purpose. Besides that, in the model an appropriate siRNA representation is also developed to represent siRNAs belonging to different distributions that are provided by research groups under different protocols.
The experimental comparative evaluation on commonly used datasets with standard evaluation procedure in different contexts shows that the proposed method achieved better results than most existing methods in doing the same task. One significant feature of the proposed method is it can easily be extended when new design rules are discovered as well as more siRNAs are analyzed by empirical processes. By analyzing BiLTR model, we provide guidelines to generate effective siRNAs, and detect positions on siRNAs where nucleotides can strongly effect the inhibition ability.
Methods

Given: Two sets of labeled and scored siRNAs of length n, and a set of K siRNA design rules.

Find: A function that predicts the knockdown efficacy of given siRNAs.
Method for siRNA knockdown efficacy prediction
1  To encode each siRNA sequence as an encoding matrix Xrepresenting the nucleotides A, C, G, and U at n positions in the sequence. Thus, siRNAs are represented as n×4 encoding matrices. 
2  To transform encoding matrices by K transformation matrices T _{ k } into enriched matrices, k=1,…,K. Each transformation matrix characterizes the knockdown ability of nucleotides A, C, G, and U at n positions in the siRNA sequence regarding the kth design rule. Each T _{ k } captures background knowledge of the kth design rule. The enriched matrices of size K×n are considered as second order tensors of the siRNA sequences. 
3  To build and learn a bilinear tensor regression model. In this step, K transformation matrices as wellas parameters of the model are learned together with the labeled and scored siRNAs and available siRNA design rules. The final model is used to predict the efficacy of new siRNAs. 
Step 1 of the method is done where each siRNA sequence with n nucleotides in length is encoded as a binary encoding matrix of size n×4. In fact, four nucleotides A, C, G, or U are encoded by encoding vectors (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0) and (0, 0, 0, 1), respectively. If a nucleotide from A, C, G, and U appears at the jth position in a siRNA sequence, j = 1,…,n, its encoding vector will be used to encode the jth row of the encoding matrix.
An example of incorporating the condition of a design rule at position 19 to a transformation matrix T by designing constraints
Position  Knockdown  Nucleotide  Mapping  Constraints 

ability  to T  on T  
19  Effective  A, U  T[1,19],  T[3,19]−T[1,19]<0 
T[4,19]  T[3,19]−T[4,19]<0  
Ineffective  C  T[2,19]  T[2,19]−T[1,19]<0  
T[2,19]−T[3,19]<0  
T[2,19]−T[4,19]<0 
where g _{ m }(T _{ k })<0 is a trick inequality constraint on transformation matrix T _{ k } that is generated by siRNA design rule kth.
where X _{ l }[j,.] and T[.,j] are the jth row vector and the jth column of the matrix X _{ l } and T, respectively, and xy is the inner product of vectors x and y.
An example of encoding matrix, transformation matrix, and transformed vector (the values 0.5, 0.1 etc. are taken to the vector)
Sequence  Enconding  Transformation  Transformed data 

matrix X  matrix T  vector x=T∘X  
AUGCU  1 0 0 0  0.5 0.7 0.32 0.2 0.5  (0.5, 0.1, 0.08, 0.6, 0.1) 
0 0 0 1  0.3 0.1 0.6 0.6 0.3  
0 0 1 0  0.1 0.1 0.08 0.1 0.1  
0 1 0 0  0.1 0.1 0 0.1 0.1  
0 0 0 1 
In this objective function, the first two components are the sum of similarities of sequence pairs belonging to the same class and the last one is the sum of similarities of sequence pairs belonging to two different classes; d(x,y) is the similarity measure between x and y (in this work we use Euclidean distance and L _{2} norm); N _{1} and N _{2} are the two index sets of ‘very high’ and ‘low’ labeled siRNAs, respectively.
We see that when labeled siRNAs are collected from heterogeneous courses, these constraints also preserve the stability of model when predicted siRNAs are generated by different protocols.
Subject to T _{ k }[i,j]≥0, g _{ m }(T _{ k }) < 0, i= 1,…,4;j= 1,…,n; k = 1,..,K; m= 1,..,M _{ k }.

Stationarity: \(\frac {\partial L}{\partial T_{k}[.,j]}=0,\ \frac {\partial L}{\partial \alpha }=0,\ \frac {\partial L}{\partial \beta }=0,\ i=1,\ldots, 4;\ k=1,\ldots,K; \text {and} j=1, \ldots,n\).

Primal feasibility: T _{ k }[i,j]≥0, g _{ r }(T _{ k })<0, i=1,…,4; j=1,…,n; r=1,…,R; k=1,…,K.

Dual feasibility: \(\mu _{m}^{(k)}\geq 0, \lambda _{j}\geq 0,\ m=1, \ldots,M_{k}; \ k=1,\ldots,K; \ j=1,\ldots,3\).

Complementary slackness: \(\mu _{m}^{(k)}g_{m}(T_{k})=0,\ m=1, \ldots,M_{k};\ k=1,\ldots,K\).
The learning phase of the proposed bilinear tensor regression model is summarized in Algorithm 1. In this algorithm, transformation matrices T _{ k },k=1,…,K, coefficient vectors α and β are learned together. In particular, siRNA sequences are first represented as encoding matrices. The transformation matrices T _{ k } are initialized following trick inequality constraints generated by siRNA design rule kth. Vectors α and β are also initialized. To learn transformation matrices T _{ k }, elements in each column of these matrices are calculated by equation (13). If they satisfy the trick inequality constraints, that column will be updated to the next solution. To learn coefficients of the proposed model, vectors α and β are updated by equations (14) and (15). The transformation matrices, vectors α and β are updated until meeting the convergence criteria, where t _{ Max } denotes the maximum iterative step to update α and β, and ε, ε _{1} and ε _{2} are thresholds for the transformation matrices, vectors α and β, respectively.
Declarations
Acknowledgements
The authors thank Dr. Le Si Vinh for stimulating discussion, Dr. Phuong Nguyen for critical review of the manuscript.
Authors’ Affiliations
References
 Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Klaus W, Tuschl T. Duplexes of 21nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature. 2001; 411:494–8.View ArticlePubMedGoogle Scholar
 Hannon GJ, Rossi JJ. Unlocking the potential of the human genome with RNA interference. Nature. 2004; 43:371–8.View ArticleGoogle Scholar
 Hutvagner G, McLachlan J, Balint E, Tuschl T, Zamore PD. A cellular function for the RNA interference enzyme Dicer in small temporal RNA maturation. Science. 2001; 293:834–8.View ArticlePubMedGoogle Scholar
 Meister G, Tuschl T. Mechanisms of gene silencing by doublestranded RNA. Nature. 2004; 43:343–9.View ArticleGoogle Scholar
 Sudarsana LR, Sarojamma V, Ramakrishna V. Future of RNAi in medicine: a review. World J Med Sci. 2007; 2:1–14.Google Scholar
 Tuschl T, Zamore PD, Lehmann R, Bartel DP, Sharp PA. Targeted mRNA degradation by doublestranded RNA in vitro. Genes Dev. 1999; 13:3191–7.View ArticlePubMedPubMed CentralGoogle Scholar
 Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y. An accurate and interpretable model for siRNA efficacy prediction. BMC Bioinf. 2006; 7:520.View ArticleGoogle Scholar
 Ui–Tei K. Optimal choice of functional and off–target effect–reduced siRNAs for RNAi therapeutics. Front Genet. 2013; 4:107.PubMedPubMed CentralGoogle Scholar
 Angart P, Vocelle D, Chan C, Walton SP. Design of siRNA therapeutics from the molecular scale. Pharmaceuticals. 2013; 6:440–68.View ArticlePubMedPubMed CentralGoogle Scholar
 Gavrilov K, Saltzman WM. Therapeutic siRNA: principles, challenges, and strategies. Yale J Biol Med. 2012; 85:187–200.PubMedPubMed CentralGoogle Scholar
 Mutisya D, Selvam C, Lunstad BD, Pallan PS, Haas A, Leake D, et al. Amides are excellent mimics of phosphate internucleoside linkages and are well tolerated in short interfering RNAs. Nucleic Acids Res. 2014; 42(10):6542–51.View ArticlePubMedPubMed CentralGoogle Scholar
 Deng Y, Wang CC, Choy KW, Du Q, Chen J, Wang Q, et al. Therapeutic potentials of gene silencing by RNA interference: principles, challenges, and new strategies. Gene. 2014; 538(2):217–27.View ArticlePubMedGoogle Scholar
 Schramm G. Ramey R. siRNA design including secondary structure target site prediction. Nat Med. 2005; 2(8):1–2. doi:10.1038/nmeth780. (Application Notes).Google Scholar
 Hannon GJ, Rossi JJ. Unlocking the potential of the human genome with RNA interference. Nature. 2004; 431:371–8.View ArticlePubMedGoogle Scholar
 Ichihara M, Murakumo Y, Masuda A, Matsuura T, Asai N, Jijiwa M, et al.Thermodynamic instability of siRNA duplex is a prerequisite for dependable prediction of siRNA activities. Nucleic Acids Res. 2007; e123:35.Google Scholar
 Mysara M, Elhefnawi M, Garibaldi JM. MysiRNA: improving siRNA efficacy prediction using a machinelearning model combining multitools and whole stacking energy. J Biomed Inform. 2012; 45:528–34.View ArticlePubMedGoogle Scholar
 Sciabola S, Cao Q, Orozco M, Faustino I, Stanton RV. Improved nucleic acid descriptors for siRNA efficacy prediction. Nucl Acids Res. 2013; 41:1383–94.View ArticlePubMedGoogle Scholar
 Elbashir SM, Lendeckel W, Tuschl T. RNA interference is mediated by 21– and 22–nucleotide RNAs. Genes Dev. 2001; 15:188–200.View ArticlePubMedPubMed CentralGoogle Scholar
 Scherer LJ, Rossi JJ. Approaches for the sequencespecific knockdown of mRNA. Nat Biotechnol. 2003; 21:1457–65.View ArticlePubMedGoogle Scholar
 Amarzguioui M, Prydz H. An algorithm for selection of functional siRNA sequences. Biochem Biophys Res Commun. 2004; 316:1050–8.View ArticlePubMedGoogle Scholar
 Jagla B, Aulner N, Kelly PD, Song D, Volchuk A, Zatorski A, et al. Sequence characteristics of functional siRNAs. RNA. 2005; 11:864–72.View ArticlePubMedPubMed CentralGoogle Scholar
 Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova A. Rational siRNA design for RNA interference. Nat Biotechnol. 2004; 22:326–30.View ArticlePubMedGoogle Scholar
 UiTei K, Naito Y, Takahashi F, Haraguchi T, Ohki–Hamazaki H, Juni A, et al. Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. Nucleic Acids Res. 2004; 32:936–48.View ArticlePubMedPubMed CentralGoogle Scholar
 Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. Asymmetry in the assembly of the RNAi enzyme complex. Cell. 2003; 115(2):199–208.View ArticlePubMedGoogle Scholar
 Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand bias. Cell. 2003; 115(2):209–16.View ArticlePubMedGoogle Scholar
 Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H, et al. Integrated siRNA design based on surveying of features associated with high RNAi effectiveness. BMC Bioinf. 2006; 7:516.View ArticleGoogle Scholar
 Ren Y, Gong W, Xu Q, Zheng X, Lin D, Wang Y, et al. siRecords: an extensive database of mammalian siRNAs with efficacy ratings. Bioinformatics. 2006; 22:1027–8.View ArticlePubMedGoogle Scholar
 Shabalina SA, Spiridonov AN, Ogurtsov AY. Computational models with thermodynamic and composition features improve siRNA design. BMC Bioinf. 2006; 7:65.View ArticleGoogle Scholar
 Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, et al. Design of a Genome–Wide siRNA Library Using an Artificial Neural Network. Nat Biotechnol. 2005; 23:955–1001.Google Scholar
 Matveeva O, Nechipurenko Y, Rossi L, Moore B, Ogurtsov AY, Atkins JF, et al. Comparison of approaches for rational siRNA design leading to a new efficient and transparent method. Access. 2007; 35:1–10.Google Scholar
 Qiu S, Lane T. A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction. IEEE/ACM Trans Comput Biol Bioinform. 2009; 6:190–9.View ArticlePubMedPubMed CentralGoogle Scholar
 Chang PC, Pan WJ, Chen CW, Chen YT, Chu YW. A design engine of siRNA that integrates SVMs prediction and feature filters. Biocatal Agric Biotechnol. 2012; 1:129–34.Google Scholar
 Klingelhoefer JW, Moutsianas L, Holmes CC. Approximate Bayesian feature selection on a large metadataset offers novel insights on factors that effect siRNA potency. Bioinformatics. 2009; 25:1594–601.View ArticlePubMedPubMed CentralGoogle Scholar
 Qi L, Han Z, Ruixin Z, Ying X, Zhiwei C. Reconsideration of in silico siRNA design from a perspective of heterogeneous data integration: problems and solutions. Brief Bioinform. 2014; 15:292–305.View ArticleGoogle Scholar
 Vickers TA, Koo S, Bennett CF, Crooke ST, Dean NM, Baker BF. Efficient reduction of target RNAs by small interfering RNA and RNase Hdependent antisense agents. A comparative analysis. J Biol Chem. 2003; 278:7108–18.View ArticlePubMedGoogle Scholar
 Harborth J, Elbashir SM, Vandenburgh K, Manninga H, Scaringe SA, Weber K, et al. Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing. Antisense Nucleic Acid Drug Dev. 2003; 13:83–105.View ArticlePubMedGoogle Scholar
 Hsieh AC, Bo R, Manola J, Vazquez F, Bare O, Khvorova A, et al. A library of siRNA duplexes targeting the phosphoinositide 3kinase pathway: determinants of gene silencing for use in cellbased screens. Nucleic Acids Res. 2004; 32:893–901.View ArticlePubMedPubMed CentralGoogle Scholar
 Takasaki S. Methods for selecting effective siRNA target sequences using a variety of statistical and analytical techniques. Methods Mol Biol. 2013; 942:17–55.View ArticlePubMedGoogle Scholar
 Saetrom P. Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming. Bioinformatics. 2004; 20(17):3055–63.View ArticlePubMedGoogle Scholar
 Takasaki S, Kotani S, Konagaya A. An effective method for selecting siRNA target sequences in mammalian cells. Cell Cycle. 2004; 3(6):790–5.View ArticlePubMedGoogle Scholar
 Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand bias. Cell. 2003; 115:209–16.View ArticlePubMedGoogle Scholar
 Chalk A, Wahlestedt C, Sonnhammer E. Improved and automated prediction of effective siRNA. Biochem Biophys Res Commun. 2004; 319(1):264–74.View ArticlePubMedGoogle Scholar
 Luo K, Chang D. The gene–silencing efficiency of siRNA is strongly dependent on the local structure of mRNA at the targeted region. Biochem Biophys Res Commun. 2004; 318(1):303–10.View ArticlePubMedGoogle Scholar
 KatohT, Suzuki T. Specific residues at every third position of siRNA shape its efficient RNAi activity. Nucleic Acids Res. 2007; e27:35.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.