Integrated siRNA design based on surveying of features associated with high RNAi effectiveness

Background Short interfering RNAs have allowed the development of clean and easily regulated methods for disruption of gene expression. However, while these methods continue to grow in popularity, designing effective siRNA experiments can be challenging. The various existing siRNA design guidelines suffer from two problems: they differ considerably from each other, and they produce high levels of false-positive predictions when tested on data of independent origins. Results Using a distinctly large set of siRNA efficacy data assembled from a vast diversity of origins (the siRecords data, containing records of 3,277 siRNA experiments targeting 1,518 genes, derived from 1,417 independent studies), we conducted extensive analyses of all known features that have been implicated in increasing RNAi effectiveness. A number of features having positive impacts on siRNA efficacy were identified. By performing quantitative analyses on cooperative effects among these features, then applying a disjunctive rule merging (DRM) algorithm, we developed a bundle of siRNA design rule sets with the false positive problem well curbed. A comparison with 15 online siRNA design tools indicated that some of the rule sets we developed surpassed all of these design tools commonly used in siRNA design practice in positive predictive values (PPVs). Conclusion The availability of the large and diverse siRNA dataset from siRecords and the approach we describe in this report have allowed the development of highly effective and generally applicable siRNA design rule sets. Together with ever improving RNAi lab techniques, these design rule sets are expected to make siRNAs a more useful tool for molecular genetics, functional genomics, and drug discovery studies.


Background
Short interfering RNAs (siRNAs) are double-stranded RNAs typically of length between 19 and 25 with 2 nucleotide overhangs on the 3' ends, and they are capable of inducing sequence-specific, post-transcriptional dele-tion of gene products, leading to the silencing of the gene activity. Naturally occurring siRNAs are cleavage products from long double-stranded RNAs (dsRNAs) by Dicer, a ribonuclease III enzyme [1,2]. The siRNA-induced mRNA degradation is a complicated process involving multiple steps, initiated by the binding of siRNA with RISC (RNA induced silencing complex), followed by RISC's activation, resulting in the recognition of the target mRNA and the degradation of the latter [1,3,4]. As a gene knockdown tool used in labs, siRNAs can also be chemically synthesized and introduced into the cells by direct transfection [5,6] or delivered into the cells in forms of hairpin precursors through plasmid or viral vectors [7,8]. The siRNA-based gene knock-down techniques are preferred by many because of their ability to disrupt individual gene's function without affecting related genes [9]. These techniques are particularly attractive for gene silencing studies in mammalian cells, because, unlike longer double-stranded RNAs, siRNAs are not likely to trigger interferon responses which lead to non-specific mRNA degradation [5].
The efficacy issue represents a major challenge in siRNA design. This issue concerns the question of how to choose from the large number of candidate siRNAs the ones that give rise of the highest levels of knock-down activity. It is well known that only a fraction of these candidate siRNAs are highly effective in silencing the target genes. Two siR-NAs targeting the mRNA sites that are separated by only a few nucleotides could exhibit very different knock-down efficacies [10,11]. What are the properties some siRNAs possess that render them more effective in knocking down the target genes than others? This is an issue of heated debate. Several sets of rules for designing high-efficacy siR-NAs have been proposed (e.g., [11][12][13][14]). In addition, a long list of factors have been claimed to influence siRNA knock-down efficacy and thus should be considered in siRNA design [15][16][17][18][19][20][21][22][23][24][25][26].
There are significant disagreements among these design rules and considerable controversies over these claims. This situation has been discussed extensively in several recent review articles [27,28], therefore we only list some examples of these disagreements here: [20] suggested that the sequence information alone was sufficient in determining the efficacy of a siRNA; however, [15,22,24] advocated the need to incorporate thermodynamic properties (calculated using tools such as Mfold [29]) in assisting siRNA design; while [17,25] emphasized the importance of the accessibility to the mRNA sites by the siRNAs, and endorsed methods of filtering candidate siRNAs based on mRNA secondary structure properties. On factors determined by siRNA sequences, [12,30] recommended choosing of sequences of intermediate G/C contents (around 50%) for effective siRNAs, while [11,18,24,31,32] endorsed the choosing of sequences of lower G/C contents (< 60%) to increase the chance of making high-efficacy siRNAs. On position-specific properties, [11] suggested that the nucleotides on positions 3, 10, 13 and 19 on the sense strand played a critical role in determining the knock-down efficacy; while [14] claimed that positions 19 and 11, and perhaps 6, 13 and 16 on the sense strand were important in determining the knock-down efficacy of the siRNAs.
The debates over siRNA efficacy go beyond the disagreements among these design rules. In fact, the effectiveness of these rules per se is in question. [17] showed that most published siRNA design tools output large numbers of ineffective siRNAs, and had a similar performance to (or even worse than) a random selector when tested on data of an independent origin. [20] made similar observations, and alleged that several published efficacy predicting algorithms gave close to random classification on unseen data.
At least two groups of researchers pointed out that many existing studies on siRNA design criteria suffered from the "overfitting" problem [20,24]. This term describes scenarios where rules are extracted from datasets that have small sample sizes, low signal-to-noise ratios, and unique experimental settings. Rules obtained under these conditions are prone to spurious effects caused by noise in the data samples or specific aspects of the experimental settings or both; rules obtained in this manner are likely to perform unsatisfactorily when used on data obtained under different experimental settings.
The key to countering the overfitting problem and developing truly effective and generally applicable siRNA design rules is the availability of a large collection of siRNA efficacy data from diverse origins. We recently undertook the effort to document all siRNA experiments in published studies and provide sensible efficacy ratings of these experiments. This effort resulted in siRecords, the largest known curated database of mammalian siRNA experiments with consistent efficacy ratings [33]. The availability of the siRecords data makes it possible to better analyze factors responsible for achieving effective RNAi experiments.
In this study, we first conducted a survey on the siRecords data of all known "features" previously implicated to influence siRNA knock-down efficacy. This survey resulted in a list of features that significantly boosted the chance of achieving higher siRNA efficacies. Then, we examined quantitatively how these significant features interact with one another in their joint effects on achieving higher efficacies. The combinations of features that give rise to the highest levels of boosting to siRNA efficacies were picked and reorganized using a disjunctive rule merging (DRM) procedure, which led to a bundle of nonredundant rule sets with controlled stringency level. The performance of these rule sets (termed the DRM rule sets) was then assessed using a reserved dataset and compared with existing design tools commonly used in current siRNA design practice.
An implementation of the DRM rule sets developed in this study is available for testing as an online siRNA design server [34].

Results
Overview of siRecords data siRecords is a continuing effort aimed to document all mammalian siRNA experiments reported in literature, and provide systematically rated efficacies for these experiments [33]. Currently, about 9000 records of siRNA experiments targeting more than 3000 genes are hosted in the siRecords database. For each siRNA experiment, we document the siRNA sequence, the target gene, key information about experimental conditions (cell line used; the method of producing the siRNA -chemically synthesized or vector-based; the method of testing the siRNA efficacy -western blot or real-time PCR or others), and an efficacy rating (elaborated below).
For this investigation, we picked all complete records of 19-mer siRNA experiments (21-mers if the two overhanging nucleotides on the 3' ends are counted) from the siRecords collection (dated 12/12/2005). The distribution of number of records per study is highly skewed -about 17.5% of the records (657 siRNA experiments) originated from 0.4% of the studies (6 studies, each reporting ≥ 30 siRNA experiments, Figure 1). To prevent our analyses from being biased by this small number of studies, we limited the number of siRNA experiments originated from a single study to be ≤ 30. For these studies where more than 30 siRNA experiments were reported, we randomly picked 30 to include in our analyses and discarded the rest. The resulting dataset includes the records of 3277 siRNA experiments targeting 1518 genes originated from 1417 independent studies. We randomly divided the dataset into two subsets at a 2:1 ratio. The larger subsettermed Set A -included 2184 records, and was used to survey features significantly associated with high efficacies and analyze the combinatorial effects of these features. The other subset (termed Set T, 1093 records) was reserved to test the conclusions obtained through the analyses of Set A.

Survey of features significantly boosting siRNA efficacy
We set out to determine, using the Set A data, what "features" of the siRNA experiments are associated with elevated RNAi efficacies. A feature is a binary property of a siRNA experiment concerning a factor potentially relevant to siRNA efficacy, for example, the 6th nucleotide of the siRNA sequence = A. Each feature has a "complementary feature". A feature and its complementary feature constitute a "feature pair". More discussions about the definition of feature and related terms can be found in Methods.
In siRecords, the effectiveness of any siRNA experiment is rated on a four-level scale: very high (if the gene product was reduced by ≥ 90%), high (if the gene product was The distribution of the number of siRNA experiments per study is highly skewed in the siRecords collection Figure 1 The distribution of the number of siRNA experiments per study is highly skewed in the siRecords collection. A. Studies were categorized based on the number of siRNA experiments reported. Only 6 out of the 1,417 studies (0.4%) reported > 30 siRNA experiments per study. B. The distribution of the total number of records in each category. Six hundred and fifty-seven records (representing 17.5% of the entire dataset) originated from the 6 studies with > 30 records per study. reduced by 70-90%), medium (if 50-70% knock-down was achieved); and low (if < 50% knock-down was observed). In Set A, the percentages of records receiving very high, high, medium and low efficacy ratings are 34.1%, 34.6%, 16.3% and 14.9% respectively ( Figure 2). The decision of using this four-level rating scheme was made based on balanced considerations about the usefulness and the reliability of the ratings [33]. One consequence of this decision is that that the conventional t-test type of analysis [11] can not be performed on this dataset, because the dependent variable (efficacy rating) is not a continuous variable, but rather a categorical, ordinal variable. Proper categorical analysis techniques need to be adopted to analyze this type of data [35].
We chose to use the Wald test of monotone trend to assess the evidence that the presence of a feature is associated with a significant up-shift (or down-shift) of the efficacy distribution. In addition, we conducted odds ratio permutation tests for two efficacy levels: > 90% and > 70% efficacies, because in siRNA design practice, we are interested in assessing whether a feature leads to increased chances of achieving higher efficacies (see Methods). For instance, a Wald test of monotone trend indicated that the presence of the feature the 6th nucleotide of the siRNA sequence = A is associated with significant up-shift of the efficacy distribution (P = 0.0058); odds ratio permutation tests showed that the presence of this feature led to significant increase in the probabilities of achieving both > 90% (P = 0.043) and > 70% (P = 0.0024) efficacies (see Supplementary Figure 1 in Additional file 1).
We examined 276 features (they constitute 138 "feature pairs") for their association with higher RNAi efficacies, using the Wald test of monotone trend and the odds ratio permutation tests. The features we examined include, to our knowledge, all that have been implicated in previous studies to improve siRNA effectiveness. Each of these features can be placed into one of five categories. The first category is based on nucleotide identities at specific positions on the 19-mer siRNA sequence, e.g. the 6th nucleotide = A; there are 76 feature pairs in this category. The second category includes 19 feature pairs that are either composite sequence features, e.g. there are at least three (A/U)'s in the seven nucleotides at the 3' end of the siRNA, or features that are defined based on the G/C content of the siRNA. The third category consists of 13 feature pairs that are based on the thermodynamics of the siRNAs Survey of features associated with the achievement of higher efficacies Figure 2 Survey of features associated with the achievement of higher efficacies. The efficacy of a siRNA experiment is rated on a fourlevel scale. In Set A, the percentages of records achieving these ratings are 34.1%, 34.6%, 16.3% and 14.9%, respectively. The distribution of the efficacy ratings across the four levels changes when certain feature is present in the siRNA experiments. For 14 selected features (they constitute 7 pairs of "complementary features"), the efficacy rating distributions of the subpopulations of siRNA experiments carrying these features are presented. Dotted vertical lines extend from the distribution of the general population.
as measured by the melting temperature, or binding energy. The fourth category, consisting of 16 feature pairs, includes features based on target mRNA sites, such as the relative positions of the target sites on the mRNA, and the local secondary structures of the target regions. Finally, the fifth category includes 14 feature pairs that are based on experimental settings, such as the cell lines used in the experiments (HeLa cells, HEK293 cells, and others), the methods used for making and delivering the siRNAs, and the methods used to evaluate the efficacy of the siRNA (Western blot, PCR-based, and others). The complete list of these tested features, and references to the studies that implicated them in enhancing siRNA efficacies, are provided in Supplementary Tables 1-5 in Additional file 1.
Of the features examined, we found 34 that were associated with a significant improvement in the efficacy distribution (P < 0.01, Wald test of monotone trend; FDR controlled at 0.056 by the q-value technique [36]); among which, 26 significantly elevated the chance of achieving > 90% efficacies (P < 0.01, odds ratio permutation test, FDR controlled at 0.038), and 27 significantly enhanced the probability of achieving > 70% efficacies (P < 0.01, odds ratio permutation tests, FDR controlled at 0.044; see Supplementary Tables 1-5 in Additional file 1). There are several cases of sub-feature -super-feature relationships among these significant features. For example, the features the 6th nucleotide = A, and the 6th nucleotide ≠ C were both significant features, however, the former is a sub-fea-ture of the latter since when the former feature is present, the latter must also be present. In each occurrence of subfeature -super-feature relationship, we eliminated all but the one feature determined to be the most significant by the Wald test. The feature the 6th nucleotide = A was thus eliminated because the Wald test P value of this feature was higher than that of the feature the 6th nucleotide ≠ C. G/C content related features were treated as a special case. Several different G/C content ranges were suggested in previous studies as being possibly associated with high RNAi effectiveness (32-79%, 30-70%, 30-52%, 35-60%, 20-50% and 31.6-57.9%) [11,12,18,24,[30][31][32]. All these features were tested. Although they do not constitute sub-feature -super-feature relationships, we treated these features as redundant features, and retained only one of them (G/C content is between 35 and 60%) because it yielded the lowest P value (0.00018) in the Wald test. The resulting list of non-redundant significant features is shown in Table 1. Detailed discussions about these significant features, and comparisons of our analyses with previous findings can be found in the Additional file 1.

Combined effects of multiple significant features
The presence of any single significant feature was not sufficient to improve the efficacy distribution substantially. When present alone, the significant features listed in Table  1 increased the probability of achieving > 90% efficacies by an average of only 2.5% (from 34.1% to 36.6%), and they increased the chance of achieving > 70% efficacies by When multiple features are co-present, we cannot assume that their contributions to the effectiveness of the RNAi experiments are additive, since features are not always independent of one another. For instance, the presence of the feature the 19th nucleotide = (A/U), clearly increases the probability that the feature there are at least three (A/U)'s in the seven nucleotides on the 3' end of the siRNA to be true. Indeed, these two features exhibited negative cooperativity: when present alone, they increased the chances of achieving > 90% efficacies by 2.6% and 2.4%, respectively; when co-present, these two features resulted in merely a 2.7% increase in the chance of achieving > 90% efficacy, much smaller than the sum of the effects of the two features (see Additional file 1 for discussions about cooperativity and additive effects of multiple features).
In seeking effective siRNA design rules, we should try to identify combinations of features that exhibit positive cooperativity. The large size and diverse origins of the records in the siRecords dataset allowed us to systemati-cally analyze how features jointly influence siRNA efficacies. Three significant features: Cell line = HeLa, Test method = Western blot and Test object ≠ mRNA were excluded from joint effect analyses because they are based on experimental settings, which are typically chosen independent of siRNA design. For the remaining 17 significant features, we looked at all possible combinations of a fixed number (l = 2,3,4,5 and 6) of features. For each combination of l features, we examined the number of records in Set A that concurrently carry all l features, and the percentages of these records that achieved > 90% and > 70% efficacies. For every given l, we focused on the top-10 feature combinations, i.e., the 10 combinations that exhibited the highest percentage of records achieving > 90% or > 70% efficacies. When there was a tie of more than 10 feature combinations, all tied combinations were considered. As we expected, as l -the number of features in the combinations increased, the number of records concurrently carrying all l features declined sharply ( Figure 3C). Meanwhile, the percentage of experiments achieving > 90% and > 70% efficacies increased steadily as l, the number of features included in the feature combinations, increased ( Figure 3A and 3B).
Highly effective siRNA design rules were obtained by selecting the top l-feature combinations, i.e., the combination of l non-redundant significant features that exhibited the highest percentages of records achieving > 70% or > 90% efficacies on Set A Figure 3 Highly effective siRNA design rules were obtained by selecting the top l-feature combinations, i.e., the combination of l nonredundant significant features that exhibited the highest percentages of records achieving > 70% or > 90% efficacies on Set A. A. For l = 2 through 6, the subpopulations of Set A records that carry all combinations of l features were examined, and the 10 feature combinations (FCs) that resulted in the highest percentages of records achieving > 70% efficacies were selected. When there was a tie of more than 10 FCs, all of them were considered (marked in the graph). The mean percentages of the top FCs are presented in black filled circles. These FCs were used to select siRNA experiments in the Set T, and the results are shown in grey filled circles. Error bars indicate standard errors. The first two data points in the graphs represent the base line levels (the percentage of records achieving > 70% efficacies for the entire Set A or Set T), and the mean levels for top-10 individual features (the 10 individual features that led to highest percentages of records achieving > 70% efficacies), respectively. B. Similarly to A, the top FCs selected with > 90% efficacies are plotted, together with the baseline levels and the mean levels for top individual features. C. The numbers of records selected in the top l-feature combinations dropped sharply as l increased. The mean numbers of selected records for Set A (with error bars indicating standard errors) are presented in black filled circles and black open circles for > 70% and > 90% efficacies, respectively. The numbers of selected records for Set T are presented in corresponding grey symbols. Again, the first two data points represent the baseline levels (numbers of records in entire Set A and Set T), and the numbers of records selected with the top-10 individual features, respectively.  The sigmoid shape of the two ascending curves is an indication of positive cooperativity (see discussion in Additional file 1). This suggests that by simply retaining the feature combinations that led to the highest percentages of records achieving efficacies of > 90% or > 70%, we were, in effect, exploiting the positive cooperativity, or favorable interaction, among these features. At l = 5, 24 feature combinations had a 100% chance of having efficacies > 70%, that is, every experiment in which the siRNA used had all features contained in any one of the 24 feature combinations exhibited efficacies of > 70%. Similarly, 14 feature combinations had 100% probabilities of having efficacies > 90% at l = 5, meaning that all siRNA experiments having these feature combinations demonstrated efficacies > 90%. At l = 6, 188 feature combinations had 100% probabilities of having efficacies of > 70%, and 94 feature combinations had 100% probabilities of achieving efficacies of > 90%.

Integrated rule sets for effective siRNA design
A disjunction of the top feature combinations described above (across l = 2 through 6; a feature combination is also called a rule thereafter) defines a rule set for designing effective siRNA experiments. Rule sets defined in this way are likely to contain redundancies, because if a rule consisting of features {f 1 , f 2 ,..., } is one of the best -feature combinations, then a rule consisting of ( +1) features {f 1 , f 2 ,..., , f 0 }, where f 0 is any other feature, is likely to be one of the best ( +1)-feature combinations thus is also selected into the rule set. A disjunctive rule merging (DRM) algorithm can be applied to remove redundancies of the rule sets, in the mean time allowing the control over the stringency of the resulting rule sets (see Methods). This algorithm takes in a user-provided stringency parameter α (which has a range of [0, 1]), and produces a non-redundant set of disjunctive rules, each rule in the set resulting in ≥ α proportion of the records in Set A reaching efficacies > 90%. The rule set rendered for the highest α level (α = 0.951, denoted as RS 0.951 ) contains seven rules (Table 2). Generally speaking, the lower α level, the larger number of rules are included in the rule set (see Supplementary Table 6 in Additional file 1).

Performance comparison between DRM rule sets and existing design tools
We assessed the performance of the DRM rule sets, and compared it with that of 15 existing online design tools commonly used in siRNA design practice, using the Set T data reserved for this purpose (Table 3 and Figure 4). Set T includes the records of 1,093 siRNA experiments, repre-senting 1,014 unique target sites on 744 genes. How do we assess the performance of a siRNA design program? A good siRNA design program should (a) provide a sufficient number of candidate siRNAs for a given gene; and (b) offer a high PPV (positive predictive value), or a low false positive rate (see Methods).
On the number of candidate siRNAs predicted, the DRM rule set with the highest stringency level (RS 0.951 ) produced on average 18.9 predicted effective siRNAs per gene. This indicates that this rule set offers sufficient candidate siRNAs in an ordinary siRNA design task for a gene of an average length. However, the smallest number of predicted effective siRNAs for a gene is 1. This suggests that for genes of the shortest lengths, the number of candidate siRNAs offered by this rule set may not be enough. There are considerations other than achieving high efficacy (e.g., avoiding cross-reactivity with other genes) in the design of siRNA experiments, thus it is desirable to have multiple candidate siRNAs designed for every gene. For genes of the shortest lengths, we resort to DRM rule sets of lower stringency levels. For example, RS 0.845 produced at least 3 potentially effective siRNAs for each gene, and an average of 38.1 potentially effective siRNAs per gene (see Supplementary Figure 3 in Additional file 1). The online design tools varied greatly in the numbers of candidate siRNAs they provided. The highest number of predicted effective siRNAs was offered by EMBOSS sirna by Institute Pasteur (639.4 siRNAs per gene). IDT RNAi Design by IDT, Inc. produced the lowest number of predicted effective siRNAs (5.8 siRNAs per gene). Among the 15 online design tools, 10 offered larger numbers of candidate siRNAs than DRM RS 0.951 , and 4 provided larger numbers of candidate siRNAs than DRM RS 0.845 .
Given that a sufficient number of candidate siRNAs are provided, the most important parameter that measures the performance of a design tool is the PPV. Only a small proportion of possible siRNA sites have been experimentally tested for effectiveness ( Set T is a fair dataset to be used for the purpose of performance comparison between the DRM rule sets and the online design tools, because it contains no overlapping records with Set A, based on which the DRM rule sets were derived. However, Set T might not be considered as a completely independent dataset, because (a), there are records in Set T that originated from the same studies as some records in Set A; and (b), there are records of siRNA experiments in Set T that target the same genes as some experiments in Set A. To rule out the possibility that these two factors might contribute to better performance of the DRM rule sets for unforeseen reasons and unfairly favor the DRM rule sets in the performance comparison, we compiled an "independent subset" of Set T, eliminating all records that share the same origins of any records in Set A, and all records that target the same genes that are also targeted by any records in Set A. We compared the performance of the DRM rule sets with that of the 15 online design tools using this independent subset (including 224 siRNAs targeting 197 different genes, see Table 4). Because of the reduced size of the dataset (by nearly 80%), the sensitivity, specificity and PPVs for all tools and rule sets showed higher levels of variability. The three DRM rule sets with the highest α levels: RS 0.951 , RS 0.895 and RS 0.845 achieved 100% PPV. Two online design tools, BLOCK-iT by Invitrogen Corp. and WI siRNA Selection Program by Whitehead Institute also achieved 100% PPV, but the other online design tools achieved lower PPVs that range between 50.0% and 86.4%. Although the small size of the independent subset prevented this analysis from being completely conclusive, it is fair to state that the comparison made based on the independent subset is generally in agreement with the comparison made based on the entire Set T.

Discussion
It has been recognized that many existing siRNA design criteria (and the design tools in which they are implemented) failed to provide promised levels of performance

List of features:
Feature Index Feature Names At least three (A/U)s in the seven nucleotides at the 3' end F 10 No occurrences of four or more identical nucleotides in a row F 11 No occurrences of G/C stretches of length 7 or longer F 12 G/C content is between 35 and 60% F 13 T m is between 20 and 60°C F 14 Binding energy of N16-N19 > -9 KCal/Mol F 15 Binding energy of N16-N19 -binding energy of N1-N4 is between 0 and 1 KCal/Mol F 16 Local folding potential (mean) ≥ -22.72 KCal/Mol F 17 Target site is on CDS  when tested with unseen data largely due to the "overfitting" problem in their development [20,24]. Practically, the key to countering this problem is to make use of a large siRNA efficacy data from diverse origins when developing siRNA design rules. In this study, we took advantage of the recent siRecords collection in our development of the DRM rule sets. First, we conducted a survey on the siRecords dataset of all known "features" previously implicated to influence siRNA knock-down efficacy. This survey resulted in a list of features that significantly boosted the chance of achieving higher siRNA efficacies. Then, we examined quantitatively how these significant features interact with one another in their joint effects on achieving higher efficacies. The combinations of features that give rise to the highest levels of boosting to siRNA effica-cies were picked and reorganized using the DRM algorithm, producing the rule sets. Finally, the performance of these rule sets was verified on a reserved dataset (Set T, also from siRecords) and was compared with that of 15 online siRNA design tools commonly used in current siRNA design practice.
The survey of features influencing RNAi effectiveness conducted in this study is the largest scale survey of this type ever reported by far (276 features were examined on a siRNA efficacy dataset consisted of 2,184 records of experiments originated from 1,141 independent studies). Among the significant features identified in the survey ( , and one feature related to the target location (Target site is on CDS). However, there are also a small number of features that were not reported to be significant in any previous studies, e.g., the 4th nucleotide = C and the 9th nucleotide = C. It appears that there are higher levels of disagreements for sequence related features (Categories 1 and 2) than for features defined based on thermodynamics of the siRNAs and on target mRNA sites (Categories 3 and 4) between our survey results and previous findings, with the exception of the 3-nucleotide segment on the 3' end (N17-N19, the lower G/C content in this segment is correlated to lower binding energy on the 3' end). Notably, three Category 5 features (defined based on experimental settings) Cell line = HeLa, Test method = Western blot and Test object ≠ mRNA were among those found to be most significant. Although there have been reports about siRNA efficacy being influenced by cell lines and test methods [37][38][39][40], this is the first quantitative analysis about how strong these influences are. More details about the significant features found in the survey, and comparisons of our analyses with previous findings are presented in the Additional file 1.
In a recently published review article, several considerations for selecting effective siRNAs were proposed resulting from summarization and integration of major recent findings in the field of siRNA design [41]. Comparison of these considerations with the survey results obtained in this study indicates that they generally agree with each  Comparison made based on the independent subset of Set T (224 siRNA experiments targeting 197 genes). Default settings were used for the 15 online predicting tools. A siRNA experiment was considered effective if it achieved > 70% efficacy (was rated "high" or "very high" efficacy).
other (see Supplementary Table 8 in Additional file 1). Of the 34 features pertinent to the considerations proposed by Pei and Tuschl, 29 were found to be significant in boosting the siRNA efficacy. Among the remaining 5 features, the feature G/C content is between 30 and 52% was found to be associated with a commensurate, though not significant improvement in the efficacy distribution (P 70 = 0.082 and P wald = 0.056). Two related features, G/C content is between 35 and 60% and G/C content is between 31.6 and 57.9%, however, were found to be highly significant in boosting the siRNA efficacy, agreeing with the common understanding that the effective siRNAs prefer a low-tomedium G/C content. Two features pertinent to the considerations proposed by Pei  Since the siRecords collection is compiled from published siRNA studies, there is the concern that it may be biased towards higher efficacy siRNAs, because researchers are probably less inclined to report lower efficacy experiments in their research articles. We can assess how much this bias is by comparing the efficacy distribution of the siRecords collection with that of published randomly designed siRNAs. In two published studies [11,22], moderately large numbers (180) of randomly designed siRNAs were tested for knock-down efficacies. The percentages of siRNAs resulting in < 50% efficacies in these two studies were 22.2% and 23.3%, respectively. In the siRecords data used in this study, the percentage of records receiving "low" efficacy rating (i.e., produced < 50% knock-down efficacies) is 14.3%. In one of these previous studies [22], the percentage of siRNAs resulting in > 90% efficacies was reported to be 29.4%. In the siRecords collection, the percentage of records receiving "very high" efficacy rating (i.e., produced > 90% efficacies) is 34.3%. Therefore, the siRecords collection is indeed biased towards the higher efficacy experiments, likely because researchers are less ready to report lower efficacy experiments. However, this bias is not severe, because nearly 2/3 of the low efficacy siRNA experiments are still included in siRecords. Furthermore, the analyses conducted in this study -in particular, the results of the survey of features influencing the siRNA efficacy -are not influenced by the reduced number of low efficacy siRNAs in the dataset. These analyses are reliable as long as the dataset includes sufficiently large number of low efficacy records (the number of records bearing "low" efficacy used in this study is 467).
Another concern over the using of the siRNA data compiled from published siRNA studies is that the design of siRNA experiments in these published studies might be dominated by one or two design tools used in the performance comparison (Table 3), compromising the objectiveness of this comparison. An analysis of the relative utility of the 15 online siRNA design tools (see Supplementary Table 7 in Additional file 1) suggested that these design tools had varied levels of utility, yet none of them had dominated the current siRNA design practice (see discussion in Additional file 1).
It is desirable to validate the DRM rule sets obtained in this study using a dataset independent of siRecords. However, it is considerably difficult to find a separate siRNA efficacy dataset that is as large and diverse as the siRecords collection. In a recent report by Huesken et al., a genomewide human siRNA library was constructed, in which 2,431 randomly selected siRNAs targeting 34 fusion mRNAs were tested for efficacy [42]. There were concerns when this library of siRNAs was considered as a validation dataset for the DRM rule sets, because, firstly, this dataset is of a singular origin; and secondly, fusion mRNAs were used against which the siRNA efficacies were tested. This is considered as a somewhat questionable practice because the native secondary structures may not be well preserved in the fusion mRNAs. Although Huesken et al. performed control experiments which suggested that fusion mRNAs and endogenous mRNAs produced similar efficacy estimates in the setting they adopted, and argued that sequence features, rather than secondary structure related features were the main determinants of the siRNA efficacy, there have been multiple recent reports about secondary structures playing important roles in determining the siRNA efficacy [17,25], which are backed up by the finding in our survey that at least one secondary structure related feature (Local folding potential (mean) ≥ -22.72 KCal/Mol) significantly boosts the chance of achieving higher siRNA efficacy. Nevertheless, we examined the per- When tested using the 249-siRNA test dataset specified in that study, the same three DRM rule sets identified 3, 4 and 6 effective siRNAs, respectively, and the average "normalized inhibitory activity" of these siRNAs were 0.96, 0.80 and 0.78, respectively. In Huesken et al., the average "normalized inhibitory activity" of the entire dataset was 0.69, and they recommended to use 0.75 or 0.80 as cut-offs for selecting effective siRNAs. These results suggest that generally speaking, the DRM rule sets were capable of identifying effective siRNAs in this completely independent siRNA efficacy dataset.
As more data becomes available in siRecords, we will perform updated analyses on this data collection with the aim of obtaining more accurate and more reliable siRNA design rules. In addition, as there is indication that the DRM rule sets behave differently for subpopulations of siRNAs tested under different experimental settings (e.g., for those validated with Western blot technique and those validated with PCR and other techniques, see Supplementary Figure 4 in Additional file 1), we will refine our analyses and develop separate rule sets for these different subpopulations of siRNAs.

Conclusion
In this study, we identified a bundle of highly effective and generally applicable rule sets for siRNA design. This was accomplished by applying a simple strategy in which we analyzed a large number of candidate features for association with increased siRNA efficacies, then used quantitative analyses of the joint effects of these significant features to identify positive cooperativity among these features. The key to our approach was the use of the large set of siRNA efficacy data available in siRecords. The availability of this dataset not only made the execution of this strategy possible, but also curbed the overfitting problem that many rules generated by other design protocols suffer from. We expect that the design rules revealed in this study, together with improving RNAi lab techniques, will make siRNAs a more useful tool for molecular genetics, functional genomics, and drug discovery studies.

Data preparation
All records of 19-mer siRNAs (not counting the overhanging nucleotides on the 3' end) were retrieved from the siRecords database. The records that failed to meet the following criteria were excluded from further analyses: (1) had complete annotations of cell line types, test methods, transfection methods and efficacy classification; (2) had target mRNA lengths ≤ 16,000 nucleotides (this is a limit set by the Mfold program for calculation of thermodynamics features, see below); (3) the siRNA sequence had no mismatches with the targeted site by pair-wise Blast (NCBI bl2seq v.2.2.9, parameters "-p blastn -W 7 -q -1 -F F"). For studies where more than 30 siRNA experiments were reported, we randomly chose 30 to include in our analyses. The cell line types and test methods were grouped based on ATCC (American Type Culture Collection) [43] and Protocol Online [44], respectively.

Features
We define a feature as a binary property of a siRNA experiment concerning a factor potentially influencing the efficacy of the experiment. For a given siRNA experiment, any defined feature is either present or absent. Some example features are listed below: (1) The 6th nucleotide of the siRNA sequence (counting from the 5' end on the sense strand) is an adenine (A).
(2) The 17th nucleotide of the siRNA sequence is not a guanine (G). (

3) There are at least three (A/U)'s in the seven nucleotides on the 3' end.
(4) The G/C content of the siRNA sequence is between 30 and 52%.
For Features (1) and (2), the concerning factors potentially influencing the siRNA efficacy are the identities of the 6th and the 17th nucleotides of the siRNA sequence, respectively. For Feature (3), the concerning factor is the seven nucleotides as a whole on the 3' end of siRNA sequence. For Feature (4), the concerning factor is the G/ C content of the siRNA sequence.
Each feature has a complementary feature, that is, the alternative property concerning the same factor. For instance, the complementary feature of Feature (1)  For a given factor, there are multiple ways of formulating features. In some cases, the so-called sub-feature -superfeature relationships can result. For example, the following four features are all concerned with same factor -the identity of the 6th nucleotide of the siRNA sequence: Wherever Feature (5) is present, Feature (8) must also be present. Thus, Feature (5) is a sub-feature of Feature (8), and Feature (8) is a super-feature of Feature (5). Similarly, Features (7) and (6) also constitute a pair of sub-featuresuper-feature relationship.

Feature definitions
We surveyed 276 features (constituting 138 feature pairs) in this study. These features can be classified into the following five categories:

Category 1: Direct sequence features
We defined 152 direct sequence features (76 pairs) based on the positional specific nucleotide identity in the siRNA sequence (on the sense strand). For each position in the 19-mer siRNA sequence, 8 features were defined based on whether or not the nucleotide at the position is an adenine (A), a cytosine (C), a guanine (G), or a uracil (U), respectively. Among these features, 24 were previously claimed to favorably influence the siRNA efficacy (see Supplementary Table 1 in Additional file 1).

Category 3: Features defined based on thermodynamics of the siRNA
Features on T m , folding energy of the sense strand and total hairpin energy. Ten features (5 pairs) were defined that are related to the melting temperature (T m ) of the siRNA, the folding energy of sense strand, or the total hairpin energy of the siRNA. Among them, 6 features were defined based on whether or not the T m falls into the following three ranges < 60°C, < 20°C, and between 20 and 60°C [11]. Two features were defined based on whether or not the folding energy of sense strand is equal to or greater than -5 KCal/Mol [18]. Two features were defined based on whether the absolute value of total hairpin energy is less than 1 KCal/Mol [24]. The DINAMelt server [48] was used in the calculation of T m and hairpin energy [29,49]. The total hairpin energy was calculated as the absolute value of the sum of hairpin energies of siRNA sense and antisense strand in units of KCal/Mol [24] (Chalk, A., personal communication).
Features on binding energy. Sixteen features (8 pairs) related to the binding energy of siRNA sequences were defined. On the 5' end binding energy, we defined the feature 5' binding energy is between -9 and -5 KCal/Mol and its complementary feature [24]. On mid-sequence binding energy, we defined 6 features associated with three nucleotide ranges: N6-N11 [22], N7-N11 [15] and N7-N12 [24]. For the nucleotide range N7-N12, we used the reported threshold -13KCal/Mol in the feature definition [24]. For the nucleotide range N7-N11, we defined the feature based on whether or not the average free energy profiles fall into the reported optimized range between -1.97 and -1.65 KCal/Mol [15]. For the binding energy of the range N6-N11 for which no threshold was explicitly reported, we took the median value (-13 KCal/Mol) of all siRNAs in the dataset as the threshold. On 3' end binding energy, we defined a feature binding energy of N16-N19 > -9 KCal/Mol and its complementary feature [24]. In addition, 6 features (3 pairs) were defined that are associated with the difference between the 5' binding energy and 3' binding energy. They are defined based on: (a) whether or not the difference between the binding energy of N1-N4 and N16-N19 is greater than 0 [22,24], (b) whether or not the difference between the binding energy of N1-N4 and N16-N19 is between 0 and 1 KCal/Mol [24], and (c) whether or not the difference between the binding energy of N1-N5 and N15-N19 is greater than 0 [15], respectively (see Supplementary Table 3 in Additional file 1).
The nearest neighbor model parameters described in Xia, T. et al. [50] were used for binding energy calculation [29]. The binding energy of N1-N4 and N16-N19 were com-puted as the sum of free energies for 4 base-pair stacks starting from position 1 in the sense strand and one single base stacking energy [21,51] (Chalk, A., personal communication). Calculations of binding energies for N1-N5 and N15-N19 were performed similarly to those done for N1-N4 and N16-N19, except that 5 base-pair stacks were used. Binding energies for N6-N11 and N7-N12 were computed as the sum of free energies for 6 base-pair stacks within positions 6-11 and positions 7-12 in the sense strand. The average free energy profiles of N7-N11 was computed as the average base pair energy of consecutive five pentamer subsequences starting from positions 7 to 11 in the sense strand (Poliseno, L., personal communication).

Category 4: Features defined based on target mRNA sites
Features on the mRNA target location. Sixteen features (8 pairs) related to the siRNA target location on mRNA were defined, based on whether or not the target region is within (a) 5' UTR [12,16], (b) 3' UTR [16], (c) CDS [18], (d) the first 100 nucleotides of CDS [12,16], (e) the first quartile of CDS, (f) the second quartile of CDS, (g) the third quartile of CDS [14], and (h) the fourth quartile of CDS, respectively. The mRNA sequences were obtained from NCBI GenBank. The target region was determined by using a BLAST search (NCBI bl2seq v.2.2.9 with parameter "-W 7 -q -1 -F F"). The targeted site was assigned to a subregion if the entire target site lied within that sub-region.
Feature on the secondary structures of the target mRNA. Fourteen features (7 pairs) that are associated with the secondary structures of the target mRNA were defined, based on (a) whether or not the calculated hydrogene bond (H-b) index is less than 28.8 [25], (b) whether or not the siRNA target region is filtered by repelling loop filter [52], (c) whether or not the local free energy of the most stable structure (LFE_mss) is equal to or greater than -20.9 KCal/Mol [53], (d) whether the average local free energy of the ten most stable structures (LFE_average) is equal to or greater than -20.85 KCal/Mol [53], (e) whether or not the mean local folding potential (LFP) is equal to or greater than -22.72 KCal/Mol, (f) whether or not a non-zero accessibility score was obtained for the siRNA target site [54], (g) whether or not the anti-sense siRNA binding energy is equal to or less than -10KCal/Mol [47], respectively (see Supplementary  Table 4 in Additional file 1).
The hydrogen bond (H-b) index measures the average number of hydrogen bonds formed between nucleotides in the target region and the rest of mRNA, and it was calculated according to Luo et al. [25]. We used the median value of all siRNAs in the dataset (28.8) as the threshold since no threshold was explicitly given in the original report. The repelling loop filter was proposed by Yiu et al. for determining the accessibility of the mRNA target region [52]. If in at least three of the five most stable structures of the whole-length mRNA (calculated with Mfold), the 19-mer target site was contained by at least one "big repelling loop", or by at least two "repelling loops", the target region was identified to be invalid by the repelling loop filter. The LFE (local free energy) was calculated according to Schubert, S., et.al [53], with predicted mRNA secondary structures calculated using Mfold 3.2 [29,55]. The free energy contribution of each sequence local element in a structure was extracted from the output .det files by Mfold; local elements include helices, bulges, and loops among others. The LFE of the targeted site was computed as the sum of the free energy contribution of all sequence local elements containing one or more nucleotides in the siRNA target site (Schubert, S., personal communication). The ten most stable secondary structures in the mRNA sequence were also used in our calculations. For each siRNA target site, we calculated the LFE for the lowest free energy structure of the site (LFE_mss) and the average LFE of the ten most stable secondary structures (LFE_average). Since no thresholds were explicated provided in the original report, the medians of all LFE values in the dataset (-20.9 KCal/Mol for LFE_mss and -20.85 for the LFE_average) were used as thresholds in the feature definitions.
The local folding potential (LFP) is a measurement of the RNA local thermodynamic stability [56][57][58]. We postulated that the thermodynamic stability of the siRNA target site may influence the RNAi effectiveness. We calculated the structure with the lowest free energy for the 100 nucleotide region on the mRNA centering around each of the 19 nucleotides in the siRNA target site. The LFP was calculated as the mean of the 19 free energy values obtained. In cases when the target site was close to either end of the mRNA, so that the 100-nucleotide regions could not be obtained for certain nucleotides in the 19mer target site, a shorter mRNA segment was used that was truncated at the end of mRNA. The median value calculated for the entire dataset (-22.72 KCal/Mol) was used as the threshold in feature definition.
The accessibility of the siRNA target region was recently raised as an important factor influencing the siRNA efficacy [59]. We conducted the Iterative computational analysis (ICA) using a window size = 800 nucleotides and a step size = 100 nucleotides [59,60]. To generate the largest number of windows that overlap the siRNA target region, the central base of the siRNA target region was used as the central point of the first window; subsequent windows were extended in both directions to cover the entire mRNA sequence. For each window, the five most stable structures predicted by Mfold were used. It turned out, however, that the ICA routine produced a filter that is too stringent for practical use. Of the 2,600 siRNA target regions in our dataset, only 6 were determined to be assessable by this routine. We then took an alternative approach, and conducted the accessibility score analysis [54], which produced a similar but less stringent filter. In calculating the accessibility score, a region receives a nonzero score as long as the most stable structure in each window covering the siRNA target region contains a singlestrand segment of length ≥ 10 nucleotides. Of the 2,600 siRNA target regions in our dataset, 456 received non-zero accessibility scores. Two features were defined based on this accessibility score filter.
The anti-sense siRNA binding energy was proposed as a measurement of mRNA accessibility [47]. We used the Sirna module of the Sfold server to calculate the anti-sense siRNA binding energy [47]. For each siRNA target sequence, a 200 nucleotide mRNA segment centering around the 19 nucleotide target site was extracted. In cases when the target site was close to either end of the mRNA sequence, so that a 200-nucleotide regions centering around the target site could not be obtained, a shorter mRNA segment (truncated at the close end of the mRNA) was used. These segments were sent to the Sirna server for calculation [61]. The results were parsed and the anti-sense siRNA binding energies were extracted.  Table 5 in Additional file 1).

Statistical tests of features influencing siRNA efficacy
Determined by the four-level scheme used to rate the efficacy of siRNA experiments, proper categorical analysis techniques were needed to analyze these data. For any given feature, we calculated the efficacy distribution (among the four levels -very high, high, medium and low) of all siRNA experiments carrying this feature, and compared it with the efficacy distribution of all siRNA experiments carrying the complementary feature of this feature. Chi-square (χ 2 ) test of independence is a com-monly used test that finds evidence of difference between two discrete distributions. However, this test assumes that the dependent variable (efficacy rating) is a nominal variable rather than an ordinal variable, thus it is not able to tell us whether the presence of a feature results in higher or lower efficacy. A more appropriate test will find evidence of monotone trend, that is, whether the presence of a feature is associated with a significant up-shift or downshift of the efficacy distributions among the four levels.
Consider the joint probability distribution {π i, j } between the presence/absence of a particular feature (which defines i: i = 1 if the feature is present, and i = 0 if the feature is absent), and the four-level efficacy ratings (which defines j: j = 3 if efficacy rating = "very high", j = 2 if efficacy rating = "high", j = 1 if efficacy rating = "medium", and j = 0 if efficacy rating = "low"). We calculate the probabilities of concordance and discordance: Then we calculate the γ difference between these two probabilities: The sample γ has approximately a normal distribution, with standard error calculated using the Delta method where Let then z 2 is a Wald statistics that has a chi-squared null distribution with 1 degree of freedom, based on which a Wald test can be conducted to find significant monotone trend [35].
The monotone trend test finds evidence about whether the presence of a particular feature is associated with sig- nificant up-shift of the four-level efficacy distribution. If the evidence of such association is found, however, this test alone is not able to tell us where the up-shift takes place. In RNAi experiments, we are most concerned with the chances of achieving higher efficacies. Thus, we also conducted permutation tests of odds ratios for achieving > 90% and > 70% efficacies. In the siRecords data, the chance of achieving > 90% (or > 70%) efficacies can be approximated by the proportion of records bearing "very high" (or "high"/"very high") efficacy ratings. For a given feature, the odds ratio for > 90% efficacies, θ 90 , is defined as where π 1, > 90 is the proportion of records bearing "very high" efficacy rating (i.e., with > 90% efficacies) in the subset of the experiments carrying the feature, and π 0, > 90 is the proportion of records bearing "very high" efficacy ratings in the subset of the experiments carrying the complementary feature of the feature concerned. To generate a null distribution of the odds ratio, Set A was randomly split into two subsets, one of which was arbitrarily marked with "feature present", the other marked with "complementary feature present", and an odds ratio was calculated accordingly. This process was repeated 100000 times, and the 100000 resampled odds ratios constituted the null distribution. Given any feature to be tested, the P value was calculated as where is the ith resampled odds ratio, and θ 90 is the true odds ratio of the feature. The odds ratio permutation test for > 70% efficacies was conducted similarly, with the proportion of records bearing "very high" or "high" efficacy ratings substituted for that of records bearing "very high" ratings in the above description.
Meaningful statistics tests require the use of sufficiently large datasets. All features were subject to a "dataset size filter" using an arbitrarily set threshold of 30 records: if a given feature was carried by fewer than 30 records in Set A, then this feature and the complementary feature of this feature were excluded from the statistics tests and following analyses. Four features -GC stretches of length ≥ 9, G/C content is not between 30 and 79%, Cell line = T24 and Test method = Flow cytometry, as well as their complementary features were excluded for this reason.

Control of false discovery rate (FDR)
The simultaneous testing of the large number of hypotheses requires the curbing of the type I error rate with the consideration of the "multiple testing" problem. We chose to control the FDR by taking the q-value approach [36], because of its ability to adapt to the true distribution of the input p-values. We used the "bootstrap" method, rather than the default "smoother" method (which is equivalent to Benjamini and Hochberg's FDR controlling method [62]) in estimating the FDR, because U-shape distributions were observed for the input p-values for both the Wald test and the odds ratio permutation tests, likely introduced by the fact that one-sided tests were conducted when two-sided signals were present [63].

Rules, rule sets and the disjunctive rule merging (DRM) algorithm
We define a rule as a conjunction of (l) features. An l-feature rule is also called an l-feature combination. A rule set is defined as a disjunction of (m) rules. Generally speaking, the larger m is, the higher sensitivity the rule set achieves, in the mean time, the lower specificity the rule set has to offer.
The disjunctive rule merging (DRM) algorithm was developed to remove the redundancy in the rule sets resulting from the combined effect analysis of multiple features, in the mean time exerting control over the stringency of the rule sets. The listing of the DRM algorithm is as follows. Initialization: Create rule set RS = φ.

Input
Step 1:For every r i ∈ Θ satisfying P i ≥ α, add r i into RS.
Step 2: For j = 2,3,...,5 For any rule r p ∈ RS where m p = j For any rule r q ∈ RS where m q > j if r p ⊂ r q , then remove r q from RS.

End For
End For It is easy to see that given any α, the rule set resulting from the DRM algorithm (thus called a DRM rule set) is fixed. The reverse, however, is not true. A DRM rule set does not correspond to a single α value, but rather, a range of different α's. For example, the DRM rule sets for any α between 0.901 and 1 are exactly the same (containing 7 rules). We note this rule set as RS 0.951 , where 0.951 is the mid-point of the range of α for which the rule sets are produced. Naturally, the higher α level, the higher specificity the DRM rule set possesses; meanwhile, the lower sensitivity the rule set has to offer. Therefore, the DRM algorithm with variable α values allows us to choose the proper combination of sensitivity and specificity that suits our needs. In the siRNA design of a typical setting, we are most concerned with achieving high specificity, and can often tolerate lower sensitivity, since there is a large pool of possible target sites to choose from -for a mRNA of length w, in theory there are (w-19+1) target sites to pick from. Therefore, we are most concerned with the behavior of the rule sets with high (close to 1) α values.

Performance comparison between DRM rule sets and existing online design tools
Design tasks were performed for the 744 genes in Set T using the following 15 online siRNA design tools with the default settings.
Ambion siRNA Target Finder (Ambion, Inc.) [64]. We used the mRNA sequence as the input. By default, no restriction of the ending dinucleotides was specified, and no restriction of the G/C content was specified. Occurrences of 4 or more identical nucleotides in a row were allowed.
Jack Lin's siRNA Sequence Finder (Cold Spring Harbor Laboratory) [65]. We used the full-length mRNA sequence as the input. The spacer length was set as to be 19.
siDESIGN Center (Dharmacon, Inc.) [66]. We used the mRNA sequence as the input. No restriction of the leading sequences was specified. The target region was limited to the ORF (open reading frame), the G/C content range was set as 30-52%, and the patterns "GGG" and "CCC" were excluded. The BLAST filtering option was turned on by default.
siRNA Target Finder (GenScript Corp.) [67]. We provided the GenBank accession of the mRNA as the input. The length of siRNA was set to be 19. By default, the G/C con-tent range was set to be between 30% and 60%, and sequence selection region was restricted to the ORF.
Imgenex sirna Designer (Imgenex Corp.) [68]. The target mRNA was specified using the GenBank accession. The siRNA length was set to be 19. The parameter "nucleotide target" was set to be 50 by default. The parameter "first nucleotide target for siRNA" was set as "AA". The G/C content range was set to be between 45% and 51%. Occurrences of 4 identical A's or T's in a row, or 3 identical (C/ G)'s in a row were not allowed. By default, the BLAST search was not performed.
EMBOSS siRNA (Institute Pasteur) [69]. We used the fulllength mRNA sequence as the input. By default, no restriction of the leading or ending dinucleotides was specified.
Occurrences of 4 identical nucleotides in a row were allowed.
IDT RNAi Design (SciTools) (Integrated DNA Technologies, Inc.) [70]. The mRNA sequence was provided as the input, and the "21mer" option was selected. The "Unified RNAi Rule Set" was used in the design. The G/C content range was set to be between 30% and 70%. The asymmetrical end stability base pair length was set to be 5. The 5' antisense asymmetrical end stability weight was set to be 0.5, and the 3' overhang was set to be "TT" by default. The default setting was also used for all motifs preferences.

BLOCK-iT RNAi Designer (Invitrogen Corp.)
[71]. We provided the mRNA sequence as the input. By default, the search in the target region was limited to the ORF. The minimum/maximum allowed G/C contents were set to be 35% and 55%, respectively. The BLAST search option was turned on by default.
siSearch (Karolinska Institutet) [72]. We provided the mRNA sequence as the input. By default, the G/C content range was set to be between 30% and 60%. The candidate sites with scores of 6 or above were obtained. The minimum energy difference between two ends of the siRNA was set to be 0. Occurrences of 4 (A/U)'s in a row were not allowed, and the siRNAs containing immunostimulatory motifs were removed. The repeat masking was turned on by default.
SiMAX (MWG-Biotech, Inc.) [73]. We used the Genbank accession to specify the target. By default, occurrences of > 3 identical nucleotides in a row in the siRNA sequences, or U's at the 3' end were not allowed. The G/C content range was set to be between 30% and 53%. The search range was restricted to the region between the 100th nucleotide downstream of the start codon and the 100th nucleotide upstream of the end codon. By default, BLAST filtering or secondary structure analysis was not performed.
BIOPREDsi (Novartis Institutes for BioMedical Research) [74]. We used the mRNA sequence as the input. The number of predicted siRNAs was set to be 10.
Promega siRNA Target Designer (Promega Corp.) [75]. We used the mRNA sequence as the input. The RNAi system was set to be the "T7 RiboMAX Express RNAi system". By default, the target length was set to be 19, and the search region was set to be the whole input sequence.
QIAGEN siRNA Design Tool (QIAGEN, Inc.) [76]. We specified the mRNA sequence as the input. The option "Start siRNA sequence with AA" was turned on by default. The BLAST search was not performed. [77]. We used the fulllength mRNA sequence as the input. The option "MPI Principles" was selected. The filtering of ineffective siRNAs based on secondary structures was not performed. By default, the G/C content range was set to be between 30% and 70%, and the search region was restricted to ≥ 100 nucleotides downstream of the CDS.

SDS/MPI (University of Hong Kong)
Whitehead WI siRNA Selection Program (Whitehead Institute for Biomedical Research) [78]. We used the mRNA sequence as the input. By default, the sequence pattern "AAN19TT" was searched for. The G/C content range was set to be between 30% and 70%. Occurrences of 4 or more identical T's, A's or G's in a row were not allowed. Occurrences of 7 or more consecutive (G/C)'s in a row were also not allowed. By default, the checking with BLAST was not performed.
The performance of a siRNA design rule set, or an online siRNA design tool, can be assessed by several parameters. Two of the most often used ones are specificity and sensitivity, as illustrated in Table 5. Specificity is defined as N D / (N B + N D ); and sensitivity is defined as N A /(N A + N C ). An ROC (Receiver Operative Characteristic) curve can be used to visually depict the overall performance of a rule set. The ROC curve is the plot of sensitivity vs. (1-specificity). Another parameter is the positive predictive value (PPV), defined as N A /(N A + N B ). The PPV is a very important parameter in siRNA design practice, because it describes out of the siRNAs predicted to be effective, how big proportion turn out to be truly effective. The value (1-PPV) is sometimes called the "false positive rate".