Volume 12 Supplement 1
Gene-gene interaction filtering with ensemble of filters
© Yang et al; licensee BioMed Central Ltd. 2011
Published: 15 February 2011
Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the ‘unstable’ results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance.
We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional χ2-test and odds ratio methods in terms of retaining gene-gene interactions.
The advancement of high-throughput genome-wide association (GWA) studies has tremendously improved our understanding of the genetic basis of many common complex diseases . Under the assumption that common diseases are associated with common variants, GWA studies often aim to identify a set of single nucleotide polymorphisms (SNPs) that are statistically associated with a target disease. Typically, this is achieved by adopting a case-control study design that perspectively identify genotypes (SNP combinations) that distinguish individuals who have a certain disease (case) from a control population of individuals (control) .
Several recent studies indicate that many complex traits cannot be explained by any single SNP variants and the characterization of gene-gene interactions and gene-environment interactions may be the key to understand the underlying pathogenesis of these complex diseases [3–5]. For this reason, several methods have been developed to jointly evaluate SNP and environmental factors with the aim to identify gene-gene and gene-environment interactions that have major contributions to complex diseases . These methods analyze genetic factors in a combinatorial manner when applied to SNP dataset with case and control samples. Therefore, we shall refer to them as combinatorial methods. Popular combinatorial methods include random forests based algorithms [7, 8], multifactor dimensionality reduction (MDR) [9, 10], Bayesian based algorithms , and evolutionary approaches [12, 13].
Combinatorial methods are computationally intensive and the computation time increases exponentially with the number of SNPs considered. Therefore, it is of great interest to perform a filtering step prior to the combinatorial evaluation to remove as many irrelevant SNPs as possible . This is commonly known as the two-step analysis approach . As discussed in a number of recent reviews [3, 4, 15], a good filtering algorithm is of critical importance since if the functional SNPs are removed by the filter, the subsequent combinatorial analysis will be in vain.
For categorical data such as genotypes of SNPs, univariate filtering algorithms including χ2-test and odds ratio are commonly used. However, these methods consider the association between each SNP and the class label independently of other SNPs in the dataset . Therefore they may filter out SNP pairs that have strong interaction effects but have weak individual association with the phenotype . Recently, new multivariate approaches known as “ReliefF-based” filtering algorithms were proposed. This series of new methods, including ReliefF , tuned ReliefF (TuRF) , and Spatially Uniform ReliefF (SURF)  takes into account dependencies between attributes . This is critical for preserving and prioritizing potential gene-gene interactions in SNP filtering .
Although ReliefF-based methods have gained much attention and have been applied to several association studies (e.g., ; and ), we found that filtering results produced by ReliefF and TuRF are sensitive to the order of samples presented in the dataset. By investigating the ReliefF algorithm, we identify that such a sample order dependency is related to an intrinsic tie-breaking procedure inherited in the k-nearest neighbors (k NN) routine. It causes a partial utilization of neighbor information, leading ReliefF and TuRF to generate unstable results. While such an unstable behavior appears to be undesirable, it is an important characteristic for ensemble learning .
In this study, we propose an ensemble approach to obtain a more faithful survey of the set of nearest neighbors to each target sample. This is accomplished by aggregating the ranking score generated from multiple filters on datasets with permutated sample order. The proposed ensemble approach extends the idea of a classification-oriented ensemble feature selection method  which uses a bootstrap sampling procedure with multiple filters to produce different rankings. However, the proposed ensemble approach is more powerful because the entire dataset (in contrast to a bootstrap subset) is used for ensemble learning.
Using simulated and real SNP datasets, we demonstrate that the proposed approach not only can generate much more stable SNP ranking results, the ensemble of TuRF can vastly improve the success rate of retaining functional SNP pairs compared to many other traditional as well as state-of-the-art SNP filtering methods.
Consider a GWA study consisting of N SNPs and M samples. Let us define each SNP in the study as g j and each sample as s i where j = 1...N and i = 1...M. The aim of the filtering procedure is to produce a ranking score defined as W(g j ) and commonly refers to as weight. This score represents the ability of each SNP g j to separate samples between the case and control groups, and the filtering is done by removing those with low ranking scores according to a pre-defined threshold.
Existing ReliefF-based algorithms
In ReliefF algorithm, the weight score of each SNP, W(g j ), is updated at each iteration as follows :
W(g j ) = W(g j ) – D(g j ,s i ,h k )/M + D(g j ,s i ,m k )/M (1)
Using pseudocode, we can outline the ReliefF algorithm in Algorithm 1.
The ReliefF algorithm calculates the distance between different samples using the genotype information of all SNPs. However, such a procedure is sensitive to noise in the dataset.
Algorithm 1 ReliefF
1: for j = 1 to N do
2: initiate(W(g j ));
3: end for
4: for i = 1 to M do
5: s i = randomSelect(sampleSize);
6: H = findHitNeighbours(s i ,K); (h1...h K ∈ H )
7: M = findMissNeighbours(s i ,K); (m1...m K ∈ M)
8: for j = 1 to N do
9: for k = 1 to K do
10: W(g j ) = W(g j ) – D(g j , s i , h k )/M + D(g j , s i , m k )/M
11: end for
12: end for
13: end for
TuRF  aims to improve the performance of the ReliefF algorithm in SNP filtering by adding an iterative component. The signal-to-noise ratio is enhanced significantly by recursively removing the low-ranked SNPs in each iteration. Specifically, if the number of iteration of this algorithm is set to R, it removes the N/R lowest ranking (i.e., least discriminative) SNPs in each iteration, where N is the total number of SNPs. The pseudocode for TuRF is shown in Algorithm 2.
Algorithm 2 TuRF
1: for i = 1 to R do
2: apply ReliefF(M,K);
4: removeLowSNP(N / R);
5: end for
6: return last ReliefF estimate for each SNP
We follow the same configuration in previous studies [18, 19, 25], in which exhaustive sample selection (i.e., M is set to be the number of training instance, and the order of sample to be evaluated is the same as the order presented in the dataset) is adopted, K = 10 of nearest neighbors is used, and 10 iterations (R = 10) for TuRF is applied.
Ensemble of ReliefF and TuRF
We find that the ReliefF algorithm is sensitive to the order of samples used to calculate the SNP ranking score (Eq. 1). That is, running these algorithms on the same dataset in which the order of the samples is permuted (while maintaining the sample-class label association), leads to different SNP ranking results.
Such a sample order dependency is related to the assignment of “hit” and “miss” nearest neighbors of each sample (lines 6 and 7 of Algorithm 1). Since K nearest neighbors are calculated by comparing the distance between each sample in the dataset (using all the SNP attributes) and the target sample (s i in Algorithm 1), a tie occurs when more than K samples have a distance equal or less than the K th nearest neighbor of s i . We can show that the sample order dependency can be caused by using any tie breaking procedure which forces exactly K samples out of all possible candidates to be the nearest neighbors of s i , which causes a different assignment of “hit” and “miss” of nearest neighbors when the sample order is permuted.
Aiming to increase the stability and the power of SNP-SNP interaction filtering, here we propose an algorithm that (1) preserves the general algorithmic principle of ReliefF, and (2) make use of all the information embedded in all the tied samples when a tie-breaking situation occurs. To achieve this, we use a rank score aggregation approach that adhere to the general principle of ensemble learning . From our analysis of the tie-breaking problem aforementioned, it is clear that a different set of samples may be assigned to be a sample’s nearest neighbors. Therefore, the result of a single run of ReliefF utilizes only partial information embedded in the full set of the nearest neighbors. In order words, the results from multiple runs of ReliefF using the dataset with permuted sample order should contains complementary information about how well each set of SNPs can discriminate between the two classes (case vs. control). In this sense, we can potentially harness the “diversity”  of ranking results from multiple execution with permuted sample order using an ensemble-based methods to produce more stable and accurate SNP ranking results.
where W ensemble (.) is the ensemble weight and h l (.) is the hypothesis of a filter algorithm obtained from the permuted dataset D l .
Similarly, the ensemble of TuRF (called TuRF-E) performs multiple runs of TuRF, and aggregates the ranking scores of each SNP produced in each iteration of TuRF using Eq. 4.
Analysis of simulated and real-world datasets
Summary of simulation datasets. Each model contains 100 datasets.
case: 200; control: 200
case: 200; control: 200
case: 200; control: 200
case: 200; control: 200
case: 400; control: 400
case: 400; control: 400
case: 400; control: 400
case: 400; control: 400
A GWA study dataset generated from case-control design of age-related macular degeneration (AMD) samples  is also used to illustrate the sample order dependency of ReliefF and TuRF when applied to real SNP datasets. The AMD dataset contains 96 cases and 50 controls, with the genotype of 116,212 SNPs for each sample.
To demonstrate the sample order dependency, a dataset is analyzed by each of the four filtering algorithms (ReliefF, TuRF, ReliefF-E, and TuRF-E) twice in which a different permutation of sample order is used for each run of the algorithm (yet we note that similar results are consistently obtained in all repeated experiments with different permutation of sample order). A Pearson correlation coefficient, r, is calculated using the rank of all SNPs generated from the two runs. A SNP filtering algorithm that obey sample order invariance should produce the same SNP ranking regardless of the sample order (i.e., r = 1).
These simulated datasets (Table 1) are also used to investigate how well each algorithm can retain functional SNP pair while performing SNP filtering. Methods included in this comparison are traditional filters: χ2-test and odds ratio; ReliefF-based filters: ReliefF, TuRF, and SURFTuRF; and two ensemble filters: ReliefF-E and TuRF-E. Specifically, we apply these seven filtering algorithms to the eight simulated models and compare the success rate of each method in filtering 100 datasets of each model. The success rate is defined as the number of times a give filtering algorithm is able to retain the interaction SNP pair in the dimension reduced subset in 100 datasets. The dimension of the dataset is divided in percentile. For a dataset with a SNP size of 1000, the percentile of 1 includes the top 10 SNPs while the percentile of 10 includes the top 100 SNPs. Therefore, if we reduce the dimension of the dataset to 100 SNPs (that is, the percentile of 10), and the interaction SNP pair is within this 100 SNPs, we say the filter successfully retained the interaction SNP pair at the percentile of 10.
We use graph to present the success rate of each method from 1 to 50 percentile, and we quantify the overall success rate of each method by an average cumulative success rate which is computed as the sum of success rate from percentile 1 to 50 divided by 50.
The effect of the sample order dependency
By using the ensemble approach (an ensemble size of 50; see Section “Determining ensemble size” for details), we are able to stabilize the ranking results of both ReliefF and TuRF. Especially, TuRF-E can significantly increase the stability of the SNP ranking results of TuRF, with a r = 0.97 for the dataset with 400 samples and a r = 0.95 for the dataset with 800 samples.
Similar results were obtained when the AMD dataset was analyzed (Figure 1(c)). The results illustrate that the sample order instability is indeed a problem in analyzing real biological datasets with ReliefF and TuRF. The use of our ensemble approach increases stability and this is evident from the increase of the ranking correlation to a r = 0.99 for ReliefF and a r = 0.98 for TuRF.
The origin of the sample order dependency
One tempting way to solve such a sample order dependency is to use a randomize procedure to select a sample randomly when a tie occur. However, our experiments indicate that such a procedure does not increase the correlation (data not shown). In fact, any tie-breaking procedure which chooses one sample out of all valid candidate samples will necessarily produce instability in its resulting ranking score.
Another way to solve such a sample order dependency can be achieved by defining nearest neighbors to a sample as the ones that are within a certain distance threshold of the target sample. A recently developed variant algorithm of ReliefF called SURF (Spatially Uniform ReliefF)  employed this idea. However, by doing so, the algorithm will rely directly on a predefined threshold for nearest neighbors selection, which may negatively affect the result giving the sample sparsity in high-dimensional space. Therefore, such an approach lack the robustness of the rank based k NN criteria. Indeed, our evaluation (which is presented in later section) confirmed that SURF does not fully recover the SNP filtering capability. As discussed later in this paper, our ensemble approach, which rely on sample ranking instead of direct thresholding, gives consistently better results.
Determining ensemble size
Ensemble approach to improve retention rate of functional SNP pairs in SNP filtering
Average cumulative success rate from percentile 1 to 50 using the simulated datasets (400 and 800 samples). The best algorithm with the highest average cumulative success rate in each dataset is shown in bold.
Heritability = 0.05
Heritability = 0.1
Heritability = 0.2
Heritability = 0.3
Simulated dataset with 400 samples
Simulated dataset with 800 samples
ReliefF-E only has marginal improvement over ReliefF whereas TuRF-E achieves a significant improvement over TuRF. This is likely due to the fact that the TuRF algorithm executes ReliefF multiple times while removing low ranking SNPs in each iteration. Therefore, the ensemble approach could accumulate more information in each iteration. It is also observed that SURFTuRF does not improve on TuRF in analyzing datasets of 400 samples. This is consistent with our hypothesis that a predefined distance threshold in SURFTuRF may be sensitive to high SNP-to-sample ratio (thus, high-dimensionality). Moreover, The standard deviations of ReliefF and TuRF are generally much larger than their ensemble version, indicating the sample order dependency also affecting the stability of the success rate of SNP-SNP filtering.
The field of gene-gene and gene-environment interaction identification from GWAS data is still young and rapidly developing. One of the biggest challenges in identification of such interaction relationship is computational efficiency since in the worst case an exponentially large number of SNP combinations need to be evaluated. As discussed by a number of authors [3, 4, 15], effective SNP filtering can greatly reduce the computational burden of the subsequent combinatorial evaluation by removing a large portion of noise. The main advantage of using ReliefF based algorithms for SNP filtering is that they can detect conditional dependencies between attributes . Furthermore, they are computationally efficient. A good implementation of TuRF can analyze a GWA study data with up a few hundred samples in the order of minutes. Such computationally efficiency, coupled with its intrinsic ability in detecting SNP dependencies, has led to its increasing wide-spread applications.
Through analyzing the ReliefF-based algorithms, we discovered a previously unknown anomaly in both ReliefF and TuRF. We show these two popular filtering algorithms are sensitive to sample ordering, therefore, giving unstable and suboptimal SNP ranking in different runs when sample order is permuted. However, we found that such an unstable behavior can be effectively utilized in an ensemble learning framework. Using a simple aggregation procedure based on the general theory of ensemble learning, we can vastly improve the stability and reliability of the SNP ranking generated by these algorithms.
ReliefF based algorithms are also used to perform feature selection tasks for a range of machine learning problems including gene selection in microarray analysis. This implies our findings are not limited to the field of gene-gene interaction identification in GWA studies, and may have relevance to the broader machine learning community. Although we recognize that the sample order sensitivity problem is of less relevant to continuous datasets since tie-breaking is less likely to occur, the potential problem caused by tie-breaking in a k NN procedure is still noteworthy in the development of new algorithms.
Our work indicates that new algorithms should be validated against a range of criteria. Many bioinformatics algorithms have been developed to perform such filtering task. These algorithms are mostly assessed and compared based on its objective, in our situation, how well can a filtering algorithm retain functional SNP pairs. However, much less focus has been placed on analyzing whether the results generated by a SNP filtering algorithm satisfy a set of desirable properties. The sample order dependency property in this paper is one such example as it is not natural to expect the SNP ranking to change due to reordering the samples in a dataset. In fact, the importance of validating a bioinformatics algorithm and its software implementation is increasingly being recognized , and we believe that systematically validating an algorithm against a range of desirable property of its behavior is becoming increasingly important as biological interpretation are increasingly drawn from results produced by bioinformatics programs.
We proposed an ensemble approach for gene-gene interaction filtering of GWA study dataset. Our approach aggregates the ranking scores of each SNP generated from multiple runs of RelieF or TuRF with sample-order permuted datasets. Such an ensemble method is robust to sample order dependency observed in commonly used ReliefF and TuRF algorithm. Based on the analysis using a number of real and simulated datasets, we demonstrated that the proposed approach can produce much more stable SNP ranking. In addition, the ensemble of TuRF performed the best in retaining interaction SNP pairs, superseding the performance of other traditional methods as well as state-of-the-art ReliefF-based algorithms.
The software of ReliefF-E and TuRF-E are available from:
P Yang is supported by a NICTA International Postgraduate Award (NIPA) and a NICTA Research Project Award (NRPA). JWK Ho is supported by an Australian Postgraduate Award (APA) and a NRPA. This work is supported in part by the Australian Research Council (ARC) through grant DP0770395 (YH Yang).
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S1.
- The Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447: 661–678. 10.1038/nature05911PubMed CentralView Article
- Iles M: What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet 2008, 4(2):e33. 10.1371/journal.pgen.0040033PubMed CentralView ArticlePubMed
- Thomas D: Gene-environment-wide association studies: Emerging approaches. Nat. Rev. Genet 2010, 11: 259–272. 10.1038/nrg2764PubMed CentralView ArticlePubMed
- Cordell H: Detecting gene-gene interactions that underlie human diseases. Nat. Rev. Genet 2009, 10(6):392–404. 10.1038/nrg2579PubMed CentralView ArticlePubMed
- Phillips P: Epistasis - The essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet 2008, 9(11):855–867. 10.1038/nrg2452PubMed CentralView ArticlePubMed
- Musani S, Shriner D, Liu N, Feng R, Coffey C, Yi N, Tiwari H, Allison D: Detection of gene× gene interactions in genome-wide association studies of human population data. Hum. Hered 2007, 63(2):67–84. 10.1159/000099179View ArticlePubMed
- Bureau A, Dupuis J, Falls K, Lunetta K, Hayward B, Keith T, Van Eerdewegh P: Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol 2005, 28(2):171–182. 10.1002/gepi.20041View ArticlePubMed
- Chen X, Liu C, Zhang M, Zhang H: A forest-based approach to identifying gene and gene-gene interactions. Proc. Natl. Acad. Sci. U.S.A 2007, 104(49):19199–19203. 10.1073/pnas.0709868104PubMed CentralView ArticlePubMed
- Hahn L, Ritchie M, Moore J: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 2003, 19(3):376–382. 10.1093/bioinformatics/btf869View ArticlePubMed
- Chung Y, Lee S, Elston R, Park T: Odds ratio based multifactor-dimensionality reduction method for detecting gene-gene interactions. Bioinformatics 2007, 23: 71–76. 10.1093/bioinformatics/btl557View ArticlePubMed
- Zhang Y, Liu J: Bayesian inference of epistatic interactions in case-control studies. Nat. Genet 2007, 39(9):1167–1173. 10.1038/ng2110View ArticlePubMed
- Yang P, Ho J, Zomaya A, Zhou B: A genetic ensemble approach for gene-gene interaction identification. BMC bioinformatics 2010, 11: 524. 10.1186/1471-2105-11-524PubMed CentralView ArticlePubMed
- Ritchie M, White B, Parker J, Hahn L, Moore J: Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC bioinformatics 2003, 4: 28. 10.1186/1471-2105-4-28PubMed CentralView ArticlePubMed
- McKinney B, Reif D, White B, Crowe J Jr, Moore J: Evaporative cooling feature selection for genotypic data involving interactions. Bioinformatics 2007, 23(16):2113–2120. 10.1093/bioinformatics/btm317PubMed CentralView ArticlePubMed
- Moore J, Asselbergs F, Williams S: Bioinformatics challenges for genome-wide association studies. Bioinformatics 2010, 26(4):445–455. 10.1093/bioinformatics/btp713PubMed CentralView ArticlePubMed
- Hoh J, Wille A, Ott J: Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res 2001, 11(12):2115–2119. 10.1101/gr.204001PubMed CentralView ArticlePubMed
- Robnik-Šikonja M, Kononenko I: Theoretical and empirical analysis of ReliefF and RReliefF. Mach.Learn 2003, 53: 23–69. 10.1023/A:1025667309714View Article
- Moore J, White B: Tuning ReliefF for genome-wide genetic analysis. Proceedings of the 5th European Conference on EvoBIO 2007, 166–175.
- Greene C, Penrod N, Kiralis J, Moore J: Spatially uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Mining 2009, 2: 5. 10.1186/1756-0381-2-5PubMed CentralView ArticlePubMed
- Moore J, Williams S: Epistasis and its implications for personal genetics. Am. J. Hum. Genet 2009, 85(3):309–320. 10.1016/j.ajhg.2009.08.006PubMed CentralView ArticlePubMed
- Andrew A, Gui J, Sanderson A, Mason R, Morlock E, Schned A, Kelsey K, Marsit C, Moore J, Karagas M: Bladder cancer SNP panel predicts susceptibility and survival. Hum. Genet 2009, 125(5):527–539. 10.1007/s00439-009-0645-6PubMed CentralView ArticlePubMed
- Qi Y, Niu W, Zhu T, Zhou W, Qiu C: Synergistic effect of the genetic polymorphisms of the renin-angiotensin-aldosterone system on high-altitude pulmonary edema: a study from Qinghai-Tibet altitude. Eur. J. Epidemiol 2008, 23(2):143–152. 10.1007/s10654-007-9208-0View ArticlePubMed
- Dietterich T: Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems. Springer-Verlag London, UK; 2000:1–15. full_textView Article
- Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 2010, 26(3):392–398. 10.1093/bioinformatics/btp630View ArticlePubMed
- McKinney B, Crowe J Jr, Guo J, Tian D: Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet 2009, 5(3):e1000432. 10.1371/journal.pgen.1000432PubMed CentralView ArticlePubMed
- Kuncheva L, Whitaker C: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn 2003, 51(2):181–207. 10.1023/A:1022859003006View Article
- Moore J, Hahn L, Ritchie M, Thornton T, White B: Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. Proceedings of the Genetic and Evolutionary Computation Conference 2002, 1150–1155.
- Klein R, Zeiss C, Chew E, Tsai J, Sackler R, Haynes C, Henning A, SanGiovanni J, Mane S, Mayne S, et al.: Complement factor H polymorphism in age-related macular degeneration. Science 2005, 308(5720):385–389. 10.1126/science.1109557PubMed CentralView ArticlePubMed
- Chen TY, Ho JWK, Liu H, Xie X: An innovative approach for testing bioinformatics programs using metamorphic testing. BMC Bioinformatics 2009, 10: 24. 10.1186/1471-2105-10-24PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.