Skip to main content


We’d like to understand how you use our websites in order to improve them. Register your interest.

Challenging popular tools for the annotation of genetic variations with a real case, pathogenic mutations of lysosomal alpha-galactosidase



Severity gradation of missense mutations is a big challenge for exome annotation. Predictors of deleteriousness that are most frequently used to filter variants found by next generation sequencing, produce qualitative predictions, but also numerical scores. It has never been tested if these scores correlate with disease severity.


wANNOVAR, a popular tool that can generate several different types of deleteriousness-prediction scores, was tested on Fabry disease. This pathology, which is caused by a deficit of lysosomal alpha-galactosidase, has a very large genotypic and phenotypic spectrum and offers the possibility of associating a quantitative measure of the damage caused by mutations to the functioning of the enzyme in the cells. Some predictors, and in particular VEST3 and PolyPhen2 provide scores that correlate with the severity of lysosomal alpha-galactosidase mutations in a statistically significant way.


Sorting disease mutations by severity is possible and offers advantages over binary classification. Dataset for testing and training in silico predictors can be obtained by transient transfection and evaluation of residual activity of mutants in cell extracts. This approach consents to quantitative data for severe, mild and non pathological variants.


Exome sequencing has become very popular for the diagnosis of genetic diseases.This is certainly due to high-throughput platforms that have greatly reduced the costs of sequences and to the tools for the analysis of data that are freely available to researchers. Pipelines for the processing [1] and the annotation of data have been proposed with the intent of “democratizing the ability to compile information on large amounts of genetic variations in individual laboratories” [2]. A critical step in the annotation process is represented by the evaluation of missense mutations. A popular annotation tool, wANNOWAR [3], can generate several different types of deleteriousness-prediction scores running SIFT [4], LRT [5], MutationAssessor [6], FATHMM [7], PROVEAN [8], VEST3 [9] metaSVM [10], metaLR [10], M-CAP [11], PolyPhen-2 [12], MutationTaster [13], CADD [14], DANN [15], fathmm-MKL coding [16], GenoCanyon [17], GERP++, [18, 19], phyloP7way vertebrate, phyloP20way mammalian [20], phastCons7way vertebrate, phastCons 20 way mammalian [21], SiPhy 29way logOdds [22]. However in the real world the situation is not simple. A continuum is observed, ranging from very sever to mild cases. The border between “disease mutation” and “non disease mutation” is artificial and dichotomizing continuous variables is problematic. We decided to address this point and challenge wANNOWAR [3] with a real example, Fabry disease (FD). Mutations that are responsible for this pathology, affect the functioning or the stability of lysosomial alpha-galactosidase (AGAL)(Uniprot: AGAL_HUMAN P06280; EC:, which is encoded by the gene GLA on the X chromosome.

AGAL is a dimer and its structure has been determined by X-ray crystallography [23,24,25]. More than 400 missense mutations have been described so far. This number is a surprisingly high value for a protein of 429 aminoacids and almost every amino acid has been found to be mutated. The large genotypic spectrum corresponds to the large phenotypic spectrum of FD, with respect to age at onset, rate of disease progression, severity of clinical manifestations. Patients with the late onset form of FD retain some AGAL activity and are asymptomatic until adult age when they develop cardiac and/or kidney problems [26,27,28,29]. When a severe mutation is diagnosed, enzyme replacement must be started even before the symptoms are manifested [30,31,32], for cases retaining some residual activity, a therapy with small molecules, can be possible [33,34,35]. Indeed for FD, as well as for other diseases which are due to deficits in lysosomal glycosidases, it is possible to employ iminosugars that stabilize the endogenous protein of the patient acting as pharmacological chaperones or reduce substrate accumulation [36,37,38]. Iminosugars represent a lucky case of drug repositioning because they were first derived to cure HIV and subsequently used to treat lysosomal storage disorders [24, 39,40,41].

The classification of FD genotypes is generally carried out on the base of clinical evaluation of patients [42]. Specialized databases such as [43, 44] annotate mutations with qualitative phenotypes. However a more punctual classification of FD mutations is possible. In fact in order to test the effects of drugs on different mutations, a cell based assay has been developed [45, 46]. Expression vectors encoding mutant AGAL are transiently transfected into COS or HEK293 cells and the residual activity of the enzyme is measured in the extracts of cells that had been treated or not treated with the drug. Residual activity is normalized by the total amount of proteins in the cell (HEK293 or COS) and depends on the stability of the mutant as well as on its specific activity. The ratio between the normalized residual activity of a given mutant and that of wild type AGAL is measured under the same conditions. Part of these data, i.e. those obtained in the absence of the drug, can be “repositioned”, so to speak. They offer the unique possibility of associating a numerical value that correlates to the severity of the damage to hundreds of mutations and consent to evaluate the performance of the most popular predictors of deleterious variant in a realistic scenario of gradual disease severity.


Missense GLA mutations with phenotypic annotation derived from clinical observation of patients were obtained from a disease specific database of clinical phenotypes and genotypes, [43, 44] (dataset 1). The mutations (genomic Reference Sequence and protein Reference sequence) and the phenotypes are reported in the 1st, 10th and last column of Additional file 1, respectively.

Missense GLA mutations with residual activity annotation were obtained from Fabry_CEP [47, 48]. Relative residual activity is the ratio between the activity measured in cell extracts for a given mutant and the activity of wild type AGAL tranfected into suitable eukaryotic vectors × 100. When residual activity for a given mutation had been measured by more than one lab, the average value was considered (dataset 2). The mutations (genomic Reference Sequence and protein Reference sequence) and the residual activities are reported in the 1st, 10th and last column of Additional file 2, respectively.

The nucleotide numbering on coding DNA Reference Sequence was obtained for each mutation from the appropriate reference link in or FABRY_CEP.

Nucleotide mutations were mapped onto the reference genome Ensembl GRCh37 release 91 [49].

We performed statistical analysis and data visualisation using the R environment for statistical computing [50].

We calculated descriptive statistics and drew box-and-whiskers plots of residual activity for severe and mild mutations subpopulations using the graphics::boxplot() function on the intersection of the two datasets.

We manually created a confusion matrix using data from the first dataset (175 mutations, from, Additional file 1), and measured the goodness of wANNOVAR qualitative predictors using the following indexes:

Raw accuracy: \( \frac{TP+ TN}{P+N} \)

Balanced Accuracy: \( 0.5\ \left(\frac{TP}{P}+\frac{TN}{N}\right) \)

F1 score: \( \frac{2\ TP}{2\ TP+ FP+ FN} \)

Matthew’s correlation coefficient: \( \frac{TP\ TN- FP\ FN}{\sqrt{\left( TP+ FP\right)\left( TP+ FN\right)\left( TN+ FP\right)\left( TN+ FN\right)}} \)

We coded an R function for the simultaneous calculation of these indexes.

Using the second dataset (280 mutations, manually built), we expressed the correlation between the rank score of tools and residual activity as Pearson’s r, and then tested for no correlation using the stats::cor.test() function with ‘less’ alternative (i.e. negative correlation) and the ‘pearson’ method. We drew box-and-whiskers plots of residual activity for every wANNOVAR prediction and conservation tool using the graphics::boxplot() function on the second dataset (280 mutations, manually built).

On the same dataset, we used the graphics::barplot() function for drawing the rank scores of the mutations whose activity is equal or higher than that of wild-type for every wANNOVAR prediction and conservation tool.


In some cases manifestations of FD occur at an early age with general, neurological, cardiovascular and renal signs, in other cases in adulthood and with a limited subset of symptoms. For this reason a qualitative phenotypic classification of mutations based on the symptoms observed in the patients, has been attempted and classic or severe ones have been distinguished from mild, late onset or variant forms [42]. [43, 44] provides a list of mutations and their qualitative phenotypic classification. Since FD is X linked and the association between genotype and phenotype is clearer in males [51], only the 175 hemizygous cases have been gathered from and form the first dataset analysed in this paper. The variants were annotated with wANNOVAR [3] and the output is provided in Additional file 1 with the original qualitative phenotypic description in the last column. In the first place it can be noticed that only 51 cases are also present in ClinVar, which is a public archive of reports of the relationships among human variations and phenotypes [52].

To test whether it is possible to broadly distinguish FD mutations collected from by the qualitative predictions provided by wANNOVAR annotation, the observed phenotypes were reduced to two classes, a severe group POS of 152 cases, which clusters mutations originally defined as “severe” or “classic”, and a mild group NEG of 23 cases, which clusters those mutations originally defined as “mild”, “late onset”, “variant” or “atypical variant” in For the predicted phenotypes, if the tool provides binary classification, like in the case of SIFT [4], the more deleterious one, D in the case of SIFT, is considered as predicted POS, the other one, T in the case of SIFT is considered as predicted NEG. If the tool provides multiple classes, as in the case of PolyPhen-2 [12], the most deleterious one, D in the case of PolyPhen-2, is considered as predicted POS, the other ones, P and B in the case of PolyPhen-2, is considered as predicted NEG. The results are summarized in Table 1. Since the two classes have different sizes, Matthews correlation coefficient should be preferred for the evaluation of predictors [53].

Table 1 Accuracy Indexes

For most tools the values are quite low and in some cases no discrimination is possible.

A different way of ordering by severity, relies on the residual activity of AGAL mutants measured in vitro in HEK293 or COS cell transiently transfected with expression plasmids. Values for 280 mutations have been collected gathering results of several laboratories [33, 45, 54,55,56,57,58,59,60,61,62]. They form the second dataset analyzed in this paper. wANNOVAR annotation for these mutants can be found in Additional file 2 with the relative residual activity in the last column.

The intersection between the two datasets is formed by 67 mutations of the severe group POS and 12 of the mild group NEG, for which relative residual activity is available. The median residual activity of severe mutations POS is 0.1 (Fig. 1). This finding suggests that severe cases have null, or very close to null activity, when tested in transfected cells. The box plot in Fig. 1 shows 20% outliers with high residual activity in POS population that might represent an overestimation in the original literature.

Fig. 1

Distribution of residual activities for phenotypically annotated GLA mutations. The boxplot shows the distribution of residual activity in the subpopulations of mutations with severe and mild effects. The red bars represent outliers

Contrary to what occurs in the first dataset of mutations whose phenotypic annotation is derived from clinical literature (Additional file 1), the second dataset, whose annotation is based on residual activity (Additional file 2), is balanced with half of the mutations with values above 0.

The box plot in Fig. 2 shows the distribution of rank scores for mutations showing 0 residual activity. Rank scores were created by wANNOVAR to make the functional prediction scores and conservation scores more comparable to each other and monotonic (a higher score indicating “more likely to be damaging”) [63]. As can be observed FATHMM [7], metaSVM [10], metaLR [10], M-CAP [11]correctly assign high scores to very severe cases. On the other side, the histograms in Fig. 3 show the rank scores assigned by the predictors to 6 non pathological mutation whose residual activity is comparable or higher than that of wild type. The same predictors, FATHMM, metaSVM [10], metaLR [10], M-CAP [11], give a constantly high score and tend to over-estimate the damage caused by a mutation. In Table 2 the correlation between the rank scores of the predicting tools and the residual activity of all the mutations in the second dataset (Additional file 2), is shown. Results obtained by some predictors used by wANNOVAR, for example VEST3 (Pearson correlation coefficient 0.71; p < 0.0001) and PolyPhen-2 (Pearson correlation coefficient − 0.62; p < 0.0001), demonstrate that the rank scores can correlate with severity in a statistically significant manner. Methods based on evolutionary and phylogenetic analysis perform very poorly.

Fig. 2

Distribution of rank scores for mutations with null residual activity. The boxplot show the distribution of the rank scores for all the predictors used by wANNOVAR. The red bars represent outliers. Predictor category label is B for “biologically based prediction method”, ML for “Machine Learning based prediction method”, Meta for “Meta prediction method” and Cons for “Conservation scoring tool”

Fig. 3

Rank scores for mutations with residual activity equal or greater than wild type alpha-galactosidase. The histograms show the rank scores of the six mutations whose residual activity is greater or equal than the wild type alpha-galactosidase, for each of the wANNOVAR predictors. Mutations are color coded, and are detailed inset

Table 2 Correlations


The gene GLA offers an example of the critical points encountered when missense mutations are annotated. In the first place getting the phenotype associated to a mutation is difficult, most information is still missing in databases such as ClinVar [52] and is present only in specialized databases. Mild and severe mutations can be mis-classified in the literature. An example is provided by the mutation D313Y, that is reported as “classic” in, but is regarded as “likely benign/uncertain significance” in ClinVar [52] and is relatively frequent in the population according to ExAC [64] and 1000Genomes [65] (Additional file 1). The residual activity of D313Y is as high as 75% than wild type (Additional file 2) thus suggesting that the interpretation of, which is derived from the original source [66], is overestimated. Other examples are provided by the outliers in Fig. 1. Given these premises, it is not surprising that the tools provided by wANNOWAR cannot distinguish mild from severe mutations as they are defined in the literature.

Hence to train or test algorithms that can grade disease severity, datasets of quantitative measures of the damage caused by mutations to the proteins must be available. In this paper we used data produced by a cell based assay that measures relative residual activity in the cells. We showed that some of the popular tools used for exome analysis, are able to grade disease severity, even though they had not been trained or tested for this specific purpose. A summary of all the tools employed in this study is provided in Additional file 3. The best result was obtained with VEST3 [9] that uses a supervised machine learning algorithm, Random Forest based on 86 sequence features and trained with a positive class of missense variants from the Human Gene Mutation Database and a negative class of common missense variants detected in the Exome Sequencing Project population. In a recent paper Plon and co-workers [67] compared the performance of several algorithms using benign or pathogenic missense variants from the ClinVar database [52]. They found “poor concordance among algorithms, particularly for variants classified as benign by clinical laboratories”. Nevertheless they observed that VEST3 has the lowest rate of false positive calls, i.e. benign variants in ClinVar that are erroneously predicted as pathogenic. This finding suggests that the training protocol employed by VEST3 reduces over-prediction of deleterious variants. The second best result was obtained with PolyPhen-2 [12] that calculates bayesian probabilities and uses eight sequence-based and three structure-based predictive features. Since AGAL structure is known [23,24,25], it is possible that the incorporation of structure-based predictive features contributed to the good results obtained with PolyPhen-2. Two versions of the same program exist. PolyPhen-2 HumDiv is trained with a positive class of mutation causing Mendelian diseases from UniProt and a negative class of variants found in closely related mammalian homologs whereas PolyPhen-2 HumVar is trained with a positive class consisting of all human disease-causing mutations from UniProt and a negative class consisting of nsSNPs without annotated involvement in disease. HumDiv performed slightly better than HumVar. Among the tools that are not limited to exonic missense mutations, CADD is the best performing one. MutationAssessor is the best performing method based on biological principles with a combinatorial entropy formalism. In a previous paper we had shown that the flexibility of the residue where the mutation occurs is the best structural property to predict AGAL mutants residual activity [68]. Results obtained by VEST3, PolyPhen-2, CADD and MutationAssessor are better than those obtained with molecular dynamics (Pearson correlation coefficient R 0.50; p < 0.0001). Although the majority of disease mutations in GLA affect protein stability, methods based on a single structural property perform worse than those relying on several properties.

Admittedly our analysis has two major limitations. In the first place only the programs run by wANNOVAR [3] were considered leaving out those softwares that use three-dimensional structures, for example SDM [69], PoPMuSiC [70] and mCSM [71]. In the second place only one gene was considered. Yet GLA represents a unique case since, to the best of our knowledge, there are few data about residual activity of other mutant proteins. We hope that the effort that was put in place for GLA were extended.


Our paper aims at soliciting a combined effort to produce a large database where the residual activity measured in a cell-based test for diverse proteins is gathered. Indeed this is feasible if cDNA encoding mutants are expressed by transient transfection in suitable mammalian cells. In case of FD, it has been shown that this in vitro test recapitulates what can be observed ex vivo in the cells derived from patients. This approach is not limited to the variants already observed in the patients or in the healthy population and provides data for negative controls too, i.e. mutation that do not affect residual activity. One obvious limitation of the method is that the effect of exonic mutation affecting splicing cannot be evaluated. Once a large dataset from diverse genes is gathered, it could be used to train linear classifiers. We also suggests that programs relying on several features, including structure-based ones, are included in the tools used for the high throughput annotation of data deriving from exome sequencing.



lysosomial alpha-galactosidase


Fabry disease


  1. 1.

    Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43(11.10):11–33.

  2. 2.

    Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc. 2015;10(10):1556–66.

  3. 3.

    Chang X, Wang K. wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet. 2012;49(7):433–6.

  4. 4.

    Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–4.

  5. 5.

    Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19(9):1553–61.

  6. 6.

    Reva B, Antipin Y, Sander C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 2011;39(17):e118.

  7. 7.

    Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GL, Edwards KJ, Day IN, Gaunt TR. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat. 2013;34(1):57–65.

  8. 8.

    Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS One. 2012;7(10):e46688.

  9. 9.

    Carter H, Douville C, Stenson PD, Cooper DN, Karchin R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics. 2013;14(Suppl 3):S3.

  10. 10.

    Dong C, Wei P, Jian X, Gibbs R, Boerwinkle E, Wang K, Liu X. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet. 2015;24(8):2125–37.

  11. 11.

    Jagadeesh KA, Wenger AM, Berger MJ, Guturu H, Stenson PD, Cooper DN, Bernstein JA, Bejerano G. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet. 2016;48(12):1581–6.

  12. 12.

    Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248–9.

  13. 13.

    Schwarz JM, Rodelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods. 2010;7(8):575–6.

  14. 14.

    Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5.

  15. 15.

    Quang D, Chen Y, Xie X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015;31(5):761–3.

  16. 16.

    Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR, Campbell C. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics. 2015;31(10):1536–43.

  17. 17.

    Lu Q, Hu Y, Sun J, Cheng Y, Cheung KH, Zhao H. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci Rep. 2015;5:10576.

  18. 18.

    Cooper GM, Goode DL, Ng SB, Sidow A, Bamshad MJ, Shendure J, Nickerson DA. Single-nucleotide evolutionary constraint scores highlight disease-causing mutations. Nat Methods. 2010;7(4):250–1.

  19. 19.

    Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol. 2010;6(12):e1001025.

  20. 20.

    Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20(1):110–21.

  21. 21.

    Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15(8):1034–50.

  22. 22.

    Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics. 2009;25(12):i54–62.

  23. 23.

    Garman SC, Garboczi DN. The molecular defect leading to Fabry disease: structure of human alpha-galactosidase. J Mol Biol. 2004;337(2):319–35.

  24. 24.

    Lieberman RL, D’Aquino JA, Ringe D, Petsko GA. Effects of pH and iminosugar pharmacological chaperones on lysosomal glycosidase structure and stability. Biochemistry. 2009;48(22):4816–27.

  25. 25.

    Guce AI, Clark NE, Salgado EN, Ivanen DR, Kulminskaya AA, Brumer H 3rd, Garman SC. Catalytic mechanism of human alpha-galactosidase. J Biol Chem. 2010;285(6):3625–32.

  26. 26.

    Mehta A, Hughes DA: Fabry disease. 1993.

  27. 27.

    Germain DP. Fabry disease. Orphanet J Rare Dis. 2010;5:30.

  28. 28.

    Thomas AS, Mehta AB. Difficulties and barriers in diagnosing Fabry disease: what can be learnt from the literature? Expert Opin Med Diagn. 2013;7(6):589–99.

  29. 29.

    Citro V, Cammisa M, Liguori L, Cimmaruta C, Lukas J, Cubellis MV, Andreotti G. The Large Phenotypic Spectrum of Fabry Disease Requires Graduated Diagnosis and Personalized Therapy: A Meta-Analysis Can Help to Differentiate Missense Mutations. Int J Mol Sci. 2016;17(12).

  30. 30.

    Germain DP, Waldek S, Banikazemi M, Bushinsky DA, Charrow J, Desnick RJ, Lee P, Loew T, Vedder AC, Abichandani R, et al. Sustained, long-term renal stabilization after 54 months of agalsidase beta therapy in patients with Fabry disease. J Am Soc Nephrol. 2007;18(5):1547–57.

  31. 31.

    Tondel C, Bostad L, Larsen KK, Hirth A, Vikse BE, Houge G, Svarstad E. Agalsidase benefits renal histology in young patients with Fabry disease. J Am Soc Nephrol. 2013;24(1):137–48.

  32. 32.

    Rombach SM, Smid BE, Bouwman MG, Linthorst GE, Dijkgraaf MG, Hollak CE. Long term enzyme replacement therapy for Fabry disease: effectiveness on kidney, heart and brain. Orphanet J Rare Dis. 2013;8:47.

  33. 33.

    Giugliani R, Waldek S, Germain DP, Nicholls K, Bichet DG, Simosky JK, Bragat AC, Castelli JP, Benjamin ER, Boudes PF. A phase 2 study of migalastat hydrochloride in females with Fabry disease: selection of population, safety and pharmacodynamic effects. Mol Genet Metab. 2013;109(1):86–92.

  34. 34.

    Germain DP, Hughes DA, Nicholls K, Bichet DG, Giugliani R, Wilcox WR, Feliciani C, Shankar SP, Ezgu F, Amartino H, et al. Treatment of Fabry's disease with the pharmacologic chaperone Migalastat. N Engl J Med. 2016;375(6):545–55.

  35. 35.

    Benjamin ER, Della Valle MC, Wu X, Katz E, Pruthi F, Bond S, Bronfin B, Williams H, Yu J, Bichet DG, et al. The validation of pharmacogenetics for the identification of Fabry patients to be treated with migalastat. Genet Med. 2017;19(4):430–8.

  36. 36.

    Guce AI, Clark NE, Rogich JJ, Garman SC. The molecular basis of pharmacological chaperoning in human alpha-galactosidase. Chem Biol. 2011;18(12):1521–6.

  37. 37.

    Haneef SA, Doss CG. Personalized Pharmacoperones for Lysosomal storage disorder: approach for next-generation treatment. Adv Protein Chem Struct Biol. 2016;102:225–65.

  38. 38.

    Cox TM. Substrate reduction therapy for lysosomal storage diseases. Acta Paediatr Suppl. 2005;94(447):69–75 discussion 57.

  39. 39.

    Platt FM, Neises GR, Dwek RA, Butters TD. N-butyldeoxynojirimycin is a novel inhibitor of glycolipid biosynthesis. J Biol Chem. 1994;269(11):8362–5.

  40. 40.

    Tierney M, Pottage J, Kessler H, Fischl M, Richman D, Merigan T, Powderly W, Smith S, Karim A, Sherman J, et al. The tolerability and pharmacokinetics of N-butyl-deoxynojirimycin in patients with advanced HIV disease (ACTG 100). The AIDS Clinical Trials Group (ACTG) of the National Institute of Allergy and Infectious Diseases. J Acquir Immune Defic Syndr Hum Retrovirol. 1995;10(5):549–53.

  41. 41.

    Hay Mele B, Citro V, Andreotti G, Cubellis MV. Drug repositioning can accelerate discovery of pharmacological chaperones. Orphanet J Rare Dis. 2015;10:55.

  42. 42.

    Whybra C, Bahner F, Baron K: Measurement of disease severity and progression in Fabry disease. 2006.

  43. 43.

    Saito S, Ohno K, Sese J, Sugawara K, Sakuraba H. Prediction of the clinical phenotype of Fabry disease based on protein sequential and structural information. J Hum Genet. 2010;55(3):175–8.

  44. 44.

    Saito S, Ohno K, Sakuraba H. database of the clinical phenotypes, genotypes and mutant alpha-galactosidase a structures in Fabry disease. J Hum Genet. 2011;56(6):467–8.

  45. 45.

    Wu X, Katz E, Della Valle MC, Mascioli K, Flanagan JJ, Castelli JP, Schiffmann R, Boudes P, Lockhart DJ, Valenzano KJ, et al. A pharmacogenetic approach to identify mutant forms of alpha-galactosidase a that respond to a pharmacological chaperone for Fabry disease. Hum Mutat. 2011;32(8):965–77.

  46. 46.

    Lukas J, Knospe AM, Seemann S, Citro V, Cubellis MV, Rolfs A. In vitro enzyme measurement to test pharmacological chaperone responsiveness in Fabry and Pompe disease. J Vis Exp. 2017;130.

  47. 47.

    Andreotti G, Guarracino MR, Cammisa M, Correra A, Cubellis MV. Prediction of the responsiveness to pharmacological chaperones: lysosomal human alpha-galactosidase, a case of study. Orphanet J Rare Dis. 2010;5:36.

  48. 48.

    Cammisa M, Correra A, Andreotti G, Cubellis MV. Fabry_CEP: a tool to identify Fabry mutations responsive to pharmacological chaperones. Orphanet J Rare Dis. 2013;8:111.

  49. 49.

    Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Giron CG, et al. Ensembl 2018. Nucleic Acids Res. 2018;46(D1):D754–61.

  50. 50.

    Team RC: R: a language and environment for statistical computing. 2013.

  51. 51.

    Echevarria L, Benistan K, Toussaint A, Dubourg O, Hagege AA, Eladari D, Jabbour F, Beldjord C, De Mazancourt P, Germain DP. X-chromosome inactivation in female patients with Fabry disease. Clin Genet. 2016;89(1):44–54.

  52. 52.

    Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44(D1):D862–8.

  53. 53.

    Chicco D. Ten quick tips for machine learning in computational biology. BioData Min. 2017;10:35.

  54. 54.

    Ishii S, Suzuki Y, Fan JQ. Role of Ser-65 in the activity of alpha-galactosidase a: characterization of a point mutation (S65T) detected in a patient with Fabry disease. Arch Biochem Biophys. 2000;377(2):228–33.

  55. 55.

    Spada M, Pagliardini S, Yasuda M, Tukel T, Thiagarajan G, Sakuraba H, Ponzone A, Desnick RJ. High incidence of later-onset fabry disease revealed by newborn screening. Am J Hum Genet. 2006;79(1):31–40.

  56. 56.

    Park JY, Kim GH, Kim SS, Ko JM, Lee JJ, Yoo HW. Effects of a chemical chaperone on genetic mutations in alpha-galactosidase a in Korean patients with Fabry disease. Exp Mol Med. 2009;41(1):1–7.

  57. 57.

    Filoni C, Caciotti A, Carraresi L, Cavicchi C, Parini R, Antuzzi D, Zampetti A, Feriozzi S, Poisetti P, Garman SC, et al. Functional studies of new GLA gene mutations leading to conformational Fabry disease. Biochim Biophys Acta. 2010;1802(2):247–52.

  58. 58.

    Andreotti G, Citro V, De Crescenzo A, Orlando P, Cammisa M, Correra A, Cubellis MV. Therapy of Fabry disease with pharmacological chaperones: from in silico predictions to in vitro tests. Orphanet J Rare Dis. 2011;6:66.

  59. 59.

    Lukas J, Giese AK, Markoff A, Grittner U, Kolodny E, Mascher H, Lackner KJ, Meyer W, Wree P, Saviouk V, et al. Functional characterisation of alpha-galactosidase a mutations as a basis for a new classification system in fabry disease. PLoS Genet. 2013;9(8):e1003632.

  60. 60.

    Andreotti G, Citro V, Correra A, Cubellis MV. A thermodynamic assay to test pharmacological chaperones for Fabry disease. Biochim Biophys Acta. 2014;1840(3):1214–24.

  61. 61.

    Citro V, Pena-Garcia J, den-Haan H, Perez-Sanchez H, Del Prete R, Liguori L, Cimmaruta C, Lukas J, Cubellis MV, Andreotti G. Identification of an allosteric binding site on human Lysosomal alpha-Galactosidase opens the way to new pharmacological chaperones for Fabry disease. PLoS One. 2016;11(10):e0165463.

  62. 62.

    Lukas J, Scalia S, Eichler S, Pockrandt AM, Dehn N, Cozma C, Giese AK, Rolfs A. Functional and clinical consequences of novel alpha-Galactosidase a mutations in Fabry disease. Hum Mutat. 2016;37(1):43–51.

  63. 63.

    Liu X, Wu C, Li C, Boerwinkle E. dbNSFP v3.0: a one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum Mutat. 2016;37(3):235–41.

  64. 64.

    Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91.

  65. 65.

    Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65.

  66. 66.

    Eng CM, Resnick-Silverman LA, Niehaus DJ, Astrin KH, Desnick RJ. Nature and frequency of mutations in the alpha-galactosidase a gene that cause Fabry disease. Am J Hum Genet. 1993;53(6):1186–97.

  67. 67.

    Ghosh R, Oak N, Plon SE. Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines. Genome Biol. 2017;18(1):225.

  68. 68.

    Cubellis MV, Baaden M, Andreotti G. Taming molecular flexibility to tackle rare diseases. Biochimie. 2015;113:54–8.

  69. 69.

    Pandurangan AP, Ochoa-Montano B, Ascher DB, Blundell TL. SDM: a server for predicting effects of mutations on protein stability. Nucleic Acids Res. 2017;45(W1):W229–35.

  70. 70.

    Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinform. 2011;12:151.

  71. 71.

    Pires DE, Ascher DB, Blundell TL. mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics. 2014;30(3):335–42.

Download references


This paper is dedicated to our friend and colleague M.Malanga.


Publication costs for this manuscript were sponsored by a grant from MIUR PRIN 2015 2015JHLY35 (to MVC). The funding body had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materials

All data are available.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 19 Supplement 15, 2018: Proceedings of the 12th International BBCC conference. The full contents of the supplement are available online at

Author information




CC, VC and LL collected the data, GA conceived the experiments, MVC wrote the paper, BH-M carried out statistical analysis and conceived the paper. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Maria Vittoria Cubellis.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

wANNOVAR annotated GLA mutation with qualitative phenotypes. (XLSX 121 kb)

Additional file 2:

wANNOVAR annotated GLA mutation with relative residual activities. (XLSX 212 kb)

Additional file 3

Summary of the predictors evaluated in this study and their main characteristics (XLSX 13 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cimmaruta, C., Citro, V., Andreotti, G. et al. Challenging popular tools for the annotation of genetic variations with a real case, pathogenic mutations of lysosomal alpha-galactosidase. BMC Bioinformatics 19, 433 (2018).

Download citation


  • Rare disease
  • Clinical informatics
  • Variant analysis
  • Bioinformatics
  • Fabry disease