Application of Wavelet Packet Transform to detect genetic polymorphisms by the analysis of inter-Alu PCR patterns
© Cardelli et al; licensee BioMed Central Ltd. 2010
Received: 9 December 2009
Accepted: 9 December 2010
Published: 9 December 2010
The analysis of Inter-Alu PCR patterns obtained from human genomic DNA samples is a promising technique for a simultaneous analysis of many genomic loci flanked by Alu repetitive sequences in order to detect the presence of genetic polymorphisms. Inter-Alu PCR products may be separated and analyzed by capillary electrophoresis using an automatic sequencer that generates a complex pattern of peaks. We propose an algorithmic method based on the Haar-Walsh Wavelet Packet Transformation (WPT) for an efficient detection of fingerprint-type patterns generated by PCR-based methodologies. We have tested our algorithmic approach on inter-Alu patterns obtained from the genomic DNA of three couples of monozygotic twins, expecting that the inter-Alu patterns of each twins couple will show differences due to unavoidable experimental variability. On the contrary the differences among samples of different twins are supposed to originate from genetic variability. Our goal is to automatically detect regions in the inter-Alu pattern likely associated to the presence of genetic polymorphisms.
We show that the WPT algorithm provides a reliable tool to identify sample to sample differences in complex peak patterns, reducing the possible errors and limits associated to a subjective evaluation. The redundant decomposition of the WPT algorithm allows for a procedure of best basis selection which maximizes the pattern differences at the lowest possible scale. Our analysis points out few classifying signal regions that could indicate the presence of possible genetic polymorphisms.
The WPT algorithm based on the Haar-Walsh wavelet is an efficient tool for a non-supervised pattern classification of inter-ALU signals provided by a genetic analyzer, even if it was not possible to estimate the power and false positive rate due to the lacking of a suitable data base. The identification of non-reproducible peaks is usually accomplished comparing different experimental replicates of each sample. Moreover, we remark that, albeit we developed and optimized an algorithm able to analyze patterns obtained through inter-Alu PCR, the method is theoretically applicable to whatever fingerprint-type pattern obtained analyzing anonymous DNA fragments through capillary electrophoresis, and it could be usefully applied on a wide range of fingerprint-type methodologies.
Many analytical methodologies in modern genetics and biochemistry are based on the analysis of complex mixtures of oligonucleotides or oligopeptides, which are resolved as complex patterns of peaks or bands often referred as "fingerprint type" patterns. When the analysis is performed at the DNA or RNA level, fingerprint type patterns can be generated by gel or capillary electrophoresis of nucleic acid sequences produced by PCR (Polymerase Chain Reaction) -based techniques, such as Random Amplified Polymorphic DNA (RAPD) , Arbitrarily Primed PCR (AP-PCR) , Simple Sequence Repeat anchored Polymerase Chain Reaction amplification (SSR-PCR) , Differential Display Reverse Transcription (DDRT) PCR , AFLP , inter-Alu PCR . All these methodologies allow for a screening of several (up to some hundreds) nucleic acid fragments that correspond to different loci, without making any a priori assumption about their exact sequence and genomic localization. The comparative analysis of patterns obtained in different samples reveals its utility in the most disparate fields of biological research: as examples we recall the identification of genes overexpressed in tumors , the identification of genetic variability at different levels (individuals, populations, species) [7–9] and the discovering of genomic loci associated with human longevity . Among DNA fingerprinting techniques, inter-Alu PCR [6, 11, 12] is of particular interest, being characterized by the highest information level . Alu repeat sequences are ubiquitously distributed in the human genome with more than one million elements . A genomic DNA fragment can be amplified with a single Alu-specific primer when it is flanked by two Alu elements which have opposite orientation and a distance within few kilobases. A PCR reaction conducted with one ore more primers complementary to Alu sequences produces a multitude of anonymous DNA amplification products that can be revealed by electrophoretic separation. A typical inter-Alu pattern often shows inter-individual variability, due to genetic polymorphisms of different types: length variation of intervening sequences, de novo insertion of flanking Alu elements, deletions, translocations, and mutation of priming sites [13, 15, 16]. In general, this approach can be used for the initial detection of polymorphic loci involved in quantitative, multigenic traits [10, 17] or of germline and somatic mutations [18, 19] or of genetic alterations in cancer cells [20–23]. In a previous study , we developed a variant of inter-Alu PCR, which uses two different Alu-specific primers labeled with different fluorochromes in the same PCR reaction; the resulting PCR products can be analyzed by capillary electrophoresis and fluorescent detection on a PE/ABI Genetic Analyzer, and reported by the instrument as distinct fluorescence peaks; many of the peaks generated by this method are smaller than 1 Kb and, given that the frequency peaks of Alu elements in the human genome are centered at 0.1 Alu/kb and 1 Alu/kb , are likely to be obtained from the regions with highest density of Alu sequences [10, 17]. In the inter-Alu PCR analysis, as well as in other fingerprint-type genomic analysis, the comparative evaluation of the analytical samples is usually done "by eye" by the operator, with the time-consumption and the possible errors associated with a subjective evaluation. These limitations prevent the application of these technique to large data sets and there is the necessity to develop computer-based analytical approaches, able to automate the comparative analysis of different samples and to provide better reliability and operative efficiency. We have elaborated and tested, in the present work, an algorithm based on the Wavelet Packet Transformation (WPT) aimed to detect fingerprint-type patterns generated by inter-Alu PCR. The WPT is an overcomplete multiscale analysis of the initial signal based on wavelet functions . Starting from a signal of length 2 N the information is distributed on N × 2 N coefficients so that it is possible to apply an optimization procedure for classification problems and pattern recognition. In recent years the wavelet analysis has been largely applied to biological data sets, for very different purposes such as microarray data mining [26, 27] and analysis of the genomic sequence [28–30]. In this paper we use the Best Basis algorithm to define different classes of signals. This method has been developed by Coifman and Wickerhauser  for the sismic signals classification and successively applied to feature extraction problems by Saito  that has proposed the Local Discriminant Basis algorithm. The classification is based on the hypothesis that the relevant signal information is well reproduced by a limited number of wavelet coefficients. To perform the WPT we have chosen the Haar basis that generates the Walsh packets . We have tested the capability of the wavelet analysis to detect sample to sample differences in a fingerprint type pattern produced by the electrophoretic analysis of inter-Alu PCR products. The positions of electrophoretic peaks detected by the genetic analyzer was used to reconstruct the inter-Alu pattern using a standard Gaussian for each peak. We have applied the WPT algorithm to identify some regions in the electrophoretic patterns where a significant difference is detected among the signals obtained from three couples of homozygotic twins. The comparison of the patterns of members of the same couple of twins allowed to filter the intrinsic variability of experimental methodology, whilst those signals which varied only among different twins were possibly correlated to polymorphic loci. The characterization of the detected polymorphic loci requires further specific experiments.
We have developed a program that performs the signals reconstruction using a mapping from data point (unit of the instrument) to base pairs.
Results and Discussion
We have applied the WPT to the 6 union signals obtained from the three couples of twins and we have looked for the coefficients that discriminate among a fixed couple of twin and the others. The analysis of sample replicates reduces experimental variability mainly due to unpredictable errors due to the PCR reaction and to the electrophoretic separation. This reproduces the condition which is encountered in the routinary biological use of the inter Alu-PCR and other similar methodologies. In this case the variability between the twins of a given couple, that share the same genomic DNA sequence, can be explained by differences in DNA quality, purity, presence of contaminants and other unpredictable differences generated in the extraction and preparation of DNA samples (which could in principle partially depends from pre-existing biochemical/biological differences between the blood samples). The variability may appear as slightly different peak positions or different amplification degree of inter-Alu sequence that could produce non-detectable signals (peak absence in one twin).
In order to relate the δ value with the effective differences in the inter-Alu patterns, we have to normalize the signals to the area of the support region of the wavelet function associated to the c ji coefficient. If, in the considered region, the union signals have a single peak, the criterium is satisfied when the peak position of different twin couples is shifted of 2 bp (at least) with respect to the measured difference between the peak position of the same twin couple. On the contrary if we are analyzing regions where several peaks are present, the criterium (2) takes into account the correlation among the peak positions in the signal and it is satisfied when the global difference between the patterns of different twin couples is more than 1/3 of the total signal area plus the experimental variability of the twin signals.
Global classifying regions obtained using the first marker (Tet fluorochrome)
Global classifying regions obtained using the second marker (Fam fluorochrome)
a rapid, computer-assisted detection of variable peaks;
an automated comparison of different replicates of the same sample, and an automatic "extraction" of reproducible signals;
a better sensitivity, with the ability to detect an higher number of polymorphic regions.
Moreover we remark that, albeit we developed an algorithm specifically optimized to analyze inter-Alu PCR patterns, the method is theoretically applicable to whatever fingerprint-type pattern obtained analyzing anonymous DNA fragments through capillary electrophoresis, and could be usefully applied on a wide range of fingerprint-type methodologies. It is important to note that, recently, new high-throughput methods based on DNA sequencing  and on TIP-chip microarray analysis [36, 37] have been presented, aimed to perform a locus by locus detection of Alu mutation/polymorphisms on the whole genome: the first results obtained with these methodologies [35, 37] have begun to clarify and to point out the importance of the mutagenesis mediated by Alu sequences and other retrotransposons in human genome variation and in various disease conditions. However, for their inherent complexity and high cost, these high-throughput methodologies are not likely to become (at least in the next few years) a substitute for inter-Alu PCR in all those situations in which limited availability of time or budget could be a constraint (for example, for diagnostic examination of disease states in which the importance of Alu-associated genetic variation has been found). The availability of a computer method capable to speed-up, simplify and standardize the analysis of inter-Alu PCR patterns will be a valuable aid for a routine use of the inter-Alu analysis.
This work was partially supported by the European Union Grants GEHA (LSHM-CT-2004-503270).
- Williams JG, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV: DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res 1990, 18: 6531–6535. 10.1093/nar/18.22.6531View ArticlePubMedPubMed CentralGoogle Scholar
- Welsh J, M M: Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res 1990, 18: 7213–7218. 10.1093/nar/18.24.7213View ArticlePubMedPubMed CentralGoogle Scholar
- Zietkiewicz E, Rafalski A, Labuda D: Genome fingerprinting by simple sequence repeat (SSR)-anchored polymerase chain reaction amplification. Genomics 1994, 20: 176–183. 10.1006/geno.1994.1151View ArticlePubMedGoogle Scholar
- Liang P, Pardee AB: Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 1992, 257: 967–971. 10.1126/science.1354393View ArticlePubMedGoogle Scholar
- Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, et al.: AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 1995, 18: 7213–7218.Google Scholar
- Nelson DL, Ledbetter SA, Corbo L, Victoria MF, Ramirez-Solis R, Webster TD, Ledbetter DH, Caskey CT: Alu polymerase chain reaction: a method for rapid isolation of human-specific sequences from complex DNA sources. Proc Natl Acad Sci USA 1989, 86: 6686–6690. 10.1073/pnas.86.17.6686View ArticlePubMedPubMed CentralGoogle Scholar
- Lavanya GR, Srivastava J, Ranade SA: Molecular assessment of genetic diversity in mung bean germplasm. J Genet 2008, 87: 65–74. 10.1007/s12041-008-0009-3View ArticlePubMedGoogle Scholar
- Shifat R, Begum A, Khan H: Use of RAPD fingerprinting for discriminating two populations of Hilsa shad (Tenualosa ilisha Ham.) from inland rivers of Bangladesh. J Biochem Mol Biol 2003, 36: 462–467.View ArticlePubMedGoogle Scholar
- Johnson EL, Zhang D, Emche SD: Inter- and Intra-specific Variation among Five Erythroxylum Taxa Assessed by AFLP. Ann Bot (Lond) 2005, 95: 601–608. 10.1093/aob/mci062View ArticleGoogle Scholar
- Bonafè M, Cardelli M, Marchegiani F, Cavallone L, Giovagnetti S, Olivieri F, Lisa R, Pieri C, Franceschi C: Increase of homozygosity in centenarians revealed by a new inter-Alu PCR technique. Experimental Gerontology 2001, 36: 1063–1073.View ArticlePubMedGoogle Scholar
- Sinnett D, Deragon JM, Simard LR, Labuda D: Alumorphs human DNA polymorphisms detected by polymerase chain reaction using Alu-specific primers. Genomics 1990, 7: 331–334. 10.1016/0888-7543(90)90166-RView ArticlePubMedGoogle Scholar
- Cardelli M: Alu PCR. Methods Mol Biol 2011, 687: 221–229. full_textView ArticlePubMedGoogle Scholar
- Jarnik M, Tang JQ, Korab-Laskowska M, Zietkiewicz E, Cardinal G, Gorska-Flipot I, Sinnett D, Labuda D: Overall informativity, OI, in DNA polymorphisms revealed by inter-Alu PCR: detection of genomic rearrangements. Genomics 1996, 36: 388–398. 10.1006/geno.1996.0483View ArticlePubMedGoogle Scholar
- Lander ES, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062View ArticlePubMedGoogle Scholar
- Zietkiewicz E, Labuda M, Sinnett D, Glorieux FH, Labuda D: Linkage mapping by simultaneous screening of multiple polymorphic loci using Alu oligonucleotide-directed PCR. Proc Natl Acad Sci USA 1992, 89: 8448–8451. 10.1073/pnas.89.18.8448View ArticlePubMedPubMed CentralGoogle Scholar
- Mighell AJ, Markham AF, Robinson PA: Alu sequences. FEBS Lett 1997, 417: 1–5. 10.1016/S0014-5793(97)01259-3View ArticlePubMedGoogle Scholar
- Cardelli M, Marchegiani F, Cavallone L, Olivieri F, Giovagnetti S, Mugianesi E, Moresi R, Lisa R, Franceschi C: A polymorphism of the YTHDF2 gene (1p35) located in an Alu-rich genomic domain is associated with human longevity. J Gerontol A Biol Sci Med Sci 2006, 61: 547–556.View ArticlePubMedGoogle Scholar
- Krajinovic M, Richer C, Labuda D, Sinnett D: Detection of a mutator phenotype in cancer cells by inter-Alu polymerase chain reaction. Cancer Res 1996, 56: 2733–2737.PubMedGoogle Scholar
- Furmaga WB, Cole SR, Tsongalis GJ: The use of Alu-PCR to distinguish between typical pulmonary carcinoids versus classic midgut carcinoids. Int J Oncol 2004, 24: 223–226.PubMedGoogle Scholar
- McKie AB, Iwamura T, Y LH, Hollingsworth MA, Lemoine NR: Alu-polymerase chain reaction genomic fingerprinting technique identifies multiple genetic loci associated with pancreatic tumourigenesis. Genes Chromosomes Cancer 1997, 18: 30–41. 10.1002/(SICI)1098-2264(199701)18:1<30::AID-GCC4>3.0.CO;2-2View ArticlePubMedGoogle Scholar
- Furmaga WB, Ryan JL, Coleman SR, Tsongalis GJ: Alu profiling of primary and metastatic non-small cell lung cancer. Exp Mol Pathol 2003, 74: 224–229. 10.1016/S0014-4800(03)00016-9View ArticlePubMedGoogle Scholar
- Srivastava T, Seth A, Datta K, Chosdol K, Chattopadhyay P, Sinha S: PCR detects high frequency of genetic alterations in glioma cells exposed to sub-lethal cisplatin. Int J Cancer 2005, 117: 683–689. 10.1002/ijc.21057View ArticlePubMedGoogle Scholar
- Pal A, Srivastava T, Sharma MK, Mehndiratta M, Das P, Sinha S, Chattopadhyay P: Aberrant methylation and associated transcriptional mobilization of Alu elements contributes to genomic instability in hypoxia. J Cell Mol Med published online Jun 2009 published online Jun 2009Google Scholar
- Moyzis RK, Torney DC, Meyne J, Buckingham JM, Wu JR, Burks C, Sirotkin KM, Goad WB: The distribution of interspersed repetitive DNA sequences in the human genome. Genomics 1989, 4: 273–289. 10.1016/0888-7543(89)90331-5View ArticlePubMedGoogle Scholar
- Jensen A, la Cour-Harbo A: Ripples in Mathematics: The Discrete Wavelet Transform. New York: Springer-Verlag; 2001.View ArticleGoogle Scholar
- Klevecz RR: Dynamic architecture of the yeast cell cycle uncovered by wavelet decomposition of expression microarray data. Funct. Integr. Genomics 2000, 1: 186–192. 10.1007/s101420000027View ArticlePubMedGoogle Scholar
- Wang J, Ma JZ, Li MD: Normalization of cDNA Microarray Data Using Wavelet Regressions. Combinatorial Chemistry & High Throughput Screening 2004, 7: 783–791.View ArticleGoogle Scholar
- Wen SY, Zhang CT: Identification of isochore boundaries in the human genome using the technique of wavelet multiresolution analysis. Biochemical and Biophysical Research Communications 2003, 311: 215–222. 10.1016/j.bbrc.2003.09.198View ArticlePubMedGoogle Scholar
- Lio P, Vannucci M: Finding pathogenicity islands and gene transfer events in genome data. Bioinformatics 2000, 16(10):932–940. 10.1093/bioinformatics/16.10.932View ArticlePubMedGoogle Scholar
- Lio P: Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics 2003, 19: 2–9. 10.1093/bioinformatics/19.1.2View ArticlePubMedGoogle Scholar
- Coifman R, Wickerhauser MV: Entropy-Based Algorithms for Best Basis Selection. IEEE Transactions on Information Theory 1992, 38: 713–718. 10.1109/18.119732View ArticleGoogle Scholar
- Saito N, Coifman R: Improved local discriminant bases using empirical probability density estimation. Proceedings of Statistical Computing 1996.Google Scholar
- Daubechies I: Ten lectures on wavelets. Philadelphia: Society for Industrial and Applied Mathematics; 1992.View ArticleGoogle Scholar
- Stenger JE, Lobachev KS, Gordenin D, Darden TA, J J, Resnick MA: Biased distribution of inverted and direct Alus in the human genome: implications for insertion, exclusion, and genome stability. Genome Res 2001, 11: 12–27. 10.1101/gr.158801View ArticlePubMedGoogle Scholar
- Iskow RC, McCabe MT, Mills RE, Torene S, Pittard WS, Neuwald AF, G VME, Vertino PM, Devine SE: Natural mutagenesis of human genomes by endogenous retrotransposons. Cell 2010, 141: 1253–1261. 10.1016/j.cell.2010.05.020View ArticlePubMedPubMed CentralGoogle Scholar
- Cardelli M, Marchegiani F, Franceschi C, Lattanzio F, Provinciali M: Alu insertion site profiling in the human genome (abstract). New Biotechnology 2010, 27: S38. 10.1016/j.nbt.2010.01.050View ArticleGoogle Scholar
- Huang CR, Schneider AM, Lu Y, Niranjan T, Shen P, A M, P SJ, Valle D, Civin CI, Wang T, Wheelan SJ, Ji H, Boeke JD, Burns KH: Mobile interspersed repeats are major structural variants in the human genome. Cell 2010, 141: 1171–1182. 10.1016/j.cell.2010.05.026View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.