methBLAST and methPrimerDB: web-tools for PCR based methylation analysis
BMC Bioinformatics volume 7, Article number: 496 (2006)
DNA methylation plays an important role in development and tumorigenesis by epigenetic modification and silencing of critical genes. The development of PCR-based methylation assays on bisulphite modified DNA heralded a breakthrough in speed and sensitivity for gene methylation analysis. Despite this technological advancement, these approaches require a cumbersome gene by gene primer design and experimental validation. Bisulphite DNA modification results in sequence alterations (all unmethylated cytosines are converted into uracils) and a general sequence complexity reduction as cytosines become underrepresented. Consequently, standard BLAST sequence homology searches cannot be applied to search for specific methylation primers.
To address this problem we developed methBLAST, a sequence similarity search program, based on the original BLAST algorithm but querying in silico bisulphite modified genome sequences to evaluate oligonucleotide sequence similarities. Apart from the primer specificity analysis tool, we have also developed a public database termed methPrimerDB for the storage and retrieval of validated PCR based methylation assays. The web interface allows free public access to perform methBLAST searches or database queries and to submit user based information. Database records can be searched by gene symbol, nucleotide sequence, analytical method used, Entrez Gene or methPrimerDB identifier, and submitter's name. Each record contains a link to Entrez Gene and PubMed to retrieve additional information on the gene, its genomic context and the article in which the methylation assay was described. To assure and maintain data integrity and accuracy, the database is linked to other reference databases. Currently, the database contains primer records for the most popular PCR-based methylation analysis methods to study human, mouse and rat epigenetic modifications. methPrimerDB and methBLAST are available at http://medgen.ugent.be/methprimerdb and http://medgen.ugent.be/methblast.
We have developed two integrated and freely available web-tools for PCR based methylation analysis. methBLAST allows in silico assessment of primer specificity in PCR based methylation assays that can be stored in the methPrimerDB database, which provides a search portal for validated methylation assays.
Alterations in the patterns of DNA methylation are among the earliest and most common events in tumorigenesis [1, 2]. In the mammalian genome, methylation takes place mostly at cytosine bases that are located 5' to a guanosine in a CpG dinucleotide. While this dinucleotide is generally underrepresented in the genome, short regions are found that are rich in CpG content. Such CpG-rich regions are part of gene promoters and are coined CpG islands . Both global hypomethylation and regional promoter hypermethylation have been described in a wide spectrum of cancers . Hypomethylation (or absence of methylation) of CpG islands increases potential gene activity, whereas hypermethylation of these promoter-containing CpG islands is associated with decreased gene activity or silencing . The development of efficient and accurate methods to study cytosine methylation is therefore of critical importance in understanding the role of DNA methylation in the development and progression of cancer. Furthermore, methylation markers open perspectives for earlier detection of malignancies and possible better prognostic assessment of the patients .
Several methods have been described for evaluation of cytosine methylation including digestion of DNA with methylation-sensitive restriction enzymes followed by Southern blotting or polymerase chain reaction (PCR) . Southern blotting requires large amounts of high molecular weight DNA, which limits the use of this technique. The above mentioned limitations are counteracted by performing PCR, but still both methods rely on a complete enzymatic digestion of the DNA in order to prevent false-positive results. Instead of using methylation-sensitive restriction enzymes, other methods are based on sodium bisulphite treatment of the DNA to introduce methylation-dependent sequence differences into the genomic DNA. Sodium bisulphite converts unmethylated cytosine to uracil while leaving 5-methylcytosine unchanged. Nowadays, the most frequently used DNA methylation analysis methods employ a combination of bisulphite treatment and PCR. The methylation-sensitive single-nucleotide primer extension (Ms-SNuPE) method incorporates amplification of bisulphite-treated DNA, followed by a quantification of the ratio of methylated versus unmethylated cytosines at CpG sites . An alternative method, called combined bisulphite restriction analysis (COBRA), uses standard sodium bisulphite PCR treatment followed by restriction digestion and a quantitation step . A more widespread procedure combines a bisulphite treatment and PCR-single-strand conformation polymorphism analysis (Bisulphite-PCR-SSCP or BiPS) . In a first step, the converted DNA is amplified with primers that have no CpG sites in the corresponding region of the original DNA, and as such amplify both unmethylated and methylated DNA. Sequence differences between amplified products from unmethylated and methylated DNA are visualised on a SSCP gel. The fourth and one of the most popular methods is methylation-specific PCR (MSP) . It heralded a breakthrough in speed and sensitivity for gene methylation analysis. After bisulphite conversion, PCR is performed using primers that distinguish methylated from unmethylated DNA. Unlike the procedures using restriction enzymes, MSP can be used to analyse any specific CpG site by appropriate primer design and it is not prone to false-positive results. MSP is very sensitive, permitting the analysis of small and heterogeneous samples, including paraffin-embedded material. A fifth method applies the use of a sequencing strategy to analyse the methylation status a target sequence (bisulphite sequencing or BiSeq) . Bisulphite converted DNA is amplified by PCR and subsequently sequenced to assess the methylation status of individual CpG's by sequence comparison with a reference sequence. A cloning step is introduced before the sequencing if the starting material contains a mixture of cells with different methylation levels. Although the above described PCR-based DNA methylation analysis methods are easy to use, sensitive and specific, the design and experimental validation/optimisation of the primers is often difficult, labour intensive, and excludes a certain level of standardization and uniformity. To reduce the number of difficult or even unsuccessful experimental PCR optimizations, we developed methBLAST to quickly assess the specificity of a primer pair prior to the experimental evaluation step, very much like the widely accepted (or even obligated) conventional PCR primer specificity analysis using default BLAST. Another important problem encountered during methylation analysis is the difficulty to retrieve methylation assay information for a given gene of interest by normal literature search tools. Therefore, we developed a public repository holding essential assay information (including primer sequences) for the four major PCR-based methods for DNA methylation analysis of human, mouse and rat genomes.
Results and discussion
Performing a methBLAST search is similar to and as fast as regular BLAST . The input page is divided into three parts. The first component contains a query box and two input fields for primer sequences. The query box is suited to paste a sequence in FASTA format. Primer sequence alignment can be performed by entering the forward and reverse primer sequence of an assay into the appropriate input fields. The primer sequences will be concatenated with three N's when processed by the methBLAST server. This will guarantee a correct separation of the forward and reverse sequence during the alignment step. The middle part lists the query processing options where the target species and alignment options should be selected. Only alignments against human, mouse and rat sequences from four different databases are available. The databases contain human, mouse or rat sequences from GenBank  for which complete CpG methylation and bisulphite modification are simulated. Because of this modification, the two daughter strands of any given sequence are no longer complementary after treatment. As either strand can serve as template for subsequent PCR amplification, we perform in silico bisulphite modification on both strands, assuming either an unmethylated or methylated CpG status. All cytosines (C) are replaced by thymines (T) – the DNA counterpart of uracil (U) – in sequences assumed to be completely unmethylated whereas in completely methylated sequences only the C's not followed by a G will be replaced resulting in four different sequences (methylated and unmethylated for each strand) per GenBank sequence (see Figure 1). The output format is adjustable by the options provided in the bottom section. An output window renders all relevant hits of the test sequence starting with the best alignments (see Figure 2). Depending on the database used, the sequence similarity search will be performed on either forward and reverse complement methylated (BISUL_METH_FW, BISUL_METH_RC), or forward and reverse complement unmethylated sequences (BISUL_UNMETH_FW, BISUL_UNMETH_RC). The user has to interpret the output in the same way like the BLAST output of a primer pair for normal PCR applications. A hit is only relevant if this reveals alignment of the primers at a distance close enough to generate exponential amplification. A well designed primer pair aligns exclusively with the target region, ranked high in the BLAST output. Partial alignment of the primers within a short distance on a different genomic location indicates that an assay using these primers could be aspecific and thus less reliable. Especially partial alignment of the 3' end of the primers increases the change of aspecific amplification. The methBLAST results of 14 different methBLAST searches shown in Table 1, display the differences in 'Score' and 'E value' of correct alignments which are mostly influenced by the primer length and constitution. It is impossible to use thresholds for the 'Score' and 'E value' to analyse a methBLAST output because correct alignments and misalignments show overlapping values between different primer pairs.
Errors in primer sequences leading to incorrect alignments can be quickly identified after a methBLAST search. To demonstrate the usefulness of methBLAST we performed an MSP analysis of the CDKN2A gene using the primers and procedures published in . However, we never succeeded in obtaining a PCR product (data not shown) and therefore evaluated the primers from  (submitted in methPrimerDB (see further) with ID 17). This assay was successful upon first attempt (data not shown) and the methBLAST outputs of both primer sets show correct alignment with the target sequence (see #11 and #12 in Table 1). On the other hand the primer sets published in  show only incomplete or even unsuccessful alignment (see #13 and #14 in Table 1). The forward primers of both assays are identical but the reverse primers from Ueki et. al. appear to contain sequence errors that caused alignment problems in methBLAST and subsequent experimental failure (see Table 2).
If a custom designed PCR methylation assay passes the in silico specificity requirements (determined by methBLAST) and further experimental evaluation, submission of the assay information in methPimerDB is encouraged. In addition, authors of publications in which methylation-specific PCR, Bisulphite-PCR-SSCP, Ms-SNuPE, COBRA or BiSeq assays are developed, are kindly invited to submit their validated primer sequences. On-line data submissions are possible after free registration. During registration, personal submitter details are provided, after which an email is sent with the login name and a temporary password. By changing this password to a more convenient one, the registration is complete and the user can log in to the system and submit primer sets. For submission of large datasets, a compressed file is available in the download section of the website which contains the guidelines to complete an empty provided table with the required information.
New primer records should contain the official gene name, the species name, the application in which the primers are used, the nucleotide sequences of the primers, and other assay specific fields. In addition, each record provides the possibility to add submitter's remarks. Data submissions for DNA methylation analysis on human, rat and mouse are allowed, as for these organisms proper controls with respect to accuracy of the gene name fields are available via Entrez Gene  and the nomenclature databases for these organisms: HGNC (HUGO Gene Nomenclature Committee)  for human, MGD (Mouse Genome Database)  for mouse, and Ratmap  and RGD (Rat Genome Database)  for rat. This eliminates the presence of aliases or synonyms for official gene symbols in the database. Finally, the possibility to link the PubMed ID of an article in which the use of a PCR methylation assay is reported, makes the record more trustworthy. The web based search engine makes it possible to query the database in different ways by type of application, organism, gene name/symbol, primer sequence, Entrez Gene ID, PubMed ID, or submitter's name. Search results are listed as a summary of links to individual assay reports (see Figure 3). Each primer set has a unique methPrimerDB identifier to access them directly or refer to in a publication (see Figure 4). Data integrity checks are performed during the data submission procedure. To guarantee data accuracy, the sequences in the database will be analysed on regular intervals by methBLAST search. Upon detection of possible sequence or other errors, the responsible submitter will be contacted by email.
We are planning to implement an additional feature in methPrimerDB to store the valuable feedback on assay performance from users who tested an assay from the database. The extension of the submitter's feedback section with the experimental evaluation details provided by the submitter as well as user's feedback will allow a better assessment of the quality of an individual assay. Although methPrimerDB is developed to let authors submit their own validated assays, we will populate the database in the near future with manually reviewed assays from recent literature.
methBLAST and methPrimerDB are web-tools to improve the design and use of PCR-based methylation assays. A sequence homology search for methylation primers with methBLAST enables specificity assessment before experimental evaluation of a new assay. To reduce the labour-intensive design of new assays, validated methylation assays can now be stored and retrieved in methPrimerDB, a public accessible database. The database is intended to be a search portal for validated methylation assays and aims to establish a certain level of standardization and uniformity in the use of PCR based methylation assays.
Both systems run on an Apache web server in a Linux environment. methBLAST is based on NCBI's BLAST server. The databases are generated by an in house developed Perl script (available upon request) converting a subset of the NCBI's nt database that contains all non-redundant GenBank+EMBL+DDBJ+PDB nucleotide sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). methPrimerDB data is stored and managed by an Oracle 9i relational database management system. The web interface to query the database is based on PHP scripts using the Oracle Call Interface (OCI). The database information and passwords are protected by the Oracle database management system which controls the access rights to the different tables.
Availability and requirements
Bird A: DNA methylation patterns and epigenetic memory. Genes Dev 2002, 16(1):6–21. 10.1101/gad.947102
Feinberg AP, Ohlsson R, Henikoff S: The epigenetic progenitor origin of human cancer. Nat Rev Genet 2006, 7(1):21–33. 10.1038/nrg1748
Liu ZJ, Maekawa M: Polymerase chain reaction-based methods of DNA methylation analysis. Anal Biochem 2003, 317(2):259–265. 10.1016/S0003-2697(03)00169-6
Esteller M, Sanchez-Cespedes M, Rosell R, Sidransky D, Baylin SB, Herman JG: Detection of aberrant promoter hypermethylation of tumor suppressor genes in serum DNA from non-small cell lung cancer patients. Cancer Res 1999, 59(1):67–70.
Jones PA, Baylin SB: The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002, 3(6):415–428.
Laird PW: The power and the promise of DNA methylation markers. Nat Rev Cancer 2003, 3(4):253–266. 10.1038/nrc1045
Fraga MF, Esteller M: DNA methylation: a profile of methods and applications. Biotechniques 2002, 33(3):632, 634, 636–49.
Gonzalgo ML, Jones PA: Rapid quantitation of methylation differences at specific sites using methylation-sensitive single nucleotide primer extension (Ms-SNuPE). Nucleic Acids Res 1997, 25(12):2529–2531. 10.1093/nar/25.12.2529
Xiong Z, Laird PW: COBRA: a sensitive and quantitative DNA methylation assay. Nucleic Acids Res 1997, 25(12):2532–2534. 10.1093/nar/25.12.2532
Maekawa M, Sugano K, Kashiwabara H, Ushiama M, Fujita S, Yoshimori M, Kakizoe T: DNA methylation analysis using bisulfite treatment and PCR-single-strand conformation polymorphism in colorectal cancer showing microsatellite instability. Biochem Biophys Res Commun 1999, 262(3):671–676. 10.1006/bbrc.1999.1230
Herman JG, Graff JR, Myohanen S, Nelkin BD, Baylin SB: Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci U S A 1996, 93(18):9821–9826. 10.1073/pnas.93.18.9821
Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL: A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 1992, 89(5):1827–1831. 10.1073/pnas.89.5.1827
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410. 10.1006/jmbi.1990.9999
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res 2006, 34(Database issue):D16–20. 10.1093/nar/gkj157
Ueki T, Toyota M, Sohn T, Yeo CJ, Issa JP, Hruban RH, Goggins M: Hypermethylation of multiple genes in pancreatic adenocarcinoma. Cancer Res 2000, 60(7):1835–1839.
Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005, 33(Database issue):D54–8. 10.1093/nar/gki031
Eyre TA, Ducluzeau F, Sneddon TP, Povey S, Bruford EA, Lush MJ: The HUGO Gene Nomenclature Database, 2006 updates. Nucleic Acids Res 2006, 34(Database issue):D319–21. 10.1093/nar/gkj147
Blake JA, Eppig JT, Bult CJ, Kadin JA, Richardson JE: The Mouse Genome Database (MGD): updates and enhancements. Nucleic Acids Res 2006, 34(Database issue):D562–7. 10.1093/nar/gkj085
Petersen G, Johnson P, Andersson L, Klinga-Levan K, Gomez-Fabre PM, Stahl F: RatMap--rat genome tools and data. Nucleic Acids Res 2005, 33(Database issue):D492–4. 10.1093/nar/gki125
de la Cruz N, Bromberg S, Pasko D, Shimoyama M, Twigger S, Chen J, Chen CF, Fan C, Foote C, Gopinath GR, Harris G, Hughes A, Ji Y, Jin W, Li D, Mathis J, Nenasheva N, Nie J, Nigam R, Petri V, Reilly D, Wang W, Wu W, Zuniga-Meyer A, Zhao L, Kwitek A, Tonellato P, Jacob H: The Rat Genome Database (RGD): developments towards a phenome database. Nucleic Acids Res 2005, 33(Database issue):D485–91. 10.1093/nar/gki050
Pattyn F, Robbrecht P, De Paepe A, Speleman F, Vandesompele J: RTPrimerDB: the real-time PCR primer and probe database, major update 2006. Nucleic Acids Res 2006, 34(Database issue):D684–8. 10.1093/nar/gkj155
We greatly acknowledge the help and insightful suggestions from Christoph Grunau and Stephen Altschul in the development and evaluation of methBLAST. The authors are most grateful for support from BioScope-IT, a Bioinformatics Service Project within the context of the Flemish Innovation Network funded by the 'Instituut voor de aanmoediging van innovatie door Wetenschap en Technologie in Vlaanderen' (IWT 040571 BIO-IT service project). Filip Pattyn is a Research Assistant and Jo Vandesompele a Postdoctoral Researcher of the Research Foundation – Flanders (FWO – Vlaanderen). Jasmien Hoebeeck is supported by the Vlaamse Liga tegen Kanker through a grant of the Stichting Emmanuel van der Schueren. This study is supported by GOA-grant 12051203, FWO-grant G.0185.04, G.1.5.243.05 and G.0106.05, and a research grant from the Childhood Cancer Fund 'Kinderkankerfonds' (a non-profit childhood cancer foundation under Belgian law). The Belgian EMBnet Node is funded by the Belgian Science Policy. This text presents research results of the Belgian program of Interuniversity Poles of Attraction initiated by the Belgian State, Prime Minister's Office, Science Policy Programming (IUAP).
FP conceived the study, carried out the development of methPrimerDB, manages and populated the database, participated in the development of methBLAST and drafted the manuscript. JH tested the usefulness of methBLAST by experimental evaluation of MSP primers and participated in methPrimerDB data submission. PR developed and manages methBLAST and participated in the deployment of the BLAST software. EM participated in the experimental evaluation of methBLAST and participated in the methPrimerDB data submission. ADP and FS coordinated the study and provided critical input for the manuscript. GB, DC and RH carried out the deployment of the BLAST software. JV conceived the study, participated in its design and coordination and was the final editor of the manuscript.
Electronic supplementary material
Additional file 1: Perl bisulphite conversion script. Perl script for creation of BLAST database files containing in silico methylated and bisulphite treated sequences. (PL 9 KB)
About this article
Cite this article
Pattyn, F., Hoebeeck, J., Robbrecht, P. et al. methBLAST and methPrimerDB: web-tools for PCR based methylation analysis. BMC Bioinformatics 7, 496 (2006). https://doi.org/10.1186/1471-2105-7-496