- Research article
- Open Access
Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles
© Meng et al; licensee BioMed Central Ltd. 2008
- Received: 18 June 2008
- Accepted: 27 October 2008
- Published: 27 October 2008
DNA methylation patterns have been shown to significantly correlate with different tissue types and disease states. High-throughput methylation arrays enable large-scale DNA methylation analysis to identify informative DNA methylation biomarkers. The identification of disease-specific methylation signatures is of fundamental and practical interest for risk assessment, diagnosis, and prognosis of diseases.
Using published high-throughput DNA methylation data, a two-stage feature selection method was developed to select a small optimal subset of DNA methylation features to precisely classify two sample groups. With this approach, a small number of CpG sites were highly sensitive and specific in distinguishing lung cancer tissue samples from normal lung tissue samples.
This study shows that it is feasible to identify DNA methylation biomarkers from high-throughput DNA methylation profiles and that a small number of signature CpG sites can suffice to classify two groups of samples. The computational method we developed in the study is efficient to identify signature CpG sites from disease samples with complex methylation patterns.
- Support Vector Machine
- Feature Selection Method
- Recursive Feature Elimination
- Cell Line Dataset
- Normal Lung Tissue Sample
DNA methylation, which occurs when a methyl (CH3) group is added at the carbon 5 position of the cytosine ring of a CpG dinucleotide, is one of the epigenetic events that can affect gene expression without changing genomic sequence . For example, hypermethylation of CpG sites in the promoter region was implicated as playing a role in the inactivation of tumor suppressor genes [2, 3]. DNA methylation patterns have been shown to significantly correlate with clinical phenotypes [4–6]. DNA methylation signatures are excellent biomarker candidates because: 1) distinct DNA methylation profiles correspond to different tissue types and disease states, and each type or subtype of tumor has its own DNA methylation signature [5, 7]; 2) DNA methylation patterns change at early stages of disease progression, allowing earlier detection of diseases ; 3) DNA methylation can be detected with high sensitivity ; 4) DNA methylation biomarkers could be detected from peripheral bio-fluid [10, 11], such as blood, when it is not possible to obtain disease-tissue samples from patients. The identification of disease-specific methylation signatures is therefore of fundamental and practical interest for risk assessment, diagnosis, and prognosis of diseases.
High-throughput methylation arrays are now available to determine DNA methylation levels of thousands of CpG sites, simultaneously [4, 5, 12–14]. This technology enables large-scale DNA methylation analysis to identify informative DNA methylation biomarkers. For example, experiments using high-throughput methylation arrays have demonstrated that each of colon, breast, lung, and prostate cancer cell lines has its own methylation signature . It has also been shown that DNA methylation profiles could clearly distinguish human embryonic stem cells from cancer cells, adult stem cells, lymphoblastoid cells, and normal cells . Additionally, Bibikova et al.  identified 55 CpG sites as the DNA methylation signature to distinguish normal lung tissue samples from lung cancer tissue samples.
Although the profiles from high-throughput methylation arrays contain a large number of CpG sites, many of them are irrelevant or redundant and provide little discriminatory information to classify samples. For clinical diagnosis, significant savings in cost can be achieved by measuring and verifying methylation levels of only a small number of CpG sites. Recent studies showed that a small discriminative set of features was sufficient to better classify samples in high-throughput gene expression analysis [15, 16].
The Support Vector Machine (SVM) is a state-of-the-art classification method (classifier or predictor)  that has been widely used in microarray data analysis [18–21]. Although the SVM was designed to deal with datasets in high-dimensional space , it has continued to suffer from the "curse of dimensionality", that is, learning from a small number of samples in a high-dimensional feature space . Including redundant and non-informative features in the analysis may cause the influence of discriminatory features to be lost in the noise, thus degrading the accuracy of the classifier. A large feature set may achieve low training error, but the ability to generalize the new dataset will decrease, resulting in data overfitting .
Classification methods can be improved by feature selection, a process designed to select a small, optimal subset of features from the original redundant feature set. In general, feature selection methods fall into two categories: filter methods and wrapper methods . Filter methods select features independent of the classification method. One typical filter method is individual feature ranking, which is straightforward, computationally efficient, and widely used for gene selection in gene expression data analysis [24–26]. However, this method has several limitations. First, feature redundancy is common in the selected feature set and many features carry essentially the same discriminatory information. In addition, this strategy does not detect dependencies among features and lacks the ability to determine which combination of features achieves the best classification since individual feature ranking evaluates each feature independently. In contrast to filter methods, wrapper methods work with classifiers to determine feature selection based on the predictive accuracy of the classifiers [18, 21]. Although wrapper methods generally outperform filter methods, they are typically computationally intensive  and may become intractable in practice for large feature sets. SVM_RFE (Recursive Feature Elimination) is a typical wrapper method that has displayed excellent prediction ability in microarray data analysis [18, 21]. Genetic algorithms (GAs) have been employed as feature selection methods in high-throughput biological data analysis [27–29], but are very time-consuming.
In this study, we investigated whether a small number of signature CpG sites are sufficient to predict phenotypic classes of two sample groups. A biomarker discovery algorithm was developed. This algorithm, here referred to as FW_SVM, uses a two-stage feature selection method by combining a Filter method and a Wrapper method and employs SVM as the classifier.
We used three published datasets generated by the Illumina GoldenGate® assay for DNA methylation (Illumina, San Diego, CA), where the reported β value indicates the methylation level of each CpG site [4, 5]. The first dataset included the DNA methylation profiles from 19 male and 25 female cell lines. The second dataset contained the DNA methylation profiles of 37 human embryonic stem cell (hES) and 24 cancer cell lines. The third dataset contained 23 lung adenocarcinoma and 23 normal lung tissue samples, 11 each from Philipps University of Marburg (Germany) and 12 each from the Pennsylvania State University College of Medicine Tumor Bank.
The data in each dataset were split into training and testing sets. The training set was used for feature selection and classifier training, and the testing set was used to evaluate algorithm performance.
Feature selection methods
Two filter methods for stage 1, namely Principal Component Analysis (PCA) and Wilcoxon rank sum test, were tested separately in this study. PCA  is a multivariate method that has been widely used for visualization of high-dimensional data, including high-throughput biological data , in low-dimensional space. PCA is seldom used for feature selection since each principal component is a linear combination of all original features and does not isolate or prioritize features. However, since the first several principal components typically capture most of the variability in the data, features that have big projections on those principal components account for the major source of data variance. Accordingly, those features are likely good candidates as signature features for classification purposes. In the first filter method, when PCA was applied at the first feature selection stage, CpG sites with an absolute loading value greater than 0.1 for the first 10 principal components were selected as signature feature candidates taken by SVM_RFE at the second stage. In the second filter method, we adopted the Wilcoxon rank sum test. In comparison to the PCA approach of selecting features with large variances across the entire dataset, the individual feature ranking targeted directly the classification goal and selected a list of the most differentially methylated CpG sites as promising feature candidates. The CpG sites from Wilcoxon rank sum test were sorted by their p-values in ascending order. The top 50 most differentially methylated CpG sites were selected as signature feature candidates with a restriction that the differences of methylation level (β value) means between two groups were greater than 0.15.
The feature selection method of FW_SVM was compared with two popular feature selection methods: individual feature ranking and SVM_RFE.
Individual feature ranking
Individual feature ranking selects features according to their individual relevance. Its implementation is simple and requires minimal run time. In this experiment, all CpG sites were ranked in ascending order based on their p-value from the Wilcoxon rank sum test. The Wilcoxon rank sum test can be applied to data from any distribution and is robust to outliers. An additional filter was applied to remove CpG sites whose mean differences of methylation level (β value) between two groups were less than 0.15. The top-ranked 1, 2, 3, 5 or 10 of the most differentially methylated CpG sites were selected as signature CpG sites.
Recursive Feature Elimination (RFE) is a backward feature selection method designed to find the best combination of features for classification. Less important features, in terms of the predictive accuracy of SVM, are successively eliminated, allowing for the selection of only the best subset of features. The RFE algorithm is outlined below:
F = [1, 2, ..., n] is the subset of remaining features.
R =  is the subset of ranked features.
For k = 1, 2, ..., n remove the k th feature and evaluate the cross-validation error on the reduced feature set using the training dataset.
Remove the feature with maximum cross-validation error and include it to the top of R.
Repeat 1 and 2 for remaining features in F, until R contains all ranked features.
SVM_RFE is an application of RFE using SVM as the classifier in the feature selection process . In this study, leave-one-out cross-validation was employed to evaluate the classification performance of each feature set. Each sample was excluded from the training set, one at a time, and then classified based on the SVM trained from the remaining samples. This procedure was repeated, in turn, for all samples, and the cross-validation error was defined as the sum of misclassifications. In the process, cross-validation error vs. the size of the feature set was recorded, and the smallest subset of features with the least cross-validation error was chosen as the final methylation signature.
We selected SVM as the classification method to evaluate signature features selected from different feature selection approaches. Note that both SVM_RFE and FW_SVM also took SVM as classifiers in their feature selection process.
where parameters α i and b are optimized in the training procedure such that the number of misclassifications on the training set is minimized. K(x i , x) is a kernel function.
The LS_SVMlab toolbox http://www.esat.kuleuven.ac.be/sista/lssvmlab/ was used in the implementation of LS_SVM , and the RBF kernel function with default parameters (γ = 10 and σ2 = 0.2) was adopted.
Performance testing and evaluation
Each of the three DNA methylation datasets generated by Illumina high-throughput DNA methylation arrays [4, 5] was split into training and testing sets. The training set contained approximately 2/3 of the samples and the testing set included the remaining 1/3. The feature selection methods were performed on training datasets. To validate the features selected by each method, raw SVMs learned from methylation profiles of the signature CpG sites in the training set, and the trained SVMs were used to predict the phenotypic classes of the samples in the testing set.
In order to minimize bias introduced by data partitioning and to accurately assess performance of the feature selection methods, each dataset was randomly partitioned into training and testing sets multiple times. For individual feature ranking and FW_SVM, the sensitivity, specificity, accuracy, number of signature features, and running time reported for each dataset represent the average across 100 independent runs. SVM_RFE was very time-consuming with each run requiring several days to complete. Therefore, its reported performance results are from only 5 random partitions of training and testing datasets.
where TP, FP, TN and FN represent true positives, false positives, true negatives and false negatives, respectively.
Pathway Studio™  with database Resnet 5.0 was used to build gene interaction pathways from a list of genes whose upstream CpG sites were differentially methylated.
All computational methods (except Pathway Studio) in this study were implemented in MATLAB (The MathWorks, Inc., Natick, MA) and run on a PC with a 3.8 GHz CPU and 3.0 GB RAM.
Comparison and discussion of feature selection methods
Performance results of individual feature ranking
Run Time (seconds)
Male and female cell lines
Cancer and hES cell lines
Lung cancer and normal tissues
In contrast, the lung cancer and normal tissue data (Figure 2c) show different results. Perhaps due to the intrinsic complexity of disease mechanisms, the lung cancer tissue samples exhibited highly variable methylation patterns. In the present case, the methylation profile of a single CpG site is not sufficient to achieve accurate separation between normal and lung cancer samples (Table 1). An ideal DNA methylation signature, therefore, would consist of a small subset of CpG sites to provide non-redundant and complementary discriminative information.
Performance results of SVM_RFE and FW_SVM for lung cancer and normal tissue dataset
FW_SVM with PCA
FW_SVM with individual feature ranking
Without other available DNA methylation datasets, FW_SVM was tested on a benchmark microarray gene expression dataset . The profiles of two genes identified by FW_SVM can classify Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL) sample groups with the average accuracy of 98.8% (data not shown).
An application of FW_SVM: signature CpG sites identification to classify lung cancer and normal tissue samples
The DNA methylation profiles in this study displayed excellent biomarker characteristics. Accurate discrimination between two sample groups was achieved on the basis of only a few CpG sites. In order to compare our results with signature CpG sites obtained by Bibikova et al. , we applied FW_SVM (the individual feature ranking version in this experiment) to identify signature CpG sites for normal and lung cancer tissue samples. We used 11 normal samples and 11 adenocarcinoma samples from the Philipps University of Marburg (Germany) as our training set and 12 normal samples and 12 adenocarcinoma samples from the Pennsylvania State University College of Medicine Tumor Bank as the testing set. From the training set, FW_SVM selected two CpG sites, TNF-1371 and TWIST1-524, as signature features. Based on those two signature CpG sites, the predictor correctly classified all of the normal and lung cancer tissue samples in the testing set and achieved better sensitivity and specificity than the 55 CpG site markers identified by Bibikova et al. .
To further verify the reliability of these two signature CpG sites, we mixed the samples from these two datasets together and randomly split them 100 times into a training set (containing 2/3 of the samples) and a testing set (containing 1/3 of the samples). Raw SVMs were trained on the profiles of these two CpG sites in the training sets, and trained SVMs were used to predict the phenotype of samples in the testing sets. The average sensitivity achieved was 96%, and the average specificity was 100%.
In this study, we identified the smallest subset of CpG sites required for precise classification of lung cancer and normal tissue samples, with every signature CpG site containing necessary, non-redundant and mutual information in the context of others. All the signature CpG sites identified are important biologically, but it is not necessary to include all important CpG sites for classification purposes.
While these two signature CpG sites (TNF-1371 and TWIST1-524) are promising leads for potential diagnostic purposes, they were detected from a relatively small dataset of 46 samples. Accordingly, the reliability of the TNF and TWIST CpG sites as biomarkers for lung cancer requires further validation in larger datasets and through targeted biological experiments.
Patterns vs. profile distances
This study shows that it is feasible to identify DNA methylation biomarkers from high-throughput DNA methylation profiles and that a small number of signature CpG sites can suffice to classify two groups of samples. Signature CpG sites can easily be detected from datasets with clear methylation patterns, such as male and female datasets, using traditional feature selection methods like individual feature ranking. However, the traditional feature selection methods were not efficient to identify signature CpG sites from disease samples with complex DNA methylation patterns, such as the lung cancer tissue examined in this study. We investigated two filter methods for SVM_RFE in the study and built up FW_SVM, a predictor with an efficient feature selection method. FW_SVM was able to detect a small, optimal subset of CpG sites with non-redundant and complementary discriminative information and achieved high predictive accuracy to classify disease samples with complex DNA methylation patterns. Since each CpG site represents a feature, and the methylation level of each CpG site simply corresponds to the value of the feature, the FW_SVM algorithm, in principle, could be extended to analyze other post-genomic datasets, such as high-throughput gene expression, microRNA expression, single nucleotide polymorphisms, and proteomic data, individually or even across platforms, to identify combinatorial signature features. Therefore, FW_SVM represents a highly flexible tool that can be adopted in classification situations in which appropriate high-throughput data are available to potentially aid in diagnosis and gain fundamental insight into disease processes.
Project name: FW_SVM
Project home page: None. Matlab scripts for FW_SVM were submitted to BMC Bioinformatics as additional file 1.
Operating system: platform independent
Programming language: Matlab
Other requirements: Work together with LS-SVMlab toolbox that can be downloaded from: http://www.esat.kuleuven.ac.be/sista/lssvmlab/
Any restrictions to use by non-academics: None
We thank Dr. Suykens for allowing us to use the LS_SVMlab toolbox in the implementation of FW_SVM. We are also grateful to George Patskan, Barbara Zedler, Andrew Joyce, Madhukar Dasika, Tapas Sengupta, Jonathan Stephenson, Gaurav Rana, Priyadashi Basu, Edwin van den Oord, Eileen Ivasauskas and Janis Worth for reviewing this manuscript.
- Singal R, Ginder GD: DNA methylation. Blood 1999, 93(12):4059–4070.PubMedGoogle Scholar
- Esteller M: CpG island hypermethylation and tumor suppressor genes: a booming present, a brighter future. Oncogene 2002, 21(35):5427–5440. 10.1038/sj.onc.1205600View ArticlePubMedGoogle Scholar
- Herman JG, Baylin SB: Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med 2003, 349(21):2042–2054. 10.1056/NEJMra023075View ArticlePubMedGoogle Scholar
- Bibikova M, Chudin E, Wu B, Zhou L, Garcia EW, Liu Y, Shin S, Plaia TW, Auerbach JM, Arking DE, et al.: Human embryonic stem cells have a unique epigenetic signature. Genome Res 2006, 16(9):1075–1083. 10.1101/gr.5319906PubMed CentralView ArticlePubMedGoogle Scholar
- Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y, Vollmer E, et al.: High-throughput DNA methylation profiling using universal bead arrays. Genome Res 2006, 16(3):383–393. 10.1101/gr.4410706PubMed CentralView ArticlePubMedGoogle Scholar
- Ehrich M, Nelson MR, Stanssens P, Zabeau M, Liloglou T, Xinarianos G, Cantor CR, Field JK, Boom D: Quantitative high-throughput analysis of DNA methylation patterns by base-specific cleavage and mass spectrometry. Proc Natl Acad Sci USA 2005, 102(44):15785–15790. 10.1073/pnas.0507816102PubMed CentralView ArticlePubMedGoogle Scholar
- Li LC, Carroll PR, Dahiya R: Epigenetic changes in prostate cancer: implication for diagnosis and treatment. J Natl Cancer Inst 2005, 97(2):103–115.View ArticlePubMedGoogle Scholar
- Das PM, Singal R: DNA methylation and cancer. J Clin Oncol 2004, 22(22):4632–4642. 10.1200/JCO.2004.07.151View ArticlePubMedGoogle Scholar
- Eads CA, Danenberg KD, Kawakami K, Saltz LB, Blake C, Shibata D, Danenberg PV, Laird PW: MethyLight: a high-throughput assay to measure DNA methylation. Nucleic Acids Res 2000, 28(8):E32. 10.1093/nar/28.8.e32PubMed CentralView ArticlePubMedGoogle Scholar
- Lofton-Day C, Model F, Devos T, Tetzner R, Distler J, Schuster M, Song X, Lesche R, Liebenberg V, Ebert M, et al.: DNA methylation biomarkers for blood-based colorectal cancer screening. Clin Chem 2008, 54(2):414. 10.1373/clinchem.2007.095992View ArticlePubMedGoogle Scholar
- Fiegl H, Millinger S, Mueller-Holzner E, Marth C, Ensinger C, Berger A, Klocker H, Goebel G, Widschwendter M: Circulating tumor-specific DNA: a marker for monitoring efficacy of adjuvant therapy in cancer patients. Cancer Res 2005, 65(4):1141–1145. 10.1158/0008-5472.CAN-04-2438View ArticlePubMedGoogle Scholar
- Model F, Osborn N, Ahlquist D, Gruetzmann R, Molnar B, Sipos F, Galamb O, Pilarsky C, Saeger HD, Tulassay Z, et al.: Identification and validation of colorectal neoplasia-specific methylation markers for accurate classification of disease. Mol Cancer Res 2007, 5(2):153–163. 10.1158/1541-7786.MCR-06-0034View ArticlePubMedGoogle Scholar
- Scholz C, Nimmrich I, Burger M, Becker E, Dorken B, Ludwig WD, Maier S: Distinction of acute lymphoblastic leukemia from acute myeloid leukemia through microarray-based DNA methylation analysis. Ann Hematol 2005, 84(4):236–244. 10.1007/s00277-004-0969-1View ArticlePubMedGoogle Scholar
- Cottrell S, Jung K, Kristiansen G, Eltze E, Semjonow A, Ittmann M, Hartmann A, Stamey T, Haefliger C, Weiss G: Discovery and validation of 3 novel DNA methylation markers of prostate cancer prognosis. J Urol 2007, 177(5):1753–1758. 10.1016/j.juro.2007.01.010View ArticlePubMedGoogle Scholar
- Grate LR: Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery. BMC Bioinformatics 2005, 6: 97. 10.1186/1471-2105-6-97PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang HH, Ahn J, Lin X, Park C: Gene selection using support vector machines with non-convex penalty. Bioinformatics 2006, 22(1):88–95. 10.1093/bioinformatics/bti736View ArticlePubMedGoogle Scholar
- Vapnik VN: Statistical Learning Theory. New York: John Wiley and Sons; 1998.Google Scholar
- Zhang X, Lu X, Shi Q, Xu XQ, Leung HC, Harris LN, Iglehart JD, Miron A, Liu JS, Wong WH: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 2006, 7: 197. 10.1186/1471-2105-7-197PubMed CentralView ArticlePubMedGoogle Scholar
- Simon R: Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data. Br J Cancer 2003, 89(9):1599–1604. 10.1038/sj.bjc.6601326PubMed CentralView ArticlePubMedGoogle Scholar
- Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000, 16(10):906–914. 10.1093/bioinformatics/16.10.906View ArticlePubMedGoogle Scholar
- Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machine. Machine Learning 2002, 46: 389–422. 10.1023/A:1012487302797View ArticleGoogle Scholar
- Tang EK, Suganthan PN, Yao X: Gene selection algorithms for microarray data based on least squares support vector machine. BMC Bioinformatics 2006, 7: 95. 10.1186/1471-2105-7-95PubMed CentralView ArticlePubMedGoogle Scholar
- Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20(15):2429–2437. 10.1093/bioinformatics/bth267View ArticlePubMedGoogle Scholar
- Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z: Tissue classification with gene expression profiles. J Comput Biol 2000, 7(3–4):559–583. 10.1089/106652700750050943View ArticlePubMedGoogle Scholar
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531–537. 10.1126/science.286.5439.531View ArticlePubMedGoogle Scholar
- Thomas JG, Olson JM, Tapscott SJ, Zhao LP: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 2001, 11(7):1227–1236. 10.1101/gr.165101PubMed CentralView ArticlePubMedGoogle Scholar
- Cho SJ, Hermsmeier MA: Genetic Algorithm guided Selection: variable selection and subset selection. J Chem Inf Comput Sci 2002, 42(4):927–936.View ArticlePubMedGoogle Scholar
- Jirapech-Umpai T, Aitken S: Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics 2005, 6: 148. 10.1186/1471-2105-6-148PubMed CentralView ArticlePubMedGoogle Scholar
- Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK: Di-mensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 2000, 4: 164–171. 10.1109/4235.850656View ArticleGoogle Scholar
- Jolliffe IT: Principal Component Analysis. New York: Springer_Verlag; 1986.View ArticleGoogle Scholar
- Hibbs MA, Dirksen NC, Li K, Troyanskaya OG: Visualization methods for statistical analysis of microarray clusters. BMC Bioinformatics 2005, 6: 115. 10.1186/1471-2105-6-115PubMed CentralView ArticlePubMedGoogle Scholar
- Suykens J, Vandewalle J: Least squares support vector machine classifiers. Neural Processing Letters 1999, 9: 293–300. 10.1023/A:1018628609742View ArticleGoogle Scholar
- Suykens J, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J: Least Squares Support Vector Machines. Singapore: World Scientific; 2002.Google Scholar
- Nikitin A, Egorov S, Daraselia N, Mazo I: Pathway studio – the analysis and navigation of molecular networks. Bioinformatics 2003, 19(16):2155–2157. 10.1093/bioinformatics/btg290View ArticlePubMedGoogle Scholar
- Rajaraman R, Rajaraman MM, Rajaraman SR, Guernsey DL: Neosis – a paradigm of self-renewal in cancer. Cell Biol Int 2005, 29(12):1084–1097. 10.1016/j.cellbi.2005.10.003View ArticlePubMedGoogle Scholar
- Flagiello D, Poupon MF, Cillo C, Dutrillaux B, Malfoy B: Relationship between DNA methylation and gene expression of the HOXB gene cluster in small cell lung cancers. FEBS Lett 1996, 380(1–2):103–107. 10.1016/0014-5793(96)00017-8View ArticlePubMedGoogle Scholar
- Kaneko KJ, Rein T, Guo ZS, Latham K, DePamphilis ML: DNA methylation may restrict but does not determine differential gene expression at the Sgy/Tead2 locus during mouse development. Mol Cell Bio 2004, 24: 1968–1982. 10.1128/MCB.24.5.1968-1982.2004View ArticleGoogle Scholar
- Jones EY, Stuart DI, Walker NP: The structure of tumour necrosis factor – implications for biological function. J Cell Sci Suppl 1990, 13: 11–18.View ArticlePubMedGoogle Scholar
- Yuen HF, Chua CW, Chan YP, Wong YC, Wang X, Chan KW: Significance of TWIST and E-cadherin expression in the metastatic progression of prostatic cancer. Histopathology 2007, 50(5):648–658. 10.1111/j.1365-2559.2007.02665.xView ArticlePubMedGoogle Scholar
- Cheng GZ, Chan J, Wang Q, Zhang W, Sun CD, Wang LH: Twist transcriptionally up-regulates AKT2 in breast cancer cells leading to increased migration, invasion, and resistance to paclitaxel. Cancer Res 2007, 67(5):1979–1987. 10.1158/0008-5472.CAN-06-1479View ArticlePubMedGoogle Scholar
- Horikawa T, Yang J, Kondo S, Yoshizaki T, Joab I, Furukawa M, Pagano JS: Twist and epithelial-mesenchymal transition are induced by the EBV oncoprotein latent membrane protein 1 and are associated with metastatic nasopharyngeal carcinoma. Cancer Res 2007, 67(5):1970–1978. 10.1158/0008-5472.CAN-06-3933View ArticlePubMedGoogle Scholar
- Ohuchida K, Mizumoto K, Ohhashi S, Yamaguchi H, Konomi H, Nagai E, Yamaguchi K, Tsuneyoshi M, Tanaka M: Twist, a novel oncogene, is upregulated in pancreatic cancer: clinical implication of Twist expression in pancreatic juice. Int J Cancer 2007, 120(8):1634–1640. 10.1002/ijc.22295View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.