# Identifying differentially methylated genes using mixed effect and generalized least square models

- Shuying Sun
^{1, 2}Email author, - Pearlly S Yan
^{3}, - Tim HM Huang
^{3}and - Shili Lin
^{4}

**10**:404

https://doi.org/10.1186/1471-2105-10-404

© Sun et al; licensee BioMed Central Ltd. 2009

**Received: **23 April 2009

**Accepted: **9 December 2009

**Published: **9 December 2009

## Abstract

### Background

DNA methylation plays an important role in the process of tumorigenesis. Identifying differentially methylated genes or CpG islands (CGIs) associated with genes between two tumor subtypes is thus an important biological question. The methylation status of all CGIs in the whole genome can be assayed with differential methylation hybridization (DMH) microarrays. However, patient samples or cell lines are heterogeneous, so their methylation pattern may be very different. In addition, neighboring probes at each CGI are correlated. How these factors affect the analysis of DMH data is unknown.

### Results

We propose a new method for identifying differentially methylated (DM) genes by identifying the associated DM CGI(s). At each CGI, we implement four different mixed effect and generalized least square models to identify DM genes between two groups. We compare four models with a simple least square regression model to study the impact of incorporating random effects and correlations.

### Conclusions

We demonstrate that the inclusion (or exclusion) of random effects and the choice of correlation structures can significantly affect the results of the data analysis. We also assess the false discovery rate of different models using CGIs associated with housekeeping genes.

## Background

DNA methylation is the addition of a methyl group (CH_{3}) to the 5's cytocine (C) at a CG site. It can be inherited without changing the original DNA sequences. This epigenetic modification plays an important role in regulating gene expression, and it may cause tumor suppressor gene silencing [1]. Over the last two decades, many biological and computational studies have been done to investigate the methylation patterns in different tissues. These studies either focus on candidate genes such as p16 and RASSF1A [2] or on different chromosomes [3–7]. Most of these studies focus on cancer since methylation patterns are changed in neoplasia. These changes may include regional or genome-wide gain or loss of methylation [8]. In order to gain a genome-wide understanding of how changes in methylation affect tumor growth, the DMH protocol [9–11] has been used to simultaneously assay the methylation status of all known CGIs, genomic regions rich in CG sites [12].

Previous DMH microarray studies mainly focus on identifying genes that are differentially methylated between normal individuals and cancer patients (or cell lines). They identify the genes that are hypermethylated (more methylation in cancer than normal) or hypomethylated (less methylation in cancer than normal). The data analysis of these studies mainly focuses on identifying DM genes by identifying DM probes. For example, an arbitrary log ratio cut off of 1.5 has been used [13], and a Gamma-Normal-Gamma model has been applied to identify differentially methylated probes [14]. However, a single high or low log ratio probe may not represent true biological signals due to the large impact of probe affinity. This is because the intensity of each probe is related to its sequence, and different microarray probes may have similar sequences. Therefore, both specific and non-specific binding could occur. With non-specific binding, two probes against the same region (e.g., a short CGI) may have very different intensity values. This issue has been well known and has been addressed in the context of gene expression microarrays [15–18]. In addition to probe affinity, other factors such as the polymerase chain reaction (PCR) application effect, sample preparations, and the sensitivity of scanners will also affect probe intensities [17]. Furthermore, it has been shown that neighboring probes are highly correlated over hundreds of bases [3]. As a result, we can not assume that all probes are independent. In addition, because cancer patients or cell lines may have different levels of methylation signals, it is important to consider random effects in the model too.

Unlike previous DMH studies, this paper focuses on identifying genes that are differentially methylated between two tumor subtypes (or two racial groups) rather than between normal and cancerous cells. We propose a novel method for identifying a DM gene by pooling all probes in its associated CGI together and incorporating the correlation structures for probes in the same CGI. To implement this method, we apply two mixed effect models and two generalized least square models to incorporate the heterogeneity of different arrays (cell lines) and study the correlation structures between probes. We compare the results of these four models with the ones obtained from a simple least square regression model and find that it is important to incorporate the random effect and choose a correlation structure properly.

## Methods

### DMH microarray protocol, data preprocessing and description

Microarray technology has brought about a revolution in our understanding of normal and abnormal molecular processes. With the aid of this technology, it is now possible to identify DNA methylation patterns in specific regions of chromosomes or even in the entire genome. The DMH protocol [9–11] utilizes restriction enzymes to reduce the complexity of the genome while preserving GC-rich regions (many of which fall in and around CGIs) for methylation profiling. A brief outline of the DMH protocol is described below:

#### Step 1

Genomic DNA samples are sonicated into 400-500 bp fragments, and linkers are ligated to these fragments.

#### Step 2

The methylation status of the genome of interest can be investigated by methylation sensitive restriction enzymes. In this particular study, the investigation is achieved with two methylation sensitive enzymes, HpaII and HinPI, which have the recognition sites of CCGG and CGCG, respectively. These restriction enzymes will cut all of these CG-containing sites if the Cs are not methylated. These enzymes will therefore remove all linker-ligated fragments containing unmethylated CG dinucleotides, leaving behind fragments that are 100% methylated or those that have no HpaII and HinPI recognition sites.

#### Step 3

The DNA fragments that are not restricted are then enriched with the polymerase chain reaction (PCR) to provide sufficient materials for fluorescent labeling and microarray hybridization. The DMH protocol is applied to both test samples (e.g., cancer patient material or cancer cell lines) and control samples (common normal reference or paired normal tissue). Both types of samples are labeled with their respective fluorescent dyes: Cy5 dye pseudo-colored as red for test samples and Cy3 dye pseudo-colored as green for control samples. They are then combined in equal quantity and competitively hybridized to a prepared microarray slide, in particular, an Agilent 244 K human CGI array.

_{2}(Cy5/Cy3), as the observed methylation signal. Large positive log ratios show that there are more methylations in test samples (e.g., cancer patients) than in control samples (e.g. normal individuals). This means that there are strong hypermethylation signals in test samples. In contrast, large negative log ratios indicate that there are less methylations in test than in control samples, implying that there are strong hypomethylation signals in test samples. Several consecutive probes are expected to have similar positive (or negative) log ratios because the DNA fragments that are hybridized to the array are about 400-500 bp.

Summary of probe numbers and CGIs.

# of probes | 1~2 | 3~10 | 11~15 | 16~20 | 21~30 | > 30 |
---|---|---|---|---|---|---|

count | 2753 | 16580 | 3323 | 1190 | 721 | 358 |

Proportion | 0.11 | 0.67 | 0.13 | 0.05 | 0.03 | 0.01 |

In our example data, 38 breast cancer cell line microarrays are divided into two tumor subtypes, basal-like and luminal types [19]. Each group has 19 arrays. For each array, the background correction is done using the "edwards method" [20]. Dye effects are corrected using the standard within array loess normalization in the Bioconductor package "limma" [21]. The between array normalization is done using the quantile method [22, 23].

### Statistical models

In order to identify differentially methylated genes between two tumor subtypes, we propose the idea of identifying differentially methylated CGIs associated with them. That is, we build models to identify differentially methylated CGIs, then determine whether a gene is methylated (either at its promoter region or inside of the gene) by examining if there is any significant methylation difference at the CGI(s) associated with this gene. This method is different from the others that identify differentially methylated genes by focusing on identifying differentially methylated probes. Instead, we put the information of all probes at each CGI together to conduct analysis using different mixed effect and generalized least square models.

A mixed effect model includes fixed effect(s) and random effect(s). A fixed effect is usually a well-defined population factor with finite levels. We are interested in estimating different levels of a fixed effect and testing to see if they are significantly different. For example, in our DMH data the tumor subtype factor, which has two levels, basal-like and luminal types, is a fixed effect and we want to test to see if the methylation levels of these two subtypes are significantly different from each other. A random effect is related to some individual experiment units that are selected from a larger (potentially infinite) underlying population and represent a certain amount of random variation. For example, the cell lines (or arrays) within each subgroup can be thought of as random samples from their representative population. However, we are not particularly interested in the differences between the cell lines within the same population. As such, a random effect component is included to model the variability. The inclusion of a random effect can help us generalize the analysis result to the whole population.

Furthermore, each cell line can be considered as a "block" of probes, which will allow us to investigate the correlation structure between probes within the same array. Alternatively, a generalized least square model can also be used to study correlations among probes of a CGI within the same cell line/array. Various models may be used to study such correlations. In this paper, we apply two commonly used structures. One is "compound-symmetry" correlation; that is, all within-group errors are assumed to have equal correlations. Another one is based on the autoregressive model in which we assume data points are observed at integer time (or location) points, and the current observation is a linear function of previous ones and an error term [24].

Although we know neighboring probes are correlated, it is still unclear what specific correlation structure is the best fit. In addition, it is not clear how the random variation among different cell lines (in the same tumor subtype) will affect the analysis results. Therefore, we explore the following four models: (1) the mixed effect model with the array effect as a random effect, M1 (mix); (2) the mixed effect model with the array effect as a random effect and the probe correlation structure modeled using an autoregressive model of order one (AR1), M2 (mix.AR1); (3) the generalized least square model with a compound symmetry correlation structure, M3 (gls.comSym); and (4) the generalized least square model with an AR1 model of correlation structure, M4 (gls.AR1).

In the following four models, we use *g* to represent the group index with *g* = 1 and 2 corresponding to tumor subtypes: basal-like and luminal types. Likewise, *p* is the probe index (*p* = 1, ..., P, where P is the total number of probes in one CGI). *a* is the array (or cell line) index (*a* = 1, 2, ..., N, where N is the total number of arrays). *α*_{
g
}and *β*_{
p
}represent the group and probe effects, and they are fixed effects. *γ*_{
a
}is the random effect. *ε*_{
gpa
}is the residual error. *γ*_{
a
}and *ε*_{
gpa
}are independent in all models.

(1). M1 (mix): it is the mixed effect model,

*y*

_{ gpa }=

*α*

_{ g }+

*β*

_{ p }+

*γ*

_{ a }+

*ε*

_{ gpa }, for any

*a*, and for any

*g*,

*p*and

*a*. The correlation structure is

Therefore, the correlation between any two different probes (*p*_{1} and *p*) at each CGI is
.

(2). M2 (mix.AR1): it is the mixed effect model with an AR1 correlation structure, *y*_{
gpa
}= *α*_{
g
}+ *β*_{
p
}+ *γ*_{
a
}+ *ε*_{
gpa
},
for any *a*, and
for any *g*, *p* and *a*.

*p*

_{1}and

*p*) at each CGI is , where

*ρ*is the correlation between two consecutive residual errors, that is,

*y*

_{ gpa }=

*α*

_{ g }+

*β*

_{ p }+

*ε*

_{ gpa }, for any

*g*,

*p*and

*a*. The correlation structure is

Therefore, the correlation between any two different probes (*p*_{1} and *p*) at each CGI is *ρ*_{3}.

*y*

_{ gpa }=

*α*

_{ g }+

*β*

_{ p }+

*ε*

_{ gpa }, for any

*g*,

*p*and

*a*. The correlation structure is

Therefore, the correlation between any two different probes (*p*_{1} and *p*) at each CGI is
.

For the above models, we are interested in testing the null hypothesis (H_{0}) that there is no methylation difference between the two tumor subtypes, i.e., *α*_{1} = *α*_{2}, at each CGI. If the test p-value of one CGI is less than a certain threshold, then it is indicative of strong evidence that this CGI and its associated gene(s) are differentially methylated between two tumor subtypes. All the above mixed effect models and generalized least square models are implemented for each CGI using the R package "nlme" [24].

## Results

### Comparisons of different models

*y*

_{ gpa }=

*α*

_{ g }+

*β*

_{ p }+

*ε*

_{ gpa }, for any

*g*,

*p*and

*a*. The test results of all five models, that is, the numbers of CGIs with p-values < 0.05, are summarized in Table 2. The overlaps of these models are shown in Figure 2. In addition, we also plot the p-values of all CGIs from four models (M1, M2, M3 and M4) against the p-values of the model M0, which does not incorporate correlations between probes (see Figure 3). The two top plots of this figure are the models with random effects included (models M1 and M2). The model M1 has larger p-values than M0 at

*all*CGIs, while the model M2 has larger p-values than M0 at

*most*CGIs. The two bottom plots of Figure 3 are the models without any random effect included. These plots show that, compared with the linear model (M0), models M3 and M4 have some CGIs with larger p-values and some with smaller ones than M0.

The number of CGIs with p-values < 0.05 in five models.

Models | M1 | M2 | M3 | M4 | M0 |
---|---|---|---|---|---|

# of CGI with p < 0.05 | 463 | 503 | 839 | 862 | 1048 |

The mixed effect models (M1 and M2) identify roughly half as many DM CGIs as identified by the non-random effect models (M3, M4 and M0). This could be due to the fact that these mixed effect models include the random effect to incorporate the variation of different cell line samples. When the random effect is included, a positive correlation structure is introduced to the model. Therefore, we tend to get larger p-values. This is useful in reducing bias due to the noise, which could be caused by many unmeasured factors, and may introduce improper negative correlations that may not be consistent with the underlying methylation pattern and the DMH microarray protocol.

### Relationship between correlation estimates and the number of probes

In Figure 5, "A" is the correlation estimates of the model M1, that is, the estimates of the intra-class correlation,
, for all CGIs. These estimates are only related to the variances of the random effect and the residual error, so they are all positive. "B" is the correlation estimates (*ρ*_{2}) of the model M2 for all CGIs, some of which have positive correlations, while the others have negative ones. "C" is the correlation estimates of the model M3 for all CGIs, the majority of which are positive. "D" is the correlation estimates of the model M4 for all CGIs. About half of these estimates are positive and half are negative.

### Assessing false discovery rate

The number of housekeeping genes with p < 0.05 in five models.

Models | M1 | M2 | M3 | M4 | M0 |
---|---|---|---|---|---|

P < 0.05 | 6 | 8 | 9 | 12 | 11 |

## Discussion and Conclusions

We propose a new method for identifying differentially methylated genes by identifying differentially methylated CGIs. With this method, we can consider the information of all probes in each CGI as a whole. One advantage of this method is that we can add a "probe" factor to account for the noise and variation due to many unmeasured factors (e.g., effects due to PCR and human error). Another advantage is that we can model the correlation between probes in the same array at each CGI. Our comparison results do show that this is necessary as the correlations between probes can affect the test results significantly.

Because cancer cell lines or patient samples are in general heterogeneous, they may have very different levels of methylation. Hence, the inclusion (or exclusion) of random effects is crucial in the data analysis. Our assessment of the false discovery rate using HK genes also shows that the mixed effect models (M1 and M2) have relatively smaller false discover rates. Therefore, we recommend using a mixed effect model instead of a generalized least square model. As for whether to use M1 or M2, we recommend using the model M1 for the following two reasons. First, according to the DMH protocol, the consecutive probes are supposed to be positively correlated. Second, the AR1 model could be too specific for this type of noisy and complex data; that is, 'forcing' a specific model like AR1 to identify the local correlation may not be proper. As for the selected candidate CGIs (genes), we recommend ignoring short CGIs that have only three or four probes and have zero correlation estimates or investigating them more carefully. We suggest this since methylation signals should be very similar within 300-400 bases (covering about three or four probes) and such a zero correlation estimate is more than likely due to noise.

As for the sensitivity, it is very important to access this with genes that are known to be differentially methylated between two tumor subtypes. Currently, we do not have such a list of genes. Instead, we make use of some known biological information about methylation patterns [3] and the DMH protocol [9–11] to make the above suggestions. One key assumption of linear models is normality. In order to validate the assumption, we used the Kolmogrov-Smironov (KS) test on the residuals of the M1 (mix) model. After the "BH" multiple test correction [26], no CGIs fail the KS test. Therefore, it is reasonable to assume the normal distribution.

After the within-array and between-array normalization, outlying probes are selected and deleted if there is at least one observation falling out of this range [Q1 - k*(Q3-Q1), Q3 + k*(Q3-Q1)], where k = 5, Q1 and Q3 are the 25% and 75% quantiles of the log ratios of all probes in one CGI. Note that we choose k = 5, which is larger than the usual inter quartile range (k = 1.5). With k = 5, we mean to delete those probes that have outlying observations due to noise but retain those with certain variations for further modeling.

The main focus of this paper is to address the importance of incorporating random effects and correlations in our statistical models and to see how this affects the analysis results. Therefore, the numbers of DM CGIs presented in Table 2 are the summary based on the raw p-values, and we have not done any multiple test correction. If we were to conduct multiple test corrections, especially using a stringent method, we expected that not many CGIs would be selected any more. This could be due to two reasons. First, the underlying biological truth is that the effect size is too small to survive any multiple test corrections. Second, the sample size may not be large enough to separate two tumor subtypes. For example, it has been shown that among 194 cancer patients, only about 1/6 ~1/5 of them have stronger methylation signals than normal individuals at the CST6 gene transcription start site (see Figure 7 (A) of [27]). Generally speaking, two types of cancerous cells are much more similar than cancerous and normal cells, so many more samples might be needed to really separate the methylation patterns of two tumor subtypes.

In this paper, we implement four models (M1, M2, M3 and M4) in all 22,000 CGIs, which have at least three probes. For each CGI, a p-value is obtained at each model to indicate whether there is a significant methylation difference between two tumor subtypes. The p-values for long CGIs (e.g., the ones with more than 20 probes) may not have the same meaning as the ones for the shorter CGIs (e.g., with less than 10 probes). We also discover that the long CGIs easily fail the KS normality test (i.e., raw p-value < 0.05) and have relatively smaller R^{2}. Therefore, we recommend utilizing the following two methods to deal with long CGIs. First, break the long CGIs into a few parts with each part consisting of several probes (e.g., around 10 probes). Second, fit the linear model to probes at each CGI using only the probes that are located in the promoter or first exon region of certain genes, and if necessary, divide them into small parts. There are only a small percentage of CGIs (about 4%) with more than 20 probes, so this will not affect our conclusion about the relationship between correlation estimates and the number of probes.

There are other HK gene lists provided by some recent papers [28, 29]. After comparing the lists from these papers with the one we used [25], we found our HK gene list is still a better choice for our purpose because it has the following two advantages. First, it is obtained using relatively a larger number of tissues than the ones used in [28, 29]. Second, the microarray data set used in [25] has replicates which helps reduce some measurement noise, making results more trustworthy.

Gene set enrichment analysis

Gene sets | p-value | Genes in pathways |
---|---|---|

Neuroactive ligand receptor interaction | 2.13 e-5 | EDNRB, GABBR1, GRIK2 |

Cytokine cytokine receptor interaction | 3.1 e-4 | KDR, EPOR, MPL, TGFB2, BMP2 |

Regulation of actin cytoskeleton | 9.21 e-4 | PIK3R2, PPP1CB, LIMK2, ARPC1B |

Focal adhesion | 4.81 e-3 | KDR, PIK3R2, PPP1CB, COL2A1, COL5A1 |

Natural killer cell mediated cytotoxicity | 4.97 e-3 | PIK3R2, NFATC3 |

Jak-STAT signaling pathway | 5.63 e-3 | EPOR, MPL, PIK3R2 |

Oxidative phosphorylation | 6.17 e-3 | ATP5A1, ATP12A |

Smooth muscle contraction | 6.53 e-3 | PLCB3, CALM1, PRKCE |

Purine metabolism | 8.35 e-3 | POLA1, ADA, IMPGH2 |

MAPK signaling pathway | 1.14 e-2 | MAP3K7, CACNA1C, CACNA1H, MKNK1, NTRK1, MAPKIP2, CACNG8, RASGRF2, TGFB2 |

CGIs could cover different parts of genes, and CGIs from promoter regions may differ from those found in the gene body with respect to their methylation levels and neighboring probe correlation levels. However, this does not affect the performance of our models since they are completely general and analyze each CGI *separately*. Therefore, they can accommodate the methylation level differences and probe correlation strength for each CGI. After the analysis is done, we can do further bioinformatic studies for the CGIs we identified. For example, we can investigate the relationship between CGI locations and probe correlation estimates. Studying the genes identified by the model M1, we found that 45% of promoter CGIs have a positive correlation estimate, and 55% of non-promoter CGIs have a positive correlation. This may indicate that the probe correlation estimates are location dependent.

Although we have only presented the implementation and comparisons of the models in one data set in this paper to focus our discussion, in fact, we have done the same analysis and comparison on an endometrial cancer data set of two racial groups (African Americans and Caucasians) and got the same conclusions. In addition, although the results of this paper are mainly for the DMH microarray data, the methods could be applied to data generated by the other methylation microarray protocols such as Methylated DNA Immunoprecipitation (MeDIP) assay.

## Declarations

### Acknowledgements

This work is supported by the National Science Foundation [0112050]. The authors thank Greg Singers, Dustin Potter, Benjamin Rodriguez, Andrew Oster and Dan Siegal-Gaskins for valuable suggestions and discussions. We appreciate two anonymous reviewers' comments and suggestions.

## Authors’ Affiliations

## References

- Cortez CC, Jones PA: Chromatin, cancer and drug therapies.
*Mutation Research*2008, 647: 44–51.PubMed CentralView ArticlePubMedGoogle Scholar - Zhu W, Qin W, Hewett JE, Sauter ER: Quantitative evaluation of DNA hypermethylation in malignant and benign breast tissue and fluids.
*International Journal of Cancer*2009, 126(2):474–482. 10.1002/ijc.24728View ArticleGoogle Scholar - Eckhardt F, Lewin J, Cortese R, Rakyan VK, Attwood J, Burger M, Burton J, Cox TV, Davies R, Down TA, Haefliger C, Horton R, Howe K, Jackson DK, Kunde J, Koenig C, Liddle J, Niblett D, Otto T, Pettett R, Seemann S, Thompson C, West T, Rogers J, Olek A, Berlin K, Beck S: DNA methylation profiling of human chromosomes 6, 20 and 22.
*Nature Genetics*2006, 38: N1378–1385. 10.1038/ng1909View ArticleGoogle Scholar - Fan S, Zhang X: CpG island methylation pattern in different human tissues and its correlation with gene expression.
*Biochemical and Biophysical Research Communications*2009, 383(4):421–425. 10.1016/j.bbrc.2009.04.023View ArticlePubMedGoogle Scholar - Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, Rongione M, Webster M, Ji H, Potash JB, Sabunciyan S, Feinberg AP: The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores.
*Nature Genetics*2009, 41(2):178–86. 10.1038/ng.298PubMed CentralView ArticlePubMedGoogle Scholar - De Bustos C, Ramos E, Young JM, Tran RK, Menzel U, Langford CF, Eichler EE, Hsu L, Henikoff S, Dumanski JP, Trask BJ: Tissue-specific variation in DNA methylation levels along human chromosome 1.
*Epigenetics Chromatin*2009, 2(1):7. 10.1186/1756-8935-2-7PubMed CentralView ArticlePubMedGoogle Scholar - Previti C, Harari O, Zwir I, Val CD: Profile analysis and prediction of tissue-specific CpG island methylation classes.
*BMC Bioinformatics*2009, 10(1):116. 10.1186/1471-2105-10-116PubMed CentralView ArticlePubMedGoogle Scholar - Jones PA, Baylin SB: The epigenomics of cancer.
*Cell*2007, 128(4):683–92. 10.1016/j.cell.2007.01.029PubMed CentralView ArticlePubMedGoogle Scholar - Huang T, Perry M, Laux D: Methylation profiling of CpG islands in human breast cancer cells.
*Human Molecular Genetics*1999, 8: 459–470. 10.1093/hmg/8.3.459View ArticlePubMedGoogle Scholar - Yan PS, Chen CM, Shi H, Rahmatpanah F, Wei SH, Huang TH: Applications of CpG island microarrays for high-throughput analysis of DNA methylation.
*The Journal of Nutrition*2002, 132(8 Suppl):2430S-2434S.PubMedGoogle Scholar - Yan PS, Potter D, Deatherage DE, Huang TH, Lin S: Differential methylation hybridization: profiling DNA methylation with a high-density CpG island microarray.
*Methods in Molecular Biology*2009, 507: 89–106. full_textView ArticlePubMedGoogle Scholar - Gardiner-Garden M, Frommer M: CpG islands in vertebrate genomes.
*Journal of Molecular Biology*1987, 196(2):261–282. 10.1016/0022-2836(87)90689-9View ArticlePubMedGoogle Scholar - Zighelboim I, Goodfellow P, Schmidt A, Walls K, Mallon M, Mutch D, Yan P, Huang T, Powell M: Differential Methylation Hybridization Array of Endometrial Cancers Reveals Two Novel Cancer-Specific Methylation Markers.
*Clinical Cancer Research*2007, 13(10):2882–2889. 10.1158/1078-0432.CCR-06-2367View ArticlePubMedGoogle Scholar - Khalili A, Potter D, Yan P, Li L, Gray J, Huang T, Lin S: Gamma-Normal-Gamma Mixture Model for Detecting Differentially Methylated Loci in Three Breast Cancer Cell Lines.
*Cancer Informatics*2007, 2: 43–54.Google Scholar - Li C, Wong WH: Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.
*Proceedings of the National Academy of Sciences USA*2001, 98(1):31–36. 10.1073/pnas.011404098View ArticleGoogle Scholar - Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
*Biostatistics*2003, 4(2):249–264. 10.1093/biostatistics/4.2.249View ArticlePubMedGoogle Scholar - Binder H, Preibisch S: Specific and Nonspecific Hybridization of Oligonucleotide Probes on Microarrays.
*Biophysical Society*2005, 89(1):337–352. 10.1529/biophysj.104.055343View ArticleGoogle Scholar - Bruun GM, Wernersson R, Juncker AS, Willenbrock H, Nielsen HB: Improving comparability between microarray probe signals by thermodynamic intensity correction.
*Nucleic Acids Research*2007, 35(7):e48. 10.1093/nar/gkl1098PubMed CentralView ArticlePubMedGoogle Scholar - Neve R, Chin K, Fridlyand J, Yeh J, Baehner F, Fevr T, Clark L, Bayani N, Coppe J, Tong F, Speed T, Spellman P, DeVries S, Lapuk A, Wang N, Kuo W, Stilwell J, Pinkel D, Albertson D, Waldman F, McCormick F, Dickson R, Johnson M, Lippman M, Ethier S, Gazdar A, Gray J: A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes.
*Cancer Cell*2006, 10: 515–527. 10.1016/j.ccr.2006.10.008PubMed CentralView ArticlePubMedGoogle Scholar - Edwards DE: Non-linear normalization and background correction in one-channel cDNA microarray studies.
*Bioinformatics*2003, 19: 825–833. 10.1093/bioinformatics/btg083View ArticlePubMedGoogle Scholar - Smyth GK, Speed TP: Normalization of cDNA microarray data.
*Methods*2003, 265–273. 10.1016/S1046-2023(03)00155-5Google Scholar - Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on bias and variance.
*Bioinformatics*2003, 19: 185–193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar - Yang YH, Thorne NP: Normalization for two-color cDNA microarray data. In
*Science and Statistics: A Festschrift for Terry Speed, IMS Lecture Notes - Monograph Series*Edited by: Goldstein DR. 2003, 40: 403–418. full_textView ArticleGoogle Scholar - Pinheiro J, Bates D:
*Mixed effect models in S and S-PLUS.*Springer; 2004.Google Scholar - Eisenberg E, Levanon E: Human Housekeeping genes are compact.
*Trends in Genetics*2003, 19: 362–365. 10.1016/S0168-9525(03)00140-9View ArticlePubMedGoogle Scholar - Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing.
*Journal of the Royal Statistical Society Series B*1995, 57: 289–300.Google Scholar - Lin HL, Zuo T, Lin C, Kuo CT, Liyanarachchi S, Sun S, Shen R, Deatherage DE, Potter D, Asamoto L, Lin S, Yan P, Cheng A, Ostrowski M, Huang TH: Breast cancer-associated fibroblasts confer AKT1-mediated epigenetic silencing of cystatin M in epithelial cells.
*Cancer Research*2008, 68: 10257–10266. 10.1158/0008-5472.CAN-08-0288PubMed CentralView ArticlePubMedGoogle Scholar - de Jonge HJ, Fehrmann RS, de Bont ES, Hofstra RM, Gerbens F, Kamps WA, de Vries EG, Zee AG, te Meerman GJ, ter Elst A: Evidence based selection of housekeeping genes.
*PLoS One*2007, 2(9):e898. 10.1371/journal.pone.0000898PubMed CentralView ArticlePubMedGoogle Scholar - She X, Rohl CA, Castle JC, Kulkarni AV, Johnson JM, Chen R: Definition, conservation and epigenetics of housekeeping and tissue-enriched genes.
*BMC genomics*2009, 10(1):269. 10.1186/1471-2164-10-269PubMed CentralView ArticlePubMedGoogle Scholar - Wei Z, Li H: A Markov random field model for network-based analysis of genomic data.
*Bioinformatics*2007, 12: 1537–1544. 10.1093/bioinformatics/btm129View ArticleGoogle Scholar - Tai F, Pan W: Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data.
*Bioinformatics*2007, 23: 3170–2177. 10.1093/bioinformatics/btm488View ArticlePubMedGoogle Scholar - Zhang L, Huang J, Yang N, Greshock J, Liang S, Hasegawa K, Giannakakis A, Poulos N, O'Brien-Jenkins A, Katsaros D, Butzow R, Weber BL, Coukos G: Integrative genomic analysis of phosphatidylinositol 3'-kinase family identifies PIK3R3 as a potential therapeutic target in epithelial ovarian cancer.
*Clinical Cancer Research*2007, 13(18 Pt 1):5314–5321. 10.1158/1078-0432.CCR-06-2660View ArticlePubMedGoogle Scholar - Eckstein N, Servan K, Hildebrandt B, Pölitz A, von Jonquières G, Wolf-Kümmeth S, Napierski I, Hamacher A, Kassack MU, Budczies J, Beier M, Dietel M, Royer-Pokora B, Denkert C, Royer HD: Hyperactivation of the insulin-like growth factor receptor I signaling pathway is an essential event for cisplatin resistance of ovarian cancer cells.
*Cancer Research*2009, 69(7):2996–3003. 10.1158/0008-5472.CAN-08-3153View ArticlePubMedGoogle Scholar - Xie K, Bai RZ, Wu Y, Liu Q, Liu K, Wei YQ: Anti-tumor effects of a human VEGFR-2-based DNA vaccine in mouse models.
*Genetic Vaccines and Therapy*2009, 7(1):10. 10.1186/1479-0556-7-10PubMed CentralView ArticlePubMedGoogle Scholar - Li YJ, Deng YJ, Wen G, Zhang XL: Evaluation of angiogenesis in the tumorigenesis and progression of breast cancer.
*Zhonghua Wai Ke Za Zhi*2009, 47(7):519–522.PubMedGoogle Scholar - An SJ, Chen ZH, Lin QX, Su J, Chen HJ, Lin JY, Wu YL: The 271 G>A polymorphism of kinase insert domain-containing receptor gene regulates its transcription level in patients with non-small cell lung cancer.
*BMC Cancer*2009, 9: 144. 10.1186/1471-2407-9-144PubMed CentralView ArticlePubMedGoogle Scholar - Mirmohammadsadegh A, Marini A, Gustrau A, Delia D, Nambiar S, Hassan M, Hengge UR: Role of Erythropoietin Receptor Expression in Malignant Melanoma.
*Journal of Investigative Dermatology*2009, in press.Google Scholar - Hussein K, Bock O, Theophile K, Schulz-Bischof K, Porwit A, Schlue J, Jonigk D, Kreipe H: MPLW515L mutation in acute megakaryoblastic leukaemia. Leukemia 2009, (5):852–855. 10.1038/leu.2008.371Google Scholar
- Wen XZ, Akiyama Y, Baylin SB, Yuasa Y: Frequent epigenetic silencing of the bone morphogenetic protein 2 gene through methylation in gastric carcinomas.
*Oncogene*2006, 25(18):2666–2673. 10.1038/sj.onc.1209297View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.