Identification of aberrant gene expression associated with aberrant promoter methylation in primordial germ cells between E13 and E16 rat F3 generation vinclozolin lineage

Taguchi, Y-h

doi:10.1186/1471-2105-16-S18-S16

Volume 16 Supplement 18

Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Bioinformatics

Research
Open access
Published: 09 December 2015

Identification of aberrant gene expression associated with aberrant promoter methylation in primordial germ cells between E13 and E16 rat F3 generation vinclozolin lineage

Y-h Taguchi¹

BMC Bioinformatics volume 16, Article number: S16 (2015) Cite this article

2091 Accesses
20 Citations
5 Altmetric
Metrics details

Abstract

Background

Transgenerational epigenetics (TGE) are currently considered important in disease, but the mechanisms involved are not yet fully understood. TGE abnormalities expected to cause disease are likely to be initiated during development and to be mediated by aberrant gene expression associated with aberrant promoter methylation that is heritable between generations. However, because methylation is removed and then re-established during development, it is not easy to identify promoter methylation abnormalities by comparing normal lineages with those expected to exhibit TGE abnormalities.

Methods

This study applied the recently proposed principal component analysis (PCA)-based unsupervised feature extraction to previously reported and publically available gene expression/promoter methylation profiles of rat primordial germ cells, between E13 and E16 of the F3 generation vinclozolin lineage that are expected to exhibit TGE abnormalities, to identify multiple genes that exhibited aberrant gene expression/promoter methylation during development.

Results

The biological feasibility of the identified genes were tested via enrichment analyses of various biological concepts including pathway analysis, gene ontology terms and protein-protein interactions. All validations suggested superiority of the proposed method over three conventional and popular supervised methods that employed t test, limma and significance analysis of microarrays, respectively. The identified genes were globally related to tumors, the prostate, kidney, testis and the immune system and were previously reported to be related to various diseases caused by TGE.

Conclusions

Among the genes reported by PCA-based unsupervised feature extraction, we propose that chemokine signaling pathways and leucine rich repeat proteins are key factors that initiate transgenerational epigenetic-mediated diseases, because multiple genes included in these two categories were identified in this study.

Background

Transgenerational epigenetics (TGE) [1] describes the transfer of phenotypes between generations without the modification of genome sequences. Because the plant germline arises from somatic cells, TGE is often observed in plants. However, TGE was also reported in the offspring of mammals, when pregnant females are exposed to endocrine disruptions. Many factors are affected by TGE including male infertility [2], anxious behavior [3], mate preference [4], various diseases [5], reprogramming of primordial germ cells [6], and stress responses [7].

In contrast to reports studying the relationship of TGE to various abnormalities, few studies have investigated how TGE occurs. The main difficulty of studying TGE mechanisms is that epigenetic markers such as promoter methylation are not only heritable, but also vary over time during development in the generation associated with TGE. For example, for promoter methylation to affect development, it must be switched on/off during various stages of development [1]. Thus, TGE that affects development is expected to follow a similar time course. Therefore, abnormalities caused by TGE must be related to the aberrant timing of promoter methylation/demethylation when compared with normal organisms. Detecting small irregularities of promoter methylation timing based on comparisons with normal organisms is not easy. For example, Skinner et al [6] recently tried to identify aberrant gene expression associated with aberrant promoter methylation between E13 and E16 germ line in F3 generation vinclozolin lineages, where vinclozolin functions as an endocrine disruptor. Endocrine disruption is thought to cause various diseases especially in reproductive organs, because it is often misrecognized as a hormone effect on the development of reproductive organs. Thus, usage of endocrine disruptors is usually forbidden for public health. Furthermore, vinclozolin was recently observed to cause TGE abnormalities. However, Skinner et al failed to identify strict pairs of aberrant gene expression and promoter methylation for specific genes. They concluded "A comparison between the germ cell DMR (differential DNA methylated regions) and the differentially expressed genes indicated no significant overlap". Thus, our understanding of the mechanisms by which TGE occurs remains poor.

In the present study we applied the recently proposed principal component analysis (PCA)-based unsupervised feature extraction (FE) [8–17] to the data set obtained by Skinner et al [6] and successfully identified a significant overlap between DMR and differentially expressed genes. Various methods for enrichment analyses supported the biological feasibility of the 48 identified RefSeq mRNAs. We also confirmed the superiority of the proposed methodology over three other methods. The relatively poorer performances achieved by these three methodologies compared with PCA-based unsupervised FE indicated that the proposed methodology outperformed these three frequently employed methods.

Furthermore, 22 genes among those derived from the 48 RefSeq mRNAs identified by PCA-based unsupervised FE were previously reported to be related to diseases caused by TGE [5]. This suggests that aberrant gene expression associated with aberrant promoter methylation during this stage of development is a key factor in the generation of TGE-mediated diseases. Because multiple genes involved in chemokine signaling pathways or containing leucine-rich repeat (LRR) proteins were identified in the current study, we hypothesized that chemokine signaling pathways and/or LRR proteins were involved in mediating TGE-related diseases.

Previous usage of PCA-based unsupervised FE

Here, we briefly review previous studies [8–17] that used PCA-based unsupervised FE. In Refs. [8–11], we applied PCA-based unsupervised FE to microRNA expression for biomarker identification between patients (of various diseases including various cancers, chronic obstructive pulmonary disease, and Alzheimer's disease, etc) and healthy controls; microRNA extracted in an unsupervised manner was combined with linear discriminant analysis. We found a combination of 10-20 microRNAs generally achieved about 80 % accuracy. It was also confirmed that the identified set of microRNAs were stable. Thus, this method is robust for the selection of samples. In Ref. [12], we applied PCA-based unsupervised FE to the proteome in a bacterial culture and identified critical proteins in an unsupervised manner. In Ref. [13], we applied PCA-based unsupervised FE to mRNA and miRNA expression of stressed mouse heart. After identifying potential disease causing genes, we performed in silico drug discovery of the identified genes. In Ref. [14], we performed integrated analysis of promoter methylation profiles of three distinct autoimmune diseases using PCA-based unsupervised FE and identified many genes commonly associated with aberrant promoter methylation. In Ref. [15], we applied PCA-based unsupervised FE to genotyping/DNA methylation profiles of cancer and identified genotype specific DNA methylation profiles that occurred in cancer genetics. In Refs [16, 17], PCA-based unsupervised FE of mRNA expression and promoter methylation profiles of normal/treated cancer cell lines was investigated. Based upon the integrated analysis of mRNA expression and promoter methylation profiles, we identified potential disease causing genes.

In summary, PCA-based unsupervised FE has mainly been used to compare between patients (or cancer cell lines) and healthy controls excluding one exception [12]. Because it is likely that healthy controls and patients (or control and treated cancer cell lines) exhibit distinct expressions, it is not surprising that PCA-based unsupervised FE detected significant differences, even if most of the biomarker/disease causing genes were identified only by PCA-based unsupervised FE, but not by other methodologies. In this study, we applied PCA-based unsupervised FE to a different factor, the difference between two time points (E13 and E16). These time points represent different developmental stages and thus some differences are expected; however, the time points are separated by only 3 days, and therefore the differences should be much smaller than between healthy controls and patients (or control and treated cancer cell lines). Of note, although Skinner et al [6] reported no aberrant gene expression associated with aberrant promoter methylation between E13 and E16 germ lines in F3 generation vinclozolin lineages, the study was still published. Thus, from a methodological point of view, the purpose of this study was to investigate whether PCA-based unsupervised FE could identify slight differences; thus it is a new challenge for this methodology.

Methods

Gene expression and promoter methylation profiles

Gene expression/promoter methylation profiles were retrieved from the gene expression omnibus (GEO) using GEO ID GSE59511. This super series consists of two subseries, GSE43559 and GSE59510, each of which includes gene expression (using Affymetrix Rat Gene 1.0 ST Array) and promoter methylation (using NimbleGen Rat CpG Island Plus RefSeq Promoter 720k array) information, respectively. Gene expression profiles were directly loaded from GEO to R [18] by getGEO function while six files whose names ended with ratio_peaks_mapToFeatures_All_Peaks.txt.gz were downloaded and loaded into R using read.csv for promoter methylation. Table 1 shows a list of the samples analyzed. GSE43559 (gene expression) consists of eight samples classified into four categories, E13 control, E13 treated, E16 control, and E16 treated. GSE59510 (promoter methylation) consists of six samples classified into two categories, E13 and E16 (all from F3 generation primordial germ lines). Using the ratio between treated and control groups, eight gene expression profiles were converted to alternative eight profiles as follows:

Table 1 Gene expression and promoter methylation profiles.

Full size table

(\begin{gathered} \frac{E 13 Control rep 1}{E 13 treated rep 1} \\ \frac{E 13 Control rep 2}{E 13 treated rep 2} \\ \frac{E 13 Control rep 2}{E 13 treated rep 1} \\ \frac{E 13 Control rep 1}{E 13 treated rep 2} \\ \frac{E 16 Control rep 1}{E 16 treated rep 1} \\ \frac{E 16 Control rep 2}{E 16 treated rep 2} \\ \frac{E 16 Control rep 2}{E 16 treated rep 1} \\ \frac{E 16 Control rep 1}{E 16 treated rep 2} \end{gathered}) .

The reason we used a ratio of control to treated instead of the usual ratio of treated to control is explained in additional file 1. These were further normalized to have a mean of zero and a variance of one within each sample. Because six samples in GSE59510 were already transformed to a ratio between treated/control samples, these were not normalized. In total, 14 (8+6) samples that exhibited a ratio between control/treated samples were pooled and prepared for further analyses. The only difference between control and treated samples was whether oil or vinclozolin was injected to F1 pregnant rats between E8 and E14. Any other treatments were identical between E13 and E16.

Principal component analysis-based unsupervised feature extraction

Although this method was described in detail in a recently published review article [19], this methodology is briefly introduced here. Example: x_ij is the gene expression/promoter methylation of the ith gene (i = 1, ..., N) in the jth sample (j = 1, ..., M). For simplicity, it is assumed that the mean of x_ij over i within each j is zero. Then, in contrast to the ordinary usage of PCA where samples are embedded into the low dimensional space, genes are embedded into the low dimensional space by applying PCA. Thus, principal component (PC) scores of the ℓth component, x_iℓ, (ℓ = 1, ..., M) are attributed to each gene while each sample has contributed c_ℓj to the ℓth component. By this definition, x_iℓ is expressed as

x_{i ℓ} = \sum_{j} c_{ℓ j} x_{i j}

PCA-based unsupervised FE attempts to extract features (in this specific application, genes) with larger absolute PC scores along the specified ℓth PC.

In the specific application described in the present study, $N_{expression}^{'}$ probes using gene expression and $N_{methylation}^{'}$ probes using promoter methylation were selected, respectively. For the computation of P-values of coincident analysis with binomial distribution, $N_{expression}^{'} = N_{methylation}^{'} = N^{'}$ for simplicity.

Although there are several ways to determine which PC is employed for FE, the most straightforward and intuitive strategy is to identify PCs that are mostly coincident with categories by employing categorical regression:

c_{ℓ j} = a_{ℓ} + \sum_{k} a_{k ℓ} δ_{k j}

where a_ℓ and a_kℓ are numerical (regression) coefficients. Then, the ℓth PC associated with the (most) significant regression is employed as the PC for FE. Because this study only contained two categories (E13 and E16), we used the t test instead of categorical regression to measure the significance of coincidence between c_ℓj and categories.

ttest-based FE

The t test was applied to gene expression/promoter methylation in each probe. For gene expression, $(\begin{gathered} \frac{E 13 Control rep 1}{E 13 treated rep 1} \\ \frac{E 13 Control rep 2}{E 13 treated rep 2} \\ \frac{E 13 Control rep 2}{E 13 treated rep 1} \\ \frac{E 13 Control rep 1}{E 13 treated rep 2} \end{gathered})$ and $(\begin{gathered} \frac{E 16 Control rep 1}{E 16 treated rep 1} \\ \frac{E 16 Control rep 2}{E 16 treated rep 2} \\ \frac{E 16 Control rep 2}{E 16 treated rep 1} \\ \frac{E 16 Control rep 1}{E 16 treated rep 2} \end{gathered})$ were compared. For promoter methylation, $(\begin{gathered} \frac{E 13 - Vip 2}{E 13 - Cip 1} \\ \frac{E 13 - Vip 1}{E 13 - Cip 1} \\ \frac{E 13 - Vip 2}{E 13 - Cip 2} \end{gathered})$ and $(\begin{gathered} \frac{E 16 - Vip 2}{E 16 - Cip 1} \\ \frac{E 16 - Vip 1}{E 16 - Cip 1} \\ \frac{E 16 - Vip 2}{E 16 - Cip 2} \end{gathered})$ were compared. Then the most significant (i.e., associated with smaller P-values) $N_{expression}^{'}$ and $N_{methylation}^{'}$ probes were selected, respectively. For the computation of P-values of coincident analysis with binomial distribution, $N_{expression}^{'} = N_{methylation}^{'} = N^{'}$ for simplicity.

limma-based FE

limma [20] was applied to gene expression and promoter methylation as follows. For gene expression, after converting raw gene expression to logarithmic values, the model Diff = (E16.VIN-E16.CNTL)-(E13.VIN-E16.CNTL) was applied, where VIN and CNTL correspond to treated and control samples, respectively. For promoter methylation, only the ratio between control and treated samples was provided (see Table 1), and the two class model was applied for E13 and E16 samples (R source codes are shown in additional file 1). Then the obtained P-values were employed for FE. The remaining procedures were the same as for the previous two FEs.

SAM-based FE

SAM [21] was applied to gene expression and promoter methylation separately, as shown in Table 1, i.e., as two class problems of E13 and E16 (siggenes packages [22] in Bioconductor [23] was used). Then, the obtained P-values were used for FE. The remaining procedures were the same as for the previous three FEs.

Protein-protein interaction enrichment analysis

The obtained RefSeq mRNA IDs were converted to gene names ("official gene symbol") via a gene ID conversion tool implemented in DAVID [24], and the obtained gene names were uploaded to STRING [25] server. Then, "protein-protein interactions" was selected among the pull-down menu of "enrichment", where the expected number of PPIs for the set of genes uploaded and the P-value attributed to identified PPIs are available.

Gene ID identification for literature searches

Literature searches were performed using gene symbols that were converted from RefSeq mRNAs using DAVID as explained above.

Results and Discussion

Gene selection using PCA-based unsupervised FE

Figure 1 illustrates the strategy to identify aberrant gene expression associated with aberrant promoter methylation between controls and vinclozolin treated samples during development from E13 to E16. Gene expression and promoter methylation of vinclozolin treated F3 samples were normalized relative to controls. Then, by separately applying PCA-based unsupervised FE to each sample group, the top N' (≪ N) genes were independently selected. The number of commonly selected genes N'' was counted. If N'' was much larger than expected, the selection of aberrant gene expression associated with aberrant promoter methylation was determined to be successful.

At first, the PCs used for FE shown in Figure 1 were specified and a boxplot (PC2 for mRNA and PC1 for methylation) is shown in Figure 2. These two PCs exhibited a significant distinction between the two categories, E13 and E16. Using the specified PCs, PCA-based unsupervised FE was performed. Then, the most significant N' genes were extracted for gene expression and promoter methylation, respectively. P-values to determine whether the coincidence and the number of commonly selected genes among N' genes occurred accidentally was computed by binomial distribution. How the P-values varied dependent upon N' was determined. Figure 3 shows the dependence of P-values upon N' when N = 13324, the number of genes commonly included in gene expression and promoter methylation profiles. P-values were smaller for larger N'. However, the minimum N' with P-values less than 0.05 were selected (i.e., N' = 1000) to minimize the number of genes selected to reduce the time spent performing literature searches in the later part of this study. Among the 1000 genes selected in either gene expression or promoter methylation, 48 RefSeq mRNAs were commonly selected (a list of gene names and boxplots of individual genes are shown in additional files 2 and 3). The P-value for N' = 1000 was 0.04 (see Figure 3, and this value was confirmed by the shuffle test, additional file 1). Thus, we successfully selected genes that were significantly associated with simultaneous aberrant gene expression/promoter methylation.

To biologically validate these 48 RefSeq mRNAs, we uploaded them to three enrichment analyses servers, DAVID [24], TargetMine [26] and g:Profiler [27]. We observed some biological terms were enriched among the selected genes (Table 2). Almost 50% of the genes selected belonged to G-protein coupled receptors (GPCR) or cell surface receptor pathways, which was expected because an endocrine disruptor such as vinclozolin targets cell surface receptors. We also estimated PPI enrichment (see methods). Because it is rare for proteins to function in the absence of collaboration with other proteins, enriched PPIs among the selected genes (proteins) can provide supporting evidence for the biological significance of selected genes. There were seven PPIs although the expected number of PPIs was three. This resulted in P = 0.05; thus there was significant PPI enrichment among the genes selected by PCA-based unsupervised FE.

Table 2 Enrichment analysis of 48 RefSeq mRNAs commonly selected in the top most 1000 genes by applying PCA-based unsupervised FE to gene expression and promoter methylation.

Full size table

P-values shown in Figure 3 remained significant even when N' increased from 1000 to 2000. Thus, we tried to obtain more genes by setting N' = 2000, because the greater number of genes uploaded would have a tendency to enhance enrichment. There were 179 mRNAs commonly selected between gene expression and promoter methylation (see additional file 3 for the full list). Uploading these genes to three enrichment analyses servers resulted in greater enrichment for these 179 genes as expected (Tables 3, 4, and 5).

Table 3 Enrichment analysis of 179 genes commonly selected in the top most 2000 genes by applying PCA-based unsupervised FE to gene expression and promoter methylation.

Full size table

Table 4 Enrichment analysis of 179 genes commonly selected in the top most 2000 genes by applying PCA-based unsupervised FE to gene expression and promoter methylation.

Full size table

Table 5 Enrichment analysis of 179 genes commonly selected in the top most 2000 genes by applying PCA-based unsupervised FE to gene expression and promoter methylation.

Full size table

GPCR and cell surface receptors were enhanced and olfactory transduction related biological terms were vastly enriched. Careful investigation of the selected genes indicated that many olfactory receptor proteins were newly identified when N' was increased from 1000 to 2000. Olfactory receptor proteins were also recognized by Skinner et al [6]. Thus, the identification of many olfactory receptor proteins suggested the correctness and superiority of our methodology, because Skinner et al [6] did not identify reciprocal relationships between gene expression and promoter methylation, probably owing to a lack of suitable statistical methods, although they noted their importance.

PPI enrichment significance was also enhanced when N' increased from 1000 to 2000. There were 360 PPIs among 179 genes while the expected number of PPIs was 191. This resulted in P = 0 (within the numerical accuracy adopted); thus the significance of PPI enrichment was enhanced. The increase of PPIs was mostly due to the newly identified olfactory receptor proteins.

These data suggest the biological suitability of our methodology.

Comparisons with other supervised FEs

Although PCA-based unsupervised FE was already demonstrated to perform better than various conventional methods reported for various applications [8–17], other simpler methods might achieve a comparative performance in this specific example, although the study by Skinner et al [6] was unsuccessful. To demonstrate the superiority of PCA-based unsupervised FE compared with a simpler method, genes were selected by t test between E13 and E16. The t test was applied to the F3 generation vinclozolin lineage gene expression/promoter methylation and normalized to control samples (see methods) and the most N' significant genes were selected for both gene expression and promoter methylation. Then the significance of overlap between genes selected by the t test related to gene expression and promoter methylation was determined. Results demonstrated that in the range of N' ≤ 2000, the minimum achieved P was 0.38, which was not significant (additional file 4). P-values increased as N' approached 2000 in contrast to the tendency seen in Figure 3 and there were no overlaps when N' ≤ 300; thus, the P-value could not be computed. Therefore, in this specific example, PCA-based unsupervised FE achieved a good performance compared with a simpler method. The second method we compared was limma [20] (see Methods), a popular method used for gene expression analyses, especially when genes exhibit differential expression between multiple conditions. In the first page of the manual, limma was identified as aiming to analyze a small number of samples; "Empirical Bayesian methods are used to provide stable results even when the number of arrays is small". Thus, limma is a very suitable method whose performance should be compared with PCA based unsupervised FE that also aims to treat small sample cases. The results were disappointing. Within the range of N' ≤ 2000, there were only two N' associated with P-values less than 0.05, when N's tested were taken to be equivalent to those when PCA-based unsupervised FE as well as t test-based FE were employed (additional file 4). The number of genes selected commonly between gene expression and promoter methylation was also small. When N' = 800 such that there were as many common genes as possible, the number of genes commonly selected between gene expression and promoter methylation was 33 Refseq mRNAs, among which there were no overlaps with the 48 RefSeq mRNAs selected by PCA-based unsupervised FE when N' = 1000 (the list of 33 RefSeq mRNAs identified by limma-based FE is shown in additional file 3). Furthermore, biological validation was also disappointing. Even uploading these 33 RefSeq mRNAs to three enrichment servers, the identified enrichments were zero. DAVID, g:Profiler or TargetMine identified no enriched GO BP, CC, MF terms or KEGG pathways. There were also no PPIs detected by STRING among the RefSeq mRNAs genes. Moreover, because there are no larger N' associated with P-values less than 0.05, we could not increase the number of common genes such that more enrichments were detected. The third method we compared was SAM [21] (see Methods), another popular method used for gene expression analyses, also designed for multiclass problems. The results were again disappointing. Within the range of N' ≤ 2000, there were only four N' associated with P-values less than 0.05, when N′s tested were taken to be equivalent to those when the other three methods were employed (additional file 4). The number of genes selected commonly between gene expression and promoter methylation was also small. When N' = 800 such that there were as many common genes as possible, the number of genes commonly selected between gene expression and promoter methylation was 30 RefSeq mRNAs (the list of 30 RefSeq mRNAs identified by SAM-based FE is shown in additional file 3), among which there were 11 RefSeq mRNAs that were also included in the 48 RefSeq mRNAs identified by PCA-based unsupervised FE when N' = 1000. Furthermore, biological validation was also disappointing. No GO BP, CC, MF terms or KEGG pathway enrichments were identified when these 30 RefSeq mRNAs were uploaded to the DAVID, g:Profiler or TargetMine enrichment servers. There were also only two PPIs (P = 0.37, thus not significant) detected by STRING among the RefSeq mRNAs genes. Moreover as for limma, because there are no larger N' associated with P-values less than 0.05, we could not increase the number of common genes such that more enrichments were detected.

Thus, we conclude that PCA-based unsupervised FE outperformed simple t test-based FE, the sophisticated Bayesian methodology (limma)-based FE, and the popular SAM-based FE. Therefore, the superiority of PCA-based unsupervised FE to these methods was demonstrated.

Biological significance of selected genes: literature searches

To estimate the biological significance of the selected 48 RefSeq mRNAs selected by PCA-based unsupervised FE when N' = 1000 in more detail, we performed a literature search for each gene selected. During the searches, we focused on the relationship between selected genes and diseases; Anway et al [5] reported that TGE induced by vinclozolin caused tumor, prostate, kidney, testis and immune diseases. Table 6 summarizes the association of selected genes previously reported to be related to tumor, prostate, kidney, testis and immune disease. Of the 48 RefSeq mRNAs identified, 22 were associated with targeted properties (the list of references identified by literature searches as well as detailed discussions are available in additional file 1). This indicates the success of our methodology to identify genes potentially associated with causing TGE mediated diseases in the F3 generation vinclozolin lineage.

Table 6 Summary of literature searches for genes selected by PCA-based unsupervised FE when N' = 1000.

Full size table

To further compare genes selected by PCA-based unsupervised FE with those selected by limma and SAM from a biological point of view, we also performed literature searches of the 33 RefSeq mRNAs selected by limma (Table 7) and 19 RefSeq mRNAs selected by SAM but not included in the 48 Refseq mRNAs identified by PCA-based unsupervised FE when N' = 1000 (Table 8). The list of references identified by literature searches is shown in additional file 1. The limma-based FE was inferior to PCA-based unsupervised FE because only 13 genes were identified by the literature search and were reported to be related to lower numbers of terms including "tumors", "prostate", "kidney", "testis" and "immune". On the other hand, the SAM-based FE showed more promising results, which was expected because it identified 11 RefSeq mRNAs that overlapped with 48 RefSeq mRNAs identified by PCA-based unsupervised FE when N' = 1000. In Table 8, 11 out of 19 genes were associated with disease. Therefore, it might be thought that SAM-based FE was superior to PCA-based unsupervised FE, which only identified 22 disease associated genes among 48 genes. However, PCA-based unsupervised FE also identified 179 genes by increasing N' from 1000 to 2000, which is impossible for SAM-based FE. We found that only three genes (IL15, PGAM2, and ZFP36L1) that were not included in the 179 RefSeq mRNAs identified by PCA-based unsupervised FE when N' = 2000. This suggested that PCA-based unsupervised FE identified almost the same genes as cover those identified by SAM-based FE. Although the disease associations of genes identified by PCA-based unsupervised FE and previously reported in the literature might not have been confirmed or remain just an observation or hypothesis, the strict difference in disease associated genes identified by PCA-based unsupervised FE and those by other methods suggested the superiority of PCAbased unsupervised FE.

Table 7 Summary of literature searches for genes selected by limma-based FE.

Full size table

Table 8 Summary of literature searches for genes selected by SAM-based FE, but not by PCA-based unsupervised FE when N' = 1000.

Full size table

Thus, PCA-based unsupervised FE appeared superior to limma and SAM-based FE.

Two groups of selected genes: chemokine signaling pathway genes and LRR proteins

To provide further insights of the genes selected by PCA-based unsupervised FE when N' = 1000, we focused on two categories: chemokine signaling pathway genes and LRR proteins, both of which have been extensively reported to be related to vinclozolin mediated diseases. This also supports the effectiveness of our methodology and the importance of the selected genes.

Chemokine signaling pathway

Four chemokine/chemokine receptors were selected: CCR2, PF4 (CXCL4), CCL3, and CMKLR1. The first three belong to the chemokine signaling pathway (rno04062) in KEGG, as either ligands or receptors that activate chemokine signaling pathways. In addition, CMKLR1 is a chemokine receptor-like protein although it is not included in the KEGG pathway. They are all localized at cell surfaces, and therefore are expected to function together to activate/inactivate chemokine signaling pathways. It is reasonable that they are detected together in the present analysis. Some studies also suggested a relationship between chemokines and vinclozolin. Cowin et al [28] reported that prostatic inflammation was associated with the postpubertal activation of proinflammatory NFκB-dependent genes, including the chemokine IL-8 when embryos were exposed to vinclozolin in rats. Chemokine signaling pathways were associated with genes whose expression was altered in rat F3 generation vinclozolin lineage Sertoli cells [2]. The expression of Cxcr4 and Cxcl2 was altered in vinclozolin F3 generation rat prostate epithelial cells [28]. Interestingly, together with Cxcr4 and Cxcl2, BMP6 was reported to have an altered expression [28]. BMP6 shares binding specificity with BMP3, which was also identified in this study (see Table 6). BMP proteins belong to the TGFβ pathway (rno04350) that is grouped with the chemokine signaling pathway as part of the cytokine-cytokine receptor interaction system (rno04060). Furthermore, numerous studies have suggested a relationship between chemokine signaling pathways and various diseases. For example, the inhibition of chemokine-induced biological activity is a promising therapeutic strategy for proteinuric disorders [29]. Chemokines and chemokine receptors play critical roles in prostate cancer development and progression [30] as well as in testicular inflammation [31]. Thus, TGE abnormalities caused by vinclozolin might develop through the regulation of inherited promoter methylation during the stages of development, by affecting chemokine signaling pathways.

LRR proteins

LRRN3, PRAMEL1, and LRRTM1 are LRR proteins shown in Table 6 or in additional files 2 and 3. LRR proteins were frequently reported to be associated with F3 generation vinclozolin lineages, e.g., LRRc48, LRRc56, and LRRc8B [2], LR-RTM3 and ELFN2 [3], Lrfn3 [28], Lrrc46, Lrrc48, Lgi2, Fbxl7, Lgi4, Lrig1, Fbxl12, Lrfn1, Lingo1, Lrrc8b, Lrrn2, and LRRTM1. Lrrtm4 [4], Lrrc61 [32], and Lrrc56 [6] were reported to be related to F3 generation vinclozolin lineages. Although the frequent observation of aberrant LRR protein expression in F3 generation vinclozolin lineages does not always indicate that LRR proteins are potential causative factors of vinclozolin mediated transgenerational epigenetic-induced diseases, there are multiple reports that suggest a relationship between LRR proteins and nervous system disorders. LRRTM1, LRRTM3, LRRN1 and LRRN3 were reported to be related to autism spectrum disorder [33] and polymorphisms in LRR genes are associated with autism spectrum disorder susceptibility in populations of European ancestry [33]. Deletions in the LRRTM binding partner neurexin 1 (NRXN1) have also been linked to schizophrenia [34]. Moreover, LRR protein dysfunction may disrupt neuronal excitation/inhibition balance and contribute to neuropsychiatric disorders [35, 36]. LRRTM3 was also identified as a candidate gene for late-onset Alzheimer's disease [37]. Toll-like receptors, transmembrane LRR proteins that bind a wide molecular variety of pathogen-associated ligands and are involved in immune responses and have been implicated in neurodegenerative diseases such as multiple sclerosis, stroke, and Alzheimer's disease [38, 39]. Moreover, LRR proteins are generally believed to play critical roles in the development of neural circuits [35, 36]; e.g., LRRTMs and neuroligins bind neurexins differentially to cooperate in glutamate synapse development [40]. Although neuroligins and neurexins mediate connections between pre/post-synapses [41], LRR proteins co-function with neuroligins and neurexins [36]. However, there are multiple reports that suggest vinclozolin mediates nervous system disorders. For example, the exposure of rats to vinclozolin increased risk for autism [7]. Furthermore, perinatal exposure to endocrine disruptors was generally associated with autism spectrum disorder [42] and exposure to vinclozolin significantly increased vulnerability to anxiety [43]. Thus, this association was suggested to be heritable [44]. Therefore, it is not surprising that LRR proteins cause vinclozolin mediated neuropsychiatric disorders including autism spectrum disorder.

Conclusions

This study re-analyzed the gene expression/promoter methylation profiles of primordial germ cells between E13 and E16 rat F3 generation vinclozolin lineage [6]. In contrast to analyses performed previously [6], we successfully identified various genes associated with aberrant promoter methylation/gene expression using treated and control samples. Identified genes were related to previously reported diseases in F3 generation vinclozolin lineage. We focused on two categories, chemokine signaling pathway molecules and LRR proteins, that might be disease causing factors. The success of the study methodology suggests the possibility that abnormalities in F3 generation vinclozolin lineage are mediated by heritable aberrant promoter methylation during development between generations.

References

Heard E, Martienssen RA: Transgenerational epigenetic inheritance: myths and mechanisms. Cell. 2014, 157 (1): 95-109.
Article CAS PubMed PubMed Central Google Scholar
Guerrero-Bosagna C, Savenkova M, Haque MM, Nilsson E, Skinner MK: Environmentally induced epigenetic transgenerational inheritance of altered Sertoli cell transcriptome and epigenome: molecular etiology of male infertility. PLoS ONE. 2013, 8 (3): 59922-
Article CAS Google Scholar
Skinner MK, Anway MD, Savenkova MI, Gore AC, Crews D: Transgenerational epigenetic programming of the brain transcriptome and anxiety behavior. PLoS ONE. 2008, 3 (11): 3745-
Article CAS Google Scholar
Skinner MK, Savenkova MI, Zhang B, Gore AC, Crews D: Gene bionetworks involved in the epigenetic transgenerational inheritance of altered mate preference: environmental epigenetics and evolutionary biology. BMC Genomics. 2014, 15: 377-
Article PubMed PubMed Central Google Scholar
Anway MD, Leathers C, Skinner MK: Endocrine disruptor vinclozolin induced epigenetic transgenerational adult-onset disease. Endocrinology. 2006, 147 (12): 5515-5523.
Article CAS PubMed Google Scholar
Skinner MK, Guerrero-Bosagna C, Haque M, Nilsson E, Bhandari R, McCarrey JR: Environmentally induced transgenerational epigenetic reprogramming of primordial germ cells and the subsequent germ line. PLoS ONE. 2013, 8 (7): 66318-
Article CAS Google Scholar
Crews D, Gillette R, Scarpino SV, Manikkam M, Savenkova MI, Skinner MK: Epigenetic transgenerational inheritance of altered stress responses. Proc Natl Acad Sci USA. 2012, 109 (23): 9143-9148.
Article CAS PubMed PubMed Central Google Scholar
Murakami Y, Toyoda H, Tanahashi T, Tanaka J, Kumada T, Yoshioka Y, Kosaka N, Ochiya T, Taguchi YH: Comprehensive miRNA expression analysis in peripheral blood can diagnose liver disease. PLoS ONE. 2012, 7 (10): 48366-
Article CAS Google Scholar
Murakami Y, Tanahashi T, Okada R, Toyoda H, Kumada T, Enomoto M, Tamori A, Kawada N, Taguchi YH, Azuma T: Comparison of Hepatocellular Carcinoma miRNA Expression Profiling as Evaluated by Next Generation Sequencing and Microarray. PLoS ONE. 2014, 9 (9): 106314-
Article CAS Google Scholar
Taguchi YH, Murakami Y: Principal component analysis based feature extraction approach to identify circulating microRNA biomarkers. PLoS ONE. 2013, 8 (6): 66714-
Article CAS Google Scholar
Taguchi YH, Murakami Y: Universal disease biomarker: can a fixed set of blood microRNAs diagnose multiple diseases?. BMC Res Notes. 2014, 7: 581-
Article CAS PubMed PubMed Central Google Scholar
Taguchi Yh, Okamoto A: Principal component analysis for bacterial proteomic analysis. Pattern Recognition in Bioinformatics LNCS. Edited by: Shibuya, T., Kashima, H., Sese, J., Ahmad, S. 2012, Springer, Heidelberg, 7632: 141-152.
Chapter Google Scholar
Taguchi YH, Iwadate M, Umeyama H: Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease. BMC Bioinformatics. 2015, 16 (1): 139-
Article PubMed PubMed Central Google Scholar
Ishida S, Umeyama H, Iwadate M, Taguchi YH: Bioinformatic Screening of Autoimmune Disease Genes and Protein Structure Prediction with FAMS for Drug Discovery. Protein Pept Lett. 2014, 21 (8): 828-39.
Article CAS PubMed PubMed Central Google Scholar
Kinoshita R, Iwadate M, Umeyama H, Taguchi YH: Genes associated with genotype-specific DNA methylation in squamous cell carcinoma as candidate drug targets. BMC Syst Biol. 2014, 8 (Suppl 1): 4-
Article Google Scholar
Umeyama H, Iwadate M, Taguchi YH: TINAGL1 and B3GALNT1 are potential therapy target genes to suppress metastasis in non-small cell lung cancer. BMC Genomics. 2014, 15 (Suppl 9): 2-
Article Google Scholar
Taguchi Yh: Integrative analysis of gene expression and promoter methylation during reprogramming of a non-small-cell lung cancer cell line using principal component analysis-based unsupervised feature extraction. Intelligent Computing in Bioinformatics LNCS. Edited by: Huang, D.-S., Han, K., Gromiha, M. 2014, Springer, Heidelberg, 8590: 445-455.
Google Scholar
R Core Team: R A Language and Environment for Statistical Computing. 2014, R Foundation for Statistical Computing, Vienna, Austria, R Foundation for Statistical Computing, [http://www.R-project.org/]
Taguchi Yh, Iwadate M, Umeyama H, Murakami Y, Okamoto A: Heuristic principal component analysis-aased unsupervised feature extraction and its application to bioinformatics. Big Data Analytics in Bioinformatics and Healthcare. Edited by: Wang, B., Li, R., Perrizo, W. 2015, 138-162.
Google Scholar
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK: limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015, 43: 10-1093007.
Article CAS Google Scholar
Tusher V, Tibshirani R, Chu C: Significance analysis of microarrays applied to ionizing radiation response. Proceedings of the National Academy of Sciences. 2001, 98: 5116-5121.
Article CAS Google Scholar
Schwender H: Siggenes: Multiple Testing Using SAM and Efron's Empirical Bayes Approaches. R package version 1.40.0. 2012
Google Scholar
Gentleman RC, Carey VJ, Bates DM, et al: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology. 2004, 5: 80-
Article Google Scholar
Huang daW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4 (1): 44-57.
Article CAS Google Scholar
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, Kuhn M, Bork P, Jensen LJ, von Mering C: STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015, 43 (Database): 447-452.
Article Google Scholar
Chen YA, Tripathi LP, Mizuguchi K: TargetMine, an integrated data warehouse for candidate gene prioritisation and target discovery. PLoS ONE. 2011, 6 (3): 17844-
Article CAS Google Scholar
Reimand J, Arak T, Vilo J: g:Profiler-a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Res. 2011, 39 (Web Server): 307-315.
Article CAS Google Scholar
Cowin PA, Foster P, Pedersen J, Hedwards S, McPherson SJ, Risbridger GP: Early-onset endocrine disruptor-induced prostatitis in the rat. Environ Health Perspect. 2008, 116 (7): 923-929.
Article CAS PubMed PubMed Central Google Scholar
Moreno JA, Moreno S, Rubio-Navarro A, Sastre C, Blanco-Colio LM, Gomez-Guerrero C, Ortiz A, Egido J: Targeting chemokines in proteinuria-induced renal disease. Expert Opin Ther Targets. 2012, 16 (8): 833-845.
Article CAS PubMed Google Scholar
Singh RK, Sudhakar A, Lokeshwar BL: Role of Chemokines and Chemokine Receptors in Prostate Cancer Development and Progression. J Cancer Sci Ther. 2010, 2 (4): 89-94.
Article CAS PubMed PubMed Central Google Scholar
Guazzone VA, Jacobo P, Theas MS, Lustig L: Cytokines and chemokines in testicular inflammation: A brief review. Microsc Res Tech. 2009, 72 (8): 620-628.
Article CAS PubMed Google Scholar
Guerrero-Bosagna C, Covert TR, Haque MM, Settles M, Nilsson EE, Anway MD, Skinner MK: Epigenetic transgenerational inheritance of vinclozolin induced mouse adult onset disease and associated sperm epigenome biomarkers. Reprod Toxicol. 2012, 34 (4): 694-707.
Article CAS PubMed PubMed Central Google Scholar
Sousa I, Clark TG, Holt R, Pagnamenta AT, Mulder EJ, Minderaa RB, Bailey AJ, Battaglia A, Klauck SM, Poustka F, Monaco AP: Polymorphisms in leucine-rich repeat genes are associated with autism spectrum disorder susceptibility in populations of European ancestry. Mol Autism. 2010, 1 (1): 7-
Article CAS PubMed PubMed Central Google Scholar
Todarello G, Feng N, Kolachana BS, Li C, Vakkalanka R, Bertolino A, Weinberger DR, Straub RE: Incomplete penetrance of NRXN1 deletions in families with schizophrenia. Schizophr Res. 2014, 155 (1-3): 1-7.
Article PubMed Google Scholar
de Wit J, Ghosh A: Control of neural circuit formation by leucine-rich repeat proteins. Trends Neurosci. 2014, 37 (10): 539-550.
Article CAS PubMed PubMed Central Google Scholar
de Wit J, Hong W, Luo L, Ghosh A: Role of leucine-rich repeat proteins in the development and function of neural circuits. Annu Rev Cell Dev Biol. 2011, 27: 697-729.
Article CAS PubMed Google Scholar
Majercak J, Ray WJ, Espeseth A, Simon A, Shi XP, Wolffe C, Getty K, Marine S, Stec E, Ferrer M, Strulovici B, Bartz S, Gates A, Xu M, Huang Q, Ma L, Shughrue P, Burchard J, Colussi D, Pietrak B, Kahana J, Beher D, Rosahl T, Shearman M, Hazuda D, Sachs AB, Koblan KS, Seabrook GR, Stone DJ: LRRTM3 promotes processing of amyloid-precursor protein by BACE1 and is a positional candidate gene for late-onset Alzheimer's disease. Proc Natl Acad Sci USA. 2006, 103 (47): 17967-17972.
Article CAS PubMed PubMed Central Google Scholar
Okun E, Griffioen KJ, Lathia JD, Tang SC, Mattson MP, Arumugam TV: Toll-like receptors in neurodegeneration. Brain Res Rev. 2009, 59 (2): 278-292.
Article CAS PubMed PubMed Central Google Scholar
Kielian T: Overview of toll-like receptors in the CNS. Curr Top Microbiol Immunol. 2009, 1-14. 336
Siddiqui TJ, Pancaroglu R, Kang Y, Rooyakkers A, Craig AM: LRRTMs and neuroligins bind neurexins with a differential code to cooperate in glutamate synapse development. J Neurosci. 2010, 30 (22): 7495-7506.
Article CAS PubMed PubMed Central Google Scholar
Sudhof TC: Neuroligins and neurexins link synaptic function to cognitive disease. Nature. 2008, 455 (7215): 903-911.
Article CAS PubMed PubMed Central Google Scholar
de Cock M, Maas YG, van de Bor M: Does perinatal exposure to endocrine disruptors induce autism spectrum and attention deficit hyperactivity disorders?. Review Acta Paediatr. 2012, 101 (8): 811-818.
Article CAS PubMed Google Scholar
Gillette R, Miller-Crews I, Nilsson EE, Skinner MK, Gore AC, Crews D: Sexually dimorphic effects of ancestral exposure to vinclozolin on stress reactivity in rats. Endocrinology. 2014, 155 (10): 3853-3866.
Article CAS PubMed PubMed Central Google Scholar
Bale TL: Lifetime stress experience: transgenerational epigenetics and germ cell programming. Dialogues Clin Neurosci. 2014, 16 (3): 297-305.
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This study was supported by KAKENHI 23300357 and 26120528 and a Chuo University Joint Research Grant.

Declarations

Publication charges for this article have been funded by Chuo University Joint Research Grant.

This article has been published as part of BMC Bioinformatics Volume 16 Supplement 18, 2015: Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S18.

Author information

Authors and Affiliations

Department of Physics, Chuo University, Kasuga, Bunkyo-ku, 112-8551, Tokyo, Japan
Y-h Taguchi

Authors

Y-h Taguchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y-h Taguchi.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YHT planned and performed all analyses and wrote the paper.

Electronic supplementary material

Additional file 1: Supplementary discussions. (PDF 74 KB)

Additional file 2: List of genes selected by PCA-based unsupervised FE, limma-based FE, and SAM-based FE. (PDF 43 KB)

Additional file 3: Boxplots for the 48 genes identified by PCA-based unsupervised FE when N' = 1000. (XLSX 22 KB)

12859_2015_7207_MOESM4_ESM.pdf

Additional file 4: Dependence of P values upon N' when genes are selected by the t test, limma and SAM instead of PCA-based unsupervised FE. (PDF 15 KB)

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Taguchi, Yh. Identification of aberrant gene expression associated with aberrant promoter methylation in primordial germ cells between E13 and E16 rat F3 generation vinclozolin lineage. BMC Bioinformatics 16 (Suppl 18), S16 (2015). https://doi.org/10.1186/1471-2105-16-S18-S16

Download citation

Published: 09 December 2015
DOI: https://doi.org/10.1186/1471-2105-16-S18-S16

Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Bioinformatics

Identification of aberrant gene expression associated with aberrant promoter methylation in primordial germ cells between E13 and E16 rat F3 generation vinclozolin lineage

Abstract

Background

Methods

Results

Conclusions

Background

Previous usage of PCA-based unsupervised FE

Methods

Gene expression and promoter methylation profiles

Principal component analysis-based unsupervised feature extraction

ttest-based FE

limma-based FE

SAM-based FE

Protein-protein interaction enrichment analysis

Gene ID identification for literature searches

Results and Discussion

Gene selection using PCA-based unsupervised FE

Comparisons with other supervised FEs

Biological significance of selected genes: literature searches

Two groups of selected genes: chemokine signaling pathway genes and LRR proteins

Chemokine signaling pathway

LRR proteins

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary material

Additional file 1: Supplementary discussions. (PDF 74 KB)

Additional file 2: List of genes selected by PCA-based unsupervised FE, limma-based FE, and SAM-based FE. (PDF 43 KB)

Additional file 3: Boxplots for the 48 genes identified by PCA-based unsupervised FE when N' = 1000. (XLSX 22 KB)

12859_2015_7207_MOESM4_ESM.pdf

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us