Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method

Takahashi, Hiro; Nemoto, Takeshi; Yoshida, Teruhiko; Honda, Hiroyuki; Hasegawa, Tadashi

doi:10.1186/1471-2105-7-399

Research article
Open access
Published: 04 September 2006

Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method

Hiro Takahashi^1,2,3,
Takeshi Nemoto⁴,
Teruhiko Yoshida³,
Hiroyuki Honda¹ &
…
Tadashi Hasegawa^4,5

BMC Bioinformatics volume 7, Article number: 399 (2006) Cite this article

6252 Accesses
26 Citations
Metrics details

Abstract

Background

Recent advances in genome technologies have provided an excellent opportunity to determine the complete biological characteristics of neoplastic tissues, resulting in improved diagnosis and selection of treatment. To accomplish this objective, it is important to establish a sophisticated algorithm that can deal with large quantities of data such as gene expression profiles obtained by DNA microarray analysis.

Results

Previously, we developed the projective adaptive resonance theory (PART) filtering method as a gene filtering method. This is one of the clustering methods that can select specific genes for each subtype. In this study, we applied the PART filtering method to analyze microarray data that were obtained from soft tissue sarcoma (STS) patients for the extraction of subtype-specific genes. The performance of the filtering method was evaluated by comparison with other widely used methods, such as signal-to-noise, significance analysis of microarrays, and nearest shrunken centroids. In addition, various combinations of filtering and modeling methods were used to extract essential subtype-specific genes. The combination of the PART filtering method and boosting – the PART-BFCS method – showed the highest accuracy. Seven genes among the 15 genes that are frequently selected by this method – MIF, CYFIP2, HSPCB, TIMP3, LDHA, ABR, and RGS3 – are known prognostic marker genes for other tumors. These genes are candidate marker genes for the diagnosis of STS. Correlation analysis was performed to extract marker genes that were not selected by PART-BFCS. Sixteen genes among those extracted are also known prognostic marker genes for other tumors, and they could be candidate marker genes for the diagnosis of STS.

Conclusion

The procedure that consisted of two steps, such as the PART-BFCS and the correlation analysis, was proposed. The results suggest that novel diagnostic and therapeutic targets for STS can be extracted by a procedure that includes the PART filtering method.

Background

Soft tissue sarcomas are a group of highly heterogeneous tumors that exhibit a diverse spectrum of mesenchymal differentiations. However, the molecular dissection of tumor heterogeneity has been hampered by the relatively low incidence of these tumors; approximately 3,800 cases are reported annually in Japan. Significant differences were observed in the five-year survival rates among the subtypes of STS, e.g., 100% for well-differentiated liposarcoma (WLS), 71% for synovial sarcoma (SS), 46% for pleomorphic malignant fibrous histiocytoma (MFH), and 92% for myxofibrosarcoma (MFS). The primary objective of this study was to identify a set of marker genes that facilitates accurate differential diagnosis of the sarcoma subtypes. Discrimination between MFH and MFS, for example, is particularly difficult because there is a histological overlap between the two. Information on such subtype-specific genes may also help in understanding the molecular pathways that are activated in each subtype of the different biological malignancies.

Recent advances in DNA microarray analysis have enabled the simultaneous evaluation of the expression levels of several tens of thousands of genes, thereby offering a rich source of information that is potentially useful in the diagnosis and prognosis of diseases [1]. There are two main methods of expression data analyses: unsupervised learning methods and supervised learning methods. The unsupervised learning methods, e.g., hierarchical clustering [2] and fuzzy adaptive resonance theory (Fuzzy ART) [3], are designed to identify previously unrecognized classes of disease based on their expression pattern; the biological significance of such disease subtypes, such as prognosis, is then assessed. In contrast, the supervised learning methods use training sets to specify the genes that should be clustered together [4]. However, to conduct either unsupervised or supervised analysis, it is necessary to select genes that have a strong correlation with the target phenotype, such as disease diagnosis or prognosis. This is because the performance of classification analysis can decline if a large number of genes as predictor variables are incorporated in the model.

Gene selection has been performed to screen candidate genes for modeling. There are two types of approaches – the wrapper approach and the filtering approach. In the former approach, genes are selected as a part of mining algorithms, such as k-nearest neighbor (kNN), multiple regression analysis (MRA), weighted voting (WV) [5], support vector machines (SVM) [6], fuzzy neural network (FNN) combined with SWEEP operator (FNN-SWEEP) method [7], and boosted fuzzy classifier with SWEEP operator (BFCS) method [8, 9]. On the other hand, in the latter approach, prior to the application of the mining algorithms, genes are selected by filtering methods, such as the Mann-Whitney U test, Student's t-test (Sttest), Welch's t-test (Wttest), the signal-to-noise statistic (S2N) [5], significance analysis of microarrays (SAM) [10], nearest shrunken centroids (NSC) [11], and the projective adaptive resonance theory (PART) filtering method [12].

In a previous study, we developed the PART filtering method by modifying PART [13, 14], and reported that PART exhibited a higher performance than conventional methods, such as S2N and NSC [12]. The combination method of PART and BFCS (PART-BFCS) was developed and applied to gene expression data, such as lymphoma [15] and esophageal cancer [16]. In the present study, we applied the various filtering methods to the gene expression profile data for the STS subtypes and constructed SVM models using the filtered genes. The results showed that the accuracy of the model based on the genes filtered by PART was the highest. In addition, various wrapper methods were applied to the genes that were filtered by the different filtering methods to extract essential genes for diagnosis. The models of the PART-BFCS method among various combinations of filtering and wrapper methods showed the highest accuracy, and 28 independent probes were extracted using this method. Seven genes among the 15 probes that were frequently selected by this method are known prognostic marker genes for other tumors. These genes are candidate marker genes for STS. Correlation analysis was performed for the 15 genes to extract the subtype-specific genes that were not selected by PART-BFCS. Sixteen genes among those extracted are also known prognostic marker genes for other tumors, and these could be candidate marker genes for STS.

Results and discussion

Clustering analysis for unfiltered data

Hierarchical clustering was applied to 35 patients and 12,241 unfiltered probes, as shown in Fig. 1. Figure 1 shows that patients were separated into three clusters – two MFH clusters and a single MFS cluster. However, there were seven MFS patients in the MFH clusters and three MFH patients in the MFS cluster that were misclassified by the clustering. On the basis of these results, various filtering and wrapper methods were performed for a more accurate separation of these patients.

Construction of SVM models by using filtered genes

To eliminate nonspecific genes for discriminating between MFH and MFS, various filtering methods, such as the U test, Sttest, Wttest, S2N, SAM, NSC, and PART were applied to the modeling data set comprising 26 patients and 12,241 probes; the performances were evaluated by using prediction accuracies for the blind data. The top 1,000 genes selected by each filtering method were used to construct the SVM models. The blind accuracies of models for each method are shown in Table 1. Table 1 shows that the accuracy of the SVM model using genes filtered by PART, which was 88.8%, was the best in this study. The accuracies of models using S2N or SAM (77.7%) were the second highest. On the other hand, the accuracy of the SVM model without filtering was 55.6%, which was the lowest. Average accuracy of the models with random selection was also 55.6%. These results suggest that when constructing diagnostic models, it is necessary to incorporate a filtering step; further, in this study, the PART filtering method was found to give the most accurate predictions.

Table 1 Blind accuracies for the SVM models using different filtering methods

Full size table

Application of various combinations of filtering and wrapper methods

To extract essential subtype-specific genes for differentiation between MFH and MFS, various wrapper methods such as kNN, MRA, WV, SVM, FNN-SWEEP, and BFCS were applied to the modeling data set comprising 26 patients and 1,000 probes filtered by each filtering method; the performances were evaluated by using the prediction accuracies of the blind data. The genes selected by each wrapper method were used in the models, and numbers of inputs were optimized by cross-validation of the modeling data set. The blind accuracies were calculated by using ten combination models that were constructed by PIM, as shown in Table 2. Table 2 shows that the average accuracy of PART-BFCS was 81.1%, which was the highest. There was a total of 80 probes in ten combinations of 8-input models. Some probes were selected several times. Among 80 probes, 28 were independent. The average accuracies of the SAM-kNN and PART-SVM methods, at 74.4% and 73.3%, were the second and third best, respectively. These results imply that the combination of PART and BFCS is the most accurate method for extraction of essential subtype-specific genes for STS.

Table 2 Blind accuracies for various combinations of filtering and modeling methods

Full size table

Clustering analysis using genes extracted by PART-BFCS

Hierarchical clustering was applied to 35 patients and 28 probes selected by PART-BFCS, as shown in Fig. 2. Figure 2 shows that patients were separated into two clusters – an MFH cluster and an MFS cluster. The results show that there was a single MFS patient in the MFH cluster and three MFH patients in the MFS cluster. These observations suggest that misclassification of samples was reduced using the genes that were extracted by the PART-BFCS method and that essential genes could be extracted for the diagnosis of STS subtypes.

Extraction of marker gene candidates by the correlation analysis

To extract the marker gene candidates unextracted by PART-BFCS, the correlation analysis was applied to STS data. Twenty-eight probes were extracted by PART-BFCS. Fifteen probes among 28 ones were selected two times or more. As shown in Table 3, a total of 150 probes, comprising the top 10 probes having high correlation with the 15 probes, were extracted as marker gene candidates. Some probes were selected several times. Thus, these probes comprised 145 independent probes, which correspond to 126 independent genes. The performance of the 145 probes was confirmed by hierarchical clustering, as shown in Fig. 3. Figure 3 shows that patients were separated into two clusters – an MFH cluster and an MFS cluster. The results show that there was two MFS patient in the MFH cluster and four MFH patients in the MFS cluster. This result was almost the same as Figure 2. This is, the genes that have high performances, were extracted by using correlation analysis.

Table 3 The genes selected by PART-BFCS and the genes having high correlation with them

Full size table

Characteristics of the genes selected for the classification models and the genes highly correlated with them

Significant differences were observed in the five-year survival rates between MFH and MFS. Thus, it was expected that prognostic marker genes would be extracted for the discrimination of MFH and MFS. We investigated the presence of previously reported prognostic marker genes among the 15 probes (genes) selected frequently by PART-BFCS among the 28 probes. Furthermore, 145 probes which correspond to 126 independent genes, were investigated.

With regard to the genes selected directly by PART-BFCS, seven genes among the 15 genes are reported to be prognostic markers for other tumors. MIF promotes tumor invasion and metastasis via the Rho dependent pathway, reported by Sun et al. [17]. CYFIP2 (PIR121) is one of the genes downregulated by p53, reported by Ceballos et al. [18]. p53 is a well-known type of tumor suppressor gene. HSPCB plays an important role in the assembly/disassembly of tubulin by inhibiting tubulin polymerization, reported by Man et al. [19]. Tubulin is a simple and useful predictive marker for the clinical response to chemotherapy in gastric cancer, reported by Urano et al. [20]. Reduced expression of TIMP3 in esophageal adenocarcinoma is associated with increased tumour invasiveness and reduced patient survival, reported by Darnton et al. [21]. LDHA is a hypoxia-inducible gene and is associated with considerably poorer overall survival, reported by Chi et al. [22]. ABR is a regulator of the Rho GTP-binding protein family, reported by Chuang et al. [23]. The Rho pathway is associated with tumor invasion and metastasis, reported by Sun et al. [17]. RGS3 is associated with tumor metastasis, reported by Tatenhorst et al. [24]. These findings suggest that the genes extracted by the PART-BFCS method are new marker genes for the STS subtypes.

With regard to the genes selected by correlation analysis, sixteen genes among 126 genes are reported to be prognostic markers for other tumors. The ADD3 protein (adducinγ) belongs to a family of ubiquitously expressed membrane-skeletal proteins that are localized at spectrin-actin junctions, reported by van den Boom et al. [25]. In renal carcinomas, changes in adducin expression, phosphorylation state, and localization were found to be associated with increased malignancy. In addition, the down-regulation of adducin-γ expression is correlated with increased migratory activity of human glioma cells in vitro. The expression of COL11A1 in colorectal tumors could be associated with the APC/β-catenin pathway in familial adenomatous polyposis (FAP) and sporadic colorectal cancer, reported by Fischer et al. [26]. Nuclear accumulation of the beta-catenin protein is associated with activation of the Wnt/Wg signaling pathway. Beta-catenin status predicts a favorable outcome in childhood medulloblastoma, reported by Ellison et al. [27]. SMAD3 is a component of the transforming growth factor-beta (TGFβ), which is a potent regulator of growth, apoptosis, and invasiveness of tumor cells, such as breast cancer cells, reported by Dubrovska et al. [28]. TGFβ1/SMAD3 suppresses BRCA1-dependent DNA repair in response to DNA damaging agents. GAS7, a growth arrest-specific gene, is the partner gene of MLL in treatment-related acute myeloid leukemia. MLL gene translocations can be present early during anticancer treatment at low cumulative doses of DNA topoisomerase II inhibitors, reported by Megonigal et al.[29]. CD130 (IL6ST) expression is associated with disease activity in multiple myeloma, reported by Barille et al. [30]. MMP1 expression is correlated significantly with the evolution of lymph node status and tumor-lymph node-metastasis (TNM) stage, reported by Gouyer et al. [31]. Expression of MMP9 and MMP13 is positively associated with poor tumor cell differentiation, vessel permeation, and lymph node metastasis, reported by Gu et al. [32]. MMP11 (ST3) is associated with lymph node involvement and tumor progression, reported by Soni et al. [33]. TSSC3 is one of the genes related to apoptosis, reported by Muller et al. [34]. HSPB2 (HSP27) is implicated in resistance to chemotherapy in breast cancer, and also predicts a poor response to chemotherapy in leukemia patients, reported by Ciocca and Calderwood [35]. HSP105B is an alternatively spliced form of HSP105A, reported by Yamagishi et al. [36]. HSP105A prevents stress-induced apoptosis in neuronal PC12 cells, and it is a novel anti-apoptotic neuroprotective factor in the mammalian brain. An anti-ICAM2 monoclonal antibody induces immune-mediated regressions of ICAM2-negative colon carcinomas, reported by Melero et al. [37]. HSPD1 is downregulated during early apoptosis of hepatoma cells, reported by Lee et al. [38]. WNT10B is a member of the WNT signaling molecules, which are potent targets for the diagnosis of cancer (susceptibility, metastasis, and prognosis) as well as for the prevention and treatment of cancer, reported by Kirikoshi and Katoh[39]. TEK is correlated with a higher risk of metastases in node-negative patients, reported by Dales et al. [40]. Thus, correlation analysis was performed to extract the subtype-specific genes that were not selected by PART-BFCS. These findings suggest that the genes having a high correlation with those extracted by the PART-BFCS method could also be new marker genes for the STS subtypes, and that this fact gives greater confidence in the accuracy of these potential maker genes selected directly by PART-BFCS.

Conclusion

In this study, we applied the PART filtering method to STS gene expression profiling data to construct subtype predictors for diagnosis. The results showed that the genes selected by PART exhibited higher prediction accuracy for STS than the other methods assessed. The genes selected by PART-BFCS such as MIF, CYFIP2, HSPCB, TIMP3, LDHA, ABR, and RGS3 can be used as targets for molecular diagnosis and treatment. In addition, the new candidate marker genes that were not extracted directly by PART-BFCS, could be extracted by correlation analysis. We believe that this procedure, the PART filtering method, should be considered as one of the candidate analytical procedures in various class prediction problems in clinical and basic oncology using transcriptome data.

Methods

Microarray analysis

The gene expression profile data were obtained from 35 surgical specimens of STS – 20 pleomorphic malignant fibrous histiocytomas (MFH) and 15 myxofibrosarcomas (MFS). For RNA extraction, trained pathologists carefully excised the tissue samples from the main tumor, leaving a margin clear from the surrounding non-tumorous tissue. Microscopically, the samples may still contain several non-tumor cells such as infiltrating lymphocytes, tissue macrophages, and vascular and lymphatic endothelial cells. However, unlike carcinomas, it is difficult to eliminate non-tumor stroma in case of soft tissue sarcomas; therefore, laser microdissection was not performed in this study. Total RNAs extracted from the bulk tissue samples were biotin-labeled and hybridized to high-density oligonucleotide microarrays (Affymetrix Human Genome U133A 2.0 Array) comprising 22,283 probe sets representing 18,400 transcripts, according to the manufacturer's instructions. The scanned array data were processed by Affymetrix Microarray Suite v.5.1, which scaled the average intensity of all the genes on each array to the target signal of 1,000.

Data processing

In this experiment, the data set was randomly partitioned into two groups – 26 samples (15 MFH and 11 MFS) as a modeling data set for constructing the subtype prediction model (predictor) and nine samples (5 MFH and 4 MFS) as a blind data set for evaluating the constructed predictor. Validations were performed by comparing the accuracies in the blind data set, instead of cross-validation accuracies, as reported by Bhasin and Raghava [41]. In the present study, cross-validation was used to optimize various parameters of the models for the modeling data.

In the 35 specimens, the probes that expressed at a signal intensity of less than 1,000 were excluded as a preprocess procedure prior to the application of various combinations of filtering and modeling methods. It is empirically difficult to reproduce the expression by RT-PCR for the genes which have signal intensity of less than 1,000, when their gene expression values were scaled to target signal of 1,000. Accordingly, 12,241 probes were selected. During the gene filtering step, 1,000 probes were selected using each filtering method. For each filtering method, SVM models were constructed to differentiate between MFH and MFS by using the filtered genes. In addition, various wrapper methods were used to extract essential genes for diagnosis; these are described in the following sections.

With regard to the wrapper methods, the parameter increasing method (PIM) [42] was used to select input combinations for model construction in the modeling methods. To validate the performance of the models, 10 independent combination models were constructed. The accuracy of the subtype prediction of the blind data was also calculated as the average of 10 combination predictors.

Model construction with parameter selection

The PIM was used to select input combinations for the construction of kNN, MRA, WV, SVM, and FNN-SWEEP models. This was conducted as follows:

Firstly, we predicted the subtype of each sample by using a prediction model with a single input. Prediction models for each probe were constructed in a series, and all the probes were ordered based on the accuracy of the constructed models. In the next step, the probe having the highest accuracy was used for constructing a combination model.

Secondly, we selected a partner probe for the probe selected in the first step in order to increase the prediction accuracy. To accomplish this, we constructed a 2-input model in which a ranked probe was designated as input 1, and input 2 (the partner probe) was selected to provide the highest training accuracy; doing so, we applied FNN-SWEEP (kNN, MRA, WV, SVM, or SVM) and PIM to the modeling data. By repeating this step, a combination of N_opt (optimized by leave-one-out cross-validation of the modeling data) candidate probes was identified for use as input probes in the model construction.

Finally, an N_opt input model was constructed. The probes with the 1st to the 10th highest accuracies were used as the first inputs for the construction of the 10 combination models by PIM. The performance of the prediction models was evaluated by applying them to the blind data set.

Fuzzy neural network (FNN) combined with the SWEEP operator method (FNN-SWEEP)

The FNN-SWEEP method was also applied for model construction. The FNN-SWEEP method was originally proposed by Noguchi et al. [43] and modified by Ando et al. [7] to manage microarray data. The FNN has three types of weight parameters (w_c, w_g, and w_f) [44]. For the FNN-SWEEP method, only parameter w_fwas optimized by the SWEEP operator method at the gene selection step. After the input combinations were determined, FNN models with selected input combinations were optimized using a backpropagation algorithm at the model construction step. For backpropagation, the number of epochs was set to 5,000 and the learning rate was set to 0.1; these values are the same as those reported by Ando et al. [7].

Support vector machine (SVM)

The SVM was originally proposed by Vapnik and Chervonenkis [45] and is used to prevent the "curse of dimensionality." The SVM is superior to many conventional methods and is frequently used in bioinformatics. In the present study, the SVM-LIGHT software package [46] was used. This software was modified, and the PIM function was added to select for a combination of inputs. The regulatory parameter c was the default value of SVM_LIGHT ((avg. (input vector)²)^-1). A linear kernel was used because a similar cross-validation accuracy of the model was obtained for the modeling data set using various kernels.

Boosted fuzzy classifier with SWEEP operator (BFCS)

BFCS is a type of advanced AdaBoost algorithm [47]. The BFCS algorithm has been described previously [8]. Briefly, multiple single-input predictors were first constructed by the FNN-SWEEP method. Then, BFCS was used to calculate adequate weights for the weak predictors, and the weighted weak predictors were assembled efficiently. As a result, the integrated predictor could correctly classify as many samples as possible by minimizing and smoothing out the probability of making an error in each individual sample.

k-nearest neighbor (kNN)

kNN methods are based on a distance function, such as the Euclidean distance, for pairs of tumor samples. The kNN proceeds as follows to classify blind data set observations on the basis of the modeling data set. For each patient in the blind data set, (a) it finds the k closest patients in the modeling data set and (b) it predicts the class by majority vote; that is, it chooses the class that is most common among those k neighbors. The number of neighbors k was chosen as three because a similar cross-validation accuracy of the model was obtained in the modeling data set for various values of k.

Multiple regression analysis (MRA)

MRA is a conventional method of statistical analysis. The MRA can be used to describe and evaluate the relationship between the subtypes of tumor and gene expression. MRA models were used to help us predict the subtypes of cancer by using gene expression data.

Weighted voting (WV)

The WV method was originally proposed by Golub et al. [5] to manage microarray data. The weights of each gene were calculated by the signal-to-noise ratio. The linear models of one gene were assembled with gene weight.

Hierarchical clustering analysis

Hierarchical clustering is widely used as one of the unsupervised learning methods. This clustering method was applied to the STS subtype analysis by using CLUSTER software [2] for the cases of the 12,241 unfiltered probes or the 28 probes selected by PART-BFCS. In this study, hierarchical clustering was performed by using centroid-linkage.

Correlation analysis

Correlation analysis was performed to extract the subtype-specific genes of the STS that were not selected by PART-BFCS. Correlation coefficients for the 15 genes that were selected two times or more by PART-BFCS were calculated by Pearson's correlation coefficient.

References

Kebriaei P, Anastasi J, Larson RA: Acute lymphoblastic leukaemia: diagnosis and classification. Best Pract Res Clin Haematol 2002, 15(4):597–621. 10.1053/beha.2002.0224
Article CAS PubMed Google Scholar
Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863
Article PubMed Central CAS PubMed Google Scholar
Tomida S, Hanai T, Honda H, Kobayashi T: Analysis of expression profile using fuzzy adaptive resonance theory. Bioinformatics 2002, 18(8):1073–1083. 10.1093/bioinformatics/18.8.1073
Article CAS PubMed Google Scholar
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares MJ, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000, 97(1):262–267. 10.1073/pnas.97.1.262
Article PubMed Central CAS PubMed Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531–537. 10.1126/science.286.5439.531
Article CAS PubMed Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V: Gene selection for cancer classification using support vector machines. Mach Learning 2002, 46: 389–422. 10.1023/A:1012487302797
Article Google Scholar
Ando T, Suguro M, Hanai T, Kobayashi T, Honda H, Seto M: Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res 2002, 93(11):1207–1212.
Article CAS PubMed Google Scholar
Takahashi H, Honda H: A new reliable cancer diagnosis method using boosted fuzzy classifier with a SWEEP operator method. J Chem Eng Jpn 2005, 38(9):763–773. 10.1252/jcej.38.763
Article CAS Google Scholar
Takahashi H, Honda H: Prediction of peptide binding to major histocompatibility complex class II molecules through use of boosted fuzzy classifier with SWEEP operator method. J Biosci Bioeng 2006, 101(2):137–141. 10.1263/jbb.101.137
Article CAS PubMed Google Scholar
Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98(9):5116–5121. 10.1073/pnas.091062498
Article PubMed Central CAS PubMed Google Scholar
Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 2002, 99(10):6567–6572. 10.1073/pnas.082099299
Article PubMed Central CAS PubMed Google Scholar
Takahashi H, Kobayashi T, Honda H: Construction of robust prognostic predictors by using projective adaptive resonance theory as a gene filtering method. Bioinformatics 2005, 21(2):179–186. 10.1093/bioinformatics/bth473
Article CAS PubMed Google Scholar
Cao Y, Wu J: Projective ART for clustering data sets in high dimensional spaces. Neural Netw 2002, 15(1):105–120. 10.1016/S0893-6080(01)00108-3
Article PubMed Google Scholar
Cao Y, Wu J: Dynamics of projective adaptive resonance theory model: the foundation of PART algorithm. IEEE Trans Neural Netw 2004, 15(2):245–260. 10.1109/TNN.2004.824261
Article PubMed Google Scholar
Takahashi H, Honda H: Lymphoma prognostication from expression profiling using a combination method of boosting and projective adaptive resonance theory. J Chem Eng Jpn 2006, 39(7):767–771. 10.1252/jcej.39.767
Article CAS Google Scholar
Takahashi H, Aoyagi K, Nakanishi Y, Sasaki H, Yoshida T, Honda H: Classification of intramural metastases and lymph node metastases of esophageal cancer from gene expression based on boosting and projective adaptive resonance theory. JBiosci Bioeng 2006, 102(1):46–52. 10.1263/jbb.102.46
Article CAS Google Scholar
Sun B, Nishihira J, Yoshiki T, Kondo M, Sato Y, Sasaki F, Todo S: Macrophage migration inhibitory factor promotes tumor invasion and metastasis via the Rho-dependent pathway. Clin Cancer Res 2005, 11(3):1050–1058.
CAS PubMed Google Scholar
Ceballos E, Munoz-Alonso MJ, Berwanger B, Acosta JC, Hernandez R, Krause M, Hartmann O, Eilers M, Leon J: Inhibitory effect of c-Myc on p53-induced apoptosis in leukemia cells. Microarray analysis reveals defective induction of p53 target genes and upregulation of chaperone genes. Oncogene 2005, 24(28):4559–4571. 10.1038/sj.onc.1208652
Article CAS PubMed Google Scholar
Man TK, Lu XY, Jaeweon K, Perlaky L, Harris CP, Shah S, Ladanyi M, Gorlick R, Lau CC, Rao PH: Genome-wide array comparative genomic hybridization analysis reveals distinct amplifications in osteosarcoma. BMC Cancer 2004, 4: 45. 10.1186/1471-2407-4-45
Article PubMed Central PubMed Google Scholar
Urano N, Fujiwara Y, Doki Y, Kim SJ, Miyoshi Y, Noguchi S, Miyata H, Takiguchi S, Yasuda T, Yano M, Monden M: Clinical significance of class III beta-tubulin expression and its predictive value for resistance to docetaxel-based chemotherapy in gastric cancer. Int J Oncol 2006, 28(2):375–381.
CAS PubMed Google Scholar
Darnton SJ, Hardie LJ, Muc RS, Wild CP, Casson AG: Tissue inhibitor of metalloproteinase-3 (TIMP-3) gene is methylated in the development of esophageal adenocarcinoma: loss of expression correlates with poor prognosis. Int J Cancer 2005, 115(3):351–358. 10.1002/ijc.20830
Article CAS PubMed Google Scholar
Chi JT, Wang Z, Nuyten DS, Rodriguez EH, Schaner ME, Salim A, Wang Y, Kristensen GB, Helland A, Borresen-Dale AL, Giaccia A, Longaker MT, Hastie T, Yang GP, Vijver MJ, Brown PO: Gene expression programs in response to hypoxia: cell type specificity and prognostic significance in human cancers. PLoS Med 2006, 3(3):e47. 10.1371/journal.pmed.0030047
Article PubMed Central PubMed Google Scholar
Chuang TH, Xu X, Kaartinen V, Heisterkamp N, Groffen J, Bokoch GM: Abr and Bcr are multifunctional regulators of the Rho GTP-binding protein family. Proc Natl Acad Sci USA 1995, 92(22):10282–10286. 10.1073/pnas.92.22.10282
Article PubMed Central CAS PubMed Google Scholar
Tatenhorst L, Senner V, Puttmann S, Paulus W: Regulators of G-protein signaling 3 and 4 (RGS3, RGS4) are associated with glioma cell motility. J Neuropathol Exp Neurol 2004, 63(3):210–222.
CAS PubMed Google Scholar
van den Boom J, Wolter M, Kuick R, Misek DE, Youkilis AS, Wechsler DS, Sommer C, Reifenberger G, Hanash SM: Characterization of gene expression profiles associated with glioma progression using oligonucleotide-based microarray analysis and real-time reverse transcription-polymerase chain reaction. Am J Pathol 2003, 163(3):1033–1043.
Article PubMed Central CAS PubMed Google Scholar
Fischer H, Salahshor S, Stenling R, Bjork J, Lindmark G, Iselius L, Rubio C, Lindblom A: COL11A1 in FAP polyps and in sporadic colorectal tumors. BMC Cancer 2001, 1: 17. 10.1186/1471-2407-1-17
Article PubMed Central CAS PubMed Google Scholar
Ellison DW, Onilude OE, Lindsey JC, Lusher ME, Weston CL, Taylor RE, Pearson AD, Clifford SC: beta-Catenin status predicts a favorable outcome in childhood medulloblastoma: the United Kingdom Children's Cancer Study Group Brain Tumour Committee. J Clin Oncol 2005, 23(31):7951–7957. 10.1200/JCO.2005.01.5479
Article CAS PubMed Google Scholar
Dubrovska A, Kanamoto T, Lomnytska M, Heldin CH, Volodko N, Souchelnytskyi S: TGFbeta1/Smad3 counteracts BRCA1-dependent repair of DNA damage. Oncogene 2005, 24(14):2289–2297. 10.1038/sj.onc.1208443
Article CAS PubMed Google Scholar
Megonigal MD, Cheung NK, Rappaport EF, Nowell PC, Wilson RB, Jones DH, Addya K, Leonard DG, Kushner BH, Williams TM, Lange BJ, Felix CA: Detection of leukemia-associated MLL-GAS7 translocation early during chemotherapy with DNA topoisomerase II inhibitors. Proc Natl Acad Sci USA 2000, 97(6):2814–2819. 10.1073/pnas.050397097
Article PubMed Central CAS PubMed Google Scholar
Barille S, Thabard W, Robillard N, Moreau P, Pineau D, Harousseau JL, Bataille R, Amiot M: CD130 rather than CD126 expression is associated with disease activity in multiple myeloma. Br J Haematol 1999, 106(2):532–535. 10.1046/j.1365-2141.1999.01563.x
Article CAS PubMed Google Scholar
Gouyer V, Conti M, Devos P, Zerimech F, Copin MC, Creme E, Wurtz A, Porte H, Huet G: Tissue inhibitor of metalloproteinase 1 is an independent predictor of prognosis in patients with nonsmall cell lung carcinoma who undergo resection with curative intent. Cancer 2005, 103(8):1676–1684. 10.1002/cncr.20965
Article CAS PubMed Google Scholar
Gu ZD, Li JY, Li M, Gu J, Shi XT, Ke Y, Chen KN: Matrix metalloproteinases expression correlates with survival in patients with esophageal squamous cell carcinoma. Am J Gastroenterol 2005, 100(8):1835–1843. 10.1111/j.1572-0241.2005.50018.x
Article CAS PubMed Google Scholar
Soni S, Mathur M, Shukla NK, Deo SV, Ralhan R: Stromelysin-3 expression is an early event in human oral tumorigenesis. Int J Cancer 2003, 107(2):309–316. 10.1002/ijc.11366
Article CAS PubMed Google Scholar
Muller S, van den Boom D, Zirkel D, Koster H, Berthold F, Schwab M, Westphal M, Zumkeller W: Retention of imprinting of the human apoptosis-related gene TSSC3 in human brain tumors. Hum Mol Genet 2000, 9(5):757–763. 10.1093/hmg/9.5.757
Article CAS PubMed Google Scholar
Ciocca DR, Calderwood SK: Heat shock proteins in cancer: diagnostic, prognostic, predictive, and treatment implications. Cell Stress Chaperones 2005, 10(2):86–103. 10.1379/CSC-99r.1
Article PubMed Central CAS PubMed Google Scholar
Yamagishi N, Saito Y, Ishihara K, Hatayama T: Enhancement of oxidative stress-induced apoptosis by Hsp105alpha in mouse embryonal F9 cells. Eur J Biochem 2002, 269(16):4143–4151. 10.1046/j.1432-1033.2002.03109.x
Article CAS PubMed Google Scholar
Melero I, Gabari I, Corbi AL, Relloso M, Mazzolini G, Schmitz V, Rodriguez-Calvillo M, Tirapu I, Camafeita E, Albar JP, Prieto J: An anti-ICAM-2 (CD102) monoclonal antibody induces immune-mediated regressions of transplanted ICAM-2-negative colon carcinomas. Cancer Res 2002, 62(11):3167–3174.
CAS PubMed Google Scholar
Lee SM, Li ML, Tse YC, Leung SC, Lee MM, Tsui SK, Fung KP, Lee CY, Waye MM: Paeoniae Radix, a Chinese herbal extract, inhibit hepatoma cells growth by inducing apoptosis in a p53 independent pathway. Life Sci 2002, 71(19):2267–2277. 10.1016/S0024-3205(02)01962-8
Article CAS PubMed Google Scholar
Kirikoshi H, Katoh M: Expression of WNT7A in human normal tissues and cancer, and regulation of WNT7A and WNT7B in human cancer. Int J Oncol 2002, 21(4):895–900.
CAS PubMed Google Scholar
Dales JP, Garcia S, Carpentier S, Andrac L, Ramuz O, Lavaut MN, Allasia C, Bonnier P, Charpin C: Long-term prognostic significance of neoangiogenesis in breast carcinomas: comparison of Tie-2/Tek, CD105, and CD31 immunocytochemical expression. Hum Pathol 2004, 35(2):176–183. 10.1016/j.humpath.2003.10.008
Article CAS PubMed Google Scholar
Bhasin M, Raghava GP: SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence. Bioinformatics 2004, 20(3):421–423. 10.1093/bioinformatics/btg424
Article CAS PubMed Google Scholar
Noguchi H, Hanai T, Takahashi W, Ichii T, Tanikawa M, Masuoka S, Honda H, Kobayashi T: Model construction for quality of beer and brewing process using FNN. (in Japanese). Kagaku Kogaku Ronbunshu 1999, 25: : 695–701.
Article Google Scholar
Noguchi H, Hanai T, Honda H, Harrison LC, Kobayashi T: Fuzzy neural network-based prediction of the motif for MHC class II binding peptides. J Biosci Bioeng 2001, 92(3):227–231. 10.1263/jbb.92.227
Article CAS PubMed Google Scholar
Horikawa S, Furuhashi T, Uchikawa Y: On fuzzy modeling using fuzzy neural networks with the back-propagation algorithm. IEEE T Neural Networ 1992, 3(5):801–806. 10.1109/72.159069
Article CAS Google Scholar
Vapnik VN, Chervonenkis A: A note on one class of perceptrons. Automat Rem Control 1964, 25: 821–837.
Google Scholar
Joachims T: Making large-scale SVM learning practical. In Advances in Kernel Methods - Support Vector Learning. Edited by: Scholkopf B, Burges C, Smola A. Cambridge, MIT Press; 1999.
Google Scholar
Freund Y, Schapire RE: A decision-theoretic generalization of online learning and an application to boosting. J Comput System Sci 1997, 55: 119–139. 10.1006/jcss.1997.1504
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Hori Information Science Promotion Foundation and the Ministry of Education, Science, Sports and Culture, Grant in aid for JSPS Fellows, 18 6550, 2006 and by the program for promotion of Fundamental Studies in Health Sciences of the National Institute of Biomedical Innovation (NiBio).

Author information

Authors and Affiliations

Department of Biotechnology, School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
Hiro Takahashi & Hiroyuki Honda
Research Fellow of the Japanese Society for the Promotion of Science (JSPS), Japan
Hiro Takahashi
Genetics Division, National Cancer Center Research Institute, 5-1-1 Tsukiji Chuo-ku, Tokyo, 104-0045, Japan
Hiro Takahashi & Teruhiko Yoshida
Pathology Division, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan
Takeshi Nemoto & Tadashi Hasegawa
Department of Surgical Pathology, Sapporo Medical University School of Medicine, South 1 West 16, Chuo-ku, Sapporo, 060-8543, Japan
Tadashi Hasegawa

Authors

Hiro Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Nemoto
View author publications
You can also search for this author in PubMed Google Scholar
Teruhiko Yoshida
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Honda
View author publications
You can also search for this author in PubMed Google Scholar
Tadashi Hasegawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroyuki Honda.

Additional information

Authors' contributions

HT developed the software, analyzed microarray data, and wrote the manuscript. NE carried out experiment of microarray. TY, HH, and TH conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.

Hiro Takahashi, Takeshi Nemoto contributed equally to this work.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Takahashi, H., Nemoto, T., Yoshida, T. et al. Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method. BMC Bioinformatics 7, 399 (2006). https://doi.org/10.1186/1471-2105-7-399

Download citation

Received: 13 April 2006
Accepted: 04 September 2006
Published: 04 September 2006
DOI: https://doi.org/10.1186/1471-2105-7-399

Cancer diagnosis marker extraction for soft tissue sarcomas based on gene expression profiling data by using projective adaptive resonance theory (PART) filtering method

Abstract

Background

Results

Conclusion

Background

Results and discussion

Clustering analysis for unfiltered data

Construction of SVM models by using filtered genes

Application of various combinations of filtering and wrapper methods

Clustering analysis using genes extracted by PART-BFCS

Extraction of marker gene candidates by the correlation analysis

Characteristics of the genes selected for the classification models and the genes highly correlated with them

Conclusion

Methods

Microarray analysis

Data processing

Model construction with parameter selection

Fuzzy neural network (FNN) combined with the SWEEP operator method (FNN-SWEEP)

Support vector machine (SVM)

Boosted fuzzy classifier with SWEEP operator (BFCS)

k-nearest neighbor (kNN)

Multiple regression analysis (MRA)

Weighted voting (WV)

Hierarchical clustering analysis

Correlation analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us