Expression profiling of kidney tumors using the Affymetrix GeneChip distinctly separated four different tumors from each other, as well as from normal kidney cortex. This finding is consistent with the morphologic, karyotypic and clinical outcome differences between these tumor types [6, 7]. There are many sample-clustering methods that may be applied to expression microarray data, none of which can be conclusively called "correct", since each algorithm makes different assumptions regarding the nature of the data. We used ten clustering methods combined with four ways of pre-processing the data sets to eliminate, or at least reduce bias in a pilot data set. The smaller pilot data set was used to simplify the interpretation of the results. A common cluster dendrogram was produced by 18 of 40 methods; 16 of these were from the 20 that employed logarithmic transformation of the data sets. The pattern was consistent with the biology of the sample with normal kidney, CC and Chr samples each grouping together (Figure 2A). That a logarithmic transformation gave the most meaningful cluster dendrograms is consistent with the distribution of the untransformed expression data being skewed to the left because the majority of genes have low expression levels. Standardization of the data assigns equal weight to each gene and, hence, increases the contribution of unreliable low expression genes. The use of logarithmic transformation, on the other hand, improves the spread of the data so the distribution is close to normal. It also re-adjusts the weight for each gene. For example, genes with high expression levels, which might be unreliable or biased due to saturation, will have lower weights in distance calculation. Therefore, the logarithmic transformation improves the calculation of distance for the subsequential clustering algorithms and leads to uncovering the biological meaningful pattern within the data.
A comparison of the dendrograms from the pilot data set and complete data set reveals some surprising changes. In general the major structure of the dendrogram remained the same, CC, Chr and normal kidney all grouped separately. However, in the pilot data set the CC were more similar to normal kidney than Chr, while in the complete data set Chr were more similar to normal kidney. It is unclear why this larger data set changed the dendrogram and suggests that the subtle structure in the dendrogram was not as robust as it appeared. With fewer Chr compared to CC it is not possible to draw any strong conclusions about relatedness of the Chr samples.
In order to visualize the functional patterns associated with a particular set of selected genes we used a simple, semi-hierarchical system to categorize genes according to the function of the proteins they encode, that we call Functional Taxonomy. There are challenges associated with the partly subjective nature of categorization of gene function, such as where to place a single gene product that is involved in several cellular tasks. Ideally, the categorization should consider multiple attributes of a protein. To this end, we propose three complementary classification schemes: (1) biochemical function, which categorizes according to molecular activity; (2) cellular function, which categorizes according to biological role at a cellular level; (3) tissue function, which categorizes according to anatomic or organ system location. In this paper we have visualized profiling results using the second of these schemes (cellular function) at three levels: primary categories, secondary categories, and individual genes (see Figure 3 and 4, Table 4 and 5). We have found Functional Taxonomy to be a useful visualization tool for understanding the differences in gene expression patterns between CC and Chr tumors. This system is similar in concept to what is currently being developed by the Gene Ontology Consortium .
The cellular function signatures of nine CC and two Chr revealed that the greatest number of gene expression changes for both tumor types occurred in the categories of Signal Transduction, Cellular and Matrix Organization and Adhesion, and Metabolism. This is consistent with current theories of neoplasia, which hypothesize that tumor cells modify their signaling pathways, establish new contacts with an altered extracellular matrix, and refashion their metabolic machinery.
There exists considerable literature on the expressed genes and gene products associated with RCCa. Using the selected gene sets from Table 3 and the p-values and fold-change values calculated from the eight normal kidneys and nine CC, we looked for concordance between our results and published reports. The genes CA 9 (carbonic anhydrase IX), CCND1 (cyclin D1), CDH2 (N-Cadherin), EGFR (epidermal growth factor receptor) and TGFA (tranforming growth factor alpha) all showed increases in CC expression that matched the literature and had p-values ≤ 0.0061 [11–15]. The observed decrease in CDH1 (E-cadherin) in CC (p-value = 0.0045) also matched previously published reports, as did the decrease in VIM (vimentin) expression in Chr RCCa [13, 16]. VIM was also found to be increased in CC with a p-value = 0.0045, which was consistent with the literature. We detected a small increase in expression of ICAM1 (intercellular adhesion molecule 1) in CC (Fold-change = 1.9, p-value = 0.0081), which was also consistent with the literature .
The expression results for the genes JUN (c-jun) and VHL (von Hippel-Lindau) did not match the literature [18, 19]. Nor did the result for KRT7 (cytokeratin 7), which has been shown to be overexpressed in Chr . Instead we found KRT7 to be strongly repressed in CC (fold-change = -5.1, p-value = 0.0009). Yet, expression profiling using nucleic acid microarrays does not necessarily correlate with other forms of analysis for all genes [21, 22]. This may be especially true when altered expression of a gene is reported to be present in a subset of a population of tumors, since a small sample number (as are the Chr samples in this study) may not include the alteration.
In the case of CD 31 and the T cell receptor beta chain, expression profiling results were concordant with immunohistochemical analysis of the tumors. The prevalence of the scattered T cells within the CC tumors was somewhat surprising, but entirely consistent with the biology of response to a treatment for RCCa, interleukin 2 (IL-2), since IL-2 activates lymphocytes against the tumor .
During preparation of this manuscript, an expression profiling study of seven renal neoplasms (four CC, 2 oncocytomas, and one Chr) was reported . This study employed a different platform (Incyte glass slide cDNA microarray) and hybridization method (competitive tumor/normal binding), and used related but not identical gene selection criteria (two fold-change in expression versus normal kidney in at least two of the seven tumors). The study identified 189 genes that were differentially expressed in at least two tumors, and this gene set was also able to distinguish between CC and Chr tumor types. We suspect that a greater number of Chr-associated genes would have been selected in their study had there been at least two Chr samples, since a gene altered in expression only in the single Chr, but not in any of the oncocytomas or the CC, would not have been identified by the selection criteria.