- Research article
- Open Access
Spectral embedding finds meaningful (relevant) structure in image and microarray data
- Brandon W Higgs†^{1}Email author,
- Jennifer Weller†^{1} and
- Jeffrey L Solka†^{1, 2}
https://doi.org/10.1186/1471-2105-7-74
© Higgs et al; licensee BioMed Central Ltd. 2006
Received: 07 September 2005
Accepted: 16 February 2006
Published: 16 February 2006
Abstract
Background
Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented.
Results
We present the results of applying the spectral method of Lafon, a nonlinear DR method based on the weighted graph Laplacian, that minimizes the requirements for such parameter optimization for two biological data types. We demonstrate that it is successful in determining implicit ordering of brain slice image data and in classifying separate species in microarray data, as compared to two conventional linear methods and three nonlinear methods (one of which is an alternative spectral method). This spectral implementation is shown to provide more meaningful information, by preserving important relationships, than the methods of DR presented for comparison.
Tuning parameter fitting is simple and is a general, rather than data type or experiment specific approach, for the two datasets analyzed here. Tuning parameter optimization is minimized in the DR step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons.
Conclusion
Results from the spectral method presented here exhibit the desirable properties of preserving meaningful nonlinear relationships in lower dimensional space and requiring minimal parameter fitting, providing a useful algorithm for purposes of visualization and classification across diverse datasets, a common challenge in systems biology.
Keywords
- Linear Discriminant Analysis
- Spectral Method
- Weighted Graph
- Laplacian Matrix
- Kernel Principal Component Analysis
Background
Dimensionality reduction (DR) is the process of rendering high dimensional complex data in a low dimensional space. Provided the process is calculated accurately, this low dimensional representation is preferred for use in inference and summarization for multiple reasons, among which are ease of visualization in a reduced variable space and clarity (i.e. ready interpretation) of clustering or classification. Other benefits include the insights into the data structure that can be obtained from the projected axes and the obvious denoising effect attained in some types of DR. Reduction strategies often rely on linear approaches defined by a method that represents x_{1}, ..., x_{ n }∈ ℝ^{ q } as ${\widehat{x}}_{1},\dots ,{\widehat{x}}_{n}$ in such a way that each ${\widehat{x}}_{i}$ is obtained by projecting x_{ i }into a common linear subspace of ℝ^{ q }. Commonly used methods on data types relevant to bioinformatics include principal components analysis (PCA) [1] and classical multidimensional scaling (MDS) [2], which calculate linear projections of the data; clearly such projections are unsuitable for nonlinear or curved surfaces.
These methods generally are based on minimization of a global cost function, wherein large distances can drive the embedding, minimizing the effect of local distance structures [3–5]. Where local data structures are not best summarized linearly (yet important to the interpretation of the experimental results), nonlinear methods that are kernel-based (e.g. kernel PCA) [6] and graph theoretic like spectral embedding [3–17] can be more appropriate. These methods attempt to model the underlying manifold by fitting a kernel parameter to optimize performance (e.g. as assessed by some performance accuracy metric) [6]. Unfortunately it is usually necessary to re-fit one or more tuning parameter(s) to each different data type or experiment set, making it difficult to propose a more generalized method across multiple data types. It is also difficult to avoid over-fitting the model to the data in this scenario. In addition, when attempting to determine class structure in the low dimensional space calculated from these nonlinear approaches, different classifiers may require separate spatial representations in order to appropriately partition the classes (e.g. quadratic discriminant analysis (QDA) compared to linear discriminant analysis (LDA)). Such parameter(s) modifications are optimized with a specific range of values that can be different for each classifier.
Two examples of high dimensional data types that fall into this nonlinear domain include DNA microarrays and image data. Microarrays contain the simultaneous measurement for thousands of mRNA transcripts [18–20], which can be viewed as n arrays with q dimensions (where n<<q). Many of the biological processes (feedback loops, oscillators, and repressilators) represented by measurements generated with microarrays are nonlinear, providing a great challenge in expressing associations between biological entities in a linear domain. Nilsson et al. demonstrated the importance of this concept in their comparison of MDS with a nonlinear algorithm, isometric feature mapping (ISOMAP) [21, 22]. ISOMAP uses nonlinear distances as estimated in the ambient space along with a linear MDS to a Euclidean target projection space [22]. This nonlinear method was shown to render more robust partitioning of disease class structure on the low dimensional manifold, when class membership predictions were evaluated against those obtained from linear projections from MDS.
Images are another data type that can be represented in q dimensions as well, where each image n is a vector. This data type can exhibit comparable complexities to the microarray example, particularly when imaging tissues and organs. The Euclidean distance between two similar images is seldom the optimal comparison criterion. Simple variations on the main image features, such as those caused by registration issues (rotation and shifting), can alter the pixel alignment, thereby modifying the definition of distance between the original image and the rotated one, and distorting the apparent relationship. The ideal method for DR should be capable of extracting meaningful patterns in multiple data types (such as these mentioned), should not be confined to a linear domain, and should exhibit tuning parameter-fitting independence to minimize parameter optimization specific to each example and classification method.
Given this goal, we examined the performance of a spectral method presented by Lafon [3, 4] and have shown that it is successful in extracting meaningful structure in these two disparate data types, both having high dimensionality paired with low replication, with a method for calculating the tuning parameter that does not have to be varied across classifiers to achieve correct results. Previous work by Lafon has demonstrated how ordered structure from both helix and trefoils curves in ℝ^{3} can be accurately preserved in the embedded space (ℝ^{2}) with a spectral method [3]. We extend this work to address biological examples of higher dimensionality, where accuracy in embedded results is evaluated using a known ordering and classification structure. In a more global sense, we demonstrate that the spectral method is able to preserve the implicit ordering within biological image data and can accurately classify different taxonomic species within microarray data. These results are compared to two linear approaches (PCA with either correlation or covariance distance metrics), one nonlinear counterpart to classical MDS (i.e. nonmetric MDS), and two similar nonlinear approaches (kernel PCA with a Gaussian radial basis function kernel; weighted graph Laplacian as presented by Ng et al. [11]), for the latter two of which numerous variations are often promoted in the mathematical/statistical literature for their successful application to a number of nonlinear data types [11–13, 15–17]. We demonstrate that for our datasets, the spectral approach presented here is not dependent upon tuning parameter(s) optimization to allow success across any of three separate classifiers chosen. This is a considerable advantage to an investigator who needs to make cross-experiment or multi-data type comparisons that benefit from a tuning parameter-independent nonlinear DR approach.
Results and discussion
Image dataset
The image dataset was used to test the ability of each of the projection methods to predict the correct image ordering, based on the size increment of the brain. Since the largest source of variability separating each image in the series is the increase in feature surface area, as a result of the head size, only the first eigenfunction for each method was used in the comparison. This calculation reduces the dimensionality from ℝ^{16,384} to ℝ^{1}. To assess the accuracy of each method, a non-parametric measure of association (Spearman's rho coefficient) was used, by which the scores from the primary eigenfunction were ranked and correlated against the correct ordering. A straight-line fit is indicative of perfect image ordering.
The remaining methods: PCA-correlation, PCA-covariance, nonmetric MDS, and the spectral method from Lafon [3, 4] do not require parameter fitting that is dependent upon performance (as is necessary for the previous two nonlinear methods), so the images can be directly rendered into a low dimensional representation.
Spearman rho values used for evaluation of CATSCAN image ordering
Method | rho |
---|---|
PCA (cor) | 0.902 |
PCA (cov) | 0.966 |
Nonmetric MDS | 0.966 |
Kernel PCA (ε = 1,040) | 0.989 |
Spectral-Ng et al. (ε = 70) | 0.980 |
Spectral-Lafon | 1.000 |
Microarray dataset
The microarray dataset was used to evaluate the ability of the projection methods to accurately classify samples derived from three taxonomically separate species into their respective groups, without conducting any preliminary feature selection (a procedure usually conducted in order to better partition the groups). In each case, the results of a method were calculated such that dimensionality was reduced from ℝ^{12,625} to ℝ^{2}. Thereafter three classifiers using leave-one-out cross-validation (LOO-CV) were calculated on this projected space, both to assess the classification accuracy for each method and to compare the differences in value of the optimal parameters for kernel PCA and the spectral method from Ng et al. A nonlinear classifier, k-nearest neighbors (KNN), was calculated, setting k = 2 and k = 3; two settings of k were used since each DR method renders the groupings differently, thereby favouring two nearest neighbors for some methods and three nearest neighbors for others. The average error rates were computed across 1,000 trials, to account for the variation arising from ties broken at random in the assignment of the nearest winning class which results in slightly different classification results for subsequent trials. In the event that the mean error rate is calculated to be greater than zero, even when one or more of the trials provide a smaller error rate, a range is denoted in the results (e.g. 0%–4.66%) to indicate that the occurrence of a lower error rate is possible. Another commonly utilized classifier, QDA, was calculated, as well as the linear classifier counterpart, LDA, to allow comparison of the dependence of the results of different classification methods on the tuning parameter fitting in the first DR step.
The parameter optimization for kernel PCA shows similar trends to the optimal values from the spectral method from Ng et al. However, the differences in optimal parameter solutions between the KNN classifier and both QDA and LDA is much more pronounced with this DR method. For the QDA classifier, the epsilon value is optimized to provide a minimal error rate (0%) in the window of ε = 67,000 – 68,000, while for the LDA classifier, the epsilon value is optimized to provide a minimal error rate (2.22%) in the window of ε = 73,000 – 92,000. In addition, the epsilon values that provide the minimal KNN error rates for k = 2 (2.22%–6.79%) and k = 3 (8.56%) are at ε = 174,000 (indicated by a dashed vertical line in Figure 4) and ε = 57,000 (indicated by a solid vertical line in Figure 4), respectively. This result from kernel PCA is consistent with those obtained from the spectral methods of Ng et al., and demonstrates that each of these two nonlinear approaches have a dependence between the outcome of the classification algorithm and an appropriately optimized parameter. In addition, when comparing training set classification results to LOO-CV, the optimal parameter values are drastically different for both kernel PCA and the spectral method of Ng et al. More importantly, the parameter selection is completely dataset-dependent. Note that here the scale is vastly different from that seen in the image example, thus the optimal parameter is as well. For these examples we examined two very disparate data types, but the same conclusion of dataset-dependence would almost certainly occur if two microarray datasets were compared, since there would still be differences in distance distributions (the possibility of two microarray datasets having identical distributions of Euclidean distances is highly unlikely). In other work we have examined additional microarray datasets and examined functional sub categories as well as disease state, cited here for those who are interested [5].
LOO-CV classification error rates for six DR methods and three classifiers (fibroblast data)
PCA (cor) | PCA (cov) | Nonmetric MDS | kernel PCA^{†} | Spectral Ng et al.^{‡} | Spectral Lafon | ||
---|---|---|---|---|---|---|---|
*KNN ( k = 2) | Pan paniscus | 27.27%–41.02% | 0%–13.71% | 0%–13.69% | 0%–9.13% | 0%–9.02% | 0% |
Gorilla gorilla | 8.33%–29.39% | 8.33%–12.61% | 8.33%–12.42% | 8.33%–12.49% | 8.33%–29.33% | 0% | |
Homo sapiens | 4.55% | 0%–2.28% | 0%–2.22% | 0%–2.26% | 0%–4.66% | 0% | |
Total | 20.09% | 2.22%–7.83% | 2.22%–7.74% | 2.22%–6.79% | 2.22%–13.93% | 0% | |
*KNN ( k = 3) | Pan paniscus | 27.27% | 18.18% | 18.18% | 9.09% | 9.09% | 0% |
Gorilla gorilla | 8.33% | 16.67% | 16.67% | 16.67% | 25% | 0% | |
Homo sapiens | 4.55% | 4.55% | 4.55% | 0% | 0%–3.04% | 0% | |
Total | 11.11% | 11.11% | 11.11% | 8.56% | 8.88%–13.79% | 0% | |
QDA | Pan paniscus | 72.73% | 9.09% | 9.09% | 0% | 0% | 0% |
Gorilla gorilla | 16.67% | 8.33% | 8.33% | 0% | 8.33% | 0% | |
Homo sapiens | 4.55% | 0% | 0% | 0% | 0% | 0% | |
Total | 24.44% | 4.44% | 4.44% | 0% | 2.22% | 0% | |
LDA | Pan paniscus | 18.18% | 9.09% | 9.09% | 0% | 0% | **9.09% |
Gorilla gorilla | 25% | 8.33% | 8.33% | 8.33% | 41.67% | 0% | |
Homo sapiens | 0% | 0% | 0% | 0% | 9.09% | 0% | |
Total | 11.11% | 4.44% | 4.44% | 2.22% | 15.56% | **2.22% |
Conclusion
Within these examples, the spectral method from Lafon is demonstrated to extract more meaningful structure, relative to two linear and three nonlinear methods, for calculating low dimensional representations of high dimensional data types, such as image and microarray data, for purposes of determining ordered patterns or classification. As a nonlinear method it is shown to be a reasonable choice for biological and image data types, where it is important to preserve nonlinear relationships and local geometries in a low dimensional embedding. Though the nonlinear methods of kernel PCA and Ng's spectral method also may be well suited for these data types, they suffer the primary drawback of requiring dataset- and classifier-specific tuning parameter optimization, making the validity of cross-experiment comparisons problematic. Other nonlinear manifold methods, such as ISOMAP [22] and Local Linear Embedding (LLE) [23], have similar optimization requirements as drawbacks, although tuning parameter optimization and classification accuracy for these two methods was not assessed here. This data fitting step can be not only time consuming, but also, as we have shown, varies according to which classification algorithm is used as well as which dataset is examined. In this work, the spectral method from Lafon is shown to outperform competing methods and exhibit independence to tuning parameter fitting across three separate classifiers and two unrelated high dimensional data types. Much like any method of DR, this method is not proposed to always elucidate the most meaningful structure across all high dimensional data types. Methods such as boosting and bagging [24] and the relative distance plane (RDP) [25] may be better suited for certain high dimensional datasets. Rather, the results presented here demonstrate success in two disparate datasets of high dimensionality and the authors' hope is that this presentation will encourage others to extend applications of this method in research within the computational biology community.
Methods
Data types
The image data was obtained from the Computer Vision Laboratory at the University of Massachusetts at Amherst [26]. A total of 54 slice-by-slice CATSCAN images were obtained for the human head, where each image has dimensions of 128 × 128 pixels. The average for each image was calculated and the median of these averages was determined to be 11.82. Each image was then scaled to a target mean of this value.
The microarray data are from genomic primary fibroblast cell lines [27, 28] and were generated with Affymetrix oligonucleotide hgu95v2 arrays for 46 samples: 23 human (Homo sapien), 11 bonobo (Pan paniscus), and 12 gorilla (Gorilla gorilla) donors. This is a publicly available dataset within the 'fibroEset' package in R [29]. It should be noted that two identical human donor arrays are in this dataset, so one was removed, reducing the dataset to 45 total samples.
The data was provided in R already normalized by the Affymetrix GeneChip MAS 5.0 algorithm. "Normalization was done by calculating multiplicative scaling factors on the basis of the median intensity of the 60th to 95th percentile of gene-expression scores" and intensities were floored to 100 fluorescent units [29]. No further filtering or scaling was conducted on this dataset, which consists of 12,625 expression points for each of 45 arrays.
Spectral methods
The spectral implementation of the weighted graph Laplacian from Lafon [3, 4] is calculated as follows: Given a set of points X = {x_{1}, x_{2}, ..., x_{ n }} ∈ ℝ^{ q }, let G = (E, V) be a graph with edge weights or lines E, between pairs of vertices V. Consistent with standard terminology from graph theory, we can construct a graph, where each pair of vertices (x_{ i }, x_{ j }∈ V(G)) is assigned a weight specific to the distance between them [30]. The matrix, K_{1} is calculated from these edge weights by a Gaussian kernel estimate
The epsilon term is chosen at the minimum squared Euclidean distance, as opposed to the average minimum distance (specified by Lafon) to induce maximum convergence of distance (see section on Comparison of transformed distances for an example). The vector, v is calculated from the square root of the product between matrix components from K_{1} and vector components from e. The matrix P is then formed by the product of v and v_{ T }. Then, the weighted graph Laplacian matrix, K is calculated by component division of the matrix K_{1} elements
(K_{1ij}where i = 1, ..., q and j = 1, ..., n) by the matrix P elements
(P_{ ij }where i = 1, ..., q and j = 1, ..., n). The calculations are given as the following:
P = vv^{ T } and
K = K_{1ij}/P_{ ij }where i = 1, ..., q and j = 1, ..., n.
The K matrix is decomposed by singular value decomposition (svd)
svd(K) = XHV^{ T } (Note that for this symmetric positive semidefinite matrix K, the svd is the spectral decomposition, however, to remain consistent with the nomenclature specified by Lafon, the calculations with svd are used.)
and the n columns of the X matrix which define the left singular vectors of K are scaled by the first column of X, given by
This provides n - 1 characteristic roots of the matrix K given by Φ_{ j }. It should be noted that the first column of X is scaled by itself, creating a vector of values equal to 1. By convention, this vector is designated by Φ_{0}. As such, the second column in the matrix X will be considered the primary Φ vector, and designated by Φ_{1}. Utilizing these primary Φ vectors, the data can be embedded as points in ℝ^{Φ}.
The spectral implementation of the weighted graph Laplacian from Ng et al. is similar up to the calculation of the K_{1} matrix, however, the kernel is defined by a denominator term of 2ε^{2} as opposed to simply ε in Lafon's method. In addition, the epsilon smoothing term (ε) (as will be demonstrated in the Results section) is not optimized at ε = min _{i≠j}||x_{ i }- x_{ j }||^{2} > 0, as it is in our modification to Lafon's method, and thus requires fitting for each example analyzed and classifier utilized. Following the calculation of the K_{1} matrix, the matrix D is a diagonal matrix calculated from the row sums of K_{1},
L = D^{-1/2}K_{1}D^{-1/2}
The Laplacian matrix, L is decomposed by svd, and the n columns of the X matrix which define the left singular vectors of L have rows scaled to unit length into the matrix Y
svd(L) = XHV^{ T } (Note the point specified in the decomposition of the weighted graph Laplacian as provided by Lafon.)
Utilizing the primary Φ columns of this Y matrix, the data can be embedded as points in ℝ^{Φ}.
Additional DR (projection) methods
Two implementations of PCA were calculated for comparison: correlation and covariance. These calculations were conducted with the prcomp function in the 'stats' package of R [29]. Since classical MDS is synonymous to PCA calculated on a Euclidean distance matrix, nonmetric MDS was performed instead (based on ranking of dissimilarities), to avoid redundant information and for an additional comparison with another nonlinear approach, using the isoMDS [31] function in the 'MASS' package of R [29]. Kernel PCA was calculated with a Gaussian radial basis function kernel using the k.pca function in the 'kmethods' package of R [29]. This kernel function was chosen to maintain consistency with the kernel used in both spectral methods. The weighted graphs Laplacian for the two spectral methods are anchored on some form of a Gaussian kernel (see Spectral methods section for the difference between the kernels).
Comparison of transformed distances
Euclidean distance matrix for simulated example
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
1 | 0.0 | 5.4 | 26.8 | 31.4 | 56.4 |
2 | 5.4 | 0.0 | 30.2 | 35.6 | 59.5 |
3 | 26.8 | 30.2 | 0.0 | 12.2 | 39.1 |
4 | 31.4 | 35.6 | 12.2 | 0.0 | 40.0 |
5 | 56.4 | 59.5 | 39.1 | 40.0 | 0.0 |
The Euclidean matrix in Table 3 contains 10 unique distance values. Two values are less than the 25% of the distribution, three values are greater than the 75% of the distribution, and the remaining values are within the interquartile range. This range of distances is utilized to convey both the subtle and apparent differences within the transformed space between methods.
From the kernel PCA plot in Figure 6a, a Gaussian radial basis function demonstrates convergence of local Euclidean distances in the transformation to weight values, while remote distances exhibit more linearity with these weight values. In addition, as the epsilon smoothing term is increased at very small values, local distances converge to similar weight values much faster than remote distances. This example illustrates how local distance structure is better preserved at levels specific to the epsilon smoothing term utilized, than remote distances. Small distances are collapsed to a similar weighted value, while large distances maintain relatively stronger linearity with Euclidean distances, though this is reduced as the epsilon smoothing term is increased to values greater than the 10% of the distribution of squared Euclidean distances (denoted by the change in line shading from red to orange). The two smallest Euclidean distances converge to similar values rather quickly as the epsilon smoothing term is increased, while the points in the distance distribution converge more slowly. At epsilon values greater than the 20% (line shading of yellow, green, and blue) of the distribution of squared Euclidean distances, the weight values all converge to the same value of 1.
The Laplacian matrix, L from Ng et al., in Figure 6b shows that at minimal values of the epsilon smoothing term, there is near linearity between Euclidean distances and values in the matrix L. However, as the smoothing epsilon term is increased, the convergence occurs with both local and remote distances, while the points in the middle of this distribution maintain similar values (represented by the approximate slope of 0 from the values within the interquartile range of the plot). Both tails of the trend line quickly approach a weight value of ~0.25 with epsilon values at less than 20% of the distribution of squared Euclidean distances (denoted by line shading of only red and orange). This example illustrates that values within the middle region of the distribution (as compared the right and left tails of the distribution) are transformed to similar weight values in matrix L, with rather small values of the smoothing epsilon term, and then as this epsilon value is increased to only slightly larger values, all points in the function converge to a similar weight value. In addition, the Laplacian matrix, L does not preserve the same ordering of Euclidean distances.
The Laplacian matrix, K from Lafon (Figure 6c), in contrast to the matrix L, demonstrates convergence of weight values on both extremes of the Euclidean distance vector at very small values of the epsilon smoothing term, as illustrated by the small variance in red lines (i.e. epsilon vales at less than the 10% of the distribution of squared Euclidean distance). Additionally, opposite to the matrix L, these weight values become more linear (less converged) on both right and left tails of the distribution as the epsilon smoothing term is increased to values greater than the 20% of the distribution of squared Euclidean distances (line shading corresponding to colors of yellow, green, and blue). It is interesting to note that at maximal epsilon values (indicated by yellow, green, and blue lines in Figure 6c), the function generated by the weight values in the K matrix resemble the function generated by the weight values in the L matrix at the minimum epsilon value (indicated by the black line in Figure 6b). However, similar to the L matrix values, the ordering of Euclidean distances is not preserved. This example illustrates how the Lafon method differs from the other two, in that maximal convergence of both local and remote distances is optimized at minimal epsilon values (i.e. values at less than 10% of the distribution of squared Euclidean distance for this example). Based on the partitioned structure in the data, this reduces the dynamic range of Euclidean distances, particularly along the middle of the distribution, and transforms the distance structure to weight values (in matrix K) at both extremes of the distribution. Such a transformation, that acts to create a large gap between local and remote distances (at a partitioning threshold driven by the data) is shown to provide an optimal distance transformation for subsequent spectral decomposition for purposes of elucidating meaningful structure in image seriation and microarray species classification examples presented. Since the convergence of local and remote distances in the K matrix is highly dependent upon the primary partitioning point (for this example, between the Euclidean distances of 12.2 and 26.8), it is assumed that the most meaningful structures are defined by the difference between the smallest two Euclidean distances and the remaining eight Euclidean distances.
Classifiers
The k-nearest neighbor (KNN) algorithm (for k = 2 and 3) was calculated for 1,000 independent trials for each method (to address the problem of random assignment when ties occur in nearest neighbor voting) and the mean error was computed using the knn function in the 'class' package of R [29]. The discriminant analysis classifiers both for separate variances (QDA) and pooled variances (LDA) were computed using the qda and lda functions, respectively, in the 'MASS' package of R [29]. All classification models were trained and tested with leave-one-out cross-validation (LOO-CV).
Notes
Declarations
Acknowledgements
We are thankful to Alan Berger, David Johannsen, David Marchette, the reviewers of this manuscript, and the BMC Editorial Staff for their useful criticism.
Authors’ Affiliations
References
- Kittler JV, Young PC: A new approach to feature selection based on the Karhunen-Loeve expansion. Pattern Recognition 1973, 5: 335–352. 10.1016/0031-3203(73)90025-3View ArticleGoogle Scholar
- Cox TF, Cox MAA: Multidimensional Scaling. Second edition. London: Chapman and Hall; 1994.Google Scholar
- Lafon S: Diffusion Maps and Geometric Harmonics. PhD thesis. Yale University, Mathematics Department; 2004.Google Scholar
- Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion Maps. PNAS 2005, 102(21):7426–7431. 10.1073/pnas.0500334102PubMed CentralView ArticlePubMedGoogle Scholar
- Higgs B: Deriving Meaningful Structure from Spectral Embedding. PhD thesis. George Mason University, School of Computational Sciences; 2005.Google Scholar
- Burges CJC: Geometric Method for Feature Extraction and Dimensional Reduction: A Guided Tour. Microsoft Research Technical Report No. MSR-TR-2004–55 2004.Google Scholar
- Chung FRK: Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92). Providence: American Mathematical Society; 1997.Google Scholar
- Belkin M, Niyogi P: Semi-Supervised Learning on Riemannian Manifolds. Machine Learning 2004, 56: 209–239. 10.1023/B:MACH.0000033120.25363.1eView ArticleGoogle Scholar
- Belkin M, Niyogi P: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Computation 2003, 15(6):1373–1396. 10.1162/089976603321780317View ArticleGoogle Scholar
- Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale Methods. PNAS 2005, 102(21):7432–7437. 10.1073/pnas.0500896102PubMed CentralView ArticlePubMedGoogle Scholar
- Ng AY, Jordan MI, Weiss Y: On Spectral Clustering: Analysis and an Algorithm. NIPS 2001, 14.Google Scholar
- Ham J, Lee DD, Mika M, Scholkopf B: A kernel view of the dimensionality reduction of manifolds. Max Planck Technical Report No. TR-110 2003.Google Scholar
- Weiss Y: Segmentation using eigenvectors: A unifying view. IEEE International Conference on Computer Vision 1999, 975–982.Google Scholar
- Cristianini N, Shawe-Taylor J, Kandola J: Spectral Kernel Methods for Clustering. NIPS 2002, 14.Google Scholar
- Mika S, Scholkopf B, Smola AJ, Muller KR, Scholz M, Ratsch G: Kernel PCA and de-noising in feature spaces. NIPS 1999, 11.Google Scholar
- Shi J, Malik J: Normalized cuts and image segmentation. Proc IEEE Transactions on Pattern Analysis and Machine Intelligence 2000, 22(8):731–737.Google Scholar
- Perona P, Freeman WT: A factorization approach to grouping. In Proc 5th ECCV Edited by: Burkardt H, Neumann B. 1998, 655–670.Google Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. PNAS 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998, 2: 65–73. 10.1016/S1097-2765(00)80114-8View ArticlePubMedGoogle Scholar
- Spellman P, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Nilsson J, Fioretos T, Hoglund M, Fontes M: Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics 2004, 20: 874–880. 10.1093/bioinformatics/btg496View ArticlePubMedGoogle Scholar
- Tenenbaum JB, Silva V, Langford JC: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290: 2319–2322. 10.1126/science.290.5500.2319View ArticlePubMedGoogle Scholar
- Roweis ST, Saul LK: Nonlinear Dimensionality Reduction by Local Linear Embedding. Science 2000, 290: 2323–2326. 10.1126/science.290.5500.2323View ArticlePubMedGoogle Scholar
- Bauer E, Kohavi R: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 1999, 36(1–2):105–139. 10.1023/A:1007515423169View ArticleGoogle Scholar
- Somorjai RL, Dolenko B, Demko A, Mandelzweig M, Nikulin AE, Baumgartner R, Pizzi NJ: Mapping high dimensional data onto a relative distance plane-an exact method for visualizing and characterizing high-dimensional patterns. Journal of Biomedical Informatics 2004, 37: 366–379. 10.1016/j.jbi.2004.07.005View ArticlePubMedGoogle Scholar
- University of Massachusetts at Amherst CATSCAN images; [http://vis-www.cs.umass.edu/files.html]
- Karaman MW, Houck ML, Chemnick LG, Nagpal S, Chawannakul D, Sudano D, Pike BL, Ho VV, Ryder OA, Hacia JG: Comparative Analysis of Gene-Expression Patterns in Human and African Great Ape Cultured Fibroblasts. Genome Research 2003, 13: 1619–1630. 10.1101/gr.1289803PubMed CentralView ArticlePubMedGoogle Scholar
- Hacia JG, Makalowski W, Edgemon K, Erdos MR, Robbins CM, Fodor SP, Brody LC, Collins FS: Evolutionary sequence comparisons using high-density oligonucleotide arrays. Nature Genetics 1998, 18: 155–158. 10.1038/ng0298-155View ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2004.Google Scholar
- Mohar B: The Laplacian spectrum of graphs. In Graph Theory, Combinatorics, and Applications. Volume 2. Edited by: Alavi Y, Schwenk A. Wiley; 1991:871–898.Google Scholar
- Kruskal JB: Nonmetric multidimensional scaling: a numerical method. Psychometrika 1964, 29: 115–129. 10.1007/BF02289694View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.