CFMDS: CUDA-based fast multidimensional scaling for genome-scale data
© Park et al.; licensee BioMed Central Ltd. 2012
Published: 13 December 2012
Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as much as possible. However, it is not trivial to apply it to genome-scale data (e.g., microarray gene expression profiles) on regular desktop computers, because of its high computational complexity.
We implemented a highly-efficient software application, called CFMDS (CUDA-based Fast MultiDimensional Scaling), which produces an approximate solution of the classical MDS based on CUDA (compute unified device architecture) and the divide-and-conquer principle. CUDA is a parallel computing architecture exploiting the power of the GPU (graphics processing unit). The principle of divide-and-conquer was adopted for circumventing the small memory problem of usual graphics cards. Our application software has been tested on various benchmark datasets including microarrays and compared with the classical MDS algorithms implemented using C# and MATLAB. In our experiments, CFMDS was more than a hundred times faster for large data than such general solutions. Regarding the quality of dimensionality reduction, our approximate solutions were as good as those from the general solutions, as the Pearson's correlation coefficients between them were larger than 0.9.
CFMDS is an expeditious solution for the data dimensionality reduction problem. It is especially useful for efficient processing of genome-scale data consisting of several thousands of objects in several minutes.
Multidimensional scaling (MDS) is a technique for representing objects (or data points) in a low-dimensional space based on their similarity. Main purposes of MDS include exploratory data analysis by visualization and feature selection for subsequent analysis such as classification. In bioinformatics and related areas, MDS has been applied to diverse problems such as gene expression pattern visualization [1, 2], drug responses profiling , and p53 transactivation prediction .
Among various MDS methods, the classical MDS is based on the idea of finding coordinates appropriate for describing dissimilarities as distances . The classical MDS finds coordinates by a set of matrix operations. Roughly speaking, it decomposes the squared distance matrix by solving the eigenpair problem, of which complexity is proportional to the cube of the number of data points . This heavy computational burden is a bottleneck for quick processing of large-scale datasets having thousands of objects. Meanwhile, massive parallel processing based on graphics processing units (GPUs) for general computing applications, a.k.a. GPGPU (general purpose computation on graphics processing units) has risen as a reasonable option for expediting computationally-intensive jobs on normal desktop computers equipped with a graphics card . CUDA (compute unified device architecture) is one of the most pervasively-used frameworks for GPGPU developed by NVIDIA, Inc. . In the CUDA environment, linear algebra packages such as CUBLAS  and CULA  are provided. In bioinformatics, CUDA has been deployed for diverse applications such as sequence alignment [10–12], protein substructure search , RNA microarray analysis , and a non-classical MDS .
One problem with CUDA is the relatively small memory size of most graphics cards (usually less than 1 gigabyte). General graphics cards do not have sufficient memory for storing and processing large-scale datasets containing tens of thousands data points. For circumventing this problem, we exploit a famous engineering principle, i.e., divide-and-conquer. Divide-and-conquer approach to the classical multidimensional scaling has drawn much attention for reducing its computational complexity and has been applied in serial computing environments [6, 16].
We implemented CFDMS by extending our previous work . Our software application has two operating modes. If a graphics card allows sufficient memory for reading and processing all data points, it runs in "one-shot" mode. When available memory is not enough, it operates in "divide-and-conquer" mode and produces an approximate solution. The available memory size is automatically detected and the two operating modes are accordingly toggled on and off.
In the one-shot mode, the classical MDS on a dissimilarity matrix D, of which size is n × n, proceeds as follows.
1. D(2) = [d ij 2], where d ij denotes the element of D on the i th row and the j th column, i.e., the dissimilarity between the i th and j th points.
2. J = I - n-11, where I is the identity matrix and 1 denotes the n × n matrix of which elements are all one.
4. Calculate the first m eigenvectors e1, e2, ..., e m and the corresponding eigenvalues λ1, λ2, ..., λ m from B.
5. Calculate the m-dimensional coordinates of the n data points by
1. Randomly decompose an n × n dissimilarity matrix D all along the diagonal into p submatrices, i.e., D1, D2, ..., D p .
2. Sample s objects from each of the submatrices.
3. Merge the sampled objects and construct a new dissimilarity submatrix M align of which size is (sp) × (sp).
4. Apply the one-shot MDS method to D1, D2, ..., D p as well as M align . Denote the resulting coordinates by dMDS1, dMDS2, ..., dMDS p as well as mMDS, respectively.
5. Extract the objects sampled at step 2 from the above results, obtaining subdMDS1, subdMDS2, ..., subdMDS p as well as mMDS1, mMDS2, ..., mMDS p .
6. For each pair subdMDS i and mMDS i (i = 1, 2, ..., p), solve the following linear least squares problem, argmin A i ||A i subdMDS i - mMDS i ||, where || · || denotes L2 norm.
7. Linearly transform the objects of D i as follows. A i dMDS i = new dMDS i .
8. Combine new dMDS1, new dMDS2, ..., new dMDS p into an approximate MDS solution to the entire objects.
Since the size of submatrix is determined by the available memory size of a graphics card, the number of submatrices p and the number of sampled objects from each submatrix s are determined automatically by our software application. Two ways of sampling from the submatrices (Step 2 of the algorithm above) are "Random" and "MaxMin". Random denotes usual random sampling without replacement. In the MaxMin approach, data points are chosen one at a time, and each new point maximizes, over all unused data points, the minimum distance to any of the previously-sampled points . As in the one-shot mode, all the matrix operations have been implemented using CUBLAS and CULA.
Number of Attributes
Number of Instances
Pearson's Median Skewness Coefficient
Coefficient of Variation
UCI ML Repository
UCI ML Repository
M. musculus Microarray
S. cerevisiae Microarray
Size of Dissimilarity Matrix
No. of Submatrices
No. of Samples
in Each Submatrix (s)
150 × 150
366 × 366
M. musculus Microarray
2,000 × 2,000
S. cerevisiae Microarray
9,300 × 9,300
10,000 × 10,000
Execution time of CFMDS
Accuracy of CFMDS
However, CFMDS with Random and MaxMin sampling modes showed different results for M. musculus Microarray and MNIST datasets. For M. musculus Microarray dataset, Random sampling mode showed the worst result among all benchmark datasets with the largest standard deviation, although MaxMin sampling method produced almost identical results compared to the result from the classical MDS (Pearson's correlation coefficient: about 0.97). On the contrary, MaxMin mode showed a relatively low performance with high variance for MNIST dataset. For the same dataset, Random sampling mode achieved relatively accurate results (Pearson's correlation coefficient: about 0.93). The difference in performance of Random and MaxMin sampling methods of CFMDS could be due to the skewness or dispersion of data. The MaxMin sampling mode is suitable for datasets with high skewness or dispersion, because it could sample data points which are far apart from each other . We checked the skewness and dispersion of our experimental datasets using Pearson's median skewness coefficient and coefficient of variation of distances between data points. The Pearson's median skewness coefficient (PMSC) is defined as 3(mean - median)/standard deviation and measures asymmetry of a distribution. Coefficient of variation (CV) is defined as standard deviation /mean and is a normalized measure of dispersion. Among the five datasets, M. musculus Microarray showed the highest skewness and dispersion (PMSC = 0.94, CV = 1.08). For this dataset, MaxMin sampling mode of CFMDS generated relatively accurate results. MNIST dataset showed the lowest skewness and dispersion (PMSC = -0.13, CV = 0.14). For this dataset, Random sampling mode showed relatively accurate results. As a conclusion, we suggest the use of MaxMin sampling for highly skewed or dispersed data and Random sampling for symmetric and lowly dispersed data.
We implemented a software application, CFMDS (CUDA-based Fast MultiDimensional Scaling) for efficient dimensionality reduction of large-scale genomic data. CFMDS adopted CUDA programming library and divide-and-conquer strategy to handle several thousands of features in less than several minutes on a commodity PC equipped with a graphics card. CUDA was applied as a parallel computing method and divide-and-conquer principle was used to circumvent the small memory size problem of usual graphics cards. By combining these two techniques, CFMDS enables that a regular PC with a CUDA-support graphics card handles the large-scale genomic data dimensionality reduction problem which can be efficiently executed only on high performance computers. The simulation results confirmed that our approach can perform MDS more than a hundred times faster with a comparable accuracy for genome-scale data. Therefore, CFMDS is especially useful to visualize and analyze data consisting of several thousands of objects in less than several minutes. We implemented two sampling options for the divide-and-conquer mode of CFMDS such as Random and MaxMin samplings. As shown in Results section, CFMDS with Random sampling approach usually works quite well in practice. MaxMin sampling method is especially useful in some contexts where data distribution is highly skewed or dispersed. Further work includes optimizing our application with respect to data transfer between graphics cards and host computers.
Availability and requirements
Project name: CFMDS
Project home page: http://ml.ssu.ac.kr/CFMDS/CFMDS.html
Operating system(s): Windows XP or higher (32-bit and 64-bit), Linux (tested on Ubuntu Linux 9.04, Red Hat Enterprise Linux 5.3/4.7, Fedora 11)
Programming language: CUDA
Other requirements: NVIDIA's GPU with CUDA, CUDA toolkit 2.3 (not support CUDA 3.0 toolkit yet), The latest version of CULA basic libraries
License: GNU GPL v2
Any restrictions to use by non-academics: none
K.-B.H. was supported by the Soongsil University Research Fund and by the Proteogenomic Research Program and Basic Science Research Program (2012R1A1A2039822) through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology. S.-Y.S. was supported by Basic Science Research Program (2012R1A1A2002804) through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 17, 2012: Eleventh International Conference on Bioinformatics (InCoB2012): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S17.
- Khaitovich P, Muetzel B, She X, Lachmann M, Hellmann I, Dietzsch J, Steigele S, Do HH, Weiss G, Enard W, Heissig F, Arendt T, Nieselt-Struwe K, Eichler EE, Pāābo S: Regional patterns of gene expression in human and chimpanzee brains. Genome Res. 2004, 14: 1462-1473. 10.1101/gr.2538704.PubMed CentralView ArticlePubMedGoogle Scholar
- Taguchi YH, Oono Y: Relational patterns of gene expression via non-metric multidimensional scaling analysis. Bioinformatics. 2005, 21 (6): 730-740. 10.1093/bioinformatics/bti067.View ArticlePubMedGoogle Scholar
- Loo LH, Wu LF, Altschuler SJ: Image-based multivariate profiling of drug responses from single cells. Nat Methods. 2007, 4 (5): 445-453.PubMedGoogle Scholar
- Gowrisankar S, Jegga AG: Regression based predictor for p53 transactivation. BMC Bioinformatics. 2009, 10: 215-10.1186/1471-2105-10-215.PubMed CentralView ArticlePubMedGoogle Scholar
- Borg I, Groenen PJF: Modern Multidimensional Scaling: Theory and Applications. 2005, New York, Springer, 2Google Scholar
- Yang T, Lui J, McMillan L, Wang W: A fast approximation to multidimensional scaling. Proceedings of the ECCV 2006 Workshop on Computational Intensive Methods for Computer Vision. 2006Google Scholar
- Harris M: Mapping computational concepts to GPUs. Proceedings of SIGGRAPH '05 ACM SIGGRAPH 2005 Courses. 2005Google Scholar
- NVIDIA CUDA Zone. [http://www.nvidia.com/object/cuda%20home%20new.html]
- CULA tools, EM Photonics. [http://www.culatools.com]
- Manavski SA, Valle G: CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics. 2008, 9 (Suppl 2): S10-10.1186/1471-2105-9-S2-S10.PubMed CentralView ArticlePubMedGoogle Scholar
- Wirawan A, Kwoh CK, Hieu NT, Schmidt B: CBESW: Sequence alignment on the Playstation 3. BMC Bioinformatics. 2008, 9: 377-10.1186/1471-2105-9-377.PubMed CentralView ArticlePubMedGoogle Scholar
- Lui Y, Maskell DL, Schmidt B: CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units. BMC Research Notes. 2009, 2: 73-10.1186/1756-0500-2-73.View ArticleGoogle Scholar
- Stivala AD, Stuckey PJS, Wirth AI: Fast and accurate protein substructure searching with simulated annealing and GPUs. BMC Bioinformatics. 2010, 11: 446-10.1186/1471-2105-11-446.PubMed CentralView ArticlePubMedGoogle Scholar
- Shterev ID, Jung SH, George SL, Owzar K: permGPU: Using graphics processing units in RNA microarray association studies. BMC Bioinformatics. 2010, 11: 329-10.1186/1471-2105-11-329.PubMed CentralView ArticlePubMedGoogle Scholar
- Fester T, Schreiber F, Strickert M: CUDA-based multi-core implementation of MDS-based bioinformatics algorithms. Proceedings of German Conference on Bioinformatics (GCB 2009). 67-79.Google Scholar
- Tzeng J, Lu HHS, Li WH: Multidimensional scaling for large genomic datasets. BMC Bioinformatics. 2008, 9: 179-10.1186/1471-2105-9-179.PubMed CentralView ArticlePubMedGoogle Scholar
- Park S, Hwang KB: An efficient multidimensional scaling method based on CUDA and divide-and-conquer. Journal of the Korean Institute of Information Scientists and Engineers: Computing Practices and Letters. 2010, 16 (4): 427-431.Google Scholar
- De Silva V, Tenenbaum JB: Sparse multidimensional scaling using landmark points. 2004, Technical Report, Stanford UniversityGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.