We implemented CFDMS by extending our previous work [17]. Our software application has two operating modes. If a graphics card allows sufficient memory for reading and processing all data points, it runs in "one-shot" mode. When available memory is not enough, it operates in "divide-and-conquer" mode and produces an approximate solution. The available memory size is automatically detected and the two operating modes are accordingly toggled on and off.

### One-shot MDS

In the one-shot mode, the classical MDS on a dissimilarity matrix **D**, of which size is *n* × *n*, proceeds as follows.

1. **D**^{(2)} = [*d*_{
ij
} ^{2}], where *d*_{
ij
} denotes the element of **D** on the *i* th row and the *j* th column, i.e., the dissimilarity between the *i* th and *j* th points.

2. **J** = **I** - *n*^{-1}**1**, where **I** is the identity matrix and **1** denotes the *n* × *n* matrix of which elements are all one.

3.\mathsf{\text{B}}=\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}\frac{1}{2}\mathsf{\text{J}}{\mathsf{\text{D}}}^{\left(2\right)}\mathsf{\text{J}}.

4. Calculate the first *m* eigenvectors **e**_{1}, **e**_{2}, ..., **e**_{
m
} and the corresponding eigenvalues λ_{1}, λ_{2}, ..., λ_{
m
} from **B**.

5. Calculate the *m*-dimensional coordinates of the *n* data points by {\mathbf{X}}^{\mathbf{T}}=\left[{\mathbf{e}}_{1},\phantom{\rule{0.3em}{0ex}}{\mathbf{e}}_{2},\phantom{\rule{0.3em}{0ex}}...,\phantom{\rule{0.3em}{0ex}}{\mathbf{e}}_{m}\right]{\mathbf{\Lambda}}_{m}^{\frac{1}{2}},\phantom{\rule{0.3em}{0ex}}\mathsf{\text{where}}{\Lambda}_{m}^{1/2}=\mathsf{\text{diag}}\left({\lambda}_{1}^{1/2},\phantom{\rule{0.3em}{0ex}}{\lambda}_{2}^{1/2},\phantom{\rule{0.3em}{0ex}}...,\phantom{\rule{0.3em}{0ex}}{\lambda}_{m}^{1/2}\right).

Each column of **X** corresponds to the coordinate of each data point in the reduced (*m*-dimensional) space. The above procedure has been implemented using CUBLAS [8] and CULA [9].

### Divide-and-conquer MDS

The divide-and-conquer MDS based on [6] divides a given set of objects into several subsets of manageable size. Then, another subset of manageable size is made by sampling from each of the previous subsets. The same MDS routine of the one-shot mode is applied to each of the submatrices. Finally, each result is merged into an approximate MDS solution for the entire objects. More precise steps are as follows (see Figure 1).

1. Randomly decompose an *n* × *n* dissimilarity matrix **D**_{
all
} along the diagonal into *p* submatrices, i.e., **D**_{1}, **D**_{2}, ..., **D**_{
p
}.

2. Sample *s* objects from each of the submatrices.

3. Merge the sampled objects and construct a new dissimilarity submatrix **M**_{
align
} of which size is (*sp*) × (*sp*).

4. Apply the one-shot MDS method to **D**_{1}, **D**_{2}, ..., **D**_{
p
} as well as **M**_{
align
}. Denote the resulting coordinates by **dMDS**_{1}, **dMDS**_{2}, ..., **dMDS**_{
p
} as well as **mMDS**, respectively.

5. Extract the objects sampled at step 2 from the above results, obtaining sub**dMDS**_{1}, sub**dMDS**_{2}, ..., sub**dMDS**_{
p
} as well as **mMDS**_{1}, **mMDS**_{2}, ..., **mMDS**_{
p
}.

6. For each pair sub**dMDS**_{
i
} and **mMDS**_{
i
} (*i* = 1, 2, ..., *p*), solve the following linear least squares problem, argmin_{
A
i
}||**A**_{
i
}sub**dMDS**_{
i
} - **mMDS**_{
i
}||, where || · || denotes *L*^{2} norm.

7. Linearly transform the objects of **D**_{
i
} as follows. **A**_{
i
}**dMDS**_{
i
} = *new* **dMDS**_{
i
}.

8. Combine *new* **dMDS**_{1}, *new* **dMDS**_{2}, ..., *new* **dMDS**_{
p
} into an approximate MDS solution to the entire objects.

Since the size of submatrix is determined by the available memory size of a graphics card, the number of submatrices *p* and the number of sampled objects from each submatrix *s* are determined automatically by our software application. Two ways of sampling from the submatrices (Step 2 of the algorithm above) are "Random" and "MaxMin". *Random* denotes usual random sampling without replacement. In the *MaxMin* approach, data points are chosen one at a time, and each new point maximizes, over all unused data points, the minimum distance to any of the previously-sampled points [18]. As in the one-shot mode, all the matrix operations have been implemented using CUBLAS and CULA.