The error rate of prediction decreases with an increasing training sample size. A. DLDA, B. RF, C. SVM. First, each sample was labeled as good (disease-free or overall survival over five years) or poor (recurrence or death within five years). Then, m training samples and 100 testing samples were randomly selected from the data set of pooled samples, a prognostic gene set was constructed from the m training samples, and its error rate of prediction was calculated by applying the prognostic gene set to the 100 testing samples. The training sample size m was varied from 100 to 500 by an increment of 100, and the entire process was repeated 100 times. Three machine learning algorithms – DLDA, RF, and SVM – were used. Data represents a boxplot of error rates calculated by 100 random sampling processes.