Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect

O'Neill, Michael C; Song, Li

doi:10.1186/1471-2105-4-13

Possible bias in error estimates

Sudhir Varma, National Cancer Institute, NIH

11 January 2005

The authors use 40 samples to find genes that differentiate between samples with different survival times. However the selected genes are used to test the same 40 samples. The fact that the network is trained on only 20 samples and validated on the rest 20 at the end of the flow schematic does not mean that a correct validation has been done since the genes used were selected using all samples. The error obtained can be very optimistic compared to the true error which one would get if both gene selection and classifier training are done on a training set and the testing is done on a completely independent testing set. Please see

1) Simon R, Radmacher MD, Dobbin K, McShane LM, Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification. J Natl. Cancer Inst 2003; 95:14-18

2) Ambroise C, McLachlan GJ, Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS. 2002 May 14; 99(10):6562-6566.

3) Reunanen J, Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research. 2003; 3:1371-1382.

Sudhir

___________________

Sudhir Varma, Ph.D.

Biometric Research Branch,

National Cancer Institute, NIH.

6130 Executive Blvd. EPN/8142

Rockville, MD-20852, USA

(301)443-1723

varmas@mail.nih.gov

Competing interests

None declared

Archived Comments for: Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect

Possible bias in error estimates

Competing interests

BMC Bioinformatics

Contact us