Skip to main content

Archived Comments for: Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect

Back to article

  1. Possible bias in error estimates

    Sudhir Varma, National Cancer Institute, NIH

    11 January 2005

    The authors use 40 samples to find genes that differentiate between samples with different survival times. However the selected genes are used to test the same 40 samples. The fact that the network is trained on only 20 samples and validated on the rest 20 at the end of the flow schematic does not mean that a correct validation has been done since the genes used were selected using all samples. The error obtained can be very optimistic compared to the true error which one would get if both gene selection and classifier training are done on a training set and the testing is done on a completely independent testing set. Please see

    1) Simon R, Radmacher MD, Dobbin K, McShane LM, Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification. J Natl. Cancer Inst 2003; 95:14-18

    2) Ambroise C, McLachlan GJ, Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS. 2002 May 14; 99(10):6562-6566.

    3) Reunanen J, Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research. 2003; 3:1371-1382.



    Sudhir Varma, Ph.D.

    Biometric Research Branch,

    National Cancer Institute, NIH.

    6130 Executive Blvd. EPN/8142

    Rockville, MD-20852, USA


    Competing interests

    None declared