Fast automated cell phenotype image classification
© Hamilton et al. 2007
Received: 20 December 2006
Accepted: 30 March 2007
Published: 30 March 2007
Skip to main content
© Hamilton et al. 2007
Received: 20 December 2006
Accepted: 30 March 2007
Published: 30 March 2007
The genomic revolution has led to rapid growth in sequencing of genes and proteins, and attention is now turning to the function of the encoded proteins. In this respect, microscope imaging of a protein's sub-cellular localisation is proving invaluable, and recent advances in automated fluorescent microscopy allow protein localisations to be imaged in high throughput. Hence there is a need for large scale automated computational techniques to efficiently quantify, distinguish and classify sub-cellular images. While image statistics have proved highly successful in distinguishing localisation, commonly used measures suffer from being relatively slow to compute, and often require cells to be individually selected from experimental images, thus limiting both throughput and the range of potential applications. Here we introduce threshold adjacency statistics, the essence which is to threshold the image and to count the number of above threshold pixels with a given number of above threshold pixels adjacent. These novel measures are shown to distinguish and classify images of distinct sub-cellular localization with high speed and accuracy without image cropping.
Threshold adjacency statistics are applied to classification of protein sub-cellular localization images. They are tested on two image sets (available for download), one for which fluorescently tagged proteins are endogenously expressed in 10 sub-cellular locations, and another for which proteins are transfected into 11 locations. For each image set, a support vector machine was trained and tested. Classification accuracies of 94.4% and 86.6% are obtained on the endogenous and transfected sets, respectively. Threshold adjacency statistics are found to provide comparable or higher accuracy than other commonly used statistics while being an order of magnitude faster to calculate. Further, threshold adjacency statistics in combination with Haralick measures give accuracies of 98.2% and 93.2% on the endogenous and transfected sets, respectively.
Threshold adjacency statistics have the potential to greatly extend the scale and range of applications of image statistics in computational image analysis. They remove the need for cropping of individual cells from images, and are an order of magnitude faster to calculate than other commonly used statistics while providing comparable or better classification accuracy, both essential requirements for application to large-scale approaches.
Obtaining the sequence of numerous genomes and subsequent identification of the encoded proteome has created the need for large-scale systematic approaches to understand the functions of the tens of thousands of proteins at the cellular level [1, 2]. High-throughput automated fluorescent microscope imaging technologies enable the experimental determination of a protein's sub-cellular localization and its dynamic trafficking within a range of cellular contexts. These approaches generate vast numbers of images including multiple fluorophores for cells under a variety of experimental conditions [3, 4]. Furthermore, cells may now be imaged in 3D, or indeed 4D with 3D stacks captured over time to observe protein trafficking in live cells . The desire and the ability to carry out high-throughput screenings of protein localization and trafficking for applications such as drug discovery  is leading to a rapid growth in cell images in need of analysis on a scale comparable to that of the genomic revolution. It has been estimated that to take a single image for each combination of protein, cell type and timescale would require of the order of 100 billion images . Currently, image databases such as the Yeast GFP Fusion Localization Database , the LOCATE mouse protein sub-cellular localization database  and the LIFEdb database for the integration and dissemination of functional data  offer the possibility to present, integrate and search the vast amounts of data being created by high throughput cell imaging. However, to a large degree the analysis and comparison of localizations are still performed by the slow, coarse-grained and possibly biased process of manual inspection. To deal with the scale of the data becoming available automated annotation, analysis, comparison, classification and storage of cellular images is essential.
Image statistics have proven to be of great utility in the automated analysis of cellular images. Haralick texture measures define a variety of statistics based on the spatial dependence of individual pixel intensities across an image . Zernike moments [10, 11] calculate the decomposition of an image onto an orthogonal set of polynomials in much the same way that Fourier coefficients may be used to decompose a time series. These and other  measures may be used to generate a vector of numbers for a given cell image, and have a wide range of applications. For a given set of images, a representative image may be chosen by selecting the image with vector closest to the mean vector of the entire image set [13, 14]. Images may be clustered  or ranked by distance from a given image to find similar images . And given two sets of sub-cellular localization images under differing experimental conditions, image statistics can be used to assess whether there is a statistically significant difference, even to the extent that visually indistinguishable images of distinct localizations may be differentiated . One important application is in automated sub-cellular localization classification. Here, a machine learning technique such as a neural network  or support vector machine (SVM)  is trained on image vectors of known localization, and subsequently used to predict those of unknown localization. With accuracies of well over 90% [19–21] such predictors have proved very successful, and have exceeded human classification accuracy [13, 19]. Once statistical techniques such as the above are fully integrated into cell image databases, a much greater degree of refinement, content searching, unbiased clustering and hypothesis testing will be enabled.
While image statistics have performed well in sub-cellular localization classification, they often suffer from high computational cost and require individual cells to be cropped from an image, hence limiting the extent to which they may be applied. Here, we introduce threshold adjacency statistics (TAS), a simple and fast morphological measure for distinguishing sub-cellular localization.
Average classification accuracies using TAS statistics on Endogenous and Transfected data test sets and the subcellular markers used
myc-nbp-45 (txreg I920050F21)
myc-Lysophosphatidic acid receptor
myc-chloride channel 7 (5330412O18)
Both the Haralick texture measures and the magnitudes of the Zernike moments have previously been shown to be useful in distinguishing sub-cellular localization [12, 13]. To compare performance with that of threshold adjacency statistics, a set of 20 Haralick measures and 49 Zernike measures were selected. The Haralick measures were chosen from a list of those shown to be good for distinguishing sub-cellular localization in Conrad et al. , and have previously been described and tested in the Automated Sub-Cellular Phenotype Classification (ASPiC) system . The Zernike measures chosen were the magnitudes associated with the first 12 Zernike polynomials, and have also previously been applied to sub-cellular localization .
Comparison of TAS, Haralick and Zernike statistics classification accuracies by 5-fold cross validation
To test if the information contained within the threshold adjacency statistics and the Haralick texture measures was complementary, SVMs were trained that combined both types of statistics for each cropped image. In this case, 5-fold cross validated accuracies of 98.2% and 93.2% were obtained on the endogenous and transfected sets, respectively, showing a significant improvement over either individual class of statistic.
The high predictive accuracy when applying Haralick and Zernike statistics comes with the expense of relatively high computational complexity. To compare the computational cost of Haralick, Zernike and threshold adjacency statistics the time taken to calculate each was recorded for the endogenous data set of 503 images. Since the Haralick measures are usually applied to single cell images and Zernike measures require them, the time to crop the 503 images to create 1420 single cell images was first benchmarked and found to be 4 minutes 16 seconds. Generation of 20 Haralick measures then took 11 minutes 50 seconds, and 49 Zernike moments took 17 minutes 22 seconds. This compares to 62 seconds to generate threshold adjacency statistics directly with no cropping for the 503 images. More detailed timing of just the function call to calculate the 27 threshold statistics once the image was loaded into memory, showed an average time to calculate the 27 statistics of 20 ms per image. Threshold adjacency statistics are hence an order of magnitude faster to calculate than either the Haralick or Zernike measures.
In general, segmentation of cell images into cellular and non-cellular regions is a difficult problem. When calculating the average intensity μ of pixels whose intensity is at least 30, the lower bound 30 was chosen as intensities below this value are in general background for the endogenous and transfected image sets. The sensitivity of the predictive accuracy using TAS for different choices of lower bound was tested as follows. As described above, a lower bound of 30 gave a 5-fold cross validation accuracy of 95.2% on the endogenous set. Further tests with a lower bound of 40 gave an accuracy of 94.2%, and a lower bound of 20 gave an accuracy of 96.6%. Tests with the transfected image set yielded similarly small variation (data not shown). Hence, while there is some variation in accuracy, threshold adjacency statistics appear relatively insensitive to the choice of threshold.
A commonly used auto-thresholding scheme is to find a threshold intensity t, such that t is (approximately) the average of the average intensity of those pixels with intensity less than or equal to t, and the average intensity of those pixels with intensity greater than t (see ImageJ FAQ ). Using such a scheme on the endogenous image set with threshold adjacency statistics gave a 5-fold classification accuracy of 91.6%. Visually examining the selections showed auto-thresholding had had variable success in highlighting the cellular regions of the images. One particular problem was that there was a general trend to under-select cells, that is to miss regions. To compensate for this a lower bound of the auto threshold value minus 15 was tested and gave an accuracy of 93.2%. Hence auto-thresholding, while performing reasonably well, is computationally more expensive and is not as effective as choosing a fixed threshold when applying threshold adjacency statistics. It is possible that another variable thresholding scheme or segmentation algorithm might give better cell region selection results and hence better predictive accuracy with threshold adjacency statistics, but the computational complexity of such a scheme is likely to be high.
Another point to consider is that cell populations may not be heterogeneous due to variations in cell cycle. In preparing the image sets, DAPI images were examined to exclude those cells that were not in interphase. For non-heterogeneous populations an interesting and useful addition could be to apply an automated cell phase predictor such as is described in by Pham et al. in  prior to classification. Cells that are not in interphase could then either be excluded or treated separately.
Threshold adjacency statistics have been shown to be well suited to sub-cellular localization classification, and offer a number of advantages over other image statistics. With a classification accuracy of up to 95% they offer comparable or better accuracy than the Haralick texture measures, while being an order of magnitude faster to calculate. While comparison with previous literature is problematic in that each group has distinct image sets with different sub-cellular classes and varying degrees of automation, threshold adjacency statistics appear at least on par with a reported 92% accuracy previously obtained . Automated region selection and cropping of cells for classification can be exceptionally difficult and computationally expensive, especially when cells are highly confluent. Threshold adjacency statistics require no cropping and are additive, hence giving better statistics the more cells there are in an image. Another advantage is that every image presented is classified. With automated cropping/selection systems a wide range of images are dealt with, and so it is not uncommon to fail to locate a cell within an image because it is relatively faint, or some other criteria. Further, for applications in which speed of calculation is not critical, the use of threshold adjacency statistics in combination with Haralick texture measures give an accuracy of up to 98%. Finally, with 3D and 4D cell imaging become more widespread, new methods are required to distinguish and classify protein localization. While automated classification of 3D sub-cellular localization using image statistics has proved very successful , the addition of an extra dimension greatly increases the computational expense, and hence application of threshold adjacency statistics to 3D has the potential to significantly increase classification throughput.
An image collection was created for sub-cellular organelles consisting of either or both of two types of sets; one set for which an endogenous protein or feature of the specific organelle was detected with a fluorescent antibody or other probe (10 organelles); and another set for which an epitope- or fluorescence-tagged protein was transiently expressed in the specific organelle and subsequently detected (11 organelles). Each image was accompanied by an additional image of the cells counterstained with the DNA specific dye 4',6-diamidino-2-phenylindole (DAPI), which highlights the location of the nucleus of every cell in the image. In addition, the DAPI image was reviewed to exclude images that contained one or more cells not in interphase. Each organelle set consists of 50 localisation images and 50 DAPI counterstained images, with the exception of the endogenous nuclear which contains only DAPI images. In total, 502 endogenous and 553 transfected localization images were obtained. All images were of fixed HeLa cells, taken at 60× magnification under oil immersion. The images are 8 bit greyscale, 768 by 512 pixels, each containing up to 13 cells. Cropped sample images of each organelle are given in Figures 3 and 4, and the antibodies or probes used are given in Table 1. The complete image set is available for download from the LOCATE website .
Image statistics were implemented in C++ within the ASPiC software . The time tests were conducted on a Pentium 4 2.4 GHz machine running Red Hat Enterprise 3.
SVMs were created using the libsvm software  with a radial basis function (RBF) kernel. Two parameters are required to train the RBF kernel, γ the coefficient of the exponent, and C the penalty term of the error. A grid search was performed to choose those values of γ and C that gave the best 5-fold cross validated performance on each data set. For testing, 5-fold cross validation was utilized. Data is split into 5 equal parts, each part in turn is tested on an SVM trained on the remainder, and the average test set accuracy returned. By splitting data as described above, cross-validation may be used to avoid over-fitting the training data and give an estimate of the prediction error for unseen data, though care does need to be taken . In certain cases, it can be proved that the cross-validation error estimate is an almost unbiased estimate of the true error on unseen data . The 1000 repeated tests of splitting the data sets into 4/5ths for training and 1/5th for testing (Table 1) gave comparable classification accuracies to that of 5-fold cross validation, hence suggesting that over-fitting has not occurred to a significant degree.
NAH gratefully acknowledges support from the Australian Research Council's award of a Federation Fellowship to Prof. Kevin Burrage. This work was supported by funds from the Australian Research Council of Australia and the Australian National Health and Medical Research Council of Australia; RDT is supported by an NHMRC R. Douglas Wright Career Development Award.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.