HCS-Neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening

Background High-content screening (HCS) has become a powerful tool for drug discovery. However, the discovery of drugs targeting neurons is still hampered by the inability to accurately identify and quantify the phenotypic changes of multiple neurons in a single image (named multi-neuron image) of a high-content screen. Therefore, it is desirable to develop an automated image analysis method for analyzing multi-neuron images. Results We propose an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCS-neurons. To observe multiple phenotypic changes of neurons, we propose two kinds of descriptors which are neuron feature descriptor (NFD) of 13 neuromorphology features, e.g., neurite length, and generic feature descriptors (GFDs), e.g., Haralick texture. HCS-neurons can 1) automatically extract all quantitative phenotype features in both NFD and GFDs, 2) identify statistically significant phenotypic changes upon drug treatments using ANOVA and regression analysis, and 3) generate an accurate classifier to group neurons treated by different drug concentrations using support vector machine and an intelligent feature selection method. To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a microtubule depolymerizing drug which has been shown to impair neurite development) at six concentrations ranging from 0 to 1000 ng/mL. The experimental results show that all the 13 features of NFD have statistically significant difference with respect to changes in various levels of nocodazole drug concentrations (NDC) and the phenotypic changes of neurites were consistent to the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area were able to achieve an independent test accuracy of 90.28% for the six-dosage classification problem. This NFD module and neuron image datasets are provided as a freely downloadable MatLab project at http://iclab.life.nctu.edu.tw/HCS-Neurons. Conclusions Few automatic methods focus on analyzing multi-neuron images collected from HCS used in drug discovery. We provided an automatic HCS-based method for generating accurate classifiers to classify neurons based on their phenotypic changes upon drug treatments. The proposed HCS-neurons method is helpful in identifying and classifying chemical or biological molecules that alter the morphology of a group of neurons in HCS.


Background
To investigate the organization of neurons in various brain tissues including their activity and function, scientists typically examine neural images to classify distinct neuron morphologies [1]. In high-content screening (HCS), automated image analysis has become necessary to identify interesting samples and extract quantitative information by microscopy [2]. For rare phenotypes that are nonetheless recognizable by eyes, a researcher can generate a classifier to recognize cells with the phenotype of interest [2]. Recently, HCS-based methods have been used to quantify neuronal phenotypic changes which correlate to multiple treatments or drugs as illustrated in Table 1. Previously, the single-neuron neuromorphology was considered difficult because of tightly packed positioning and huge spanning arbors of neurons [1], [3]. However, the variation of neuronal morphology to a treatment effect should be considered as a global phenotypic change affecting a large number of neurons rather than only one specific neuron. The image containing multiple neurons is named a multi-neuron image. Thus, the multi-neuron based HCS plays a crucial role for drug treatment analysis [3][4][5][6][7][8][9][10]. In this study [8], the appropriate medication for Huntington's disease was identified. Table 1 lists the functions of major methodologies published since 2010. The neurite-related features such as neurite length are most frequently used for quantifying the neuromorphology changes in specific cell culture. The soma-related features such as soma area are at rank 2, and the branch-related features such as branch complexity are at rank 3. The quantification analysis for single-neuron phenotypic changes is successfully demonstrated in the studies [11][12][13][14][15][16][17]. In additional, classification analysis was implemented in the studies [14,16,17] and regression analysis is also proposed in the work [17]. For analyzing HCSbased multi-neuron images [3][4][5][6][7][8][9][10], automatic feature extraction is considered as an essential technique. The classification analysis was only applied in [10] apart from neuron feature descriptor (NFD), the generic feature descriptor (GFD) was verified to provide a promissing result [10]. Surprisingly, the regression analysis is out of attention in multi-neuron-image-based HCS.
In this study, we develop an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCSneurons. At first, we extend our previous work [3] to propose a neuron feature descriptor which consists of 13 features and is able to effectively quantify neuronal morphology changes. To make a comprehensive study on the collective phenotypic changes, we propose a generic feature descriptor consisting of several promising image features by utilizing pixel intensity, image moment, and texture information. The HCS-neurons method achieves the automatic feature extraction using an extended version of NeurphologyJ [3], feature analysis using ANOVA analysis and regression analysis, feature selection using an optimization approach based on an inheritable bi-objective combinatorial genetic algorithm (IBCGA) [18], classifier design based on support vector machine (SVM) [19] with the selected features.
To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a known microtubule depolymerizing drug) at six concentrations ranging from 0 to 1000 ng/mL. The multi-neuron images treated using 6 different nocodazole drug concentrations (NDC) were selected as our benchmark because nocodazole has a well-known ability to directly affect neurite morphology [20][21][22][23]. The identified phenotypic changes of neurites were consistent with the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area can achieve an independent accuracy as high as 90.28% for the six-dosage classification problem.

Methods
The proposed HCS-neurons method using the quantification and classification strategies for HCS of multiple neuron phenotypic changes response to 6 dosages of NDC is described. The multi-neuron images are preprocessed in the same way as reported in the previous work [3]. The binary images are generated for establishing the NFD and GFDs datasets. Additional gray-scaled images are necessary for some GFD features. We perform standard statistical analyses using ANOVA and regression for the NFD features which can directly show easily interpretable changes in neuronal morphology. To identify the descriptors which correlated to phenotypic changes upon NDC variations, the SVM-based classification analysis was used to evaluate both datasets. Finally, the IBCGA method was applied to select a small set of features by optimizing SVM prediction performance.

Dataset
Nocodazole is a known microtubule depolymerizing drug which can lead to impaired neurite development. The image acquisition procedure used here is the same as described in [3]. For self-completeness, the procedure is concisely described below. Embryonic carcinoma P19 cells were maintained at 37°C in 5% CO 2 in minimum essential medium supplemented with 2 mM glutamine, 1 mM sodium pyruvate, and 10% (v/v) fetal bovine serum. A total of 216 images called Noco216 were analyzed to examine the morphological adaptation of neuron cells to the amount of drug. Images were divided into 6 classes (each class had 36 images) based on the dosage of nocodazole applied. These 216 images were all multi-neuron images since they contained hundreds of neurons. Both ANOVA analysis and regression analysis are based on NFD extracted from images in this dataset. To define the classification problem, we implemented a stratified random sampling to separate 2/3 of the image dataset as a training set called Noco144 and 1/3 of image dataset as a testing set called Noco72. The summary of these datasets is listed in Table 2. Typical neuron images from each class (i.e. each different nocodazole concentration) are shown in Figure 1. It is readily observable that neuronal morphology exhibited dramatic difference at different dosage of nocodazole.

Neuron Feature Descriptor (NFD)
The standard multi-neuron image descriptors from Neur-phologyJ [3] were used which include somaCount, somaArea, neuriteLength, neuriteArea, attachmentPoint# and endingPoint#. The features somaCount and somaArea are the numbers of soma and summation of all soma area in a multi-neuron image, respectively. The features neuri-teLength and neuriteArea represent the total length of all neurites combined together and summation of the entire area covered by all somata in an image, respectively. The feature attachmentPoint# describes the total number of all neurite attachment point where neurites connect to a soma appeared in an image. The feature endPoint# describes the total number of all neurite end point appeared in an image. In addition, we developed an additional further called branchPoint# which is determined by using a 3x3 mask consisting of all possible 3x3 branch patterns. Each 1-pixel-wide neurite was automatically matched to one of these patterns to characterize the branch points.
Features measuring the cumulative value of all cells (e.g. total length of all neurites), are sensitive to cell counts, e.g. a reduced cell density will change the phenotypic readout even when cells are not treated with drugs. Therefore, the second part of NFD includes Avg_somaArea, Avg_neurite-Length, Avg_neuriteArea, Avg_attachmentPoint#, Avg_-endingPoint#, and Avg_branchPoint# which were computed by dividing all of the standard features (except somaCount) by somaCount. The standard descriptors describe global changes of multiple-neuron neuromorphology patterns whereas the normalized values approximate a local descriptor for each single neuron in the multi-neuron image. Summaries of the analyzed descriptors are listed in Table 3.

Generic Feature Descriptor (GFD)
The Generic feature descriptor (GFD) consists of several well-known image descriptors which have been used to classify images in many applications and is included in HCS bioimage tools [24][25][26][27]. A summary of these features including references and parameters is given in Table 4. However, these descriptors are not popular in the research of neural image analysis (see Table 1) and have never been proposed as tools to investigate neuronal phenotypic changes. Part of the reason may be due to difficulty of implementation. For example, contour-based descriptors such as the fourier transform and bending energy [28] require the development of neuronal contours which is a difficult task due to the intersection of spanning arbors of neurons. These contours then require additional singleneuron segmentation to process the multi-neuron images. Thus, in this study, we focused on methods that could easily be implemented without the prerequisite of the segmentation process and readily describe characteristic of multiple-neuron neuromorphology. Effective shape descriptors based on orthogonal polynomial moments include Zernike Moment [29], Legendre Moment [30] and Tchebichef Moment [31] which are extracted with respect to 2 nd , 4 th , 8 th and 16 th moment orders. In additional to moment based descriptors, the Generic Fourier method [32] which is another shape descriptor based on the Fourier transform of polar coordinates has also been implemented. All shape-based GFDs are calculated using binary images generated from the preprocessing procedure of NeurphologyJ.
Additional outstanding texture-based descriptors consisting of Haralick, Gabor and Daubechies are also designed. The Haralick descriptor is based on a spatial relationship evaluated from a gray level co-occurrence matrix (GLCM) of gray-scaled images [33]. In [34] Gabor and Daubechies successfully used this feature to predict protein location. The Gabor descriptor uses a mean and standard deviation of gray-scaled image convoluted by the Gabor filter [34]. Daubechies is an averaged energy value calculated from the 10 levels of wavelet decomposition of gray-scaled image using the Daubechies4 wavelet function [34].

Statistical analysis -ANOVA and regression
The six sets of 36 multi-neuron images generated using different nocodazole drug concentrations were statistically analyzed using 1-way ANOVA and regression analyses to determine whether there were any statistically significant differences amongst different drug levels for each of the thirteen tested features. The statistical analysis is performed using the SAS/STAT software, Version 4.3 (4.3.0.12251) of the SAS System for Windows.
Each of the 13 features was tested for normality and equal variances (homoscedasticity) using the Bartlett's test in SAS/STAT. Those features which were found to satisfy the assumptions of the standard 1-way ANOVA were then analyzed with a post-hoc test using the Tukey honestly significant difference (HSD) test. Those features that were found to be heteroscedastic with significant differences between the within group variances were tested in SAS with the Welch's variance-weighted ANOVA before then being further analyzed with the Tukey HSD post-hoc test.
Those features which were found to have significant variation between groups were then further tested with regression analyses. For each of these features, linear and quadratic regression models were tested to determine the best fit for the changing Nocodazole concentration relationship. The linear model was chosen if there was no significant improvement with the addition of the quadratic term [35] otherwise the quadratic regression model was chosen. Briefly, the amount of between group variations not explained by the linear model is compared to the increase in explained variation found using the quadratic model. If this increase is more than 5% of the total not explained remaining variations  then the quadratic model provides a significantly improved fit to the data. In all cases the total between group variation explained by the optimal regression model was then compared to the total between group variation from the ANOVA. If the regression model explained greater than 90% of the total variation, we considered it a successful model.

Inheritable bi-objective genetic algorithm
The bi-objective 0/1 combinatorial optimization problem for feature selection has two objectives: minimizing the number of selected informative features and maximizing classification accuracy. The inheritable bi-objective combinatorial genetic algorithm (IBCGA) is an efficient feature selection method based an intelligent genetic algorithm (IGA) [36]. To efficiently solve the combinatorial optimization problem C(n,m) where n is number of candidate features, IGA uses an intelligent crossover operation based on orthogonal experiment designed to efficiently explore the possible solution of the combinatorial problem. IBCGA can efficiently explore the possible solutions to C(n, r±1) by inheriting a good solution to C(n, r) [18]. This inheritable mechanism also allows to economically choose the feature set for improving predication accuracy. The IBCGA encodes features in the descriptors as binary genes for feature selection and encodes parameters of support vector machine (SVM) in using the IGA chromosome. The IGA chromosome consists of n features bits (n is feature number) and two 4-bit IGAgenes to tune the parameters C and γ of SVM. One feature is included in the SVM classifier if the encoded value for the gene is equal to 1. For tuning the SVM parameters, the 16 encoded values of γ and C are belonging to {2 -7 , 2 -6 , ..., 2 7 , 2 8 }.
The IBCGA procedure can be briefly summarized as follows [18]: Step1) Randomly generate an initial population of individuals using r = R start . In this study, the size of the candidate feature set selected by the IBCGA was ranged from R start = 13 to R end = 1 corresponding to number of features in NFD. Eventually, the best solution in terms of 10-CV classification accuracy is selected as a final solution.

Performance evaluation of multi-neuron image descriptor
We hypothesize that the phenotypic changes in multiple neuron systems can be observed in two general ways consisting of neural dependent and neural independent mechanisms via NFD and GFDs, respectively. The classification approach is familiarly applied as an important tool for characterizing each phenotypic change of a single neuron [37][38][39]. Thus, the suitable multi-neuron image descriptor that is highly correlated to the variation of 6 dosages of NDC must be efficient to use as a training set for constructing the classifier for multi-neuron image input. The performance comparison between NFD and GFDs is evaluated by the prediction accuracy of independent test. Moreover, the discrimination power of utilizing single morphology and multiple morphologies as the classifiers are also examined. We used SVM which is a wellknown efficient classification method for our multi-neuron image classification because of its successful use in numerous image classification studies [40][41][42][43][44]. The wide spread usage of the LibSVM tool also encouraged us to utilize it in all of our experiments [19]. An SVM with a radial basis function kernel was used to create the classifier using Noco144 as a training set. A grid search technique was then used to select for the proper values of the SVM parameters while maximizing the classification accuracy of 10-CV. Then, the Noco72 data set was used as an independent test set to assess the generated classifier performance.

Images preprocessing
Our original raw images were gray-scaled taken using fluorescence microscopy. We applied a preprocessing technique in the same way as reported previously in NeurphologyJ [3]. The four input parameters consisting of contrast, soma intensity, neurite width and particle cleanup value were set to 13, 288, 5 and 15, respectively. Examples of the original images from the six classes are shown in Figure 1.
After background removal and isolated object elimination, the resulting gray-scaled images of class 1 representing the wild type are displayed in full size (Figure 2a Figure 4. This result shows that most of the NFD especially neurite-related features have an inverse relation to increasing NDC. Remarkably, the soma area related features tend to have a correlated relationship with NDC. These results provide some insight into the relationships between neuronal morphology and nocodazole concentration. However, to obtain the quantitative relationship between them, stronger statistical tools such as ANOVA and regression analysis are required and described below. For GFDs, only classification analysis is conducted.

ANOVA analysis of NDC affect to NFD
Each of the 13 features was tested for normality and equal variances (homoscedasticity) using the Bartlett's test in SAS/STAT. Of the 13 features, only somaCount and somaArea were found that they do not have significant difference within group variances or depart from normality. These two features were therefore tested with the standard homoscedastic ANOVA analysis to detect the presence of significant differences between the group means and were then post-hoc tested using the Tukey honestly significant difference (HSD) test with SAS/STAT.
The other 11 features were found to have significant differences between the within group variances and therefore were tested in SAS with the Welch's variance-weighted ANOVA before being tested with a post-hoc test using the Tukey HSD test. The mean values of each nocodazole concentration for each of the 13 features are shown in Table 5 with the differences between the group means shown by way of the standard superscript letters designating means which do not differ at the 95% significance level. The second column of Table 5 displays the ANOVA significance R 2 values which for 215 total degrees of freedom is significant if R 2 > 11.2%. In particular, the soma count which was used to normalize six of the other features exhibited a significant increase in the group means for the 50 ng/mL concentration over the untreated soma count. This feature then showed significant decreases in the soma count for further increasing nocodazole concentrations up to 1000 ng/mL. The soma area values had a similar appearing quadratic relationship for increasing concentrations of nocodazole with an initial increase in area values for concentrations up to 50 ng/mL and then a significant decrease for the 1000 ng/mL concentration back to untreated levels. Only these two soma features and the attachment point number feature exhibited obvious quadratic relationships, whereas the other 10 features generally seemed to uniformly increase or decrease with increasing nocodazole concentrations.
For nine of these features, the means significantly decreased with increasing nocodazole concentrations. Eight of these features were pairs of the normal and average features: neurite length, neurite area, ending point number and branch point number. The last one feature with monotonically decreasing average values was the average attachment point number. The reduction in neurite length correlates nicely with the known effect of nocodazole in causing neurite retraction [23]. The decreasing amount of branch points and end points with increasing nocodazole concentration also indicate a nocodazole-dependent degradation in neurite complexity. The reduction in average neurite area with increasing nocodazole concentration also shows the further effect of neurite retraction on the morphological properties of neurites.
The last feature showed an increase in average soma area while the nocodazole concentration was increasing. After several images were manually inspected, we found that HCS-Neurons presumed that these somata are regarded as a single soma. The scenario reveals that the somata from several neurons tend to cluster together at high nocodazole concentration.

Regression analysis of NDC affect to NFD
In the ANOVA analyses described above, significant differences were found for each of the 13 features with the observed patterns showing generally linear changes with respect to NDC with a few obvious cases showing a quadratic relationship. Therefore we chose the linear and quadratic regression models to find the optimal relationships between increasing nocodazole concentrations and each feature.
To assess the quality of each model we determined the R 2 values for the amount of between group variations explained by the two regression models. As is well known, the R 2 values for quadratic regression will always be higher than those for linear regression models, so we tested whether the improvement in the quadratic model R 2 value was significantly above that of the linear model [35]. Table 6 summarizes these quantities for each of the 13 features with the first column showing the total between group variations for each feature (these R 2 values are also shown in Table 5 but they are duplicated here for ease of interpretation). The next two columns display the R 2 values for the linear and quadratic regression models respectively. The fourth column in Table 6 shows how much of the overall between group variation is accounted for by the optimal regression model. The optimal regression models explain over 95% up to 99.7% for every feature except Soma area which only had 72.7% of the between group variation explained by the quadratic model. This column shows that the optimal regression models for increasing NDC values are very well described by simple linear and quadratic models. For the soma area we did try fitting the data to a third order polynomial which yielded an R 2 value of 61.5% and accounted for 98.7 of the between group variation. The shape of the third order polynomial looked very quadratic with a local minimum near zero and a sharp maximum near NDC values around 170 ng/mL.   The final column in Table 6 shows the regression equation for the optimal linear or quadratic model which is also shown in bold in the appropriate regression model R 2 column.

Regression analysis of nocodazole drug concentrations on soma count
Since a very strong effect was found for nocodazole drug concentration on somaCount, and that the value of somaCount showed a strong quadratic relationship with increasing NDC up to 50 ng/mL and then decreased with further NDC increases the fact that the optimal quadratic regression model explained 95.1% of the between group variation was not unexpected. The results of this regression analysis are shown in Figure 5 where the best fit line is shown in black and the 95% confidence intervals of the mean values are shown as dotted lines. This regression analysis found that R 2 = 62.7% of the variation for this feature was described by the line Y = (-35.3 ± 2.2)X 2 + (80.4 ± 6.9)X + (219.3 ± 5.0) where Y is the somaCount value and × is the Log (NDC) value. In terms of the total variation described by this best fit line, (62.7/65.9) = 95.1% of all between  The R 2 ANOVA column shows the total amount of between group variation for each feature. The next two columns show how much of the total variation is explained by a linear model or a quadratic model. If the quadratic model explains more than 5% of the remaining between group variation over the linear model that model is chosen otherwise the quadratic model doesn't improve the fit significantly enough to justify the additional complexity. Essentially, if using the quadratic model increases the R 2 fit by more than 0.5% over the linear model we choose that one [35] The variance explained column shows the total percentage of the between group variation explained by the chosen model. The final column shows the equation for the best fit model for each feature.
group variation is accounted for by this quadratic regression line. The somaCount residuals were also found to not vary significantly from a normal distribution (p = 0.085). The feature somaCount satisfied all of the assumptions of both 1-way ANOVA and regression and was very well described by the quadratic dependence model on Log(NDC). These results are summarized in Table 6 which also shows the quadratic model R 2 value, the quadratic equation and the percent of between group variation explained by the optimal quadratic model.

Classification analysis Performance comparison of each individual NFD feature
We separately applied each of the 13 specified features to evaluate the SVM prediction results from Noco72 by using Noco144 as a training data set. The optimal values of C and γ were then determined using a grid search. From the summary results listed in Table 7 Every soma-related feature showed a lack of discriminating power in this research. These results can be interpreted as showing that the nocodazole drug concentration affected both of the length and area of the neurites thereby achieving the highest accuracies. Figure 4 shows the Pearson's correlation value between each feature and the nocodazole drug concentrations. Most of the neuritebased features showed strong inverse relationship with nocodazole concentration, indicating that nocodazole exhibited strong negative effect on neurite development. According to the phenotypic changes of soma clustering, the increasing of nocodazole concentration also affected Figure 5 The quadratic relationship between somaCount and the logarithm of nocodazole concentration. The variation described by this regression accounts for 95.1% of all of the between group variation for somaCount. the average soma size since the feature Avg_somaArea had a correlation value of 0.86. We noted that this feature is also referred to as the averaged total area of adjacent somata described in previous section.

Performance comparison between NFD and GFD
The SVM analysis was used to run the descriptor assessment in the same way as described in the previous section. As shown in Table 8 Although our descriptor, the moment-based descriptors and the Generic Fourier descriptor utilize the same binary information to compute image features, the moment based descriptors and Generic Fourier descriptor cannot obtain high prediction accuracy because they were designed to handle shape information of single objects. In contrast, our dataset does not contain explicit shape information due to the many neurons distributed in each of the single images. Thus, all texture-based descriptors outperform the more restrictive explicit shape descriptors.
The Haralick descriptor is able to capture global information from each image including a measure of randomness, contrast and variance. This global approach is quite different from our multiple neuron based approach, but it does characterize global properties of the entire system which is a similar approach to our method and which enables it to achieve a high classification performance similar to our results. Furthermore, these efficient results of GFDs using gray-scaled information demonstrated the presence of intensity changes multi-neuron images according to NDC variations. This result states that Haralick descriptor and NFD provide the same classification performance corresponding to phenotypic changes of neuron by varies NDC. However, the numbers of features in these two descriptors are hugely different, 180 and 13, respectively. Therefore, in conclusion, NFD is the best descriptor for phenotypic change of neuron due to its high classification performance and minimal number of features.

Optimal NFD features selected by IBCGA
According to the best choice in this HCS is NFD, IBCGA is applied to find the optimal solution to classify NDC using this descriptor. The 10-CV accuracy was evaluated using the IBCGA procedure to find suitable features and SVM parameters. This IBCGA procedure was executed 30 times to cope with the robustness problem of GA and produce 30 solutions. Figure 6 shows the number of times for each feature selected in the 30 independent runs. The features neur-iteArea, Avg_neuriteArea and Avg_neuriteLength were selected with the largest number of times. The feature neuriteLength was not selected possibly due to redundant information with neuriteArea as mentioned in the previous section. This means either NeuriteArea or NeuriteLength can be substituted by each other. The solution with the highest prediction accuracy also contains these 3 features. Thus, the best SVM classifier is constructed by these 3 features with the SVM parameters, C = 2 and gamma = 16.
This feature set was able to achieve accuracies of 90.28% and 91.67% for the test and training sets, respectively. The averaged performances for all of the solutions chosen by the IBCGA algorithm were 89.31% and 91.30% for the test and training sets, respectively. This result shows that the performance of the proposed multi-neuron image descriptor is stable with respect to the choice of feature set. To further evaluate the quality of our results, we constructed a confusion matrix from our prediction results which is shown in Figure 7. Interestingly, all of the locations where the prediction results were incorrect were located in the classes that were immediately adjacent to the correct class. This makes sense since our class definitions are based on increasing drug concentrations.

Multiple neuron phenotypic changes response to NDC
The phenotypic transition of multi-neuron images detected by HCS-neurons is categorized into neuronal morphology alteration and pixel intensity alteration. In this section, the NFD describing neuronal morphology is in focus according to its optimal performance. We used the IBCGA method to separate out neuron-related features having the highest NDC discrimination power. The neuron morphology features identified by IBCGA were neuriteArea, Avg_neuriteLength and Avg_neuriteArea. The individual classification performances of these features were found to be more than 70%. From our regression analysis, we found significant decreases with increasing NDC levels for these 3 features. In particular, each of these features showed strongly decreasing values as NDC increased with reduced effect near the lower and upper NDC values. Although all three features showed essentially linear decreases with increasing log(NDC)  values the strongest affects were for NDC values in the 50-200 ng/mL range. In the previous section, the high correlation between Avg_somaArea and NDC was mentioned to have a simple linear relationship. The average soma area linearly increased with increased log(NDC) values while the average neurite area strongly decreased. While the inverse correlation between neurite length (or neurite area) and NDC is consistent with nocodazole's effect on inducing neurite retraction, the linear correlation between soma area and NDC has not been reported before. These data indicated that in addition to inducing neurite retraction, nocodazole might promote soma clustering at high concentration. It will be of great interest to examine this additional effect of nocodazole using other neurons.
For GFD morphology alteration, we found the strong evidence for intense differentiation according to NDC. However, detailed analysis to extract all of the interpretable information from these features can still be expanded. We plan to further explore the relationships between GFD and neuron development. In conclusion, there are many strong results from the statistical analysis and classification analysis for the interpretation of multi-neuron images. Therefore, the combination of these two approaches provides a powerful set of tools to generate useful information for neuroscientists to understand neuron modification response to drug treatment.

Conclusions
In this paper we propose a complete high-content screening analysis method HCS-neuron for multiple neuron phynotypic modification response to different nocodazole concentration. Our extended version of multi-neuron image descriptor achieved prediction accuracies of 86.11%. We then used the IBCGA method to find the optimal feature set which resulted in an increase in prediction accuracy to 90.28%. The optimal set of features for this problem was found to be neuriteArea (Neurite Area), Avg_neuriteArea (Average Neurite Area) and Avg_neuri-teLength (Average Neurite Length). Our quantification analysis also found that there were statistically significant changes in these descriptors which vary in exactly the way nocodazole is known to affect neurite growth. The intensity alteration also demonstrated by the high discrimination power of texture based generic descriptor i.e Haralick (86.11%), Gabor Filter (81.94%) and Daubechies4 wavelet decomposition (75%). However, the detail analysis is still hard to interpret at present.
The proposed HCS-neuron can extend HCS with singleneuron images to that with multi-neuron images and help improve the statistical significance of such results and leverage the strengths of high-throughput analysis to the understanding of neuron research. The proposed HCS-neurons is helpful to identify substances such as small compounds or RNAi molecules that can alter the morphological phenotype of an entire population of neurons using HCS. In addition to accomplishing this, our program and datasets are all available for download. Interestingly, we discovered a previously unknown effect of nocodazole on soma clustering. These results demonstrate effectiveness of the proposed quantitative analysis on the morpgological features from images containing multiple neurons.