Novel image markers for non-small cell lung cancer classification and survival prediction
© Wang et al.; licensee BioMed Central Ltd. 2014
Received: 3 February 2014
Accepted: 14 August 2014
Published: 19 September 2014
Non-small cell lung cancer (NSCLC), the most common type of lung cancer, is one of serious diseases causing death for both men and women. Computer-aided diagnosis and survival prediction of NSCLC, is of great importance in providing assistance to diagnosis and personalize therapy planning for lung cancer patients.
In this paper we have proposed an integrated framework for NSCLC computer-aided diagnosis and survival analysis using novel image markers. The entire biomedical imaging informatics framework consists of cell detection, segmentation, classification, discovery of image markers, and survival analysis. A robust seed detection-guided cell segmentation algorithm is proposed to accurately segment each individual cell in digital images. Based on cell segmentation results, a set of extensive cellular morphological features are extracted using efficient feature descriptors. Next, eight different classification techniques that can handle high-dimensional data have been evaluated and then compared for computer-aided diagnosis. The results show that the random forest and adaboost offer the best classification performance for NSCLC. Finally, a Cox proportional hazards model is fitted by component-wise likelihood based boosting. Significant image markers have been discovered using the bootstrap analysis and the survival prediction performance of the model is also evaluated.
The proposed model have been applied to a lung cancer dataset that contains 122 cases with complete clinical information. The classification performance exhibits high correlations between the discovered image markers and the subtypes of NSCLC. The survival analysis demonstrates strong prediction power of the statistical model built from the discovered image markers.
Lung cancer is one of the most frequent cancers worldwide. Similar to breast cancer in female, lung cancer is the leading cancer in males, with 17% of the total new cancer cases and 23% of the total cancer deaths. The prognosis of lung cancer is still poor, with five-year survival rate of approximately 10% in most countries. Lung cancer can be classified as small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). NSCLC accounts for the majority (84%) of lung cancer . Two major types of NSCLC are adenocarcinoma (including bronchi alveolar carcinoma) representing about 40% and squamous cell carcinoma representing about 25–30% . Accurate classification and survival analysis can provide assistance for personalized treatment planning and prognosis.
Recently, there are much active research in imaging informatics [4–12]. Before computer-aided lung cancer diagnosis and survival analysis, usually accurate image segmentation [13, 14] is a prerequisite. Sometimes the explicit segmentation may not be required for the applications when the tumor microenvironment is critical for tumor classification; however, in our study, we find that explicit cell localization and cellular features are important for NSCLC classification and survival analysis.
Because crowding and overlapping cancer cells often present significant challenges for most traditional automatic segmentation methods. A vast variety of algorithms based on watershed and its variants [15–17], graph cut [18, 19], and active contour models [20–22] have been proposed. However, none of these methods could robustly handle touching cell segmentation challenges exhibited in lung cancer images. Lu et al.  has proposed a supervised learning-based segmentation algorithm to support new image features extraction and polyp detection on CT images, and a flexible, hierarchical feature learning framework integrating different levels of discriminative and descriptive information is presented in . Supervised learning is a potential approach to tackle these challenges, but it requires a lot of labeled training data provided by experienced pathologists. For computer-aided classification, genetic algorithms (GAs) and support vector machines (SVMs) have been combined for multi-class cancer identification based on microarray dataset . Partial least square regression (PLSR) and support vector machine with recursive feature elimination (SVM-RFE) have been applied to lung cancer subtype classification . In , the lung cancer image classification is modeled as a multi-class multi-instance learning problem, and an adaboost algorithm has been used to perform classification with a bag of feature model. None of these studies correlated image features with the patient survival information.
Survival analysis is related to death in biological organisms and failure in mechanical systems. Several commonly used survival analysis methods are the Kaplan-Meier method for estimating the survival function , the log-rank test for comparing the equality of two or more survival distributions, and the Cox proportional hazards (PH) model for examining the covariate effects on the hazard function . In survival analysis, one important issue that needs to be considered is censoring problem (subjects are censored if they are not followed up or if the study ends before they die or have an outcome of interest). Cox proportional hazards model  is one of the most commonly used multivariate approaches to analyze the survival time data in medical research. It is a semi-parametric method that does not need a specific baseline hazard function and has the capability to effectively handle censoring problem. In other words, it is not necessary to specify a survival distribution to model the effect of the explanatory variables on the duration variable.
In survival analysis, researchers also considers clinical factors or other environmental information [29, 31–33]. For example, standard Cox proportional hazards survival model with a spatial random effect extension has been applied and proved that the long-term exposure to combustion-related fine particulate air pollution is an important environmental risk factor for cardiopulmonary and lung cancer mortality . Gene signature expressions have also been used as covariates to conduct survival analysis [34–39] to search for pairs of genes (biomarkers) that are significantly related to patient death.
In this paper, we will present an integrated framework that investigates the prognostic effects of image markers. First, a novel seed detection-based repulsive deformable model is proposed to separate touching cells; secondly, a set of geometry, pixel intensity, and image texture features are extracted to describe cellular morphological properties; thirdly, eight different classification techniques are comparatively analyzed for computer-aided diagnoses of NSCLC; finally, survival analysis is conducted based on a Cox model fitted by component-wise likelihood based boosting. The entire system is designed to assist doctors for more objective and accurate diagnoses and prognoses of NSCLC. Unlike gene sequencing, histopathological slides are always available for each lung cancer patient in routing clinical diagnosis, and therefore the adoption of the developed prediction model does not require any changes to current clinical practice.
The experiments in the paper are conducted using the adenocarcinoma and squamous cell carcinoma lung cancer images downloaded from the TCGA Data Portal. TCGA (The Cancer Genome Atlas) is a collection of cancer specimens, with additional clinical information about participants, metadata about the samples, histopathology slide images from sample portions and molecular information derived from the samples. It is supervised by National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) and freely available to researchers.
Cell detection and segmentation
where C1 is the normalized constant, S represents the set of all voting points, A c (m,n) denotes the cone-shape voting area with vertex (m,n) and scale c. The voting area at each scale is defined as the radial range (r min ,r max ) and angular range Δ, (θ is the angle of the gradient direction with respect to x axis) is the mean of the Gaussian kernel. Σ=σ2I2 (I2 is the identity matrix) is the covariance matrix. is the indicator function: 1 for (x,y)∈A c (m,n) and 0 otherwise. g D(x,y,σ) represents the Euclidean distance map. After the confidence map V(x,y) is calculated, we remove those points with relatively lower voting values, which locate near the cell boundaries. In order to achieve a robust seed detection, we apply mean shift  to locate the final positions of cell seeds.
Using the boundaries of detected cell seeds as initializations, a novel repulsive balloon snake (RBS) algorithm based on a deformable model  is used to seek the cell boundaries. RBS is a parametric model which can naturally preserve cell topologies and prevent contours from splitting or merging with one another.
where n(s) represents the normal vector (pressure force) to the curve at the specific point on v(s) and ∇E ext (v(s)) is defined as image force, where E ext (v(s))=-||∇I(v(s))||2 (I(v(s)) is the original image). γ and λ are the weighting parameters controlling pressure force and image force, respectively.
Balloon snake (BS) model can not be directly used for touching object segmentation. If all balloon snakes move independently, they will cross with one another. Based on these observations, we introduce an interactive scheme to form a RBS model for touching cell segmentation. The intrinsic idea of RBS is based on the following: the cell contour should be driven by its own forces as well as extrinsic forces from other deformable contours; both amplitudes and directions of these extrinsic forces should vary with respect to the distance between snakes. When two snakes are far away, their movements should be dominantly controlled by their own driven forces (internal and external forces); when they get closer, each snake should receive repulsive forces from all other adjacent snakes. As a result, the extrinsic force will prohibit snakes from crossing or merging.
where d ij (s,t)=||v i (s)-v j (t)||2 is the Euclidean distance between v i (s) and v j (t). f(x)>0,x∈(0,+∞), represents a monotonic decreasing function (f(x)=x-2 in our case), and ω weights the repulsive force. For a specific point v i (s), the closer it moves to other snakes, the more repulsive forces it will receive. Unlike the original balloon snake, RBS moves contours under the influence of their own driven forces and extrinsic repulsive forces from other snakes. When these two types of forces achieve a balance, snakes stop evolving.
The image features and their descriptions
Cell area feature
Major-minor axis ratio
Cell circularity feature
Contour perimeter feature
Contour solidity feature
Cell intensity mean feature
Cell intensity stand deviation feature
Cell intensity Kurtosis
Cell intensity entropy
Cell intensity energy
Cell intensity contrast
Cell intensity correlation
Cell intensity energy from co-occurrence matrix
Cell intensity homogeneity
Cell intensity skewness
Texture feature coding method (TFCM)
Center symmetric auto correlation (CSAC) feature
Local binary pattern (LBP) feature
Texton histogram feature
Group 1: Geometry Features. Five geometry features are calculated for each segmented lung cancer cell, including area A cell , contour perimeter P cell , circularity , major-minor axis ratio, and contour solidity that is defined as the ratio of cell area region over the convex hull defined by the cell boundary.
Group 2: Pixel Intensity Statistics. This group of features are calculated based on the pixels in the segmented cells, including intensity mean, standard deviation, skewness, kurtosis, entropy, and energy. We use Lab color space for better color representation.
Group 3: Texture Features: This group of features contains co-occurrence matrix , local binary pattern (LBP) , texture feature coding method (TFCM) , center symmetric auto-correlation (CSAC) , and texton features . The co-occurrence matrix  is an estimation of the joint probability distribution of intensity of two neighboring pixels. LBP  is a measure of local textures. Each pixel in the input image patch is assigned a binary code by comparing the intensity of this pixel to those of its neighbors. Similar to LBP, in TFCM , each pixel is assigned a texture feature number (TFN). The TFN of one pixel is generated by comparing this pixel with its neighbors in four directions: 0°, 45°, 90°, and 135°. A histogram is calculated based on the TFNs of one image patch. CSAC is a measure of the local patterns with symmetrical forms. We calculated a series of local auto-correlation and covariance introduced in , including symmetric texture covariance (SCOV) and variance (SVR), and within-pair variance (WVAR) and between-pari variance (BVAR). 3×3 pixel unit of each channel is considered for CSAC feature. Texton  is a discriminative texture representation. The calculation of texton feature is based on unsupervised learning. We randomly picked some cells in each image as training samples. These cell patches are filtered by texton filter bank. K-means clustering is then applied and the centers of the clusters are defined as textons. To generate the texton histogram for a new image, the image is first filtered by the same texton filter bank, then each pixel is assigned to one texton to build the final texton histogram.
After calculating the aforementioned image features, we first perform the NSCLC subtype classification. In this stage, several conventional machine learning methods and recently published state-of-the-art algorithms that can handle high dimensional data are compared, which include multiple support vector machine recursive feature extraction (MSVM-RFE) algorithm , L1 penalized logistic regression [48, 49], random forest [50, 51], naive Bayesian [52, 53], adaboost [54, 55], sparse coding spatial pyramid matching (ScSPM) alogrithm , locality-constrained linear coding (LLC) , and nearest class mean (NCM) classifier . MSVM-RFE is an iterative feature selection method that uses a backward elimination procedure. Resampling scheme is introduced to stabilize the feature rankings. At each iteration, the feature ranking score is computed based on the weight vectors of multiple linear SVMs trained on subsamples of the original training data and the feature with the smallest ranking score is removed from the model. L1 penalized logistic regression provides an efficient lasso regularization path for logistic regression, which enables feature shrinkage and selection for high dimensional data. Random forest is an ensemble learning method for classification, which can generate a score by permutation to rank the importance of variables in classification problem. Naive Bayesian classifier is a simple probabilistic classifier based on the Bayes theorem with naive feature conditional independence assumptions. The adaboost algorithm employs the idea of sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of a ensemble of weak classifiers. Adaboost can provide an importance score for each weak classifier that corresponds to one selected feature. ScSPM is an extension of spatial pyramid matching  and the algorithm uses selective sparse coding followed by multi-scale spatial max pooling and SVM. LLC is another feature representation method that applies locality constraint to project each feature into a sparse code. NCM is a distance-based classification with projecting the features into a low-dimensional space for classification. These three recent algorithms have already made remarkable successes on a range of nature image classification benchmarks.
Before survival analysis, dimension reduction is a widely used approach to avoid the “curse of dimensionality”. Common examples of linear dimension reduction methods, such as principal component analysis (PCA), are proposed to minimize the variances. Meanwhile, least absolute shrinkage and selection operator (LASSO)  method is another classic method of feature shrinkage and selection for regression that can potentially handle high dimensional data. Least angle regression (LARS) is proposed for variable selection in the linear regression setting for high dimensional data . The LARS selects predictors by its current correlation or angle with the response, where the correlation is defined as the co-correlation between the predictor and the current residuals. Moreover, elastic net is proposed as a new regularization and variable selection method for feature selection . Boosting is another widely used feature selection approach. It applies the idea of fitting an ensemble of weak learners to the data. Furthermore, component-wise boosting has been proposed to estimate the model with intrinsic variable selection . The term component-wise means each base learner only consists of linear function of one component (variable). For each covariate, a base-learner is specified and only the best base-learner is updated in each boosting step. Finally only part of base learners are chosen to ensemble the strong classifier when the optimal boosting iteration is reached. The algorithm can generate a strong classifier and a sparse set of selected features.
For high dimensional data, penalized regression methods like LASSO , ridge regression , would add a penalization term into the partial log likelihood function and the penalized partial likelihood is maximized by techniques such as quadratic programming.
where I p is a diagonal matrix to penalize each covariate separately, with diagonal elements equal to 1 for each candidate and 0 for the rest corresponding to penalization and no-penalization. The candidate covariate that can best improve the overall fitting will be selected for updating. As the number of boosting steps increases, more feature variables will be selected and chosen with respect to their relevances in predicting survival rates. The result is expected to be sparse with many coefficients equal to zero. The coefficient paths of component-wise boosting are expected to be more stable than LASSO based approaches . In addition, it has two major advantages over LASSO: 1) it allows for unpenalized mandatory covariates; 2) it can handle correlated covariates by including pathway information .
Cell detection and segmentation
The pixel-wise seed detection accuracy compared with ground truth
The performance of segmentation measured by precision and recall
The average recall, precision and accuracy of NSCLC classification
The top 10 features selected by adaboost are 4 lbp features, 3 solidity features, and area3, kurt3 and peri3 features. Among the top 30 features, lbp, area, solidity, axis, tfcm, energy, correlation and contrast are most commonly selected image features. Peri, kurt, std and circularity all have one feature been selected. The ranking suggests that the image texture features and geometric features are representative markers to distinguish between two subtypes of NSCLC: Adenocarcinoma (AC) and Squamous cell carcinoma (SCC). The results also indicate that 1) there are more elongated cells for SCC than AC; 2) AC usually has a relatively larger intensity variation inside cells than SCC; 3) SCC cells are often over-stained and exhibit more clear boundaries; 4) AC cells usually exhibit more inhomogeneous texture than SCC.
After the prediction model training procedure, we have employed the time dependent ROC curves for uncensored data and AUC as criteria to select the best thresholds for risk scores and assess how well the model predicts patients’ survival outcome. At time t, larger AUC indicates better predictability of time to event measured by sensitivity and specificity. After classifying patients into low- and high-risk groups, we can estimate and compare their Kaplan-Meier survival curves. The performance of such a binary classifier is generally evaluated in terms of the overall predictive accuracy.
(TCGA NSCLC testing data n = 57) Multivariate Cox proportional hazards analysis of all clinical covariates without the image feature related risk score
Tumor stage II
Tumor stage III & IV
(TCGA NSCLC testing data n = 57) Multivariate cox proportional analysis of all clinical covariates and image feature related risk score
Tumor stage II
Tumor stage III & IV
To measure the robustness of the feature selection, we have conducted the bootstrap analysis. We have resampled the whole dataset 5000 times with replacement, performed the boosting feature selection procedure on each sample and counted the frequency of selecting one specific feature variable. The top 10 most frequently selected image markers are: peri6, homo3, homo4, homo5, skew6, lbp5, lbp16, csac6, csac15 and tfcm18. Among the top 10 image features that are most highly associated with survival, 4 are pixel intensity features, 5 are image texture features and only 1 belongs to geometric feature. Moreover, 4 out of the 7 significant features previously selected in the training set are from the top 11 features in bootstrap analysis on the whole set, which shows good consistence of the proposed algorithm.
Discussion and conclusions
In this paper, we have investigated novel image markers for both computer aided diagnosis and prognosis of non-small cell lung cancer. We propose an integrated framework that consists of cell detection, segmentation, feature extraction, classification, discovery of image markers, and survival analysis for NSCLC. A multi-scale distance map-based voting algorithm is first introduced to localize individual cells, and a repulsive deformable model is proposed to segment the cells using the previous detection results as initializations. A complete set of cellular features are extracted, and several advanced classification techniques are compared using the image markers calculated in previous steps. Finally, a Cox model fitted with component-wise likelihood based boosting is applied and several survival analysis approaches have been conducted to evaluate the discovered image features. Through extensive experiments, we have found a set of diagnostic image markers that are highly correlated to NSCLC subtype classification. In addition, we have also discovered a set of prognostic image markers (majorly representing image staining characteristics and inhomogeneity inside the nuclei of cancer cells) to predict NSCLC patients’ survival. We statistically prove that the developed comprehensive image marker related risk score is a strong predictor for patients’ survival than traditional clinical factors. Together with clinical information, it provides significant clinical values for patients’ prognosis.
This research is funded by an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant number 2 P20 GM103436-14. The project is also partially supported by the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant UL1TR000117 (or TL1 TR000115 or KL2 TR000116). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
- Detterbeck FC, Boffa DJ, Tanoue LT: The new lung cancer staging system. CHEST J. 2009, 136 (1): 260-271.View ArticleGoogle Scholar
- Anagnostou VK, Dimou AT, Botsis T, Killiam EJ, Gustavson MD, Homer RJ, Boffa D, Zolota V, Dougenis D, Tanoue L, Gettinger SN, Detterbeck FC, Syrigos KN, Bepler G, Rimm DL: Molecular classification of nonsmall cell lung cancer using a 4-protein quantitative assay. Cancer. 2012, 118 (6): 1607-1618.View ArticlePubMedGoogle Scholar
- Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B: Histopathological image analysis: a review. IEEE Rev Biomed Eng. 2009, 2: 147-171.View ArticlePubMed CentralPubMedGoogle Scholar
- Caicedo JC, González FA, Romero E: Content-based histopathology image retrieval using a kernel-based semantic annotation framework. J Biomed Inform. 2011, 44 (4): 519-528.View ArticlePubMedGoogle Scholar
- Díaz G, González FA, Romero E: A semi-automatic method for quantification and classification of erythrocytes infected with malaria parasites in microscopic images. J Biomed Informat. 2009, 42 (2): 296-307.View ArticleGoogle Scholar
- Mazurowski MA, Lo JY, Harrawood BP, Tourassi GD: Mutual information-based template matching scheme for detection of breast masses: from mammography to digital breast tomosynthesis. J Biomed Inform. 2011, 44 (5): 815-823.View ArticlePubMed CentralPubMedGoogle Scholar
- Wei C-H, Li Y, Huang PJ: Mammogram retrieval through machine learning within bi-rads standards. J Biomed Inform. 2011, 44 (4): 607-614.View ArticlePubMedGoogle Scholar
- Kim D, Ramesh BP, Yu H: Automatic figure classification in bioscience literature. J Biomed Inform. 2011, 44 (5): 848-858.View ArticlePubMed CentralPubMedGoogle Scholar
- Wang X, Zheng B, Li S, Mulvihill JJ, Wood MC, Liu H: Automated classification of metaphase chromosomes: optimization of an adaptive computerized scheme. J Biomed Inform. 2009, 42 (1): 22-31.View ArticlePubMed CentralPubMedGoogle Scholar
- Wang J, Zhou X, Li F, Bradley PL, Chang S-F, Perrimon N, Wong ST: An image score inference system for rnai genome-wide screening based on fuzzy mixture regression modeling. J Biomed Inform. 2009, 42 (1): 32-40.View ArticlePubMed CentralPubMedGoogle Scholar
- Kothari S, Phan JH, Stokes TH, Wang MD: Pathology imaging informatics for quantitative analysis of whole-slide images. J Am Med Inform Assoc. 2013, 20: 1099-1108.View ArticlePubMed CentralPubMedGoogle Scholar
- Peng H, Roysam B, Ascoli G: Automated image computing reshapes computational neuroscience. BMC Bioinformatics. 2013, 14: 293-View ArticlePubMed CentralPubMedGoogle Scholar
- Song Y, Cai W, Huang H, Wang Y, Feng D, Chen M: Region-based progressive localization of cell nuclei in microscopic images with data adaptive modeling. BMC Bioinformatics. 2013, 14: 173-View ArticlePubMed CentralPubMedGoogle Scholar
- Zhang W, Feng D, Li R, Chernikov A, Chrisochoides N, Osgood C, Konikoff C, Newfeld S, Kumar S, Ji S: A mesh generation and machine learning framework for drosophila gene expression pattern image analysis. BMC Bioinformatics. 2013, 14: 372-View ArticlePubMed CentralPubMedGoogle Scholar
- Zhou X, Liu K-Y, Bradley P, Perrimon N, Wong ST: Towards automated cellular image segmentation for rnai genome-wide screening. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2005, vol. 3749. 2005, Springer Berlin Heidelberg, 885-892.View ArticleGoogle Scholar
- Cheng J, Rajapakse JC: Segmentation of clustered nuclei with shape markers and marking function. IEEE Trans Biomed Eng. 2009, 56 (3): 741-748.View ArticlePubMedGoogle Scholar
- Yang X, Li H, Zhou X: Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and kalman filter in time-lapse microscopy. IEEE Trans Circ Syst. 2006, 53 (11): 2405-2414.View ArticleGoogle Scholar
- Bernardis E, Yu S: Finding dots: Segmentation as popping out regions from boundaries. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference On. 2010, San Francisco, CA: IEEE, 199-206.View ArticleGoogle Scholar
- Al-Kofahi Y, Lassoued W, Lee W, Roysam B: Improved automatic detection and segmentation of cell nuclei in histopathology images. IEEE Trans Biomed Eng. 2010, 57 (4): 841-852.View ArticlePubMedGoogle Scholar
- Lankton S, Tannenbaum A: Localizing region-based active contours. IEEE Trans Image Process. 2008, 17 (11): 2029-2039.View ArticlePubMed CentralPubMedGoogle Scholar
- Bergeest J-P, Rohr K: Fast globally optimal segmentation of cells in fluorescence microscopy images. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011, vol. 6891. 2011, Springer Berlin Heidelberg, 645-652.View ArticleGoogle Scholar
- Qi X, Xing F, Foran DJ, Yang L: Robust segmentation of overlapping cells in histopathology specimens using parallel seed detection and repulsive level set. IEEE Trans Biomed Eng. 2012, 59 (3): 754-765.View ArticlePubMedGoogle Scholar
- Lu L, Bi J, Wolf M, Salganicoff M: Effective 3D object detection and regression using probabilistic segmentation features in CT images. Computer Vision and Pattern Recognition (CVPR), IEEE Conference On. 2011, Providence, RI: IEEE, 1049-1056.Google Scholar
- Lu L, Devarakota P, Vikal S, Wu D, Zheng Y, Wolf M: Computer aided diagnosis using multilevel image features on large-scale evaluation. Medical Computer Vision. Large Data in Medical Imaging. 2014, Springer International Publishing Switzerland, 161-174.View ArticleGoogle Scholar
- Peng S, Xu Q, Ling XB, Peng X, Du W, Chen L: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett. 2003, 555 (2): 358-362.View ArticlePubMedGoogle Scholar
- Gao L, Li F, Thrall MJ, Yang Y, Xing J, Hammoudi AA, Zhao H, Massoud Y, Cagle PT, Fan Y, Wong KK, Wang Z, Wong ST: On-the-spot lung cancer differential diagnosis by label-free, molecular vibrational imaging and knowledge-based classification. J Biomed Opt. 2011, 16 (9): 096004-096004.View ArticlePubMedGoogle Scholar
- Zhu L, Zhao B, Gao Y: Multi-class multi-instance learning for lung cancer image classification based on bag feature selection. Fuzzy Systems and Knowledge Discovery (FSKD), 2008 IEEE Fifth International Conference On. 2008, 487-492.View ArticleGoogle Scholar
- Kaplan EL, Meier P: Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958, 53 (282): 457-481.View ArticleGoogle Scholar
- Fleming TR, Lin D: Survival analysis in clinical trials: past developments and future directions. Biometrics. 2000, 56 (4): 971-983.View ArticlePubMedGoogle Scholar
- Cox DR: Regression models and life-tables. J Roy Stat Soc B. 1972, 34: 187-220.Google Scholar
- Pope CA, Burnett RT, Thun MJ, Calle EE, Krewski D, Ito K, Thurston GD: Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution.JAMA. 2002, 287 (9): 1132-1141.View ArticlePubMed CentralPubMedGoogle Scholar
- Dockery DW, Pope CA, Xu X, Spengler JD, Ware JH, Fay ME, Ferris BG, Speizer FE: An association between air pollution and mortality in six us cities. N Engl J Med. 1993, 329 (24): 1753-1759.View ArticlePubMedGoogle Scholar
- Bennett S: Analysis of survival data by the proportional odds model. Stat Med. 1983, 2 (2): 273-277.View ArticlePubMedGoogle Scholar
- Miecznikowski J, Wang D, Liu S, Sucheston L, Gold D: Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways. BMC Cancer. 2010, 10 (1): 573-View ArticlePubMed CentralPubMedGoogle Scholar
- Horak E, Klenk N, Leek R, LeJeune S, Smith K, Stuart N, Harris A, Greenall M, Stepniewska K: Angiogenesis, assessed by platelet/endothelial cell adhesion molecule antibodies, as indicator of node metastases and survival in breast cancer. Lancet. 1992, 340 (8828): 1120-1124.View ArticlePubMedGoogle Scholar
- Guo NL, Wan Y-W, Tosun K, Lin H, Msiska Z, Flynn DC, Remick SC, Vallyathan V, Dowlati A, Shi X, Castranova V, Beer DG, Qian Y: Confirmation of gene expression–based prediction of survival in non–small cell lung cancer. Clin Cancer Res. 2008, 14 (24): 8213-8220.View ArticlePubMed CentralPubMedGoogle Scholar
- Shedden K, Taylor JM, Enkemann SA, Tsao M-S, Yeatman TJ, Gerald WL, Eschrich S, Jurisica I, Giordano TJ, Misek DE, Chang AC, Zhu CQ, Strumpf D, Hanash S, Shepherd FA, Ding K, Seymour L, Naoki K, Pennell N, Weir B, Verhaak R, Ladd-Acosta C, Golub T, Gruidl M, Sharma A, Szoke J, Zakowski M, Rusch V, Kris M, Viale A, et al: Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008, 14 (8): 822-827.View ArticlePubMed CentralPubMedGoogle Scholar
- Wan Y-W, Beer DG, Guo NL: Signaling pathway-based identification of extensive prognostic gene signatures for lung adenocarcinoma. Lung Cancer. 2012, 76 (1): 98-105.View ArticlePubMed CentralPubMedGoogle Scholar
- Beer DG, Kardia SL, Huang C-C, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannetton MD, Orringer MB, Hanash S: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002, 8 (8): 816-824.PubMedGoogle Scholar
- Comaniciu D, Meer P: Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell. 2002, 24 (5): 603-619.View ArticleGoogle Scholar
- Cohen LD: On active contour models and balloons. CVGIP: Image Understanding. 1991, 53 (2): 211-218.View ArticleGoogle Scholar
- Haralick RM, Shanmugam K, Dinstein IH: Textural features for image classification. IEEE Trans Syst Man Cybern. 1973, SMC-3 (6): 610-621.View ArticleGoogle Scholar
- Ojala T, Pietikainen M, Maenpaa T: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell. 2002, 24 (7): 971-987.View ArticleGoogle Scholar
- Horng M-H, Sun Y-N, Lin X-Z: Texture feature coding method for classification of liver sonography. Comput Med Imaging Graph. 2002, 26 (1): 33-42.View ArticlePubMedGoogle Scholar
- Laws KI: Rapid texture identification. Proc. SPIE 0238, Image Processing for Missile Guidance. 1980, 376-381. doi:10.1117/12.959169,Google Scholar
- Leung T, Malik J: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis. 2001, 43 (1): 29-44.View ArticleGoogle Scholar
- Duan K-B, Rajapakse JC, Wang H, Azuaje F: Multiple svm-rfe for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience. 2005, 4 (3): 228-234.View ArticlePubMedGoogle Scholar
- Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Software. 2010, 33 (1): 1-22.View ArticleGoogle Scholar
- Friedman J, Hastie T, Tibshirani R: glmnet: Lasso and elastic-net regularized generalized linear models. R Package Version. 2009, [http://cran.r-project.org/web/packages/glmnet/index.html],Google Scholar
- Breiman L: Random forests. Mach Learn. 2001, 45 (1): 5-32.View ArticleGoogle Scholar
- Liaw A, Wiener M, Breiman L, Cutler A: Package 'randomforest.’. Retrieved December. 2009, 12: 2009-Google Scholar
- Domingos P, Pazzani M: On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn. 1997, 29 (2–3): 103-130.View ArticleGoogle Scholar
- Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A, Leisch MF: The e1071 package. Misc Functions of Department of Statistics (e1071), TU Wien. 2006, [http://cran.r-project.org/web/packages/e1071/index.html],
- Freund Y, Schapire RE: Experiments with a new boosting algorithm. Machine Learning, Proceedings of the Thirteenth International Conference (ICML). 1996, Bary: Morgan Kaufmann, 148-156.Google Scholar
- Culp M, Johnson K, Michailidis G: ada: An r package for stochastic boosting. J Stat Software. 2006, 17 (2): 9-View ArticleGoogle Scholar
- Yang J, Yu K, Gong Y, Huang T: Linear spatial pyramid matching using sparse coding for image classification. Computer Vision and Pattern Recognition (CVPR), 2009 IEEE Conference On. 2009, Miami, FL: IEEE, 1794-1801.View ArticleGoogle Scholar
- Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y: Locality-constrained linear coding for image classification. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference On. 2010, San Francisco, CA: IEEE, 3360-3367.View ArticleGoogle Scholar
- Mensink T, Verbeek J, Perronnin F, Csurka G: Distance-based image classification: Generalizing to new classes at near zero cost. IEEE Trans Pattern Anal Mach Intell. 2013, 35 (11): 2624-2637.View ArticlePubMedGoogle Scholar
- Lazebnik S, Schmid C, Ponce J: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Computer Vision and Pattern Recognition (CVPR), 2006 IEEE Conference On, vol. 2. 2006, New York, 2169-2178.View ArticleGoogle Scholar
- Tibshirani R: Regression shrinkage and selection via the lasso. J Roy Stat Soc B. 1996, 58 (1): 267-288.Google Scholar
- Efron B, Hastie T, Johnstone I, Tibshirani R: Least angle regression. Ann Stat. 2004, 32 (2): 407-499.View ArticleGoogle Scholar
- Zou H, Hastie T: Regularization and variable selection via the elastic net. J Roy Stat Soc B Stat Meth. 2005, 67 (2): 301-320.View ArticleGoogle Scholar
- Bühlmann P, Yu B: Boosting with the l 2 loss: regression and classification. J Am Stat Assoc. 2003, 98 (462): 324-339.View ArticleGoogle Scholar
- Hoerl AE, Kennard RW: Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970, 12 (1): 55-67.View ArticleGoogle Scholar
- Binder H, Schumacher M: Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 2008, 9: 14-View ArticlePubMed CentralPubMedGoogle Scholar
- Tutz G, Binder H: Boosting ridge regression. Comput Stat Data Anal. 2007, 51 (12): 6044-6059.View ArticleGoogle Scholar
- Binder H, Schumacher M: Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics. 2009, 10: 18-View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.