We designed a new prognostic model based on a neuroblastoma classifier, NB-MuSE, that predicts patients' outcome by merging the biological and prognostic information of published gene expression signatures, assessed by a panel of machine learning algorithms, into a single outcome predictor. We examined every neuroblastoma-related signature described in the literature since 2002 without consideration for the purpose for which it was generated or the gene expression platform used. We took the blind screening approach to avoid biases and to include biology-driven signatures, not previously tested for patients stratification, in addition to risk-based signatures. We identified 33 signatures, complete of gene lists, suitable for our study. Patients' outcome was the final readout of the classifier and we had to develop a strategy to filter out poorly information signatures contributing to the background noise. We developed a multi-algorithm screening and an 80% accuracy filter for signature selection. This essential step was based on the overproduce-and-select approach in which a pool of classifiers are spawned and then optimally selected on-the-fly by monitoring accuracy of prediction on an external dataset. We evaluated 22 machine learning algorithms for outcome prediction on the 33 signatures generating 726 prediction to be evaluated for accuracy on an independent dataset. We selected the signatures for which we identified at least one algorithm performing with an accuracy > 80%. Exclusion of a signature from this analysis indicated that we did not identified an algorithm capable of translating those signatures into a predictor in our cohorts or that the signatures were not related to patients' outcome but it does impact on the relevance of those genes in the contest of the original publication. Eleven out of thirty three signatures were discarded. We then matched each of the remaining 20 signatures with the best performing algorithm among those with > 80% accuracy to generate signature specific outcome prediction classifier. In essence, we transformed 20 datasets each with 60 instances (patients) and numeric attributes (probesets expression value) into one dataset with 60 instances and 20 nominal "alive" or "dead" attributes (one per selected signature). The latter dataset could then be used as input to train the new NB-MuSE-classifier merging all the signature information. 22 algorithms were tested to select the best performing which was the Decision Table which builds a simple decision table majority classifier and evaluates features subsets using best-first search and can use cross validation for evaluation (for review see ). Performance can be evaluated by many parameters and there is heterogeneity in the performance of the various algorithms tested as shown by the Additional file 8. The Decision Table algorithm was chosen because it showed maximal accuracy, but other parameters could have been selected to highlight other features like sensitivity or specificity. Ensemble learning approaches have proven to exceed average classifier performance . Our strategy utilizes such strategy to produce a flexible tool merging gene expression signatures overcoming the limitations imposed by specific environments in which they were generated. We observed that, in the absence of signature/algorithm filtering, the accuracy of our classifier fell below 82% a level that was lower than that achieved by individual classifiers. The importance of including these steps in model generation procedures to obtain a more robust and better performing classifier was recently reported . Optimization and filtering is quite labor intense and was not considered, for example, in breast cancer studies merging hundred of gene expression signatures to build classifiers . The high number of signatures available in breast cancer may balance the avoidance of filtering out poorly informative signatures. An automated implementation of this process can be envisioned if this approach was exported to larger lists of signatures.
The accuracy of NB-MuSE-classifier on external validation was 94%, a value that is very high from the biologic stand point. Although there is no logical reason why it cannot be higher, it is difficult to envision a much better precision from a biological standpoint considering the variability of the experimental and clinical data. On the other hand, there is no limit to the number of signatures that can be derived with biological questions in mind. Our model offers a reliable way to keep merging this information into an outcome classifier that will be more robust even if not much more accurate. It is noteworthy that the misclassified patients are grouped in the stage 4 category in agreement with the fact that prognosis of this stage is traditionally difficult. We can speculate that combination of the information of stage and NB-MuSE-classifier could be particularly effective in predicting outcome in patients with localized tumors (stage 1-3) or stage 4s Survival analysis if this group of patients supports this claim showing excellent outcome separation superior to that observed on the whole cohort. however, more patients will have to be tested to substantiate this claim. Similar analysis performed on patients with MYCN amplified tumors showed a significant outcome stratification although not as good as that observed with the whole cohort. We are working on strategies for comparing neuroblastoma gene expression dataset obtained with different platforms in order build a larger data set to address question on smaller groups of patients. We are among the few focusing on the question of merging heterogeneous gene expression signatures to predict outcome. To limit the variability, we considered only gene expression data generated by microarray analysis of the primary neuroblastomas using the Affymetrix platform U133plus2 and we put together 182 primary neuroblastomas, a cohort that is large for this kind of tumor. On the other hand, there was no restriction on the technology used to generate the signatures that turned out to be quite heterogeneous demonstrating that our multistep approach a is suited to work across experimental platforms. This aspect is very important particularly in the field of rare tumors, such as pediatric tumors, where it is extremely difficult to build large homogeneous gene expression datasets and where we may envision that the developing signatures will be based on new experimental platforms.
Affymetrix platform differs largely from the those used in the studies reporting the single classifiers (e.g. two-color gene-expression data from different technological platforms, QPCR analyses etc.). In addition, some of the machine learning algorithms used in the original reports of the classifiers were not part of the panel used in the present study. This may explain discrepancies between the performances of individual signatures that were previously published and that calculated in this work. The problem of downplaying the performance of some signatures is partially offset by the discovery of the prognostic ability of other signatures, a feature not previously shown in the original publications. However, the possible advantage of the MuSE-classifier over presently existing classifiers cannot be easily quantified because we took individual signatures out of their original contest. Table 3 shows that merging signatures into a single classifier results in a predictor with very high accuracy but it does not imply that this value is maximal and considerations on the relative performance of MuSE versus other signatures is valid only in the contest of this work.
The discovery of outcome prediction ability of biology-driven signatures, never tested before for patients stratification, is a spinoff of the process of NB-MuSE-classifier generation. This was true for most of the biology-driven signatures comprising about half of those in the NB-MuSE-classifier [1, 22–24, 26–29, 31] with the exception of those addressing the prognostic significance of hypoxia  and MYC pathway  that had already been validated in patients stratification. Our data bear direct evidence to the suggestion that the biology driven features, measured by the gene expression signatures, such as neuroblast transformation, apoptosis histone deacetylase etc. (Table 2) are strongly interconnected with the progression of the human disease and support the need for further research in this direction .