An integrative model of multi-organ drug-induced toxicity prediction using gene-expression data

Background In practice, some drugs produce a number of negative biological effects that can mitigate their effectiveness as a remedy. To address this issue, several studies have been performed for the prediction of drug-induced toxicity from gene-expression data, and a significant amount of work has been done on predicting limited drug-induced symptoms or single-organ toxicity. Since drugs often lead to some injuries in several organs like liver or kidney, however, it would be very useful to forecast the drug-induced injuries for multiple organs. Therefore, in this work, our aim was to develop a multi-organ toxicity prediction model using an integrative model of gene-expression data. Results To train our integrative model, we used 3708 in-vivo samples of gene-expression profiles exposed to one of 41 drugs related to 21 distinct physiological changes divided between liver and kidney (liver 11, kidney 10). Specifically, we used the gene-expression profiles to learn an ensemble classifier for each of 21 pathology prediction models. Subsequently, these classifiers were combined with weights to generate an integrative model for each pathological finding. The integrative model outputs the likeliness of presenting the trained pathology in a given test sample of gene-expression profile, called an integrative prediction score (IPS). For the evaluation of an integrative model, we estimated the prediction performance with the k-fold cross-validation. Our results demonstrate that the proposed integrative model is superior to individual pathology prediction models in predicting multi-organ drug-induced toxicities over all the targeted pathological findings. On average, the AUC of the integrative models was 88% while the AUC of individual pathology prediction models was 68%. Conclusions Not only does this integrative model produce comparable prediction performance to existing approaches, but also it produces very stable performance overall. In addition, our approach is easily expandable to a variety of other multi-organ toxicology applications.


Background
With the advent of high-throughput technologies (such as microarrays), an explosive growth of gene-expression data in toxicology applications has given us new insights into the biological processes of hepatotoxicity or nephrotoxicity [1]. Thus, many studies have worked on using geneexpression profiles for the prediction of the potential negative effects of new drugs under development [1,2]. Because new drugs often lead to some injuries in the liver or kidney, which present various pathological findings, it would be very useful to predict the drug-induced toxicity for multiple organs. However, thus far, most of the earlier works focused on predicting single-organ toxicity with only limited pathological symptoms [3][4][5][6][7][8][9].
For example, Cui et al. [3] used gene-expression profiles obtained from kidney RNA samples with rat cDNA microarrays to develop an SVM-based prediction model that groups gene-expression profiles into four classes according to the severity and type of pathology. Using this model, they could predict the pathologies of 28 test profiles with 100% specificity and 82% sensitivity. Lingkang et al. [4] grouped liver samples by the level of necrosis exhibited in the tissue and used them to develop a random forest classifier with 21 genes. They achieved 90%, 80%, and 60% prediction accuracies against test data derived from the livers of rats exposed to three different hepatotoxins, respectively. Kedar et al. [5] proposed a network model to assess chronic hepatotoxicity based on sub-chronic hepatic geneexpression data in rats, where the directed graph accounts for the interactions among drugs, differentially expressed genes and chronic hepatotoxicity. They estimated phenotypical exposure risk such as toxic hepatopathy, diffuse fatty change and hepatocellular adenoma from gene-expression profile of rats. Low et al. [6] proposed a hybrid model of chemical descriptors and gene-expression data to predict hepatotoxicity. Although the hybrid models did not show better accuracy than the model based on only gene-expression data, the use of chemical descriptors could enrich the interpretation of the model. Minowa et al. [7] developed the prediction model for the onset of drug-induced proximal tubular injury from gene-expression data, and achieved 93% sensitivity and 90% specificity. Bowles et al. [8] quantified several pathological findings and DILI severity and then constructed statistical prediction models from geneexpression profiles for liver pathology in rats and for druginduced liver injury. For this purpose, they used the Lasso regression and "glmnet" algorithm to extract models for rat liver pathology prediction and used stochastic gradient boosting to extract models for drug-induced liver injury. Zhang et al. [9] employed gene-expression data, cell-based assays, and pathological data to obtain a network of earlyresponse genes as a consensus signature of drug-induced in-vitro and in-vivo toxicity, and used the network to predict liver or kidney toxicity from gene-expression data. The accuracy of prediction model was between 80% and 97% in both liver and kidney.
In this work, we attempt to develop an integrative model of drug-induced toxicity prediction applicable to the prediction of multiple organ pathologies. To achieve this end, we investigated the relationships between distinct pathologies by exploring co-occurrences of pathologies within training samples, and used them to combine individual pathology prediction models for drug-induced toxicity. Consequently, for a test sample, our integrative model can predict the occurrence of the trained pathologies presented in multiple organs.
In fact, according to TG-GATEs, the researchers extracted the slides of the liver or kidney from all in-vivo samples, and pathologists examined these slides to detect pathological symptoms and their severity in various parts of the liver or kidney. (See Additional file 1, 2) Due to the limitation of available gene-expression data in both the liver and kidney, which cannot cover all possible pathological symptoms, we targeted only the 21 findings, as mentioned above.
For 37 liver and 26 kidney drug-induced pathological findings observed in in-vivo rat samples exposed to one of 131 drugs in TG-GATEs, we simplified them into general terms (e.g., grouping the subtypes of liver degeneration into 'liver degeneration', or grouping the necrosis of bile duct or hepatocyte into liver necrosis.) and then selected only the pathology terms that were induced by at least 5 drugs or more, as the targets of our prediction model.

Training data for toxicity prediction model
For the learning of the toxicity prediction model, we used confirmed case and control samples, with geneexpression profiles for cases with targeted pathological findings and expression profiles for controls without the targeted pathological finding, respectively. All the geneexpression profiles in the training data were normalized by using a robust multi-array average (RMA) method with the RefPlus R package [11], where the reference model was built by selecting 5% of the training samples in each organ and used to derive reference quartiles and probe effects for normalization. To construct the toxicity prediction model, we need to identify gene signatures particular to each of the 21 pathological findings. For this purpose, we applied the t-test to the training data of confirmed case/control samples in each pathological finding and selected the top 5% ranked genes in the increasing order of p-values.

Construction of individual pathology prediction models
For each of the 21 pathological findings, we trained an ensemble model of n k-nearest neighbor (KNN) classifiers (i.e., an ensemble of n sub-prediction models) to construct an individual pathology prediction model. The main reason of developing n sub-prediction models for each pathology prediction is to handle the data imbalance problem (i.e., handling the situation in which the number of case samples is much smaller than the number of control samples), a problem that commonly occurs in training data for adverse drugs reaction [12,13]. Each sub-prediction model was developed with each of n jackknife samples generated from training data, in which each jackknife sample includes all the confirmed case samples and a randomly selected subset of control samples. The final output of any individual pathology prediction model is determined by aggregating all the outputs of n sub-prediction models for each pathology finding. That is, for a test sample s of geneexpression profile with exposure to a certain drug, our pathology prediction model produces the pathology prediction score (PPS)M F i (s) as an output for a pathology finding F i by averaging all the outputs of n sub-prediction models as follows: The score of M F (s) is in the range of 0 [1], where values approaching 0 indicate that a pathological finding F will not occur in the sample s and values closer to 1 indicate that F will occur in the sample s. If M F (s) is higher than a certain threshold, we predict that the pathological finding F would occur in the sample s. Figure 1 illustrates the procedure of producing the PPS as an output of individual pathology prediction.

Pathology relationships extraction
To investigate the relationships between pathological findings for drug-induced toxicity prediction, we constructed a pathology occurrence matrix that describes the presentation states of training samples for the targeted pathology findings (i.e., 11 liver and 10 kidney). From this matrix, we can derive the pathology occurrence vector for each pathology finding and use them to calculate the Jaccard similarities between confirmed case samples displaying two different pathological symptoms. Specifically, assuming that there are n training samples and k pathological findings of interest, the pathology occurrence matrix Φ is defined by: Where ij = if a pathology F j occurred in the i th training sample and ij otherwise. Also, the similarity between F i and F j is defined as the Jaccard coefficient between their pathology occurrence vectors (i.e., Φ i and Φ j ) which measures the commonness of sample occurrences of two pathology findings. Formally, the pathology similarity S between F i and F j is calculated by: where N 01 is the total number of training samples in which F i does not occur and F j does occur, N 10 is the total number of training samples in which F i occur and F j does not occur, and N 11 is the total number of training samples in which both F i and F j does occur. The value of pathology similarity is in the range of 0 [1], and some examples of pathology similarities for 4 pathological findings (i.e., liver eosinophilic change, liver cellular infiltration, kidney cytoplasmic vacuolization, and kidney regeneration) are shown in Table 1.
The rationale of pathology relationships can be explained as follows: A drug-induced toxicity can present several pathological findings in liver or kidney, and the occurrence of certain pathological finding may be associated with the occurrence of some other pathological finding(s). This is because a drug-induced toxicity can perturb certain gene-expressions or protein activities followed by the perturbation of several pathway activities, leading to various toxicity symptoms [14]. For example, the PPAR-alpha gene is related to liver hypertrophy [15], which is also related to kidney tubular necrosis [16,17]. ARG1 gene is related to both biliary injury and single cell necrosis [18]. Hence, some symptom(s) of toxicity might be the indirect evidence of other symptom(s) of toxicity. To implement this rationale in our experiments, we estimated the degree of the association between two pathological findings by calculating the Jaccard similarity between their pathology occurrence vectors.

The integrative model for drug-induced toxicity prediction
The construction of the integrative model for drug-induced toxicity prediction is done by the weighted combination of all the 21 individual pathology prediction models. For this purpose, we first obtain the normalized pathology similarities N(F i , F j ) from the pairwise pathology relationships S(F i , F j ) defined as above by exploring co-occurrence samples of two pathological findings. That is, when having k different Figure 1 The procedure of producing the pathology prediction score (PPS) in an individual pathology prediction model. pathological findings, the normalized pathology similarity between F i and F j is defined by Then, our integrative model produces the integrative prediction score IPS F i (s)of a test sample s for each pathological finding F i , which is calculated by If the score of IPS F i (s) is higher than a certain threshold, it is predicted that the pathology finding F i would occur in the test sample s. Thus, for k distinct pathological findings, our integrative model produces a k-dimensional vector of the IPSs of a test sample s as follows: . . .  Figure 2(c). From this example, we can notice that the IPS for a certain pathological finding is affected by both the degree of the association with other pathology findings and the results of other pathology prediction models. That is, in the case of a pathological finding F 1 of s 1 , the score of IPS F 1 (s 1 ) = 0.521 gets higher than the score M F 1 (s 1 ) = 0.4 of an individual pathology prediction model because its highly associated finding F 2 produces a relatively high score of M F 2 (s 1 ) = 0.4. On the other hand, in the sample of s 2 , the IPS F 1 (s 2 ) = 0.479 gets lower than the score M F 1 (s 2 ) = 0.5 of an individual pathology prediction model because its highly associated finding F 2 produces a low score of M F 2 (s 2 ) = 0.0. Thus, if the threshold of IPS is set to 0.5, our model predicts that the pathological finding F 1 will be presented in the sample s 1 and not in the sample s 2 .

Evaluation of individual pathology prediction models for multi-organ drug-induced toxicity
To evaluate the performance of individual pathology prediction models in forecasting multi-organ drug-induced toxicity over the targeted pathological findings, we used 3708 rat in-vivo samples exposed to one of our chosen 41 drugs. (See Additional file 3) Here, 41 drugs were chosen from TG-GATEs of which the gene expression profiles are provided in both the liver and kidney samples in all cases. As discussed above, we developed 21 individual pathology prediction models for our targeted pathological findings (i.e., 11 in liver and 10 in kidney) using the KNN-based ensemble classifiers, which produced the outputs of the pathology prediction scores (PPSs). If the PPS output of a pathology prediction model for a test sample is greater than a given threshold, then we predict that the targeted pathological finding will occur in the test sample. We estimated the prediction performance of each individual pathology prediction model in terms of sensitivity and specificity using the k-fold cross-validation method. That is, original training data of the in-vivo samples mentioned in Methods were divided into k groups of equal size in the unit of drugs. To avoid an over-fitting problem, we used the samples of k-1 groups for model development (including gene signature selection for the targeted pathology, pathology relationship extraction, and prediction model learning), and used the samples of 1 remaining group to test model performance. This process was iterated k times for different selection of a test group (See Additional file 4). Here, the threshold value of PPS in each pathology prediction model was set to maximize the geometric mean of sensitivity and specificity. The geometric mean of sensitivity and specificity is often used to evaluate a prediction model with imbalanced data set.   all the individual pathology prediction models tend to show relatively lower sensitivities (approximately 70% on average) than specificities (approximately 87% on average) overall in predicting the targeted pathological findings. In addition, the standard deviations of sensitivity over all the models seems to be considerably high, ±18% in liver pathology and ±17% in kidney pathology, compared to those of specificity, i.e., ±8% in liver pathology and ±9% in kidney pathology. Even the sensitivities of several pathology prediction models (e.g., models for liver single cell necrosis and kidney hypertrophy) are extremely low as <50%. Since the performance of individual pathology prediction models depends on the choice of the PPS thresholds, it is worthwhile to draw ROC curves for model evaluation. The ROC curves of 6 pathology prediction models, which show the most distinction between PPS- based models and IPS-based models, are given in Figure 4. All the ROC curves of 21 pathology prediction models are also given in Additional file 5.
Evaluation of the proposed integrative models for multiorgan drug-induced toxicity We evaluated the prediction performance of the proposed integrative models in the same manner as for the individual pathology prediction models, using the k-fold cross-validation method. Particularly, we extracted pathology relationships only based on the samples of k-1 groups chosen as training set. (See Additional file 4) Here, the threshold of IPS was also set to maximize the geometric mean of sensitivity and specificity in forecasting the targeted pathological changes. Figure 5 displays the evaluation results of our integrative models for the prediction of 21 pathological findings. We observe that overall prediction performance is quite impressive, increasing sensitivity considerably (approximately 84% on average) without much sacrifice of specificity (approximately 83% on average). In addition, the variability of the prediction performance for all pathologies in the integrative model is much lesser than the individual pathology models it is comprised of, indicating that an integrative approach can provide more stable performance. Figure 4 shows the ROC curves of the integrative models for the 6 pathological findings (3 in liver and 3 in kidney). It is noted that the ROC curves of the IPS-based integrative models are located completely above ROC curves of the PPS-based individual models in several cases. We also measured the area under the ROC curves (AUC) of the IPS-based integrative models and the PPS-based individual models (see Figure 6). As seen in Figure 6, the IPS-based integrative models showed much better performance than the PPS-based individual models overall. On average, approximately 20% of AUC in the proposed integrative models increased over all the pathology prediction, compared to individual pathology models. Particularly, for some pathological findings like liver single cell necrosis and kidney cytoplasmic vacuolization, the IPS-based integrative models achieved distinct improvement in prediction performance. For example, regarding liver single cell necrosis, the IPS-based integrative model can make high quality predictions within 93% of its AUC, but the PPS-based individual models only display this performance within 26% of AUC.
Furthermore, we compared our integrative model performance with the prediction performance given by the recent work of Bowles et al. (2013) [8] which also employs TG-GATEs to train and evaluate the prediction model. Table 2 shows the AUC comparison of the both models in predicting liver toxicity only, since the work of Bowles et al. only concerned itself with liver toxicity. Even if it shows limited comparison in predicting only three liver pathological findings (liver necrosis, hypertrophy, cellular infiltration), our integrative model performance seems to be comparable to their work.

Conclusions
We introduced an integrative model approach for multiorgan drug-induced toxicity prediction. For the development of integrative models, we investigated pathology relationships by considering the co-occurrence of targeted pathological states in training samples, and used them to unite individual pathology prediction models. Consequently, our integrative models produced better prediction performance over all of the 21 targeted pathological findings, than individual pathology prediction models.
Even when the performance of individual pathology prediction models is extremely low, our integrative models could improve the prediction performance considerably by referring to the prediction results of other highly associated pathological findings. In this work, our prediction models were developed using limited training set, so it is expected that more training samples would make the better estimation of the real associations among toxicity pathological findings.

Additional material
Additional file 1: An example of slide image extracted from the liver and kidney of an in-vivo sample.
Additional file 2: A table of detected pathological symptoms in each part of the liver and kidney. This table is formatted for Excel. The table contains type of pathology, part of liver of kidney in which the pathology occurred, and number of drugs which induced the pathology within experiments of TG-GATEs.
Additional file 3: A table of in-vivo samples which were used to evaluate the prediction models. This table is formatted for Excel. The table contains sample ID, type of medicated drug, and occurrences of drug-induced targeted pathologies for each sample.
Additional file 4: An image to describe the cross-validation method for evaluating toxicity prediction models. The image shows the concept of cross-validation methods used to evaluate individual pathology prediction models or integrative models.