Literature based discovery of alternative TCM medicine for adverse reactions to depression drugs

Background In recent years, Traditional Chinese Medicine (TCM) and alternative medicine have been widely used along with western drugs as a complementary form of treatment. In this study, we first use the scientific literature to identify western drugs with obvious side effects. Then, we find TCM alternatives for these western drugs to ameliorate their side effects. Results We used depression as a case study. To evaluate our method, we showed the relation between herb-ingredients-target-disease for representative alternative herbs of western drugs. Further, a protein-protein interaction network of western drugs and alternative herbs was produced, and we performed enrichment analysis of the targets of the active ingredients of the herbs and examined the enrichment of Gene Ontology terms for Biological Process, Cellular Component, and Molecular Function and KEGG Pathway levels, to show how these targets affect different levels of gene expression. Conclusion Our proposed method is able to select herbs that are highly relevant to the target indication (depression) and are able to treat the side effects caused by the target drug. The compounds from our selected alternative herbal medicines can therefore be complementary to the western drugs and ameliorate their side effects, which may help in the development of new drugs.

especially those based on network analysis theory, provide opportunities for complementary and alternative application of medication to traditional Western medicine methodology. Many studies have integrated multiple data sources with traditional Eastern medicine as a form of meta-database and used them to discover latent relations among biological entities. A TCM database built by Chen [9] shows more than 20,000 compounds from TCM ingredients as 2D and 3D molecular structures. Ye et al. [22] constructed a curated database for Herb Ingredients' Targets (HIT) from PubMed abstracts. Xue et al. [10] also built a database of traditional prescriptions, herbs, and compounds, including text-mined drug and gene information from resources such as DrugBank, PubChem, and OMIM. Subsequently, the TCMSP was implemented [11].
These network analysis projects have a common purpose: to find prospective drug candidates or to facilitate the repositioning or repurposing of existing drugs by identifying previously undiscovered interactions. In Korea, the Integrated Bio-Pharmacological Network Database for Traditional Korean Medicine, which proposed an established network of traditional Korean medicines, drugs, proteins, indications and side-effects for drug discovery, was published [23], Jeong et al. [24] conducted literature-based research into the clustering of anti-cancer drugs and network analysis of those drugs and target proteins, focusing on pancreatic cancer. A link analysis of compound-target proteins from a semantic network constructed using text-mining data was studied by Fu et al. [25]. Zhang et al. [26] proposed a network-topological similarity-based classification method for the prediction of the association between drugs and diseases. Specifically regarding the prediction of side effects and ADEs, Cheng et al. [27] carried out text mining and constructed a meta-database including known compound-ADE associations, and reported a network model for the prediction of potential ADEs.
As the number of databases containing rich information about chemical compounds, genes, proteins and diseases has increased, many computational methods have been developed for predicting the side effects of drugs, based on this information, before they are released to the market. These prediction and identification studies into ADRs primarily use machine learning techniques, ranging from naive Bayesian models for rapid assessment [17] to support vector machines (SVMs), with or without other techniques [13,14], to multiple techniques including ensemble or hybrid learning [16,19,20,28], and more complicated algorithms are continuously being developed [29,30].
In the present study, we used Western medicine databases and traditional literature regarding Chinese medicine for text mining in order to find complementary or alternative medicines for known drugs with significant side-effects. We also tracked the associations of drugs with conditions and side-effects, into which previous research has not been conducted. In the selection of alternative traditional prescriptions, we aimed to maximize the side-effect mitigating efficacy of the herbs. Figure 1 illustrates the overall research design used to explore alternative herbs for drugs with side effects identified in cross-lingual scientific databases. In the initial step, scientific papers were obtained from PubMed, and 169,766 records were collected from 2010 to 2014 (http:// informatics.yonsei.ac.kr/download/pubmed2010-2014.txt). We preprocessed the PubMed records and extracted entities using the Named Entity Recognition (NER) technique of PKDE4J [31]. Then we linked drug and disease entities according to their co-occurrence in a single abstract. We then filtered them by database to identify drug/side effect and drug/indication relations. For this, we used SIDER (http://sideeffects.embl.de/download/) to identify drug/side effect relations, and the Therapeutic Target Database (TTD) (http://db.idrblab.net/ ttd/) to identify drug/indication relations. We developed an algorithm to select drugs in PubMed which have obvious side effects. Further, we translated the indication for which the drugs were indicated into Chinese and searched for them in the Chinese science database (CNKI) in the domain of traditional Chinese pharmacology. In the same manner, we extracted entities-herb and disease-and linked them by co-occurrence in one abstract. We calculated the substitutability of herbs and drugs indicated for the same condition using the Chinese Traditional Prescriptions Database (CTPD). In the result part, we show the relation Fig. 1 The overall research design between herb-ingredients-target-disease for representative alternative herbs. We also produced a protein-protein interaction network of Western drugs and alternative herbs and we performed enrichment analysis of the targets of the active ingredients of the herbs.

Selection of drugs to be replaced
In this paper, we calculated the drug/side effect score and drug/indication score using Formulae 1 and 2. Link weight is the drug/side effect or drug−/indication co-occurrence frequency.
Indication score ¼ target indication link weight average indication link weight ð2Þ For every drug, we recorded the link weight of the co-occurrence side effects, and then used the target side effect link weight divided by the average side effect link weight. We calculated indication in the same way. If a drug is popular in the scientific field, the side effects and indications of that drug will be better documented than those of a less popular drug. Thus, we used two approaches to reduce the problem of bias in the literature: division by the average side effect or indication; and the use of average side effect score divided by indication score to rank the result.
The assumption of the proposed algorithm is that in the literature, if a drug has low toxicity, the conditions for which that drug is indicated will be mentioned more in publications than those of other drugs which have similar targets. A high indication score may therefore be related to low toxicity. Conversely, if a drug has high toxicity, the side effects of that drug will generally not be in common with those of other drugs. To evaluate the performance of the proposed algorithm based on Formula 1 and 2, we matched the ranked results with the toxicity and half-life indicators of the target drug in DrugBank.
As shown in Algorithm 1, in our dataset, for each drug, we calculated the side effect and indication scores using Formulae 1 and 2. A score of side-effect/indication of more than 1 indicated that this drug may need an alternative in a corresponding indication.  Eighty-four drugs were identified as having obvious side effects in our dataset.

Herb data collection
These eighty-four drugs are indicated for sixty-six different indications, according to the drug/indication pairs in our dataset. We translated these indications into Chinese and searched in the CNKI (http://www.cnki.net/). We collected Chinese literature from the Chinese database because the literature on TCM published in English are insufficient due to the low recognition of TCM in Western countries, and we needed to find alternative herbs for the drug, based on traditional Chinese medical prescriptions. Table 1 shows the top 20 indication names ranked by number of papers. The full indication names are shown in Additional file 1. A total of 47,103 papers were collected from the domain of traditional Chinese pharmacology.
After the Chinese abstracts were collected, we extracted the disease and herb entities from them. The disease and herb names came from a Chinese medicinal materials database, which includes 60,993 records of herb names and indications. The data came from the National Chinese herbal medicine compilation, a Chinese medicine dictionary, the Chinese herb and other sources. We obtained 153,595 disease-herb pairs linked by co-occurrence in one abstract.

Substitutability
The substitutability of a herb for a drug, which shares a specific indication with the herb, is calculated by considering a) average dosage proportion of a herb D (Herb) , and b) the average number of elements of the intersection of side-effects of the drug and the indications of the herb's prescription (N) as shown in Formula (3).
To calculate the intersection number (N) of side effects produced by a target drug and the indication of the target herb's prescription, we translated the side effects of the target drug into Chinese. Prescriptions came from the Chinese Traditional Prescriptions databases (CTPD中药方剂), which includes more than 1000 famous medical books including 84,212 prescriptions with prescription composition, source, preparation method, efficacy, usage, and notice. Figure 2 shows the substitutability calculation and evaluation process. First, we translated the condition for which a candidate herb is indicated, searched the CNKI, and extracted the disease-herb pair. We searched for these herbs in the CTPD to calculate the average dosage proportion of the target indication and the average number of elements in the intersection of drug side-effects and the indications of the herb's prescription. Then we took the top 10 herbs ranked by substitutability as alternative herbs for the target drug. Finally, we used the herb-ingredients-target-disease relations to evaluated our result. Table 2 shows an example. The target drug is mirtazapine, the target herb is 白芍, and the number of intersections of the prescription 1 indication set and the side effect set is three (agitation(烦躁), dizziness (头晕), dry mouth(口燥)). The intersection number of the prescription 2 indication and the side effect set is three (anxiety(忧), vomiting (呕), dizziness(头眩)). N is the average number of N n , and then 1 is added, because depression is also an indication in the prescription. In Table 2, the asterisk (*) in the prescription indications column shows common indications with the side effects of the target drug and the dot (•) in bold is the indication (depression).
Before calculating the importance of each herb, we needed to unify the unit of measurement. In our prescription dataset, 1两 (Liang) = 10钱 (Qian) = 100分 (Fen), so we unified all to a minimum unit (Fen).
We calculated the proportion of herb per dosage in Traditional Chinese Medicine prescriptions. In prescription 1, the dosage of 白芍 (PAEONIAE RADIX ALBA) is 3 钱(Qian, the dosage of whole prescription 1 is 25 钱 (Qian), therefore, the importance of the target herb  in prescription 1 is 3/25 = 0.12. The average dosage of the target herb in prescription 2 is 0.086. The substitutability of mirtazapine and 白芍 is (3 + 1) × 0.103 = 0.412. If herb and target indication showed in one prescription (direct herb), we used the above method to calculate their substitutability. However, If the herb and the target indication did not show in one prescription, we called them indirect herbs. We calculated the co-occurrence proportion of indirect and direct herbs and then multiplied by the directly connected herb dosage proportion to obtain the indirect herb dosage proportion as the score. We also accumulated all related direct herb scores as the indirect herb dosage proportion. N indirect is the average number of the intersections of side effects produced by the target drug and the indication of the target herb's prescription. Algorithm 2 shows the calculation of substitutability for these two cases.

Evaluation of obvious side effect drugs
We evaluated all drugs in our dataset by toxicity and half-lives of drugs from Drug-Bank. In use of toxicity, we used the oral LD50 in rat. We did not calculate toxicity if the oral LD50 in rat of those drugs are not founded in the DrugBank database. In addition, if the LD50 and half-life are time buckets, we used the median value. LD50 is the amount of a toxic agent that is sufficient to kill 50% of a population of animals within a certain time. A smaller number means the toxicity is higher. Half-life represents drug persistence. If drug A and B have similar toxicity but the half-life of A is much longer than that of B, the dosing interval of drug A is longer. Table 3 shows the averaged toxicity and half-life of selected and unselected drugs, respectively. The averaged oral LD50 in rat for unselected drugs was 1.56 times higher than that of selected obvious side effect drugs, which indicated that the selected obvious side effect drugs generally have higher toxicity. In addition, the averaged half-lives of unselected drugs is longer than that of selected obvious side effect drugs.

Case study: depression
In this section, we describe the result of alternative herbs for obvious side effect drugs in the dataset by dividing the result of algorithm 1 (finding obvious side-effect drugs for depression in the dataset) and algorithm 2 (finding alternative herbs) using depression as the indication. Also, to evaluate our methodology, we show the relation between herb-ingredients-target-disease for salient alternative herbs. We also produced a protein-protein interaction network of Western drugs and alternative herbs, and performed enrichment analysis of the targets of the active ingredients of the herbs. The enrichment of Gene Ontology terms for Biological Process, Cellular Component, and Molecular Function and KEGG Pathway levels, to show how these targets affect different levels of gene expression, was examined.

Finding obvious side-effect drugs for depression
In our dataset, there are 12 different kinds of Western drugs for treating depression. However, 7 of them have side effect scores greater than the indication score defined in Algorithm 1. In Table 4, the replaceable score is the side effect score divided by the indication score. A higher score means this drug has a higher need for finding alternative herb medicines in the collected dataset. Table 4 shows the 7 drugs used as candidates among the 12 drugs which have the target indication of depression. Nefazodone ranked first in our dataset. We examined the toxicity and half-life indicators in DrugBank, and the half-life of Nefazodone is only two to four hours.
There were five drugs that there is no need to find alternative herb medicines (Table 4). For example, reboxetine and mirtazapine have low toxicity. Venlafaxine's half-life is short, but the toxicity is lower. Canadian clinical practice guidelines recommend venlafaxine as a first-line option for the treatment of depression [32]. Citalopram has a certain level of toxicity, but its half-life is longer than that of the other drugs. Fluoxetine has many side effects, but none of them are serious. Its half-life is the highest of all of the anti-depression drugs. Thus, fluoxetine appears to be the most suitable drug for the treatment of depression in our dataset. In the literature, if a drug has low toxicity, the conditions for which that drug is indicated will be mentioned more in publications than those of other drugs which have similar targets. A high indication score may therefore be related to low toxicity. Conversely, if a drug has high toxicity, the side effects of that drug will generally not be in common. For example, headache is a side effect, but is very common in many drugs, so the side effect score these drugs is not higher in our algorithm. A higher side effect score may be related to higher toxicity. Finally, we used the side effect score divided by the indication score to strengthen the difference.

The alternative herbs for obvious side-effect drugs for depression
In Algorithm 2, we used the average proportion of the target herb dosage in the target indication (depression) prescriptions and the average number of elements of the intersection of side-effects of the drug and the indications of the herb's prescription. We named this metric 'substitutability'. However, not all herb/disease pairs as indicated by co-occurrence in the literature are directly connected in prescriptions. For these herbs, we could not directly calculate the average dosage proportion. In the Traditional Chinese Prescription Database (TCPD), there are 57 prescriptions related to depression. After relation extraction, we identified 258 herbs which co-occurred with depression, and 97 herbs co-occurred with depression in prescriptions. For the 161 herbs which are not directly connected in prescriptions, we first calculated the co-occurrence proportion of these herbs and directly connected herbs as shown in the lower part of Algorithm 2. We then multiplied this measure by the directly connected herb dosage proportion to calculate indirect herb dosage proportion. Thus, we calculated the substitutability of all anti-depression drug and herb medicines.
In Table 5, the No. 1 herb in the Nefazodone-related list is 藿香 (POGOSTEMON CABLIN BENTH). TCMSP shows POGOSTEMON CABLIN BENTH including quercetin, which is used to treat depression, insomnia, asthma, gout, and arthritis. Among these indications, depression is the indication of Nefazodone. The side effects of Nefazodone are insomnia, asthma, gout, and arthritis, so the ingredients of POGOSTEMON CABLIN BENTH can treat the side effects of Nefazodone.

Relation of herb-ingredients-target gene or protein-disease(side effects)
As shown in Table 5, the Nefazodone related herb list includes POGOSTEMON CABLIN BENTH, PERSICAE SEMEN, PERILLA FRUTESCENS, CARTHAMI FLOS, PLATYCLADI SEMEN, RADIX SALVIAE, CORTEX PHELLODENDR, REHMANNIAE RADIX PRAEPA RATA, CIRTRI RETICULATAE PERICARPIUM VIRIDE, PAEONIAE RADIX ALBA. The ingredients of the top 10 herbs are related to Nefazodone. For the herb ingredients, we selected oral bioavailability (OB) greater than 30%, and drug-likeness (DL) greater than 0.18. Figure 3 shows herb-ingredient-target gene or protein-disease relations. In this figure, we show only Nefazodone-related diseases involving side effects or indications of Nefazodone.
We choose Nefazodone because it is the highest-ranked drug ( Table 4). The relation of herbs and ingredients was matched using TCMSP, while ingredients and diseases were matched using DrugBank. Nefazodone shows 206 side effects, in other words, diseases in SIDER, and 13 diseases including gout, asthma, and migraine are matched with what the herb ingredients could treat (OB > 30, DL > 0.18). These diseases in Fig. 3 are the side effects. Nefazodone is an anti-depression drug, and the target genes/proteins of herb ingredients such as SLC6A2, ADRB2, MAOA are related to depression disorder. Stigmasterol from 红花(CARTHAMI FLOS) (4th for Nefazodone) targets these three genes or proteins, which means our selected herb not only has the same indication (depression) as the Western drug and but also could mitigate the side-effects of that drug. Figure 3 shows the implications of herb substitutability, which means the herbs of similar curative effect of the Western medicine have the ability to reduce the side-effects of it at the same time. Our depression-focused study could be generalized to other cases for discovering complementary compounds from alternative herbal medicines. Table 6 shows the target of Nefazodone (DrugBank). We used the targets from Nefazodone and main active alternative herb ingredients (blue dots in Fig. 3) to create a protein-protein interaction network.  homology, gene co-occurrence, and text mining, we found that this protein could have interactions with ADRA1D, and it is an alpha-adrenergic receptor which mediates its effect through the influx of extracellular calcium. ADRA1D comes from the herb 丹参 (RADIX SALVIAE), which is effective on hypertension. In Fig. 4, MAOA is found related to HTR2A, ADRA2A by text mining, and to HTR2C, ADRA1B by co-expression with Nefazodone. MAOA is involved in the breakdown of the neurotransmitter serotonin. Signals transmitted by serotonin regulate mood, emotion, sleep, and appetite, which help treatment of depression. MAOA comes from stigmasterol, an ingredient of herb 红花(CARTHAMI FLOS). These relations show that alternative herbs could help the Western drug to improve the effectiveness of the treatment at the gene or protein level.

Enrichment analysis
We used all 221 target genes or proteins of alternative herbs for gene enrichment analysis. Figure 5 shows the enriched Gene Ontology terms for biological process,  cellular component, and molecular function, and KEGG Pathways of target genes/ proteins from the main active ingredients of alternative herbs ranked by P-value.

The relations of candidate drugs and their alternative herbs
In Table 7, we show the common target proteins between other western drugs (Table 5) and their alternative herbs. The targets of western drugs come from DrugBank. The relations of herbs, ingredients and targets were matched using TCMSP.

Discussions
In this study, we extracted drug and disease information from the PubMed records, and developed an algorithm to identify the drug with obvious side effects to be replaced. The indications of those drugs were translated and searched in the CNKI to extract herb and disease relationships. We also developed an algorithm to calculate the substitutability of target drugs and target herbs by considering the proportion of herb in the TCM prescriptions to the conditions for which the drug is indicated, and the side effects produced by the drug. The algorithms themselves are rather simple, but as shown in the case study of depression drug alternatives, the proposed approach could sort out viable candidates from multiple sources of biomedical databases across the western and Chinese literature. Our case study of depression, identified 10 alternative herbs, ranked by our method for each candidate drug. We chose the drug with the highest side effects and its related herb to evaluate our methods. The graph of Herb-Ingredients-Target-Disease relations shows the herbs of similarly curative effect of the western medicine can reduce the side-effects at the same time. The protein-protein interaction network implied that alternative herbs may help the western drug to improve the effectiveness of the treatment at the gene or protein level. Further, the gene enrichment analysis shows the enriched Gene Ontology terms for biological process, cellular component, and molecular function, and KEGG Pathways of targets from the main active ingredients of alternative herbs. With respect to biological process, the following two pathways are ranked in top 10: the adenylate cyclase-activating adrenergic receptor signaling pathway (GO:0071880) and the adrenergic receptor signaling pathway (GO:0071875). With the adenylyl cyclase being activated, GO:0071880 proceeds to increase the cyclic adenosine monophosphate (CAMP) concentration (https://www.ebi. ac.uk/QuickGO/term/GO:0071880). CAMP is known as a second messenger in cellular activity [33]. It responds the binding of an extracellular signal to a receptor on the cell surface by its concentration, and regulates the activity of intracellular enzymes and non-enzymatic proteins to carry and amplify signals in cell signal transduction pathway. Adrenaline is used to treat hypertension and asthma, which are side effects of Nefazodone. In cellular component, the GABA-A receptor complex (GO:1902711) ranked in top 10. Once GABA binds to the receptor, the receptor changes its configuration on  the cell membrane. Then, with the open channel hole, chloride anions can be allowed to pass down an electrochemical gradient. GABAA receptor is triggered and can create a more stable environment for the resting potential. Moreover, it can make the cell hyperpolarization which weakens the depolarization influence of the excitatory neurotransmitter and decrease the possibility of producing action potential. Thus, the role of this receptor is to inhabit and reduce the activity of neuron [34]. GABA-A contributes to receptor activation, and has anxiolytic, anticonvulsant, amnesic, sedative, hypnotic and euphoric properties. In terms of molecular function, G-protein coupled neurotransmitter receptor activity (GO:0099528) is ranked top 3. The G protein coupled receptor (GPCR) promotes an associated G protein by catalyzing the exchange of GDP to GTP. This process causes the G protein to be activated and participate in the next step of signaling. The receptor for serotonin is a G-protein-coupled receptor [35]. KEGG Pathways show that the targets of the main active ingredients of alternative herbs mainly act on the AGE-RAGE signaling pathway, which is important in in diabetic complications, cancer, and hepatitis. deoxyneocryptotanshinone, dihydrotanshinlactone, dihydrotanshi-noneI, epidanshenspiroketallactone, C09092, isocryptotanshi-none, Isotanshinone II, miltionone I, Miltirone, neocryptotanshinone ii, 1methyl-8,9-dihydro-7H-naphtho[5,6-g]benzofuran-6,10,11-trione, salviolone, tanshinone iia, (6S)-6-(hydroxymethyl)-1,6-dimethyl-8,9dihydro-7H-naphtho [8,7-