Drug repositioning for non-small cell lung cancer by using machine learning algorithms and topological graph theory
© Huang et al. 2016
Published: 11 January 2016
Non-small cell lung cancer (NSCLC) is one of the leading causes of death globally, and research into NSCLC has been accumulating steadily over several years. Drug repositioning is the current trend in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development process of drugs, as well as reducing side effects.
This work integrates two approaches - machine learning algorithms and topological parameter-based classification - to develop a novel pipeline of drug repositioning to analyze four lung cancer microarray datasets, enriched biological processes, potential therapeutic drugs and targeted genes for NSCLC treatments. A total of 7 (8) and 11 (12) promising drugs (targeted genes) were discovered for treating early- and late-stage NSCLC, respectively. The effectiveness of these drugs is supported by the literature, experimentally determined in-vitro IC50 and clinical trials. This work provides better drug prediction accuracy than competitive research according to IC50 measurements.
With the novel pipeline of drug repositioning, the discovery of enriched pathways and potential drugs related to NSCLC can provide insight into the key regulators of tumorigenesis and the treatment of NSCLC. Based on the verified effectiveness of the targeted drugs predicted by this pipeline, we suggest that our drug-finding pipeline is effective for repositioning drugs.
KeywordsNon-small cell lung cancer Drug repositioning Microarray data analysis Machine learning algorithm Topological parameters Protein-protein interactions Enrichment analysis Connectivity Map
Lung cancer is the leading cause of death globally  and non-small cell lung cancer (NSCLC) accounts for more than 85 % of all lung cancer cases; adenocarcinoma is the most common subtype. Many efforts have been made to development treatments for NSCLC, and they depend on finding suitable drugs for treating NSCLC within an effective time and at reasonable cost.
Drug repositioning by the Food and Drug Administration (FDA) involves approving drugs with known side effects; it has become a major trend and seen some success. Stachnik et al.  showed that bisphosphonates can potentially be repurposed for the prevention and adjunctive therapy of HER1-driven cancers (such as NSCLC and breast cancers). Having constructed a drug-disease bipartite network, Chen et al.  utilized two inference methods, ProbS and HeatS, to predict direct drug-disease associations based on node degree in the network. Lee et al.  integrated the shared neighborhood scoring algorithm with a database of disease indications, drug development, and associated proteins, to identify new indications for known FDA-approved drugs. In earlier studies , , based on PPI (protein-protein interaction) community, we established a systematic strategy for identifying potential drugs and target genes for treating NSCLC, which can be extended in several respects that are addressed in the present study. Those two previous studies did not use the four features of machine learning algorithms that are used herein, and were proposed in our work in 2015 on the prediction of cancer proteins .
The machine learning method and the topological properties of biological networks have been used separately to identify cancer-related genes. For example, Bull et al.  utilized proteins’ hydrophobicities, in vivo half-lives, propensity for being membrane-bound and the fraction of non-polar amino acids as features in the Random Forest classifier to predict drug targets. Carson et al.  utilized topological metrics, such as betweenness centralities, neighborhood connectivity and radiality, as features and used an alternating decision tree (ADTree) classifier to identify disease-associated genes. Many works on identifying repositioned drugs have been based on various computational methods, such as mapping gene expression profiles using drug response profiles –, the use of side-effect-based similarities –, heterogeneous network clustering , and the graph-based inference method –. However, most of these methods are either disease-centric or drug-centric. To the best of the authors’ knowledge, few works have addressed the problem of drug repositioning by integrating machine learning methods, graph theory and meta-analysis. This work integrates two state-of-art methods - machine learning  and the graphing of topological properties  - to develop a new pipeline to identify potential therapeutic drugs and targeted genes for treating NSCLC.
In solving the targeted drug problem, the following issues must be addressed. First, different individuals may correspond to different sets of differentially expressed genes. Second, cancer is a heterogeneous disease: different stages of cancer require different drug targets and involve stage-specific cancer-associated genes. Third, the results of microarray profiling vary from study to study and a rigorous method is required to solve this problem. Fourth, the reliability of drug finding remains to be verified.
This study deals with the above four issues. First, to reduce the effect of biological heterogeneity among different individuals, tumor/adjacent non-tumor pairwise arrays for NSCLC were used, allowing pairwise statistical testing. Second, the samples were grouped into early-stage and late-stage samples. Third, meta-analysis was carried out to integrate multiple microarray profiles and results. Finally, potential drugs were validated by performing biochemical assays and with reference to the literature.
Microarray data for lung cancer were firstly separated into the early- and late-stage data. Two-pair tests (based on normal and cancer tissues from the same patient) were performed to identify differentially expressed genes (DEGs). A Robust Multi-array Average (RMA) was utilized to normalize gene expression, and eBayes analysis was then performed on the results thereof. DEGs were predicted using an adjusted p-value of 0.05. The selected DEGs were divided into two groups - an up-regulated group and a down-regulated group - based on the fold-changes (FC) in gene expression. These selected DEGs are separately filtered using machine learning classifiers and graph theory, and two corresponding sets of key genes are then derived. Gene set enrichment analysis and pathway analysis were then conducted on the two sets of key genes, and drug-gene interaction databases and the Connectivity Map (cMap) were used to identify potential drugs (with cMap p-value <0.1 and enrichment score <0) for treating NSCLC. The common enriched pathways and drugs that were returned by both machine learning algorithms and the classification of topological parameters were further investigated. The predictions of targeted drugs were confirmed by IC50 experiments, a review of the literature and clinical trials. Finally, the targeted genes were prioritized for reference.
Summary of microarray datasets
Number of samples (Early-stage)
Number of samples (Late-stage)
Taipei Veterans General Hospital
National Cancer Institute, NIH
National Taiwan University
National Yang Ming University
To reduce the effect on integrating of biological heterogeneity among individuals, normal and cancer tissues were taken from each patient. Two-pair tests (on these normal and cancer tissues are taken from the same patient) were performed to identify differentially expressed genes (DEGs). Samples were divided into early- and late-stage samples. Early-stage samples were taken from patients with stage I, IA or IB cancer, while late-stage data were obtained from patients with stages III or IV cancer .
Microarray data analysis
In this study, the publicly available microarray data analysis package Bioconductor was used to identify DEGs among a large number of gene expressions. Based on whether the log base 2 of the fold-change (FC) values for gene expression, log2FC, was less than or greater than zero, the selected DEGs were divided into two groups - up-regulated (up probes in Fig. 1) and down-regulated (down probes in Fig. 1), respectively. The FC value of any gene expression level with a fold change value of less than 5.64 was set to 5.64 to facilitate the cMap  search.
Machine learning algorithms
In the previous study , we developed a simple and effective machine learning method, based on domain-domain interactions (DDI), weighted domain frequency score (DFS) and cancer linker degree data (CLD) to predict cancer proteins. We used the one-to-one interaction model to quantify the likelihood that was a cancer-specific DDI; the weighted DFS feature is used to measure the propensity of a domain to be present in cancer and non-cancer proteins, and the CLD feature is defined to identify the partners with which cancer and non-cancer proteins interact. The machine learning algorithms was implemented in the Weka software tool, and a ten-fold cross-validation test was used to train the supervised model. Based on our previous studies , , a balanced data set typically provides better performance than an unbalanced one, so, the machine learning algorithms were trained using positive and negative datasets that contained equal numbers of data.
Experimental results revealed that the proposed machine learning method identified cancer proteins with relatively high hit ratios (about 80 %). Five classifiers – three with the highest F1 values – the LMT, SimpleCart and J48 algorithms, and two with the highest AUC values – the LWL and Ridor algorithms, were used to identify potential cancer genes under strictly uniform voting, meaning that only a protein that was predicted by all five classifiers to be a cancer protein was considered. In the machine learning approach, the up- (down-) regulated DEGs in each microarray data are processed individually for each microarray.
Classification of topological parameters
The topological features provide valuable information for identifying crucial genes and clusters in a biological network. Recently, we proposed the identification of critical nodes for a network using topological parameters . The five classified groups are: group 1: degree centrality; group 2: betweenness centrality; group 3: bridging centrality; group 4: closeness centrality and eccentricity centrality; group 5: clustering coefficient, brokering coefficient and local average connectivity. This classification enables nodes to be ranked by their topological importance in the networks. To apply topological parameter classification in this study, common up- (down-) regulated DEGs for the microarray datasets must be firstly extracted. Next, for early- and late-stage NSCLC, the corresponding up- (down-) regulated network was constructed by using the common DEGs for all microarray datasets and their neighbors in protein-protein interactions. The up- and down-regulated networks for early- and late-stage NSCLC are inputs for the topological parameter-based classification.
Enrichment analysis of gene set
Given a gene list, DAVID  performs batch annotation and GO  term enrichment analysis to highlight the most relevant GO terms. In contrast, the ConsensusPathDB (CPDB)  resource performs gene set analysis and metabolite set analysis. To find the enriched pathways of the proposed genetic signature for NSCLC, an over-representation pathway analysis was performed using both DAVID and CPDB using a p-value threshold of 0.05. Significant pathways were ranked by p-value. Both tools were utilized in this stage for cross-verification.
Potential target genes and drug discovery
The two sets of key genes that were obtained using machine learning algorithms and topological parameter-based classification were grouped up- and down-regulated genes to query the cMap database, which retained potential drugs with p-values of less than 0.05. Drugs that were output by cMap were mapped, and finally identified with known drug targets in the up- or down-regulated cancer PPI network.
Combining datasets raises some issues, such as the problem of data heterogeneity, varying sample sizes, and the problem of data dependence. In principle, these issues can be resolved using meta-analysis. Meta-analysis ,  is a set of statistical methods for summarizing the results of several investigations as a single value. The advantage of meta-analysis is that it can identify relationships across many studies.
where F i tests (χ 2 test with 2 N degrees of freedom, where N is the sample size) the null hypothesis for gene i, and indices i and j indicate the ith gene in the jth dataset respectively.
and the variance of z is defined as Vz = 1 / (N – 3), where N is the sample size. The variance of z is approximately proportional to N-3 (as proved by R. A. Fisher), which is independent of the value of the correlation among the population from which the sample drawn .
where SE M is the standard error and equals .
The formula for the random-effects model can be found in a monograph that was written by Borenstein . The above analyses enable the confidence interval of the ES to be determined.
The meta-analysis involves two models - the fixed-effect model and the random-effect mod-el . In the fixed-effect model, only one true effect size is assumed to exist, and all differences among studies or batches are assumed to be caused by sampling errors only. In contrast, the random-effect model allows the effect size to vary among studies, and allows an effect size to be estimated for each study. This work considers both models.
where k, W, Y and M are the number of studies, the study weight, the size of the effect of interest in the study and the summary effect, respectively.
A p-value of 0.1 for I 2 statistics is used as the threshold for statistical significance. A p-value of larger than or equal to 0.1 indicates little variation among batches, and that a fixed-effect model may therefore be appropriate; otherwise, the random-effect model applies . The I 2 value represents the degree of heterogeneity: an I 2 of less than 25 % implies no heterogeneity, whereas a value of larger than 75 % indicates extremely high heterogeneity.
If the studies are homogenous, then they are likely to have tested the same hypothesis. If estimates are heterogeneous, then the studies probably did not test the same hypothesis. Therefore, all of the study results may not be able to be combined in a single meta-analysis. In such a case, a separate meta-analysis, such as a meta-regression analysis, must be performed for various subsets of studies .
MTT™ cell viability test
To determine the effective cytotoxicity of screening drugs, MTT assay was used for cell viability and proliferation. In general, all incubated cancer cell lines (A549 and H460) were seeded in a 96-well microplate for up to 24 h dependent on the baseline growth rate. After incubation, candidate drugs were added into the plate and incubated together for 72 h. For performing the assay, 50 μl MTT solution (2 mg/ml) per well was added and incubated at 37 °C for 2 h. The 150 μl supernatant per well was then extracted and DMSO was filled to dissolve the recipe. The absorbance was set up at 570 nm and calculated by using ELISA reader (Infinite® M1000, TECAN, Switzerland). Ratio decrease comparing to the control group as 100 % viable was seemed as the inhibitory effect.
We use two different high clonogenic lung cancer cell lines, A549 and H460 to perform the clonogenic assay. Cells were diluted to 500 cells per well and then seeded in 6-well plates up to 10 days according to the growth rate. Each well contained 1.5 ml RPMI medium as culture condition and screening compounds were added 24 h after the seeding. For the longer duration of incubation, medium and compounds were changed every 4 days. For performing the assay, cells were washed with PBS, and then the attached colonies were fixed with acetic acid (1: 3 diluted in methanol). The fixed colonies were stained with 0.5 % crystal violet. The colonies were then counted manually after removing the excess crystal violet and rinsing with tap water.
Microarray data analysis
In this study, multiple microarray source data were used for analysis. The Robust Multi-array Average (RMA) was used to normalize gene expression. DEGs were predicted using an adjusted p-value of 0.005. Integrating DEGs data with the BioGrid  PPI data yielded a list of binary interactions among DEGs for both up and down groups.
The fact that that the use of various microarray platforms may raise the problem of heterogeneity is a concern, which can be tackled in the following two steps; (i) select common DEGs among all platforms for further analysis, and (ii) perform meta-analysis and test heterogeneity to determine whether the fixed-effect model or the random-effect model should be used.
Results of machine learning
The number of DEGs derived from the machine learning method for each microarray dataset
Number of DEGs
Number of predicted key genes
Results of topological parameter-based classification
To identify key genes in the up- and down-regulated networks respectively the following process was implemented. For each group of DEGs that is classified by a topological parameter, a DEG that ranks in the top 20 % in that parameter will receive a score (S) of one. Clearly, a higher score for a DEG indicates greater importance in the network. DEGs with the highest scores in each group are selected for key genes. The key genes are the union of the two sets with the highest-scoring DEGs in the up- and down-regulated networks. In this work, this stage yielded 104 and 123 key genes for the early- and late-stage NSCLC, respectively. Focusing on the top 10 % rather than 20 % yields only 41 and 56 key genes for the early- and late-stage NSCLC. Relaxing the threshold to 30 % yields 170 and 200 key genes, respectively, which are too many; therefore, top 20 % of classified genes were chosen for key genes.
Enriched biological pathways
In the machine learning method, the selected DEGs are microarray-specific. Common DEGs were collected from all microarray datasets as the key genes for biological pathway analysis. The key genes that were selected by topological parameter-based classification of genes in up- and down-regulated networks are merged into a single set. The two sets of key genes from the different approaches are submitted to DAVID and CPDB to extract the common enriched biological pathways.
The common pathways by using DAVID and CPDB for early-stage NSCLC (the p M -value and p T -value represent the corresponding p-value obtained by machine learning algorithms and topological parameter-based classification)
p M -value
p T -value
p M -value
p T -value
p M -value
p T -value
p M -value
p T -value
Hematopoietic cell lineage
Regulation of PLK1 Activity at G2/M Transition
Metabolism of nucleotides
Cell junction organization
Platelet activation, signaling and aggregation
In REACTOME, cell-cell communication, glucose metabolism, regulation of PLK1 activity at the G2/M transition, metabolism of nucleotides, organization of the cell junction and platelet activation, signaling and aggregation are enriched pathways for early NSCLC. Of them, glucose metabolism is like glycolysis/gluconeo-genesis and has been previously determined to be related to cancers. Tominaga et al.  demonstrated that cancer-derived extracellular vesicles (EVs), which are mediators of cell–cell communication, trigger the breakdown of the blood–brain barrier, which controls the migration of cancer cells. Arid and Zhang proposed that nucleotide metabolism causes tumor progression, and considered how this pathway can be targeted for cancer therapy by inducing the senescence of cancer cells . Several cell junction components have functions that are associated with cell polarity and growth control and are specifically disrupted in cancerous cells . PLK1 seems to be involved in the tumor suppressor p53-related pathways. Evidence suggests that PLK1 inhibits the transactivation and pro-apoptotic functions of p53 by physical interaction and phosphorylation . Additionally, in cancer growth and dissemination, complex interactions between tumor cells and circulating platelets are critical. Evidence supports a role for physiological platelet receptors and platelet agonists in cancer metastases and angiogenesis .
The common paths using DAVID and CPDB for late-stage NSCLC (the p M -value and p T -value represent the corresponding p-value obtained by machine learning algorithms and topological parameter-based classification)
p M -value
p T -value
p M -value
p T -value
Cell Cycle Checkpoints
Cell Cycle, Mitotic
p M -value
p T -value
p M -value
p T -value
Regulation of mitotic cell cycle
Inflammatory mediator regulation of TRP channels
APC/C:Cdc20 mediated degradation of mitotic proteins
Activation of APC/C and APC/C: Cdc20 mediated degradation of mitotic proteins
Thyroid hormone synthesis
Cell Cycle Checkpoints
Regulation of APC/C activators between G1/S and early anaphase
cGMP-PKG signaling pathway
Vascular smooth muscle contraction
Cdc20:Phospho-APC/C mediated degradation of Cyclin A
Cell Cycle, Mitotic
APC:Cdc20 mediated degradation of cell cycle proteins prior to satisfaction of the cell cycle checkpoint
Mitotic G1-G1/S phases
Resolution of Sister Chromatid Cohesion
Apoptotic cleavage of cellular proteins
Mitotic Metaphase and Anaphase
Apoptotic execution phase
Synthesis of DNA
Separation of Sister Chromatids
DNA replication, repair and checkpoint activation pathways are highly regulated and coordinated. Defects in any of these functions cause genomic instability and may lead to cancer . For example, BRCA2 participates in homologous recombination and regulating the S-phase checkpoint, and mutations of deficiencies in BRCA2 are strongly associated with tumorigenesis .
Table 4 agrees closely with the results of our previous work , which also identified cell-cycle, the mitotic anaphase, DNA replication, the sister-chromatid segregation process, the Cdc20:Phospho-APC/C-mediated degradation of Cy-clin A, the M-phase and mitotic G1-G1/S phases.
Although defective apoptosis is critical to the development and progression of cancer, apoptosis is important in the treatment of cancer as it is a popular target of many treatment strategies .
Wong et al.  noted that PKG-Iα kinase activity is necessary to maintaining high levels of cAMP response element binding (CREB) phosphorylation at ser133, and promotes the formation of colonies in NSCLC cells. The gene expression signature of the responses of vascular smooth muscle contraction to serum exposure is associated with a significantly poorer prognosis in cases of human cancer, and vascular injury response is therefore potentially linked to tumor progression .
According to Table 4, the mitotic process and CDC20 are involved in many enriched pathways. Mitotic progression and sister-chromatid segregation are controlled by the anaphase promoting complex/cyclosome (APC/C). APC/C forms a protein complex with its mitotic co-activator, CDC20, which controls mitotic progression. CDC20 protein level may directly influence the fate of cells during prolonged mitotic arrest and its turnover rate may critically affect the response of a cancer patient to anti-mitotic therapies .
In summary, combining machine learning methods with the classification of topological parameters reveals many cancer related pathways, which are well supported by the literature, providing insight into key regulators of the tumorigenesis of NSCLC.
Potential drugs for treating NSCLC and their targeted genes
The number of potential drugs filtered by meta-analysis for early- and late- stage NSCLC using the enrichment score (ES) and cMap p-value (less than 0.1) for meta-analysis
ES < 0 & cMap p-value <0.1
ES < 0 & cMap p-value <0.5
ES < 0 & p MA -value <0.05
ES < 0 & p MA -value <0.1
The number of common drugs and JI score for early- and late-stage using the enrichment score (ES) and cMap p-value (less than 0.1) for meta-analysis
ES < 0 & p MA -value <0.1
ES < 0 & cMap p-value < 0.1
ES < 0 & p MA -value <0.1
ES < 0 & cMap p-value < 0.1
IC50 values of potential drugs for early-stage NSCLC
Machine learning algorithms
cMap drug name
Topological parameter-based classification
cMap drug name
IC50 values of potential drugs for late-stage NSCLC
Machine learning algorithms
cMap drug name
Topological parameter classification
cMap drug name
The common drugs identified by both two methods
Some of the above common drugs have been undergoing clinical trials for NSCLC treatment, including mepacrine (clinical trial NCT01839955), MS-275 (clinical trial NCT02437136) and Vorinostat (clinical trial NCT00667082). The results in this study are consistent with our previous work ; both studies identified nine drugs, of which had cytotoxic effects that were validated by IC50 experiments. These three drugs are trichostatin A, vorinosta and nortriptyline. The potential use in lung cancer treatment warrants further exploration. Notably, Ref.  treated the early stage and late stage on the same footing, it is not stage-specific.
The machine learning approach has similar hit ratio to the topological parameter-based approach (early-stage: 8/60 vs. 2/17; late-stage: 5/49 vs. 5/37), as supported by in vitro IC50 measurements. Combining the machine learning approach with the topological parameter-based classification yielded the best hit ratio. The current method has a higher prediction accuracy (early-stage: 2/7 vs. 1/7; late-stage: 2/11 vs. 7/65) than the method of Ref. , consistent with the IC50 measurements.
The targeted genes identified by the common drugs derived from both two methods (the parentheses represent the number of associated cMap drugs)
A concern arises regarding how the p-values that are obtained by different methods are combined. In fact, only the p-values and the enrichment scores (ES) that were obtained from cMap are combined in meta-analysis. Please refer to the workflow in Fig. 1. Four p-values were obtained by (1) identification of DEGs, (2) gene set enrichment analysis, (3) cMap drug analysis and (4) meta-analysis of cMap drugs.
The p-values that were obtained in the DEG analysis are used to identify significant DEGs. Also, the p-values that were obtained in (2) and (3) are not related to each other, and they do not have to be combined. Since different microarray datasets yielded different drug predictions, meta-analysis was conducted using the cMap p-values and ES to achieve results in which confidence is high.
Some missense mutations and non-synonymous SNPs (nsSNPs) may damage protein functions, disrupting the drug actions. Our future work will account for this effect. Numerous web-based tools are available to facilitate such analysis. PolyPhen2  is a tool that predicts the impact of an amino acid substitution on the function and structure of a protein using sequence-based and structure-based features. SNPdryad  is a web-based tool that elucidates the effect of nsSNPs based on multiple sequence alignments of orthologous proteins. MutationTaster  is another tool that uses NGS data to elucidate the effect of missense mutations on the expression and function of proteins.
In this study, two methods - machine learning algorithms and topological parameter-based classification - are compared and combined to identify potential reliable drugs for treating NSCLC, and meta-analysis is used to solve the problem of data heterogeneity. Since cancer is a multistage progressive disease, early- and late-stage cancer-related genes potentially differ substantially. Therefore, the proposed method was used to identify stage-specific DEGs, biological pathways and potential drugs. Some of the extracted biological pathways are supported by the literature, and some of the results herein concerning the identified drugs are supported by IC50 experiments. Seven and 11 potential drugs are discovered for treating early- and late-stage NSCLC, respectively, and warrant further investigation. Among them, perhexiline and trichostatin A are supported by the previous research. Interestingly, the UBC gene dominates all of the targeted genes associated with early- and late-stage NSCLC, so its role in the cancer pathway warrants further investigation.
Integrating machine learning algorithms and topological parameter-based classification herein increased drug prediction accuracy over that achieved in any previous research. This improvement is confirmed by IC50 experiments. The overlap of our discovered drug candidates with those that are undergoing clinical trials or are identified in the literature demonstrates the effectiveness of the proposed methods. The performance of the proposed methods can be further improved by incorporating more microarray datasets or verified gene-drug associations. In summary, many techniques were integrated to develop a novel pipeline of therapeutic drugs for NSCLC, and the efficiency of this pipeline was investigated. The approaches that were developed in this work are expected to inspire future studies, and the pipeline may be extended to the treatment of other diseases.
The work of Chien-Hung Huang is supported by the Ministry of Science and Technology of Taiwan (MOST) under grant MOST 104-2221-E-150-039. The work of Ka-Lok Ng is supported by MOST under grants MOST 102-2632-E-468-001-MY3 and MOST 104-2221-E-468-012, and also supported by Asia University under the grants 103-asia-06. The work of Peter Mu-Hsin Chang is supported by MOST under grant MOST 103-2314-B-075-025 and Taipei Veterans General Hospital under grant V104C-090. The work of Chi-Ying Huang is supported by MOST under grant MOST104-2325-B-010-002 and by Center of Excellence for Cancer Research at Taipei Veterans General Hospital phase II: Integrated approach to reduce cancer incidence and mortality (MOHW104-TDU-B-211-124-001). Ted Knoy is appreciated for his editorial assistance.
The source of funding for publication is supported by the Ministry of Science and Technology of Taiwan (MOST) under grant MOST 104-2221-E-150-039. This article has been published as part of BMC Bioinformatics Volume 17 Supplement 1, 2016: Selected articles from the Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016). The full contents of the supplements are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/17/S1.
- Berman AT, James SS, Rengan R: Structure, mechanism, and evolution of the mRNA capping apparatus. Cancers (Basel). 2015, 7 (3): 1178-90. 10.3390/cancers7030831.View ArticleGoogle Scholar
- Stachnik A, Yuen T, Iqbal J, Sgobba M, Gupta Y, Lu P, et al: Repurposing of bisphosphonates for the prevention and therapy of nonsmall cell lung and breast cancer. Proc Natl Acad Sci U S A. 2014, 111 (50): 17995-8000. 10.1073/pnas.1421422111.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen H, Zhang H, Zhang Z, Cao Y, Tang W: Network-based inference methods for drug repositioning. Comput Math Methods Med. 2015, 2015: 130620-PubMedPubMed CentralGoogle Scholar
- Lee HS, Bae T, Lee JH, Kim DG, Oh YS, Jang Y, et al: Rational drug repositioning guided by an integrated pharmacological network of protein, disease and drug. BMC Syst Biol. 2012, 6: 80-10.1186/1752-0509-6-80.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang CH, Wu MY, Chang PM, Huang CY, Ng KL: In silico identification of potential targets and drugs for non-small cell lung cancer. IET Syst Biol. 2014, 8 (2): 56-66. 10.1049/iet-syb.2013.0035.View ArticlePubMedGoogle Scholar
- Huang CH, Chang PM, Lin YJ, Wang CH, Huang CY, Ng KL: Drug repositioning discovery for early- and late-stage non-small-cell lung cancer. Biomed Res Int. 2014, 2014: 193817-PubMedPubMed CentralGoogle Scholar
- Huang CH, Peng HS, Ng KL: Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms. Biomed Res Int. 2015, 2015: 312047-PubMedPubMed CentralGoogle Scholar
- Bull SC, Doig AJ: Properties of protein drug target classes. PLoS One. 2015, 10 (3): 10.1371/journal.pone.0117955. Article ID e0117955Google Scholar
- Carson MB, Lu H: Network-based prediction and knowledge mining of disease genes. BMC Med Genomics. 2015, 8 (Suppl 2): S9-10.1186/1755-8794-8-S2-S9.View ArticlePubMedPubMed CentralGoogle Scholar
- Iorio F, Bosotti R, Scacheri E, Belcastro V, Mithbaokar P, Ferriero R, et al: Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc Natl Acad Sci U S A. 2010, 107 (33): 14621-14626. 10.1073/pnas.1000138107.View ArticlePubMedPubMed CentralGoogle Scholar
- Dudley JT, Sirota M, Shenoy M, Pai RK, Roedder S, Chiang AP, et al: Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci Transl Med. 2011, 3 (96): 96ra76-10.1126/scitranslmed.3002648.View ArticlePubMedPubMed CentralGoogle Scholar
- Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, et al: Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med. 2011, 3 (96): 96ra77-10.1126/scitranslmed.3001318.View ArticlePubMedPubMed CentralGoogle Scholar
- Pacini C, Iorio F, Goncalves E, Iskar M, Klabunde T, Bork P, et al: DvD: An R/Cytoscape pipeline for drug repurposing using public repositories of gene expression data. Bioinformatics. 2013, 29 (1): 132-134. 10.1093/bioinformatics/bts656.View ArticlePubMedGoogle Scholar
- Fortney K, Griesman J, Kotlyar M, Pastrello C, Angeli M, Sound-Tsao M, et al: Prioritizing therapeutics for lung cancer: an integrative meta-analysis of cancer gene signatures and chemogenomic data. PLoS Comput Biol. 2015, 11 (3): 10.1371/journal.pcbi.1004068. Article ID e1004068Google Scholar
- Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P: Drug target identification using side-effect similarity. Science. 2008, 321 (5886): 263-266. 10.1126/science.1158140.View ArticlePubMedGoogle Scholar
- Yang L, Agarwal P: Systematic drug repositioning based on clinical side-effects. PLoS One. 2011, 6 (12): 10.1371/journal.pone.0028025. Article ID e28025Google Scholar
- Duran-Frigola M, Aloy P: Recycling side-effects into clinical markers for drug repositioning. Genome Med. 2012, 4 (1): 3-10.1186/gm302.View ArticlePubMedPubMed CentralGoogle Scholar
- Hurle MR, Yang L, Xie Q, Rajpal DK, Sanseau P, Agarwal P: Computational drug repositioning: from data to therapeutics. Clin Pharmacol Ther. 2013, 93 (4): 335-341. 10.1038/clpt.2013.1.View ArticlePubMedGoogle Scholar
- Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, et al: Predicting new molecular targets for known drugs. Nature. 2009, 462 (7270): 175-181. 10.1038/nature08506.View ArticlePubMedPubMed CentralGoogle Scholar
- Chiang AP, Butte AJ: Systematic evaluation of drug-disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther. 2009, 86 (5): 507-510. 10.1038/clpt.2009.103.View ArticlePubMedPubMed CentralGoogle Scholar
- Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, et al: Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol. 2012, 8 (5): 10.1371/journal.pcbi.1002503. Article ID e1002503Google Scholar
- Fukuoka Y, Takei D, Ogawa H: A two-step drug repositioning method based on a protein-protein interaction network of genes shared by two diseases and the similarity of drugs. Bioinformation. 2013, 9 (2): 89-93. 10.6026/97320630009089.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang CH, Peng HS, Ng KL: Graph theory and stability analysis of protein complex interaction networks. IET Syst Biol. 2015Google Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al: NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013, 41 (Database issue): D991-5. 10.1093/nar/gks1193.View ArticlePubMedGoogle Scholar
- Su LJ, Chang CW, Wu YC, Chen KC, Lin CJ, Liang SC, et al: Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme. BMC Genomics. 2007, 8: 140-10.1186/1471-2164-8-140.View ArticlePubMedPubMed CentralGoogle Scholar
- Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, et al: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS One. 2008, 3 (2): 10.1371/journal.pone.0001651. Article ID e1651Google Scholar
- Lu TP, Tsai MH, Lee JM, Hsu CP, Chen PC, Lin CW, et al: Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women. Cancer Epidemiol Biomarkers Prev. 2010, 19 (10): 2590-7. 10.1158/1055-9965.EPI-10-0332.View ArticlePubMedGoogle Scholar
- Wei TY, Juan CC, Hisa JY, Su LJ, Lee YC, Chou HY, et al: Protein arginine methyltransferase 5 is a potential oncoprotein that upregulates G1 cyclins/cyclin-dependent kinases and the phosphoinositide 3-kinase/AKT signaling cascade. Cancer Sci. 2012, 103 (9): 1640-50. 10.1111/j.1349-7006.2012.02367.x.View ArticlePubMedGoogle Scholar
- Lamb J: The Connectivity Map: a new tool for biomedical research. Nat Rev Cancer. 2007, 7 (1): 54-60. 10.1038/nrc2044.View ArticlePubMedGoogle Scholar
- Huang CH, Chou SY, Ng KL: Improving protein complex classification accuracy using amino acid composition profile. Comput Biol Med. 2013, 43 (9): 1196-1204. 10.1016/j.compbiomed.2013.05.026.View ArticlePubMedGoogle Scholar
- Kurubanjerdjit N, Huang CH, Lee Y, Tsai JP, Ng KL: Prediction of microRNA-regulated protein interaction pathways in Arabidopsis using machine learning algorithms. Comput Biol Med. 2013, 43 (11): 1645-1652. 10.1016/j.compbiomed.2013.08.010.View ArticlePubMedGoogle Scholar
- da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4 (1): 44-57. 10.1038/nprot.2008.211.View ArticlePubMedGoogle Scholar
- Gene Ontology Consortium: The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 2006, 34 (Database issue): D322-6. 10.1093/nar/gkj021.View ArticleGoogle Scholar
- Kamburov A, Wierling C, Lehrach H, Herwig R: ConsensusPathDB--a database for integrating human functional interaction networks. Nucleic Acids Res. 2009, 37 (Database issue): D623-8. 10.1093/nar/gkn698.View ArticlePubMedGoogle Scholar
- Wolf FM: Meta-Analysis: Quantitative Methods for Research Synthesis. 1986, Sage publications, CaliforniaView ArticleGoogle Scholar
- Borenstein M, Hedges LV, Higgins JPT, Rothstein HR: Introduction to meta-analysis. 2009, Wiley press, United KingdomView ArticleGoogle Scholar
- Erich L, Lehmann -Fisher, Neyman: the Creation of Classical Statistics. Ch 2, Fisher's Testing Methodology. Springer Science & Business Media; 2011. p. 24.Google Scholar
- Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, et al: The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008, 36 (Database issue): D637-40.PubMedGoogle Scholar
- Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, et al: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011, 39 (Database issue): D691-7. 10.1093/nar/gkq1018.View ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32 (Database issue): D277-80. 10.1093/nar/gkh063.View ArticlePubMedPubMed CentralGoogle Scholar
- Ganapathy-Kanniappan S, Geschwind JF: Tumor glycolysis as a target for cancer therapy: progress and prospects. Mol Cancer. 2013, 12: 152-10.1186/1476-4598-12-152.View ArticlePubMedPubMed CentralGoogle Scholar
- Gillies RJ, Robey I, Gatenby RA: Causes and consequences of increased glucose metabolism of cancers. J Nucl Med. 2008, 49 (Suppl 2): 24S-42S. 10.2967/jnumed.107.047258.View ArticlePubMedGoogle Scholar
- Lanzetti L, Di Fiore PP: Endocytosis and cancer: an 'insider' network with dangerous liaisons. Traffic. 2008, 9 (12): 2011-21. 10.1111/j.1600-0854.2008.00816.x.View ArticlePubMedGoogle Scholar
- McCubrey JA, Steelman LS, Chappell WH, Abrams SL, Wong EW, Chang F, et al: Roles of the Raf/MEK/ERK pathway in cell growth, malignant transformation and drug resistance. Biochim Biophys Acta. 2007, 1773 (8): 1263-84. 10.1016/j.bbamcr.2006.10.001.View ArticlePubMedGoogle Scholar
- Holder JW, Elmore E, Barrett JC: Gap junction function and cancer. Cancer Res. 1993, 53 (15): 3475-85.PubMedGoogle Scholar
- Leithe E, Sirnes S, Omori Y, Rivedal E: Downregulation of gap junctions in cancer cells. Crit Rev Oncog. 2006, 12 (3–4): 225-56. 10.1615/CritRevOncog.v12.i3-4.30.View ArticlePubMedGoogle Scholar
- Tominaga N, Kosaka N, Ono M, Katsuda T, Yoshioka Y, Tamura K, et al: Brain metastatic cancer cells release microRNA-181c-containing extracellular vesicles capable of destructing blood–brain barrier. Nat Commun. 2015, 6: 6716-10.1038/ncomms7716.View ArticlePubMedPubMed CentralGoogle Scholar
- Aird KM, Zhang R: Nucleotide metabolism, oncogene-induced senescence and cancer. Cancer Lett. 2015, 356 (2 Pt A): 204-10. 10.1016/j.canlet.2014.01.017.View ArticlePubMedGoogle Scholar
- Gates KL, Howell HA, Nair A, Vohwinkel CU, Welch LC, Beitel GJ, et al: Hypercapnia impairs lung neutrophil function and increases mortality in murine pseudomonas pneumonia. Am J Respir Cell Mol Biol. 2013, 49 (5): 821-8. 10.1165/rcmb.2012-0487OC.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu X, Erikson RL: Polo-like kinase (Plk)1 depletion induces apoptosis in cancer cells. Proc Natl Acad Sci U S A. 2003, 100 (10): 5789-94. 10.1073/pnas.1031523100.View ArticlePubMedPubMed CentralGoogle Scholar
- Bambace NM, Holmes CE: The platelet contribution to cancer progression. J Thromb Haemost. 2011, 9 (2): 237-49. 10.1111/j.1538-7836.2010.04131.x.View ArticlePubMedGoogle Scholar
- Mazouzi A, Velimezi G, Loizou JI: DNA replication stress: causes, resolution and disease. Exp Cell Res. 2014, 329 (1): 85-93. 10.1016/j.yexcr.2014.09.030.View ArticlePubMedGoogle Scholar
- Venkitaraman AR: Cancer susceptibility and the functions of BRCA1 and BRCA2. Cell. 2002, 108 (2): 171-82. 10.1016/S0092-8674(02)00615-3.View ArticlePubMedGoogle Scholar
- Wong RS: Apoptosis in cancer: from pathogenesis to treatment. J Exp Clin Cancer Res. 2011, 30: 87-10.1186/1756-9966-30-87.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong JC, Bathina M, Fiscus RR: Cyclic GMP/protein kinase G type-Iα (PKG-Iα) signaling pathway promotes CREB phosphorylation and maintains higher c-IAP1, livin, survivin, and Mcl-1 expression and the inhibition of PKG-Iα kinase activity synergizes with cisplatin in non-small cell lung cancer cells. J Cell Biochem. 2012, 113 (11): 3587-98. 10.1002/jcb.24237.View ArticlePubMedGoogle Scholar
- Chi JT, Rodriguez EH, Wang Z, Nuyten DS, Mukherjee S, van de Rijn M, et al: Gene expression programs of human smooth muscle cells: tissue-specific differentiation and prognostic significance in breast cancers. PLoS Genet. 2007, 3 (9): 1770-84. 10.1371/journal.pgen.0030164.View ArticlePubMedGoogle Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, 36 (Database issue): D901-6.PubMedGoogle Scholar
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al: A method and server for predicting damaging missense mutations. Nat Methods. 2010, 7 (4): 248-249. 10.1038/nmeth0410-248.View ArticlePubMedPubMed CentralGoogle Scholar
- Wong KC, Zhang Z: SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences. Bioinformatics. 2014, 30 (8): 1112-1119. 10.1093/bioinformatics/btt769.View ArticleGoogle Scholar
- Schwarz JM, Cooper DN, Schuelke M, Seelow D: MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014, 11 (4): 361-2. 10.1038/nmeth.2890.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.