Identifying biomarkers for breast cancer by gene regulatory network rewiring

Wang, Yijuan; Liu, Zhi-Ping

doi:10.1186/s12859-021-04225-1

Volume 22 Supplement 12

Explainable AI methods in biomedical data science

Methodology
Open access
Published: 20 January 2022

Identifying biomarkers for breast cancer by gene regulatory network rewiring

BMC Bioinformatics volume 22, Article number: 308 (2021) Cite this article

3337 Accesses
8 Citations
1 Altmetric
Metrics details

Abstract

Background

Mining gene regulatory network (GRN) is an important avenue for addressing cancer mechanism. Mutations in cancer genome perturb GRN and cause a rewiring in an orchestrated network. Hence, the exploration of gene regulatory network rewiring is significant to discover potential biomarkers and indicators for discriminating cancer phenotypes.

Results

Here, we propose a new bioinformatics method of identifying biomarkers based on network rewiring in different states. It firstly reconstructs GRN in different phenotypic conditions from gene expression data with a priori background network. We employ the algorithm based on path consistency algorithm and conditional mutual information to delete false-positive regulatory interactions between independent nodes/genes or not closely related gene pairs. And then a differential gene regulatory network (D-GRN) is constructed from the rewiring parts in the two phenotype-specific GRNs. Community detection technique is then applied for D-GRN to detect functional modules. Finally, we apply logistic regression classifier with recursive feature elimination to select biomarker genes in each module individually. The extracted feature genes result in a gene set of biomarkers with impressing ability to distinguish normal samples from controls. We verify the identified biomarkers in external independent validation datasets. For a proof-of-concept study, we apply the framework to identify diagnostic biomarkers of breast cancer. The identified biomarkers obtain a maximum AUC of 0.985 in the internal sample classification experiments. And these biomarkers achieve a maximum AUC of 0.989 in the external validations.

Conclusion

In conclusion, network rewiring reveals significant differences between different phenotypes, which indicating cancer dysfunctional mechanisms. With the development of sequencing technology, the amount and quality of gene expression data become available. Condition-specific gene regulatory networks that are close to the real regulations in different states will be established. Revealing the network rewiring will greatly benefit the discovery of biomarkers or signatures for phenotypes. D-GRN is a general method to meet this demand of deciphering the high-throughput data for biomarker discovery. It is also easy to be extended for identifying biomarkers of other complex diseases beyond breast cancer.

Background

Gene regulatory network (GRN) is a model that characterizes the complex relationship between genes in a cell [1]. In a GRN, nodes represent genes and edges describe the regulatory relationships among them. From a physical perspective, the interactions between genes are through their products like proteins and RNAs. The weight of edge describes the direction and strength of an interaction. The alternation or mutation of one gene may affect the activity of many other genes through the network [2, 3].

Cancer is recognized as a complex disease caused by gene mutations, which will perturb the normal interactions among genes and lead to the disorder of connection mode or strength [4,5,6]. In other words, gene mutations cause perturbation and rewiring of GRNs [7, 8]. The rewired interactions generate changes in normal biological processes and that is crucial for cancerogenesis. Thus, the investigation of the rewiring GRN is significant in discovering potential biomarkers of indicating certain phenotypic states.

Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer death in women worldwide [9, 10]. Biomarkers play important roles in its early diagnosis and prognostic evaluation [11,12,13]. Nowadays, the accurate identification of biomarkers for breast cancer early detection is still very challenging. There are some biomarkers that have been validated like BRCA1 and HER2 [14]. However, new biomarkers and their combinations are still urgently needed to quantify the treatment effects with classical clinical prognostic factors. They also indicate the potential risks and pathogenesis of breast cancer [15, 16].

With the development of high-throughput sequencing technologies, an increasing amount of gene expression data become available. Various methods have been developed to find efficient biomarkers from high-throughput data [17,18,19,20]. For instance, the methods construct a dynamic network model and perform a multi-omics data integration for biomarker discovery [21]. However, there are few methods to solve this problem from the perspective of network rewiring, which indicates the dysfunctional mechanism of cancer.

In this paper, we propose a framework to identify potential biomarkers of breast cancer based on network rewiring. The disease and normal GRN are reconstructed from gene expression data with a reliable background GRN. CMI-PC (conditional mutual information-based path consistency) algorithm is employed to delete false positive interactions between independent genes or pairs that are not connected closely in a specific condition from the integrative background network. Comparing the GRNs in the two phenotypic conditions, a differential GRN, called D-GRN, containing the rewired nodes with differential regulations will be extracted. In D-GRN, we detect the community structures which are intensively connected nodes in the form of subnetwork modules. Finally, we apply logistic regression with recursive feature elimination (LR-RFE) to select biomarkers in each module respectively. We use cross-validations to find the optimal number of biomarkers individually. The maximum AUC in these module-based biomarkers achieves 0.985 in the internal validation. The selected biomarkers are also verified in external independent datasets and they achieve the maximum AUC value of 0.989 in classification.

Results

In this work, the proposed biomarker discovery framework focuses on the rewiring gene network between disease and normal conditions. Condition-specific GRNs are reconstructed through the integration of prior knowledge of an integrative background network and phenotypic gene expression data. CMI-PC algorithm is employed to remove redundant regulatory interactions from the background network. D-GRN is extracted from the two specified networks in two different states. We detect the communities in the D-GRN. And then machine learning method is applied to find the best feature combination in classification experiments. The selected features are more likely to be potential biomarkers. Here, we apply our framework to breast cancer and identify potential network-based module biomarkers.

Network rewiring

The reconstructed normal GRN has 430 edges (regulations) and 198 nodes (genes), while the disease GRN has 301 edges and 137 nodes. There are 71 same genes and 115 common edges between them. We merge the same nodes that have different connections and their neighbors to construct a D-GRN which contains 509 regulations and 238 genes.

After community detection, the D-GRN has been divided into 5 modules (in the next section). To illustrate the network rewiring in normal and disease states, Fig. 1a, b show the Module 4’s gene regulatory interactions in normal condition and disease condition respectively. Figure 1c illustrates this part of D-GRN, including 30 nodes. Black, green and red lines represent edges in common, only in the normal network and only in the disease network respectively. Figure 1d shows the gene expression boxplot details in the normal and disease conditions and P values of difference. It can be easily observed that most nodes have significantly different gene expressions between the two conditions. Interestingly, few genes are not differentially expressed, but the regulatory interactions rewire in the two conditions. Instead of the node-centric difference, D-GRN identifies the edge-centric difference between the two phenotypes, i.e., normal and cancer state by the rewiring gene regulations.

To further demonstrate the perturbations in the two GRNs, Fig. 2a, b present the heatmaps of Pearson’s correlation coefficient (PCC) between genes in normal and disease conditions. Obviously, there is a marked difference between them and it proves the effectiveness of our identification of the rewiring GRN across two conditions.

Detected communities

The community detection results in D-GRN are shown in Fig. 3. Different colors correspond to different modules. The 5 modules include 118, 46, 41, 30, and 3 members of genes individually. The global D-GRN is then divided into five functional blocks in the form of network-based modules. These subnetworks provide a pool of module biomarker candidates. To remove the redundant genes in the five detected modules, we perform feature selection for discovering biomarker gene sets respectively.

Breast cancer biomarkers identification and validation

Table 1 lists the selected genes in each module after logistic regression with recursive feature elimination (LR-REF) with tenfold cross-validations. F1-scores in classification experiments are also shown. Due to the number of genes in each module is diverse and some specific genes may have better discrimination abilities, there is a fluctuation of F1-score in the five modules. However, all of them are over 0.86, which means they perform well in the classification of distinguishing disease samples from controls. Figure 4 shows the receiver operating characteristic (ROC) curves of the selected biomarkers underlying the five modules in the internal validation dataset. The highest area under the ROC curve (AUC) value achieves 0.985, and the lowest reaches 0.923.

Table 1 Five module biomarkers after LR-RFE selection

Full size table

For validating of our finding module biomarkers, we perform the classification experiments in the external independent dataset GSE42568, which contains 104 disease samples and 17 controls. Figure 5a demonstrates the ROC curves of each selected gene sets in the independent validation data. As shown, the 5 gene modules all perform well in the classifications. The highest AUC value achieves 0.989, and the lowest AUC value reaches 0.934. In addition, Fig. 5b shows the diverse ability of classification in module biomarkers and in the corresponding random gene sets with the same size. The P-value is 0.0382, indicating a significant difference for them in classification. The results provide evidence that they are potential molecular markers for diagnosing breast cancer.

To further demonstrate the dysfunctions of selected biomarkers, we employ network ontology analysis (NOA) [22] to perform gene ontology (GO) enrichment analysis on the rewired regulatory network across normal and disease states. Table 2 lists the enriched functions in the discovered biomarkers. As shown, some important cancerous dysregulations related ‘metabolic process’, ‘cell cycle’, ‘cell proliferation’ and ‘lymphocyte differentiation’ are significantly enriched. They are consistent with the prior knowledge of breast cancer pathogenesis during occurrence and development [23]. In turn, the functional analysis also provides evidence for the effectiveness of our proposed biomarker discovery method via network rewiring.

Table 2 The enriched GO biological processes of identified biomarkers in D-GRN

Full size table

Discussion

Identification of biomarkers for complex diseases such as cancer is of paramount importance in treatment, diagnosis and prognosis. Although numerous methods have been proposed to characterize biomarkers, few are from the perspective of regulatory network rewiring. GRN is one important strategy for revealing the disease mechanism from a systematic perspective. The investigation of cancer mutation and perturbation through GRN rewiring is of significance for addressing the underlying causal regulations responding to phenotypic transition. In this paper, we proposed a novel framework for identifying biomarkers based on network rewiring. Disease and normal condition-specific GRNs have been reconstructed from gene expression data with a priori background network respectively. The gene regulatory interactions changed between them illustrated the results of disease mutation and perturbation. D-GRN is extracted and modules in it are detected sequentially. LR-RFE is employed to find diagnostic biomarkers from modules. And cross-validation is used to set optimal number of biomarkers in each module.

Here, we applied the proposed framework D-GRN for identifying biomarkers of breast cancer. The integrative background network based on prior knowledge and condition-specific gene expression data have been used to construct normal and disease GRNs. We have to admit that there is limitation on missing nodes and edges, which is also expected to be as complete as possible. Totally, a D-GRN including 509 edges and 238 nodes have been extracted. Five potential biomarker gene sets in the form of subnetwork modules have been identified and they performed well in the classification of disease/normal samples in both internal and external validation datasets.

The focus of this work is to provide a computational pipeline for cancer biomarker discovery. In our framework, we select optimal genes serving as biomarkers in the network modules by machine learning. The rewired regulations as well as the weights or coefficients on these regulations have not been fully considered in biomarker discovery. The rewiring edges and patterns are expected to be embedded in the future discovery of biomarkers. In this work, another potential limitation is that the rewiring mechanism and gene dysfunction across different phenotypes have not been included in our feature selection. The genetic and epigenetic factors need be integrated together for addressing the causality of these identified gene regulatory rewiring. These will provide more valuable information for detecting more precise biomarkers for breast cancer.

Conclusion

In conclusion, network rewiring reveals significant information about cancer mechanisms. With the development of high-throughput technology, the amount of high quality gene expression data will keep arising. Condition-specific networks that are close to the real gene network will be established. The rewiring network components will be more clearly revealed, which will greatly benefit the discovery of biomarkers or signatures for breast cancer diagnosis. Obviously, our proposed strategy is rather general and it can be used to discovering biomarkers for other complex diseases.

Methods

Data sources and pre-processing

The RNA-seq gene expression data are downloaded from the TCGA data portal that includes 1097 patients with BRCA (breast invasive carcinoma) and 112 normal controls. The dataset provides gene expression values in the form of mean-centered number for 17,924 genes in all samples. In this study, 60% samples are used for training and testing purpose. We call them as internal training datasets. The remaining 40% samples are used for internal validation. We also download an independent dataset from NCBI GEO database (ID: GSE42568) for validating the identified biomarkers. It has 104 cancer samples and 17 controls. They are called external independent validation data.

The integrative human GRN is downloaded from our RegNetwork knowledgebase [24]. RegNetwork is a comprehensive repository for GRN by collecting the documented gene regulations from more than 20 databases and the predicted gene regulations by aligning transcription factor binding sites. Here, we use a new version of it containing 151,215 regulations in 19,719 genes.

Framework

Figure 6 shows the framework of biomarker identification. It mainly contains three steps. First, as shown in Fig. 6a, it acquires the background of GRN through our prior knowledge about gene regulations in humans. It is a non-specific regulatory network with many redundant gene regulations. Gene expression data in normal and disease samples are used to evaluate the prior gene–gene interactions in specific phenotypes and eliminate redundant ones in the background GRN. Second, by comparing the normal and disease specific GRNs reconstructed from gene expression data, we can clearly identify the rewiring network sections across the two phenotypic states. A differential GRN called D-GRN can be extracted by comparing them. Community detection algorithm is then employed to find closely-connected nodes in the form of modules as shown in Fig. 6b. Third, we apply a logistic regression with recursive feature elimination (LR-RFE) approach to find biomarker genes as shown in Fig. 6c.

Gene regulatory network rewiring

Determining GRN is an important avenue for revealing disease mechanisms. In this study, disease GRN and normal GRN are reconstructed respectively based on the corresponding gene expression profiling data on a background network. The prior network is deposited in RegNetwork, a knowledge-based genome-wide regulatory network database by integrating amount of data resources [24].

Numerous methods have been developed to reconstruct GRN from gene expression profile [25, 26]. Here, we particularly concern about the regulatory connection changes between disease and normal states. So we apply CMI-PC method to reconstruct the disease and normal GRNs [27,28,29]. Mutual information (MI) is a measure of the mutual dependence between the two variables. It is increasingly popular in GRN reconstruction for the ability to measure non-linear dependency [30, 31]. Conditional mutual information (CMI) in gene pairs is the expected value of the mutual information of two interest genes given the joint regulation by other genes [32]. MI is a special case of zero-order CMI. The MI of variables X and Y, CMI of variables X and Y given Z are calculated by a widely-used estimation method [33] as

$$I(X,Y) = \log \sum\limits_{i,j} {\log } \frac{p(i,j)}{{p_{x}(i) p_{y} (j)}}$$

(1)

$$I(X,Y|Z) = \sum\limits_{i,j,k} {\mathop p\nolimits_{x,y,z} (i,j,k)} \log \frac{{p_{z} (k)p_{x,y,z} (i,j,k)}}{{p_{x,z} (i,k)p_{y,z} (j,k)}}$$

(2)

The approach partitions the supports of X, Y, Z into bins with finite size, where the marginal, joint, and conditional probability mass functions are denoted by p with the appropriate subscripts. $p_{x\left( i \right)} = \mathop \smallint \limits_{i}^{ } dx\mu_{x} \left( x \right), p_{y\left( i \right)} = \mathop \smallint \limits_{i}^{ } dy\mu_{y} \left( y \right), p\left( {i,j} \right) = \mathop \smallint \limits_{i}^{ } \mathop \smallint \limits_{j}^{ } dxdy\mu \left( {x,y} \right)$, and $\mathop \smallint \limits_{i}^{ }$ means the integral over the bin i.

Similar to MI, a higher CMI value indicates a closer relationship between the variables X and Y given variable(s) Z. Path consistency (PC) algorithm is used to remove the edges from the background network based on CMI values. The process is, for an adjacent gene pair X and Y, first, calculate MI (0-order CMI). If the value is low or zero, delete the edge between them. Next, select the adjacent gene Z of them and compute first-order CMI I(X,Y|Z) and repeat the step to delete edges that are independent or not strongly connected until no edge that can be deleted. The procedure will continue until there is no higher order CMI. The threshold values for deletion are the same in the two different conditions.

In this way, we obtain two specific GRNs in disease and normal samples respectively. The different interaction between genes shows the rewiring raised by the disease effects. We extract the rewiring parts and construct a D-GRN. In detail, we find the same genes with different connections and add their adjacent genes. Then we connect them based on the normal and disease GRNs.

Community detection

The communities in D-GRN are imperative in the understanding of the functional module about the difference between normal and disease conditions. We apply a fast greedy detection algorithm [34] in the D-GRN to identify the closely-connected gene modules. This algorithm can be briefly described as follows: assuming every independent node in the network is a module. And then it merges modules to make the evaluation standard Modularity (Q) increase most until all nodes are involved in one module. Finally, a tree graph will appear with leaves representing gene nodes. Modules can be divided by different tree levels. The most reliable dividing corresponds to the maximum modularity. Modularity (Q) can be described as:

$$Q = \sum\limits_{i} \left({e{}_{ii} - a_{i}^{2} }\right)$$

(3)

$$a_{i} = \sum\limits_{j} {e_{ij} }$$

(4)

where e_ij is the ratio of numbers of edges connected module i and module j to total edges.

Biomarker discovery based on LR-RFE

Biomarkers should be able to effectively distinguish disease from normal samples [17, 35]. The detected network-based gene communities provide a pool of module biomarker candidates. To select better biomarkers in each module, we employ RFE with cross-validations based on logistic regression [36] classifier. Compared to other machine learning methods, LR is easier to implement, interpret, and also is a very efficient classification algorithm [37]. Because of its mathematical interpretability, it has a wide range of applications in the field of biomedicine [38]. The logistic regression can be considered as follows

$$\pi_{{\text{i}}} = Pr\left( {\left. {y_{i} } \right|x_{{\text{i}}} ;\theta } \right) = f\left( {x_{i}^{T} } \right) = \frac{{\exp \left( {x_{i}^{T} \theta } \right)}}{{1 + \exp \left( {x_{i}^{T} \theta } \right)}},\quad i = 1,2, \ldots ,n.$$

(5)

where $X_{i} = (x_{i1} ,x_{i2} , \ldots ,x_{ip} )^{T} ,$ denotes the p-dimensional gene expression vector. y_i is a corresponding binary variable. θ is the vector of the coefficients.

For over-fitting problem, we choose L₂ regularization techniques to avoid, which is defined as

$$P\left( {\theta ;\lambda } \right) = \lambda \mathop \sum \limits_{j = 1}^{p} \theta_{j}^{2}$$

(6)

where λ is a positive tuning parameter used to balance the loss term and penalty term.

RFE law is in the process of continuously training the model [39]. Each time the training is completed, the specified number of low-importance features are deleted. Then new features are trained again. The importance of features is obtained again, and unimportant characteristics are deleted until the number of characteristics meets the predefined settings. In this paper, we delete one gene each time and through cross-validation to find the optimal number of features. If reducing the features will cause a performance loss, then no features will be removed. The selected biomarkers are further verified in the validation datasets.

Availability of data and materials

The results published here are based in part upon data generated by the TCGA Research Network (http://cancergenome.nih.gov/).

Abbreviations

GRN:: Gene regulatory network
D-GRN:: Differential gene regulatory network
CMI-PC:: Conditional mutual information-based path consistency
LR-RFE:: Logistic regression with recursive feature elimination
PCC:: Pearson’s correlation coefficient
ROC:: Receiver operating characteristic
AUC:: Area under the ROC curve
NOA:: Network ontology analysis
GO:: Gene ontology
BRCA:: Breast invasive carcinoma
MI:: Mutual information
CMI:: Conditional mutual information
PC:: Path consistency
Q:: Modularity

References

Mccarthy MI, Abecasis GAR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9(5):356–69.
Article CAS PubMed Google Scholar
Watson P, Lynch HT. Cancer risk in mismatch repair gene mutation carriers. Fam Cancer. 2001;1(1):57–60.
Article CAS PubMed Google Scholar
Liu J, Feng C, Kong X, Xu Y. Dual graph-Laplacian PCA: a closed-form solution for bi-clustering to find “Checkerboard” structures on gene expression data. IEEE Access. 2019;7:151329–38.
Article Google Scholar
Creixell P, Schoof EM, Simpson CD, Longden J, Miller CJ, Lou HJ, Perryman L, Cox TR, Zivanovic N, Palmeri A. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell. 2015;163(1):202–17.
Article CAS PubMed PubMed Central Google Scholar
Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446(7132):153–8.
Article CAS PubMed PubMed Central Google Scholar
Billmann M, Chaudhary V, ElMaghraby MF, Fischer B, Boutros M. Widespread rewiring of genetic networks upon cancer signaling pathway activation. Cell Syst. 2018;6(1):52–64.
Article CAS PubMed PubMed Central Google Scholar
Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, et al. Rewiring of genetic networks in response to DNA damage. Science. 2010;330(6009):1385–9.
Article CAS PubMed PubMed Central Google Scholar
Shang H, Liu ZP. Prioritizing type 2 diabetes genes by weighted pagerank on bilayer heterogeneous networks. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(1):336–46.
Article CAS PubMed Google Scholar
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Article PubMed Google Scholar
DeSantis CE, Ma J, Gaudet MM, Newman LA, Miller KD, Goding Sauer A, Jemal A, Siegel RL. Breast cancer statistics, 2019. CA Cancer J Clin. 2019;69(6):438–51.
Article PubMed Google Scholar
Waks AG, Winer EP. Breast cancer treatment: a review. J Am Med Assoc. 2019;321(3):288–300.
Article CAS Google Scholar
Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, Shi W, Jiang J, Yao PP, Zhu HP. Risk factors and preventions of breast cancer. Int J Biol Sci. 2017;13(11):1387.
Article CAS PubMed PubMed Central Google Scholar
Kwa M, Makris A, Esteva FJ. Clinical utility of gene-expression signatures in early stage breast cancer. Nat Rev Clin Oncol. 2017;14(10):595–610.
Article CAS PubMed Google Scholar
Weigel MT, Dowsett M. Current and emerging biomarkers in breast cancer: prognosis and prediction. Endocrine Relat Cancer. 2010;17(4):R245–62.
Article CAS Google Scholar
Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S, Lemaçon A, Soucy P, Glubb D, Rostamianfar A. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551(7678):92.
Article PubMed PubMed Central Google Scholar
Veronesi U, Boyle P, Goldhirsch A, Orecchia R, Viale G. Breast cancer. The Lancet. 2005;365(9472):1727–41.
Article Google Scholar
Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2010;26(3):392–98.
Smolinska A, Blanchet L, Buydens LMC, Wijmenga SS. NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. Anal Chim Acta. 2012;750:82–97.
Article CAS PubMed Google Scholar
Feng CM, Xu Y, Hou MX, Dai LY, Shang JL. PCA via joint graph Laplacian and sparse constraint: identification of differentially expressed genes and sample clustering on gene expression data. BMC Bioinform. 2019;20(Suppl 22):716.
Article Google Scholar
Liu ZP. Identifying network-based biomarkers of complex diseases from high-throughput data. Biomark Med. 2016;10(6):633–50.
Article CAS PubMed Google Scholar
Zhou W, Sailani MR, Contrepois K, Zhou Y, Ahadi S, Leopold SR, Zhang MJ, Rao V, Avina M, Mishra T. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature. 2019;569(7758):663–71.
Article CAS PubMed PubMed Central Google Scholar
Wang J, Huang Q, Liu ZP, Wang Y, Wu LY, Chen L, Zhang XS. NOA: a novel network ontology analysis method. Nucleic Acids Res. 2011;39(13):e87.
Article CAS PubMed PubMed Central Google Scholar
Zhang BH, Yang J, Jiang L, Lyu T, Kong L-X, Tan YF, Li B, Zhu YF, Xi AY, Xu XJG. Development and validation of a 14-gene signature for prognosis prediction in hepatocellular carcinoma. Genomics. 2020;112(4):2763–71.
Article CAS PubMed Google Scholar
Liu ZP, Wu C, Miao H, Wu H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database. 2015;2015:bav095.
Article PubMed PubMed Central Google Scholar
Tian D, Gu Q, Ma J. Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res. 2016;44(17):e140.
Article PubMed PubMed Central Google Scholar
Dong C, Chu X, Wang Y, Wang Y, Jin L, Shi T, Huang W, Li Y. Exploration of gene-gene interaction effects using entropy-based methods. Eur J Hum Genet. 2008;16(2):229–35.
Article CAS PubMed Google Scholar
Xiao F, Gao L, Ye Y, Hu Y, He R. Inferring gene regulatory networks using conditional regulation pattern to guide candidate genes. PLoS ONE. 2016;11(5):e0154953.
Article PubMed PubMed Central Google Scholar
Kalisch M, Bühlman P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res. 2012;8(2):613–36.
Google Scholar
Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Hao JK, Liu ZP, Chen L. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics. 2012;28(1):98–104.
Article CAS PubMed Google Scholar
Helena B, Joan-Josep G-C, Alfonso B, Montserrat V, Manuel SJ, Pere C, Alexandre P. MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. Bioinformatics. 2010;15:1811–8.
Google Scholar
Butte AJ, Kohane IS. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput Pac Symp Biocomput. 2000;5:418–29.
Google Scholar
Wang K, Saito M, Bisikirska BC, Alvarez MJ, Lim WK, Rajbhandari P, Shen Q, Nemenman I, Basso K, Margolin AA, et al. Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol. 2009;27(9):829–37.
Article CAS PubMed PubMed Central Google Scholar
Shang H, Liu ZP: Prioritizing Congenital Heart Disease Genes from Transcriptone and Interactome via PageRank. In: 2018 IEEE 8th annual international conference on CYBER technology in automation, control, and intelligent systems (CYBER); 2018. pp. 424–9.
Newman ME. Fast algorithm for detecting community structure in networks. Phys Rev E. 2004;69(6):066133.
Article CAS Google Scholar
Ilyin SE, Belkowski SM, Plata-Salamán CR. Biomarker discovery and validation: technologies and integrative approaches. Trends Biotechnol. 2004;22(8):411–6.
Article CAS PubMed Google Scholar
Brandes U. On variants of shortest-path betweenness centrality and their generic computation. Soc Netw. 2008;30(2):136–45.
Article Google Scholar
Hosmer Jr DW, Lemeshow S, Sturdivant RX: Applied logistic regression, vol. 398: Wiley; 2013.
Chan YH. Biostatistics 305. Multinomial logistic regression. Singap Med J. 2005;46(6):259.
CAS Google Scholar
Granitto PM, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst. 2006;83(2):83–90.
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank editors and anonymous reviewers for their constructive comments. Thanks are also due to Lingyu Li and Haixia Shang for their assistance in this project.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 22 Supplement 12 2021: Explainable AI methods in biomedical data science. The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-22-supplement-12.

Funding

This work was partially supported by National Key Research and Development Program of China (No. 2020YFA0712402); National Natural Science Foundation of China (NSFC) (61973190 and 61572287); Natural Science Foundation of Shandong Province of China (ZR2020ZD25) and Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project, 2019JZZY010423); the Program of Qilu Young Scholars of Shandong University. Publication costs are funded by National Key Research and Development Program and NSFC. The funding bodies had no role in the design of the study, collection, the interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations

Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
Yijuan Wang & Zhi-Ping Liu

Authors

Yijuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Ping Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YJW carried out the experiments, wrote the program and drafted the manuscript. ZPL proposed the idea, coordinated this study and revised the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhi-Ping Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Wang, Y., Liu, ZP. Identifying biomarkers for breast cancer by gene regulatory network rewiring. BMC Bioinformatics 22 (Suppl 12), 308 (2021). https://doi.org/10.1186/s12859-021-04225-1

Download citation

Received: 23 May 2021
Accepted: 01 June 2021
Published: 20 January 2022
DOI: https://doi.org/10.1186/s12859-021-04225-1

Explainable AI methods in biomedical data science

Identifying biomarkers for breast cancer by gene regulatory network rewiring

Abstract

Background

Results

Conclusion

Background

Results

Network rewiring

Detected communities

Breast cancer biomarkers identification and validation

Discussion

Conclusion

Methods

Data sources and pre-processing

Framework

Gene regulatory network rewiring

Community detection

Biomarker discovery based on LR-RFE

Availability of data and materials

Abbreviations

References

Acknowledgements

About this supplement

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us