Skip to main content

Identifying biomarkers for breast cancer by gene regulatory network rewiring

Abstract

Background

Mining gene regulatory network (GRN) is an important avenue for addressing cancer mechanism. Mutations in cancer genome perturb GRN and cause a rewiring in an orchestrated network. Hence, the exploration of gene regulatory network rewiring is significant to discover potential biomarkers and indicators for discriminating cancer phenotypes.

Results

Here, we propose a new bioinformatics method of identifying biomarkers based on network rewiring in different states. It firstly reconstructs GRN in different phenotypic conditions from gene expression data with a priori background network. We employ the algorithm based on path consistency algorithm and conditional mutual information to delete false-positive regulatory interactions between independent nodes/genes or not closely related gene pairs. And then a differential gene regulatory network (D-GRN) is constructed from the rewiring parts in the two phenotype-specific GRNs. Community detection technique is then applied for D-GRN to detect functional modules. Finally, we apply logistic regression classifier with recursive feature elimination to select biomarker genes in each module individually. The extracted feature genes result in a gene set of biomarkers with impressing ability to distinguish normal samples from controls. We verify the identified biomarkers in external independent validation datasets. For a proof-of-concept study, we apply the framework to identify diagnostic biomarkers of breast cancer. The identified biomarkers obtain a maximum AUC of 0.985 in the internal sample classification experiments. And these biomarkers achieve a maximum AUC of 0.989 in the external validations.

Conclusion

In conclusion, network rewiring reveals significant differences between different phenotypes, which indicating cancer dysfunctional mechanisms. With the development of sequencing technology, the amount and quality of gene expression data become available. Condition-specific gene regulatory networks that are close to the real regulations in different states will be established. Revealing the network rewiring will greatly benefit the discovery of biomarkers or signatures for phenotypes. D-GRN is a general method to meet this demand of deciphering the high-throughput data for biomarker discovery. It is also easy to be extended for identifying biomarkers of other complex diseases beyond breast cancer.

Background

Gene regulatory network (GRN) is a model that characterizes the complex relationship between genes in a cell [1]. In a GRN, nodes represent genes and edges describe the regulatory relationships among them. From a physical perspective, the interactions between genes are through their products like proteins and RNAs. The weight of edge describes the direction and strength of an interaction. The alternation or mutation of one gene may affect the activity of many other genes through the network [2, 3].

Cancer is recognized as a complex disease caused by gene mutations, which will perturb the normal interactions among genes and lead to the disorder of connection mode or strength [4,5,6]. In other words, gene mutations cause perturbation and rewiring of GRNs [7, 8]. The rewired interactions generate changes in normal biological processes and that is crucial for cancerogenesis. Thus, the investigation of the rewiring GRN is significant in discovering potential biomarkers of indicating certain phenotypic states.

Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer death in women worldwide [9, 10]. Biomarkers play important roles in its early diagnosis and prognostic evaluation [11,12,13]. Nowadays, the accurate identification of biomarkers for breast cancer early detection is still very challenging. There are some biomarkers that have been validated like BRCA1 and HER2 [14]. However, new biomarkers and their combinations are still urgently needed to quantify the treatment effects with classical clinical prognostic factors. They also indicate the potential risks and pathogenesis of breast cancer [15, 16].

With the development of high-throughput sequencing technologies, an increasing amount of gene expression data become available. Various methods have been developed to find efficient biomarkers from high-throughput data [17,18,19,20]. For instance, the methods construct a dynamic network model and perform a multi-omics data integration for biomarker discovery [21]. However, there are few methods to solve this problem from the perspective of network rewiring, which indicates the dysfunctional mechanism of cancer.

In this paper, we propose a framework to identify potential biomarkers of breast cancer based on network rewiring. The disease and normal GRN are reconstructed from gene expression data with a reliable background GRN. CMI-PC (conditional mutual information-based path consistency) algorithm is employed to delete false positive interactions between independent genes or pairs that are not connected closely in a specific condition from the integrative background network. Comparing the GRNs in the two phenotypic conditions, a differential GRN, called D-GRN, containing the rewired nodes with differential regulations will be extracted. In D-GRN, we detect the community structures which are intensively connected nodes in the form of subnetwork modules. Finally, we apply logistic regression with recursive feature elimination (LR-RFE) to select biomarkers in each module respectively. We use cross-validations to find the optimal number of biomarkers individually. The maximum AUC in these module-based biomarkers achieves 0.985 in the internal validation. The selected biomarkers are also verified in external independent datasets and they achieve the maximum AUC value of 0.989 in classification.

Results

In this work, the proposed biomarker discovery framework focuses on the rewiring gene network between disease and normal conditions. Condition-specific GRNs are reconstructed through the integration of prior knowledge of an integrative background network and phenotypic gene expression data. CMI-PC algorithm is employed to remove redundant regulatory interactions from the background network. D-GRN is extracted from the two specified networks in two different states. We detect the communities in the D-GRN. And then machine learning method is applied to find the best feature combination in classification experiments. The selected features are more likely to be potential biomarkers. Here, we apply our framework to breast cancer and identify potential network-based module biomarkers.

Network rewiring

The reconstructed normal GRN has 430 edges (regulations) and 198 nodes (genes), while the disease GRN has 301 edges and 137 nodes. There are 71 same genes and 115 common edges between them. We merge the same nodes that have different connections and their neighbors to construct a D-GRN which contains 509 regulations and 238 genes.

After community detection, the D-GRN has been divided into 5 modules (in the next section). To illustrate the network rewiring in normal and disease states, Fig. 1a, b show the Module 4’s gene regulatory interactions in normal condition and disease condition respectively. Figure 1c illustrates this part of D-GRN, including 30 nodes. Black, green and red lines represent edges in common, only in the normal network and only in the disease network respectively. Figure 1d shows the gene expression boxplot details in the normal and disease conditions and P values of difference. It can be easily observed that most nodes have significantly different gene expressions between the two conditions. Interestingly, few genes are not differentially expressed, but the regulatory interactions rewire in the two conditions. Instead of the node-centric difference, D-GRN identifies the edge-centric difference between the two phenotypes, i.e., normal and cancer state by the rewiring gene regulations.

Fig. 1
figure 1

Network rewiring and gene expression analysis of an identified module (Module 4). a Regulatory interactions in normal condition. b Regulatory interactions in disease condition. c D-GRN. d Gene expression profiles in normal and disease

To further demonstrate the perturbations in the two GRNs, Fig. 2a, b present the heatmaps of Pearson’s correlation coefficient (PCC) between genes in normal and disease conditions. Obviously, there is a marked difference between them and it proves the effectiveness of our identification of the rewiring GRN across two conditions.

Fig. 2
figure 2

Correlation between genes in Module 4. a Normal condition. b Cancer condition

Detected communities

The community detection results in D-GRN are shown in Fig. 3. Different colors correspond to different modules. The 5 modules include 118, 46, 41, 30, and 3 members of genes individually. The global D-GRN is then divided into five functional blocks in the form of network-based modules. These subnetworks provide a pool of module biomarker candidates. To remove the redundant genes in the five detected modules, we perform feature selection for discovering biomarker gene sets respectively.

Fig. 3
figure 3

Community detection result in D-GRN

Breast cancer biomarkers identification and validation

Table 1 lists the selected genes in each module after logistic regression with recursive feature elimination (LR-REF) with tenfold cross-validations. F1-scores in classification experiments are also shown. Due to the number of genes in each module is diverse and some specific genes may have better discrimination abilities, there is a fluctuation of F1-score in the five modules. However, all of them are over 0.86, which means they perform well in the classification of distinguishing disease samples from controls. Figure 4 shows the receiver operating characteristic (ROC) curves of the selected biomarkers underlying the five modules in the internal validation dataset. The highest area under the ROC curve (AUC) value achieves 0.985, and the lowest reaches 0.923.

Table 1 Five module biomarkers after LR-RFE selection
Fig. 4
figure 4

Classification performance of identified biomarkers in internal validation data

For validating of our finding module biomarkers, we perform the classification experiments in the external independent dataset GSE42568, which contains 104 disease samples and 17 controls. Figure 5a demonstrates the ROC curves of each selected gene sets in the independent validation data. As shown, the 5 gene modules all perform well in the classifications. The highest AUC value achieves 0.989, and the lowest AUC value reaches 0.934. In addition, Fig. 5b shows the diverse ability of classification in module biomarkers and in the corresponding random gene sets with the same size. The P-value is 0.0382, indicating a significant difference for them in classification. The results provide evidence that they are potential molecular markers for diagnosing breast cancer.

Fig. 5
figure 5

Classification performance of identified biomarkers and random genes in the independent validation data. a ROC curves of module biomarkers. b Comparison of the classification ability between module biomarkers and equal amount of random genes

To further demonstrate the dysfunctions of selected biomarkers, we employ network ontology analysis (NOA) [22] to perform gene ontology (GO) enrichment analysis on the rewired regulatory network across normal and disease states. Table 2 lists the enriched functions in the discovered biomarkers. As shown, some important cancerous dysregulations related ‘metabolic process’, ‘cell cycle’, ‘cell proliferation’ and ‘lymphocyte differentiation’ are significantly enriched. They are consistent with the prior knowledge of breast cancer pathogenesis during occurrence and development [23]. In turn, the functional analysis also provides evidence for the effectiveness of our proposed biomarker discovery method via network rewiring.

Table 2 The enriched GO biological processes of identified biomarkers in D-GRN

Discussion

Identification of biomarkers for complex diseases such as cancer is of paramount importance in treatment, diagnosis and prognosis. Although numerous methods have been proposed to characterize biomarkers, few are from the perspective of regulatory network rewiring. GRN is one important strategy for revealing the disease mechanism from a systematic perspective. The investigation of cancer mutation and perturbation through GRN rewiring is of significance for addressing the underlying causal regulations responding to phenotypic transition. In this paper, we proposed a novel framework for identifying biomarkers based on network rewiring. Disease and normal condition-specific GRNs have been reconstructed from gene expression data with a priori background network respectively. The gene regulatory interactions changed between them illustrated the results of disease mutation and perturbation. D-GRN is extracted and modules in it are detected sequentially. LR-RFE is employed to find diagnostic biomarkers from modules. And cross-validation is used to set optimal number of biomarkers in each module.

Here, we applied the proposed framework D-GRN for identifying biomarkers of breast cancer. The integrative background network based on prior knowledge and condition-specific gene expression data have been used to construct normal and disease GRNs. We have to admit that there is limitation on missing nodes and edges, which is also expected to be as complete as possible. Totally, a D-GRN including 509 edges and 238 nodes have been extracted. Five potential biomarker gene sets in the form of subnetwork modules have been identified and they performed well in the classification of disease/normal samples in both internal and external validation datasets.

The focus of this work is to provide a computational pipeline for cancer biomarker discovery. In our framework, we select optimal genes serving as biomarkers in the network modules by machine learning. The rewired regulations as well as the weights or coefficients on these regulations have not been fully considered in biomarker discovery. The rewiring edges and patterns are expected to be embedded in the future discovery of biomarkers. In this work, another potential limitation is that the rewiring mechanism and gene dysfunction across different phenotypes have not been included in our feature selection. The genetic and epigenetic factors need be integrated together for addressing the causality of these identified gene regulatory rewiring. These will provide more valuable information for detecting more precise biomarkers for breast cancer.

Conclusion

In conclusion, network rewiring reveals significant information about cancer mechanisms. With the development of high-throughput technology, the amount of high quality gene expression data will keep arising. Condition-specific networks that are close to the real gene network will be established. The rewiring network components will be more clearly revealed, which will greatly benefit the discovery of biomarkers or signatures for breast cancer diagnosis. Obviously, our proposed strategy is rather general and it can be used to discovering biomarkers for other complex diseases.

Methods

Data sources and pre-processing

The RNA-seq gene expression data are downloaded from the TCGA data portal that includes 1097 patients with BRCA (breast invasive carcinoma) and 112 normal controls. The dataset provides gene expression values in the form of mean-centered number for 17,924 genes in all samples. In this study, 60% samples are used for training and testing purpose. We call them as internal training datasets. The remaining 40% samples are used for internal validation. We also download an independent dataset from NCBI GEO database (ID: GSE42568) for validating the identified biomarkers. It has 104 cancer samples and 17 controls. They are called external independent validation data.

The integrative human GRN is downloaded from our RegNetwork knowledgebase [24]. RegNetwork is a comprehensive repository for GRN by collecting the documented gene regulations from more than 20 databases and the predicted gene regulations by aligning transcription factor binding sites. Here, we use a new version of it containing 151,215 regulations in 19,719 genes.

Framework

Figure 6 shows the framework of biomarker identification. It mainly contains three steps. First, as shown in Fig. 6a, it acquires the background of GRN through our prior knowledge about gene regulations in humans. It is a non-specific regulatory network with many redundant gene regulations. Gene expression data in normal and disease samples are used to evaluate the prior gene–gene interactions in specific phenotypes and eliminate redundant ones in the background GRN. Second, by comparing the normal and disease specific GRNs reconstructed from gene expression data, we can clearly identify the rewiring network sections across the two phenotypic states. A differential GRN called D-GRN can be extracted by comparing them. Community detection algorithm is then employed to find closely-connected nodes in the form of modules as shown in Fig. 6b. Third, we apply a logistic regression with recursive feature elimination (LR-RFE) approach to find biomarker genes as shown in Fig. 6c.

Fig. 6
figure 6

The framework of biomarker discovery based on network rewiring. a Reconstruct the disease and normal GRNs respectively by integrating prior background network and gene expression data. b Extract the rewiring regulations and establish a D-GRN. Module detection is implemented to find closely-connected nodes in D-GRN. c Identify biomarkers in each module through LR-RFE

Gene regulatory network rewiring

Determining GRN is an important avenue for revealing disease mechanisms. In this study, disease GRN and normal GRN are reconstructed respectively based on the corresponding gene expression profiling data on a background network. The prior network is deposited in RegNetwork, a knowledge-based genome-wide regulatory network database by integrating amount of data resources [24].

Numerous methods have been developed to reconstruct GRN from gene expression profile [25, 26]. Here, we particularly concern about the regulatory connection changes between disease and normal states. So we apply CMI-PC method to reconstruct the disease and normal GRNs [27,28,29]. Mutual information (MI) is a measure of the mutual dependence between the two variables. It is increasingly popular in GRN reconstruction for the ability to measure non-linear dependency [30, 31]. Conditional mutual information (CMI) in gene pairs is the expected value of the mutual information of two interest genes given the joint regulation by other genes [32]. MI is a special case of zero-order CMI. The MI of variables X and Y, CMI of variables X and Y given Z are calculated by a widely-used estimation method [33] as

$$I(X,Y) = \log \sum\limits_{i,j} {\log } \frac{p(i,j)}{{p_{x}(i) p_{y} (j)}}$$
(1)
$$I(X,Y|Z) = \sum\limits_{i,j,k} {\mathop p\nolimits_{x,y,z} (i,j,k)} \log \frac{{p_{z} (k)p_{x,y,z} (i,j,k)}}{{p_{x,z} (i,k)p_{y,z} (j,k)}}$$
(2)

The approach partitions the supports of X, Y, Z into bins with finite size, where the marginal, joint, and conditional probability mass functions are denoted by p with the appropriate subscripts. \(p_{x\left( i \right)} = \mathop \smallint \limits_{i}^{ } dx\mu_{x} \left( x \right), p_{y\left( i \right)} = \mathop \smallint \limits_{i}^{ } dy\mu_{y} \left( y \right), p\left( {i,j} \right) = \mathop \smallint \limits_{i}^{ } \mathop \smallint \limits_{j}^{ } dxdy\mu \left( {x,y} \right)\), and \(\mathop \smallint \limits_{i}^{ }\) means the integral over the bin i.

Similar to MI, a higher CMI value indicates a closer relationship between the variables X and Y given variable(s) Z. Path consistency (PC) algorithm is used to remove the edges from the background network based on CMI values. The process is, for an adjacent gene pair X and Y, first, calculate MI (0-order CMI). If the value is low or zero, delete the edge between them. Next, select the adjacent gene Z of them and compute first-order CMI I(X,Y|Z) and repeat the step to delete edges that are independent or not strongly connected until no edge that can be deleted. The procedure will continue until there is no higher order CMI. The threshold values for deletion are the same in the two different conditions.

In this way, we obtain two specific GRNs in disease and normal samples respectively. The different interaction between genes shows the rewiring raised by the disease effects. We extract the rewiring parts and construct a D-GRN. In detail, we find the same genes with different connections and add their adjacent genes. Then we connect them based on the normal and disease GRNs.

Community detection

The communities in D-GRN are imperative in the understanding of the functional module about the difference between normal and disease conditions. We apply a fast greedy detection algorithm [34] in the D-GRN to identify the closely-connected gene modules. This algorithm can be briefly described as follows: assuming every independent node in the network is a module. And then it merges modules to make the evaluation standard Modularity (Q) increase most until all nodes are involved in one module. Finally, a tree graph will appear with leaves representing gene nodes. Modules can be divided by different tree levels. The most reliable dividing corresponds to the maximum modularity. Modularity (Q) can be described as:

$$Q = \sum\limits_{i} \left({e{}_{ii} - a_{i}^{2} }\right)$$
(3)
$$a_{i} = \sum\limits_{j} {e_{ij} }$$
(4)

where eij is the ratio of numbers of edges connected module i and module j to total edges.

Biomarker discovery based on LR-RFE

Biomarkers should be able to effectively distinguish disease from normal samples [17, 35]. The detected network-based gene communities provide a pool of module biomarker candidates. To select better biomarkers in each module, we employ RFE with cross-validations based on logistic regression [36] classifier. Compared to other machine learning methods, LR is easier to implement, interpret, and also is a very efficient classification algorithm [37]. Because of its mathematical interpretability, it has a wide range of applications in the field of biomedicine [38]. The logistic regression can be considered as follows

$$\pi_{{\text{i}}} = Pr\left( {\left. {y_{i} } \right|x_{{\text{i}}} ;\theta } \right) = f\left( {x_{i}^{T} } \right) = \frac{{\exp \left( {x_{i}^{T} \theta } \right)}}{{1 + \exp \left( {x_{i}^{T} \theta } \right)}},\quad i = 1,2, \ldots ,n.$$
(5)

where \(X_{i} = (x_{i1} ,x_{i2} , \ldots ,x_{ip} )^{T} ,\) denotes the p-dimensional gene expression vector. yi is a corresponding binary variable. θ is the vector of the coefficients.

For over-fitting problem, we choose L2 regularization techniques to avoid, which is defined as

$$P\left( {\theta ;\lambda } \right) = \lambda \mathop \sum \limits_{j = 1}^{p} \theta_{j}^{2}$$
(6)

where λ is a positive tuning parameter used to balance the loss term and penalty term.

RFE law is in the process of continuously training the model [39]. Each time the training is completed, the specified number of low-importance features are deleted. Then new features are trained again. The importance of features is obtained again, and unimportant characteristics are deleted until the number of characteristics meets the predefined settings. In this paper, we delete one gene each time and through cross-validation to find the optimal number of features. If reducing the features will cause a performance loss, then no features will be removed. The selected biomarkers are further verified in the validation datasets.

Availability of data and materials

The results published here are based in part upon data generated by the TCGA Research Network (http://cancergenome.nih.gov/).

Abbreviations

GRN:

Gene regulatory network

D-GRN:

Differential gene regulatory network

CMI-PC:

Conditional mutual information-based path consistency

LR-RFE:

Logistic regression with recursive feature elimination

PCC:

Pearson’s correlation coefficient

ROC:

Receiver operating characteristic

AUC:

Area under the ROC curve

NOA:

Network ontology analysis

GO:

Gene ontology

BRCA:

Breast invasive carcinoma

MI:

Mutual information

CMI:

Conditional mutual information

PC:

Path consistency

Q:

Modularity

References

  1. Mccarthy MI, Abecasis GAR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9(5):356–69.

    Article  CAS  PubMed  Google Scholar 

  2. Watson P, Lynch HT. Cancer risk in mismatch repair gene mutation carriers. Fam Cancer. 2001;1(1):57–60.

    Article  CAS  PubMed  Google Scholar 

  3. Liu J, Feng C, Kong X, Xu Y. Dual graph-Laplacian PCA: a closed-form solution for bi-clustering to find “Checkerboard” structures on gene expression data. IEEE Access. 2019;7:151329–38.

    Article  Google Scholar 

  4. Creixell P, Schoof EM, Simpson CD, Longden J, Miller CJ, Lou HJ, Perryman L, Cox TR, Zivanovic N, Palmeri A. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell. 2015;163(1):202–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446(7132):153–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Billmann M, Chaudhary V, ElMaghraby MF, Fischer B, Boutros M. Widespread rewiring of genetic networks upon cancer signaling pathway activation. Cell Syst. 2018;6(1):52–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, Bodenmiller B, Licon K, Copeland W, Shales M, et al. Rewiring of genetic networks in response to DNA damage. Science. 2010;330(6009):1385–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Shang H, Liu ZP. Prioritizing type 2 diabetes genes by weighted pagerank on bilayer heterogeneous networks. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(1):336–46.

    Article  CAS  PubMed  Google Scholar 

  9. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.

    Article  PubMed  Google Scholar 

  10. DeSantis CE, Ma J, Gaudet MM, Newman LA, Miller KD, Goding Sauer A, Jemal A, Siegel RL. Breast cancer statistics, 2019. CA Cancer J Clin. 2019;69(6):438–51.

    Article  PubMed  Google Scholar 

  11. Waks AG, Winer EP. Breast cancer treatment: a review. J Am Med Assoc. 2019;321(3):288–300.

    Article  CAS  Google Scholar 

  12. Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, Shi W, Jiang J, Yao PP, Zhu HP. Risk factors and preventions of breast cancer. Int J Biol Sci. 2017;13(11):1387.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Kwa M, Makris A, Esteva FJ. Clinical utility of gene-expression signatures in early stage breast cancer. Nat Rev Clin Oncol. 2017;14(10):595–610.

    Article  CAS  PubMed  Google Scholar 

  14. Weigel MT, Dowsett M. Current and emerging biomarkers in breast cancer: prognosis and prediction. Endocrine Relat Cancer. 2010;17(4):R245–62.

    Article  CAS  Google Scholar 

  15. Michailidou K, Lindström S, Dennis J, Beesley J, Hui S, Kar S, Lemaçon A, Soucy P, Glubb D, Rostamianfar A. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551(7678):92.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Veronesi U, Boyle P, Goldhirsch A, Orecchia R, Viale G. Breast cancer. The Lancet. 2005;365(9472):1727–41.

    Article  Google Scholar 

  17. Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2010;26(3):392–98.

  18. Smolinska A, Blanchet L, Buydens LMC, Wijmenga SS. NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. Anal Chim Acta. 2012;750:82–97.

    Article  CAS  PubMed  Google Scholar 

  19. Feng CM, Xu Y, Hou MX, Dai LY, Shang JL. PCA via joint graph Laplacian and sparse constraint: identification of differentially expressed genes and sample clustering on gene expression data. BMC Bioinform. 2019;20(Suppl 22):716.

    Article  Google Scholar 

  20. Liu ZP. Identifying network-based biomarkers of complex diseases from high-throughput data. Biomark Med. 2016;10(6):633–50.

    Article  CAS  PubMed  Google Scholar 

  21. Zhou W, Sailani MR, Contrepois K, Zhou Y, Ahadi S, Leopold SR, Zhang MJ, Rao V, Avina M, Mishra T. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature. 2019;569(7758):663–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Wang J, Huang Q, Liu ZP, Wang Y, Wu LY, Chen L, Zhang XS. NOA: a novel network ontology analysis method. Nucleic Acids Res. 2011;39(13):e87.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhang BH, Yang J, Jiang L, Lyu T, Kong L-X, Tan YF, Li B, Zhu YF, Xi AY, Xu XJG. Development and validation of a 14-gene signature for prognosis prediction in hepatocellular carcinoma. Genomics. 2020;112(4):2763–71.

    Article  CAS  PubMed  Google Scholar 

  24. Liu ZP, Wu C, Miao H, Wu H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database. 2015;2015:bav095.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Tian D, Gu Q, Ma J. Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res. 2016;44(17):e140.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Dong C, Chu X, Wang Y, Wang Y, Jin L, Shi T, Huang W, Li Y. Exploration of gene-gene interaction effects using entropy-based methods. Eur J Hum Genet. 2008;16(2):229–35.

    Article  CAS  PubMed  Google Scholar 

  27. Xiao F, Gao L, Ye Y, Hu Y, He R. Inferring gene regulatory networks using conditional regulation pattern to guide candidate genes. PLoS ONE. 2016;11(5):e0154953.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Kalisch M, Bühlman P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res. 2012;8(2):613–36.

    Google Scholar 

  29. Zhang X, Zhao XM, He K, Lu L, Cao Y, Liu J, Hao JK, Liu ZP, Chen L. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics. 2012;28(1):98–104.

    Article  CAS  PubMed  Google Scholar 

  30. Helena B, Joan-Josep G-C, Alfonso B, Montserrat V, Manuel SJ, Pere C, Alexandre P. MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. Bioinformatics. 2010;15:1811–8.

    Google Scholar 

  31. Butte AJ, Kohane IS. Mutual information relevance networks: Functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput Pac Symp Biocomput. 2000;5:418–29.

    Google Scholar 

  32. Wang K, Saito M, Bisikirska BC, Alvarez MJ, Lim WK, Rajbhandari P, Shen Q, Nemenman I, Basso K, Margolin AA, et al. Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat Biotechnol. 2009;27(9):829–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Shang H, Liu ZP: Prioritizing Congenital Heart Disease Genes from Transcriptone and Interactome via PageRank. In: 2018 IEEE 8th annual international conference on CYBER technology in automation, control, and intelligent systems (CYBER); 2018. pp. 424–9.

  34. Newman ME. Fast algorithm for detecting community structure in networks. Phys Rev E. 2004;69(6):066133.

    Article  CAS  Google Scholar 

  35. Ilyin SE, Belkowski SM, Plata-Salamán CR. Biomarker discovery and validation: technologies and integrative approaches. Trends Biotechnol. 2004;22(8):411–6.

    Article  CAS  PubMed  Google Scholar 

  36. Brandes U. On variants of shortest-path betweenness centrality and their generic computation. Soc Netw. 2008;30(2):136–45.

    Article  Google Scholar 

  37. Hosmer Jr DW, Lemeshow S, Sturdivant RX: Applied logistic regression, vol. 398: Wiley; 2013.

  38. Chan YH. Biostatistics 305. Multinomial logistic regression. Singap Med J. 2005;46(6):259.

    CAS  Google Scholar 

  39. Granitto PM, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst. 2006;83(2):83–90.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We would like to thank editors and anonymous reviewers for their constructive comments. Thanks are also due to Lingyu Li and Haixia Shang for their assistance in this project.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 22 Supplement 12 2021: Explainable AI methods in biomedical data science. The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-22-supplement-12.

Funding

This work was partially supported by National Key Research and Development Program of China (No. 2020YFA0712402); National Natural Science Foundation of China (NSFC) (61973190 and 61572287); Natural Science Foundation of Shandong Province of China (ZR2020ZD25) and Shandong Provincial Key Research and Development Program (Major Scientific and Technological Innovation Project, 2019JZZY010423); the Program of Qilu Young Scholars of Shandong University. Publication costs are funded by National Key Research and Development Program and NSFC. The funding bodies had no role in the design of the study, collection, the interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

YJW carried out the experiments, wrote the program and drafted the manuscript. ZPL proposed the idea, coordinated this study and revised the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhi-Ping Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Liu, ZP. Identifying biomarkers for breast cancer by gene regulatory network rewiring. BMC Bioinformatics 22 (Suppl 12), 308 (2021). https://doi.org/10.1186/s12859-021-04225-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-021-04225-1

Keywords