Here, we adopted a differential graphical model to investigate the differences in co-binding patterns of TFs between normal and disease conditions. We hypothesized that factors that dramatically change their partners during the transcription process would be altered to a larger degree in disease samples, and hence would have larger effects than TFs showing little difference in co-binding patterns in driving disease progression.

### ChIP-seq data collection and pre-processing

We aim to infer the differential TF co-regulation alterations among normal and disease conditions. In order to assess these alterations, we selected disease-to-normal cell types that shared at least 50 common TFs (ChIP-Seq) as well as had a strong expression correlation between the two cell types (Figure S1). Therefore, we decided to use the 369 and 143 ChIP-seq experiments from K562 and GM12878 cell lines, respectively. After de-duplicating and extracting common ChIP-Seq targets, we identified 94 common TFs among these two cell lines (Table S1). In order to investigate alterations in the joint activity between TFs and specific chromatin marks in disease, we also extracted nine histone modifications and chromatin accessibility datasets from these cell lines (Fig. 1a, b). The majority of the 94 common TFs were sequence-specific binding factors (TFSS in Fig. 1c), 31 of which showed significant expression changes between disease and control.

To uniformly process the data, we first divided the autosomal chromosome (hg38 version) into 100 base pair (bp) bins and removed bins that overlapped with genomic regions that have gaps or low mappability using BEDTools (version 2.27.1-foss-2018b) [31]. To remove any artifacts from non-peak regions, we further removed all 100 bp bins that did not overlap with any peaks from these 104 factors (Fig. 3a). In total, we kept 1,351,140 bins in our analysis.

For each factor, we calculated the fold change of read count between the TF ChIP-seq experiment and its matched ChIP-seq control experiment from ENCODE for all bins in order to normalize for read depth (Fig. 3b). We then calculated the average signal from each of the replicates or experiments from different labs when multiple datasets were present for a single factor. Details of the ChIP-seq signal files used has been listed in the supplementary data. We organized the resultant signal data into a matrix for K562 and GM12878 separately, with columns indicating factors and rows indicating bins in the genome (Fig. 3c). We used these two matrices as the inputs for the following differential graphical model (Fig. 3d).

### Infer differential graphical model to TF-TF network rewiring

#### Gaussian graphical model for co-regulation network in one state

In this section, we describe the details of the differential graphical model. Let *G*^{(D)} = (*V*^{(D)}, *E*^{(D)}) represent the network with nodes *V*^{(D)} and edges *E*^{(D)} for disease status *D*. Let \( {X}_j^{(D)} \) denote the vector of the average normalized ChIP-seq signal of factor *j* in status *D*, where *j* = 1⋯*J* . *D* = 1 indicates disease and 0 indicates normal. Next, for each *D*, we calculated \( {X}_1^{(D)},{X}_2^{(D)},\cdots, {X}_J^{(D)} \) over 100 bp bins for all the 104 factors. In our analysis, we included 94 TFs, nine histone modification marks, and one chromatin accessibility (*J* = 104).

Under one condition *D*, we assume that \( {X}_1^{(D)},{X}_2^{(D)},\cdots, {X}_J^{(D)} \) follows a multivariate Gaussian distribution such that \( {\boldsymbol{X}}^{(D)}={\left({X}_1^{(D)},\cdots, {X}_J^{(D)}\right)}^T\sim {N}_J\left({\boldsymbol{\mu}}^{(D)},{\boldsymbol{\Sigma}}^{(D)}\right) \). We can construct the TF co-regulatory network using a traditional GGM. Here, we are aiming to identify TF true physical interactions by highlighting conditional dependent binding profiles among TFs. Therefore, we used the precision matrix represented as **Θ**^{(D)} ≔ (**Σ**^{(D)})^{−1} to infer whether any TF pair has a non-random co-binding interaction. In other words, If \( {\boldsymbol{\Theta}}_{ij}^{(D)}=0, \) then \( {X}_i^{(D)} \) and \( {X}_j^{(D)} \) are independent of each other, conditioned on all the other TFs. As a result, \( \left(i,j\right)\notin {E}^{(D)}\ if\ {\boldsymbol{\Theta}}_{ij}^{(D)}=0 \).

#### Differential graphical model for co-regulation alteration in two states

Next, we used the difference between two networks *G*^{(1)} and *G*^{(0)}, called the differential network, to represent the degree of TF co-regulation alteration under two conditions (*D* = 1 and *D* = 0). Given the observed data \( {X}_1^{(1)},{X}_2^{(1)},\cdots, {X}_J^{(1)} \) in the disease cell and \( {X}_1^{(0)},{X}_2^{(0)},\cdots, {X}_J^{(0)} \) for the normal cell, edges in the differential co-regulatory network can be inferred from the difference between the two precision matrices **Δ** = **Θ**^{(1)} − **Θ**^{(0)} = (**Σ**^{(1)})^{−1} − (**Σ**^{(0)})^{−1}, where the co-regulation relationship between *TF*_{i} and *TF*_{j} changes if |Δ_{i, j}| ≠ 0. Note that **Σ**^{(1)}**ΔΣ**^{(0)} − (**Σ**^{(1)} − **Σ**^{(0)}) = 0. Hence, we can solve the following equation to estimate a reasonable **Δ**.

$$ {\hat{\boldsymbol{\Sigma}}}^{(1)}\boldsymbol{\Delta} {\hat{\boldsymbol{\Sigma}}}^{(0)}-\left({\hat{\boldsymbol{\Sigma}}}^{(1)}-{\hat{\boldsymbol{\Sigma}}}^{(0)}\right)=0 $$

Here \( {\hat{\boldsymbol{\Sigma}}}^{(D)} \) is the sample covariance matrix. Specifically, we used the penalized D-Trace loss model estimate **Δ** [16,17,18].

$$ l\left(\boldsymbol{\Delta} \right)=\frac{1}{2}\ tr\left(\boldsymbol{\Delta} {\hat{\boldsymbol{\Sigma}}}^{(1)}\boldsymbol{\Delta} {\hat{\boldsymbol{\Sigma}}}^{(0)}\right)- tr\left(\boldsymbol{\Delta} \left({\hat{\boldsymbol{\Sigma}}}^{(1)}-{\hat{\boldsymbol{\Sigma}}}^{(0)}\right)\right) $$

*tr* represents the trace of a matrix. To remove spurious differential edges, we introduced a non-negative regularization parameter *λ* to penalize the number of edges in the network.

$$ l\left(\boldsymbol{\Delta} \right)=\frac{1}{2}\ tr\left(\boldsymbol{\Delta} {\hat{\boldsymbol{\Sigma}}}^{(1)}\boldsymbol{\Delta} {\hat{\boldsymbol{\Sigma}}}^{(0)}\right)- tr\left(\boldsymbol{\Delta} \left({\hat{\boldsymbol{\Sigma}}}^{(1)}-{\hat{\boldsymbol{\Sigma}}}^{(0)}\right)\right)+\lambda {\sum}_{i,j}{\left|{\Delta }_{i,j}\right|}_1 $$

Here, *λ* controls the sparsity of the rewired network. For example, *λ* = 0 indicates no penalty and usually will result in a very dense network. In contrast, a large *λ* value will result in a sparse network.

#### Co-variance matrix inference

Under the Gaussian assumption, \( {\hat{\boldsymbol{\Sigma}}}^{(D)} \) can be directly obtained from the sample covariance matrix. However, One statistical concern of using a differential graphical model is the distribution of *X*^{(D)}. Here, we found that even after log transformation, almost all TFs severely contradicted the Gaussian assumption (for details see suppl. sect. S2). Therefore, going forward we used a non-parametric model instead of a GGM. Our assumption is that a set of monotonically increasing functions \( {\left\{{f}_j^{(D)}\right\}}_{j=1}^J \) exists such that, after transformation, \( {f}_1^{(D)}\left({X}_1^{(D)}\right),{f}_2^{(D)}\left({X}_2^{(D)}\right),\cdots, {f}_J^{(D)}\left({X}_J^{(D)}\right) \) follow a multivariate normal distribution *N*_{J}(**0**, **Σ**^{(D)}). Similar to the GGM, we can use the precision matrix **Θ**^{(D)} ≔ (**Σ**^{(D)})^{−1} to infer the conditional dependence between any pair of factors in the network. As described in [17], we adopted the rank-based scheme to estimate the sample covariance matrix without directly estimating \( {\left\{{f}_j^{(D)}\right\}}_{j=1}^J \). Specifically, let \( {r}_{li}^{(D)} \) represent the rank of bin *l* for TF *i* in status *D* among all the bins, and *n* is the total number of bins in the genome. The Spearman correlation of TFs *i* and *j* are represented as below.

$$ {\rho}_{ij}^{(D)}=\frac{\sum_{l=0}^n\left({r}_{li}^{(D)}-\frac{n+1}{2}\right)\left({r}_{lj}^{(D)}-\frac{n+1}{2}\right)}{\sqrt{\sum_{l=1}^n{\left({r}_{li}^{(D)}-\frac{n+1}{2}\right)}^2{\sum}_{l=1}^n{\left({r}_{li}^{(D)}-\frac{n+1}{2}\right)}^2}} $$

Then we replace the sample covariance matrix \( {\hat{\boldsymbol{\Sigma}}}^{(D)} \) by, \( {\hat{\mathbf{S}}}^{(D)} \) with elements

$$ {\hat{\mathbf{S}}}^{(D)}=\left\{\begin{array}{cc}2\sin \left(\frac{\pi }{6}{\hat{\rho}}_{ij}^{(D)}\right)& i\ne j\\ {}1& i=j\end{array}\right. $$

In cases where \( {\hat{\mathbf{S}}}^{(D)} \) was not positively semi-definite, we used a projection method as described in [17, 32].

### Model selection

It is critical to select an appropriate *λ* to reliably infer network changes in disease samples because different *λ* values can lead to different conclusions in downstream analyses. Researchers have proposed many methods, cross validation, Akaike information criterion (AIC), and Bayesian information criterion (BIC), to automatically select *λ* [33,34,35]. We chose a more interpretable approach in our ChIP-seq-based co-regulation network analysis than AIC and BIC, by using the Stability Approach to Regularization Selection (StARS) approach (Fig. 5) [15]. The key characteristic of this method is that it encourages the differential network to be inclusive to account for the true dynamics between disease and control networks, while guaranteeing an acceptable stability in the resultant differential network.

Specifically, we defined \( \Lambda =\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$\lambda $}\right. \) as an alternative parameter to control network density so that a larger Λ indicates a denser network. During the model section process, we start from subsampling part of the genome for *S* times. Specifically, during the *s*^{th} sampling, *X*^{(1), s} and *X*^{(0), s} represent the binding profile matrices. \( {\psi}_{i,j}^s\left(\Lambda \right)=1 \) if there is an edge between TF *i* and TF *j* in the rewired network under Λ, otherwise \( {\psi}_{i,j}^s\left(\Lambda \right)=0 \). In the *s* = 1, 2, ⋯, *S* randomly sampled datasets, we defined \( {\theta}_{i,j}\left(\Lambda \right)=\frac{1}{S}{\sum}_{s=1}^S{\psi}_{i,j}^s\left(\Lambda \right) \), and *ξ*_{i, j}(Λ) = 2*θ*_{i, j}(Λ){1 − *θ*_{i, j}(Λ)} to be the fraction of times the networks disagree with the existence of the edge (*i*, *j*). Then, the overall instability of the networks over the sampling sets is

$$ \hat{D}\left(\Lambda \right)=\frac{\sum_{i<j}{\xi}_{i,j}\left(\Lambda \right)}{\left(\begin{array}{c}J\\ {}2\end{array}\right)} $$

It is clear that \( \hat{D}(0)=0 \) in an empty network because there is no instability when there are no edges. In general, the network becomes denser and more instable as Λ goes larger. However, when the network becomes very dense and even fully connected, \( \hat{D}\left(\Lambda \right) \) goes smaller again and eventually reduces to zero. As suggested in [15], we used the monotone function \( \overline{D}\left(\Lambda \right)={\mathit{\sup}}_{0\ll t\ll \lambda}\hat{D}\left(\Lambda \right) \) to remove such artifact effect in an extremely dense network. As a result, the optimal Λ should be \( {\hat{\Lambda}}_{opt}=\mathit{\sup}\left\{\Lambda :\overline{D}\left(\Lambda \right)\le \beta \right\} \) for a predefine network instability measure *β*.

In our analysis, we started from a broad spectrum of parameter values from *λ*_{1} = 0.01, *λ*_{2} = 0.05, ⋯*λ*_{m}, ⋯, *λ*_{M} = 10, representing a wide range of sparse networks, from almost fully connected to empty (Fig. 4). For each *λ*_{m}, we randomly selected half of the bins we used in “Infer differential graphical model to TF-TF network rewiring” section to run the LASSO penalized D-Trace loss model. We repeated this process *S* = 100 times for each *λ*_{m} and calculated the average network variance. We used *β* = 0.5% as our stability threshold and selected the optional *λ* is 0.2.