Deriving a Boolean dynamics to reveal macrophage activation with in vitro temporal cytokine expression profiles

Background Macrophages show versatile functions in innate immunity, infectious diseases, and progression of cancers and cardiovascular diseases. These versatile functions of macrophages are conducted by different macrophage phenotypes classified as classically activated macrophages and alternatively activated macrophages due to different stimuli in the complex in vivo cytokine environment. Dissecting the regulation of macrophage activations will have a significant impact on disease progression and therapeutic strategy. Mathematical modeling of macrophage activation can improve the understanding of this biological process through quantitative analysis and provide guidance to facilitate future experimental design. However, few results have been reported for a complete model of macrophage activation patterns. Results We globally searched and reviewed literature for macrophage activation from PubMed databases and screened the published experimental results. Temporal in vitro macrophage cytokine expression profiles from published results were selected to establish Boolean network models for macrophage activation patterns in response to three different stimuli. A combination of modeling methods including clustering, binarization, linear programming (LP), Boolean function determination, and semi-tensor product was applied to establish Boolean networks to quantify three macrophage activation patterns. The structure of the networks was confirmed based on protein-protein-interaction databases, pathway databases, and published experimental results. Computational predictions of the network evolution were compared against real experimental results to validate the effectiveness of the Boolean network models. Conclusion Three macrophage activation core evolution maps were established based on the Boolean networks using Matlab. Cytokine signatures of macrophage activation patterns were identified, providing a possible determination of macrophage activations using extracellular cytokine measurements.


Background
Over the past 30 years, extensive research has been dedicated to investigating the role of macrophages due to its versatile functions in innate immunity, infectious diseases, and progression of cancers and cardiovascular diseases, the top 2 leading causes of death in the world [1][2][3][4]. Macrophages have bactericidal and phagocytic functions and regulate immune responses and the development of inflammation by secreting monokines including enzymes, complement proteins, and regulatory factors such as IL-1 (Interleukin-1), IL-6, and IL-12 [5]. M2 macrophages turn off the damaging immune system activation, function in constructive processes like wound healing and tissue repair, and coordinate the chronic inflammatory response by regulating downstream cellular functions (inflammation resolution, endothelial cell, and fibroblast activation) [4,[6][7][8]. These contradictory functions of macrophages were conducted by different macrophage phenotypes classified as M1 (classically activated macrophages) and M2 (alternatively activated macrophages) due to different stimuli in the complex in vivo cytokine environment.
Analysis of macrophage activations has been reported by others and our group. We have reported differential equation models on macrophage activation and effects of matrix metalloproteinase-9, and − 28 on macrophage activations [16][17][18]. Martin and colleagues have validated a Boolean dynamics of genetic regulatory network for LPS-stimulated macrophage activation using temporal microarray (mRNA) data at 8-time points (15mins, 30mins, 1 h, 2 h, 4 h, 8 h, 16 h, 32 h) [19]. Rex's group characterized the inflammatory gene expression patterns of LPS-stimulated M1 activation and IL-4 and IL-13stimulated M2 activation at 0.5, 1, 2, 6, 10, and 24 h, and modeled macrophage activations by combining Boolean dynamics and differential equations [20]. Though less than 50 genes were considered in each macrophage activation model, the limited 6 temporal measurements still led to an unavoidable difficulty in parameter estimation for differential equation models. Recently, transcriptional and post-transcriptional graphical networks of macrophage in healthy and diseased hearts were reported [21]. All these models focused on the regulatory networks at the gene level and gave a deeper insight of what happened inside a cell. Interestingly, all models reported significant cytokine markers for macrophage activations. Furthermore, in an in vivo situation, stimuli and final secretion products of activated macrophages stay outside the cells. It's natural and practical to define macrophage phenotypes with cytokine expressions for an Input-Output representation only based on cytokine expressions outside of the cells. Therefore, in vitro cytokine profiles from macrophage activations serve perfectly to explore such abstract models.
We globally searched "temporal" or "time series" and "macrophage activation" or "macrophage polarization" in PubMed, GEO datasets/profiles, and ProteomeExchange to screen specific macrophage activation temporal profiles with different stimuli. A total of 173 datasets were found for Mus musculus and 156 for Homo sapiens as of June 18, 2019. Eleven studies focused on temporal macrophage activations. Currently, complete temporal in vivo genome or proteome expression profiles of macrophage activation has not been reported. However, in vitro temporal data has been deposited in public databases to shed a light on macrophage activation mechanisms. A dataset from Melton's group was selected to establish a Boolean model because macrophage activation expression profiles of 27 cytokines for 4 groups (a control group and 3 activation groups with 3 different stimuli, LPS, IL-4, and IL-10) at 7 points (0 h, 0.5 h, 1 h, 3 h, 6 h, 12 h, 24 h) were documented [22]. This dataset provides extra information on time points, sub-activation groups, and cytokine expression levels to existing models on gene regulations [19][20][21].
In this study, expression levels of the selected differentially expressed cytokines were binarized with "1" (upregulated) and "0" (down-regulated). Linear programming algorithm was applied to determine the structure (links and interaction strengths) of a regulatory network for macrophage activations among the 27 selected cytokines under 3 different stimuli. Based on the obtained network structure, Boolean functions were determined for each cytokine in the network using a Karnaugh Map. For the first time, core networks for macrophage activation models were generated and mathematically represented with the semi-tensor product. The core network for M1, M2a, and M2c only contained 6, 7, and 5 proteins, respectively. Links of macrophage activation networks were validated based on public Protein-Protein Interaction (PPI) and Kegg pathway databases. The core network for each activation pattern was validated by the coincidence of predicted and temporal experimental results. A novel temporal evolution map for each macrophage activation model was also generated to show all Boolean states, suggesting possible evolutionary paths of the networks with different initial conditions.

Results
Curve fitting with smooth spline algorithm An illustration of 3 typical expression profiles by curve fitting was shown in Fig. 1. The 3 proteins are C-C motif chemokines ligand 12 (CCL12, also named as MCP-5) in M2c activation, CCL7 (MCP-3) in M1 activation, and chemokine (C-X-C motif) ligand 10 (CXCL10, also named as IP-10) in M2a activation. Interestingly, MCP-5 followed a similar profile as promotion, (c t 1þt Þ, where c represents a constant of its max amplitude with respect to time t (Fig. 1a). MCP-3 followed a bell shape curve similar to a Gaussian distribution in Fig. 1b, and IP-10 followed an inhibition relatable to, ( c 1 1þt þ biasÞ , in Fig. 1c. The bias denotes an offset term since an expression may not necessarily reach 0. The average R-Squared error of the fitting algorithm was 0.9087, 0.8566, and 0.8106 for M1, M2a, and M2c group, respectively. A temporal profile is considered as a promotion if its final expression level is more than twice or inhibition if it is less than half of the initial expression level, and a bell-curve otherwise. It's worth to mention that an expression profile demonstrates different patterns with different time periods due to activation dynamics. About 138 out of 324 (27x4x3, 27 protein expression measured from 4 replicates in 3 activation groups) protein expression profiles followed a promotion pattern, 27 had inhibition profiles, and 148 had a bell curve within the first 6 h as shown in Table 1. Examining the complete 24 h time span, 78 profiles fit a promotion, 110 fit an inhibition while 125 for a bell curve, suggesting significant protein up-regulation in the first 6 h and gradually evolve to stable polarizations. M2 activation profiles demonstrate more inhibition patterns compared to M1 activation in Table 1. Chemokine (C-X-C motif) ligand 1 (CXCL1/KC-GRO) did not or barely reached detectable levels in both M2a and M2c activations. These profiles were denoted as N/A in Table 1. Therefore, KC-GRO was not considered in either M2 activation models for undetectable expression levels. GM-CSF (Granulocyte-macrophage colony-stimulating factor) from some replicates reached a detectable level 12 h post-stimulation in M2a and M2c and was included in both M2 models. KC-GRO was significantly up-regulated in M1 activation, illustrating their role to identify M1 activation [23]. All other protein expression levels are either elevated or down-regulated comparing against the average of the control group at least 1 out of 7 time points with p-value < 0.05.

Binarization
Both normal K-means and iterative K-means algorithms were applied to the 50 re-sampled data from the fitted curves. Figure 2 gives an illustration of 2 clusters with the normal K-means method and the final binarization of the iterative K-means method. The Iterative K-means generates better binarization results than the normal Kmeans method since it identifies more significant peaks in the conversion. A binarization obtained from the smooth spline and iterative K-means is shown in Fig. 3. The blue stars represent the initial time points (0 h, 0.5 h, 1 h, 3 h, 6 h, 12 h, 24 h) in the dataset, the red solid line represents the curve fitting result, the green solid line gives the threshold from the iterative K-means, and the yellow solid line is the final binarization result. An error band centered at the binarization threshold is denoted by dashed lines for possible binarization errors.

Statistical analysis
All experimental measurements at 7 sample times have been analyzed in Melton's paper for statistical significance [22]. We confirmed their analysis with the Kolmogorov-Smirnov test (p < 0.01). We fitted temporal expression profiles for all genes with detectable expression levels in 3 stimulated groups and 1 control group. Two-sample Kolmogorov-Smirnov tests were performed on fitted temporal profiles of each gene given a stimulus versus the control with a significance value p < 0.01.
The fitted smooth spline functions are determined by minimizing the summation of squared errors at seven time points. The average error at these seven sample time points can be obtained in Matlab. Such error may lead to a false determination of binarization value. To test the sensitivity of binarization, an error band was formed as a binarization threshold ± error. If any of the 50 re-sampled data from the averaged fitted expression stays in the error band, there might be a false binarization and the re-sampled data is sensitive to binarization. The maximum likelihood for false binarization is less than 5% for all genes in 3 stimuli groups. No outlier is detected in our experimental and fitted data.

Determining Boolean functions
There are three assumptions in determining the Boolean functions of a network model. 1) Significant functional changes appear in a period larger than the simulation intervals; 2) The cycles of macrophage activation are larger than our simulation interval; 3) The Boolean network will have a fixed structure and regulation during the 24h response period in the experiment.
Since macrophage activations have the most significant changes in the first 9 h post-stimulation, we put 40 samples during the first 9 h, which led to a simulation interval of about 13.5 min. The other 10 out of 50 samples were evenly re-sampled from 9 to 24 h. Therefore, the simulation interval is 13.5 min in this study.
Currently, most of the published results show a period of 24 h for macrophages' responses to stimuli. In addition, Fig. 2 Comparison of binarization threshold using K-means and iterative K-means methods. Analyzing the same expression profiles in (a), the K-means method identified the first 5 peaks with relatively high expression levels in (b) while the iterative K-means method identified all peaks in (c) experimental data are collected with different time periods ranging from 15 min to hours [19][20][21]24]. Our simulation time interval is very similar to the minimum sampling time of available experimental measurements (15 min). This validated assumptions 1 and 2. Since biological regulations have demonstrated stability and robustness, assumption 3 is a general assumption for modeling.
Boolean functions are determined with respect to binarized values in a time period instead of a single distinct time point. We verified the Boolean function determined by the averaged gene expressions with two methods: agreement with Boolean function dependent on experimental results and agreement of promotion/inhibition determined by Linear Programming. A total of 23 out of 27 datasets (85%) have shown the agreement of Boolean functions determined by the average expressions of genes from 4 replicates with a total of 50 samples for 24 h poststimuli and linear programming results.
However, variations of temporal responses exist in biological systems [25]. The variations may be more severe during the transition between two different states. If there is any disagreement of Boolean function determined by Karnaugh map and experimental measurement or Linear programming results, we further determined the Boolean function among proteins with respect to each individual experimental replicate instead of the average of 4 replicates and enriched our samples to be 200. An illustrative example showing the agreement of Boolean functions determined by the averaged expression and individual replicates is shown in Fig. 4. In a small network with two inputs (IL-10 and MIP-1β) and one output (MIP-2), the averaged profiles have a missing state (IL-10 = 1 and MIP-1β = 0), so the Boolean function determined by the Karnaugh map showed MIP-2 = 0, contradicting to our experimental results. In this case, expression levels of 4 replicates are considered together to determine the Boolean function. Note that there are only 49 (average) and 196 (4 replicates) sample time points for the iterative evolution ends at sample 49 to predict the state at sample time point 50. The agreement of this Boolean function is about 90%.

Properties of the networks established with linear programming algorithm
Linear programming algorithm was applied to determine the network structure for 3 macrophage activation patterns. As shown in Table 2, the networks obtained were still complex. To simplify the network structure, a link between two nodes was removed if its weight calculated by linear programming was less than 70% of the maximum link weight described in the Method Section. The 70% threshold was chosen based on the criteria to minimize the number of links and maximize the summation of link weights. Properties of the networks established with original linear programming and the simplified network were shown in Table 2. Input nodes are proteins with no parent nodes and output nodes are proteins with no child nodes. All 3 simplified networks have fewer nodes and links than the original results obtained from the linear programming algorithm. In addition, the complexities of M1 and M2a polarization networks are less than the networks for M2c polarization, specifically for the number of links and network density. While the M2c activation network has 3 nodes with ≥4 parent nodes, these 3 nodes are all output nodes and will not affect the regulation of Boolean functions, suggesting that the 50 sample data should be sufficient to establish the Boolean functions.

LPS induced M1 activation network
The structure for the LPS induced M1 activation network was shown in Fig. 5. Boolean regulations for each node were shown in Table 3. An overbar denotes a negation, while a double vertical bar for a logic 'OR' and a '&' symbol representing a logic 'AND'. In addition, interaction strengths calculated by the linear programming algorithm for each link were listed in Table 3.
There are 26 proteins in this network. TIMP-1 (Tissue Inhibitor of Metalloproteinases) is the only protein removed from the 27 measured proteins as an orphan node. The two input nodes are FGF-9 (Fibroblast Growth Factor-9) and IL-4. FGF-9 has been known to be an inflammation promoter in multiple sclerosis [26]. Interestingly, IL-4, not LPS, was annotated as input for the M1 activation network. Though IL-4 is known as a stimuli for M2a activation and anti-inflammatory responses, the combination of LPS and IL-4 has been reported for its promotion of other inflammation cytokines including IL-6, CCL-1, CCL-3 (macrophage inflammatory protein 1-α, MIP-1α), CCL-4 (macrophage inflammatory protein 1-β, MIP-1β), CCL-5 (Regulated upon Activation, Normal T cell Expressed, and Secreted, RANTES), TNF-α, and IFN-γ, which are also included in our M1 activation network [44]. In addition, LPS has  been shown to induce IL-4 gene expression, which confirmed the rationale for IL-4 serving as an input for the M1 activation network [59]. The M1 activation network has 10 output nodes: GM-CSF, IL-2, IL-6, IL-7, IL-12p70, IL-17A, IP-10, OSM (Oncostatin M), TNF-α, and VEGF-A (Vascular Endothelial Growth Factor A). All these proteins' expressions are significantly upregulated compared to the control group. There were 5 feedback loops as shown in Fig. 5: 1) IFN-γ attractor; 2) from IFN-γ to itself through MCP-1 The network for M1 activation includes 26 nodes with each node representing a protein (blue block), a black line with an arrow for a positive connection from a parent node to a child node, and logic negation being a red dashed connection with a 'T' arrow. The core network is shown as the diamond nodes in the network (monocyte chemoattractant protein), MCP-5, Lymphotactin, IL-1α, and SCF (Stem Cell Factor); 3) from IFN-γ to itself through Lymphotactin, IL-1α, and SCF; 4) a loop between Lymphotactin and MCP-5, and 5) a loop between MCP-3 and RANTES. Interestingly, the first three loops illustrated an IFN-γ attractor self-promotion loop during macrophage activation [60]. STRING database showed MCP-3 and RANTES, MCP-5 and Lymphotactin are interaction pairs. Figure 5 showed the complete Boolean network generated with our temporal profiles. The 10 output nodes (from the leftmost node IL-7 to the rightmost node VEGF-A at the bottom of Fig. 5) do not affect the regulations of the network and their states can be determined by their parent nodes. In addition, the signal flows from FGF-9 to IL-11 to MIP-1β and from SCF to IL-3 to MIP-1β form single direction forward links, therefore, given the status of input node FGF-9, the

Validation of LPS induced M1 activation network model
status of MIP-1β can be determined together with SCF. Similarly, signal flow from IL-4 to IL-10 to KC-GRO forms single directional links. The input node IL-4 and Lymphotactin expression levels determine IL-10 and KC-GRO expression levels, suggesting Lymphotactin should be a major in evolution and IL-10 and KC-GRO are dependent variables in the simulations and can be removed. Therefore, removing output and single direction transduction nodes led to a simplified network with only 8 proteins for two sub-networks: MCP-3 -RANTES loop for sub-network 1 and IFN-γ, MCP-1, MCP-5, Lymphotactin, IL-1α, and stem cell factor (SCF) for a core 6-protein (in diamond shapes) network as shown in Fig. 5. States of other nodes in the M1 activation network can be determined by the states of these 8 proteins and the input nodes. The state transition map for the M1 activation core network was shown in Fig. 6 64 and these states were illustrated in red boxes in Fig. 6. The binary values of these 5 transition states were shown in Table 4. The state numbers are illustrated in two's complement representation. Temporal transitions for the 6 proteins with binary states and sampled data were shown in Fig. 7.
It's worth to mention, not all 4 experiments for M1 activation converged at the same time. One replication of this experiment reached the state e 58 64 while the other 3 ended at the state e 62 64 . The coincidence of state transition from experimental results and computational results demonstrated the effectiveness of the core network model of M1 activation. Fig. 6 The evolution map of the core network for M1 activation showed all possible pathways predicted by the M1 Boolean network. The evolution of 64 (64 = 2 6 ) states for 6 nodes in the core network were illustrated from different initial conditions. The states in red boxes represent the evolution path based on the temporal cytokine profiles in this study

M2 activation networks
Boolean networks for IL-4 induced M2a and IL-10 induced M2c activations have been established and shown in Fig. 8 and Fig. 9, respectively. M2 activations were more complicated than M1 activation due to variant stimuli and M2 subtypes [11,14]. These were represented by the increased network density and number of links in the M2c activation network compared against the M1 activation network as shown in Table 2.
Though M2a and M2c networks shared common responses, different subtypes also illustrated specific responses [3]. In the M2a and M2c activation networks, there were 23 common nodes as shown in Tables 5 and 6. The only two different proteins between M2a and M2c are TIMP-1 and IL-11. TIMP-1 was included in both M2a and M2c networks from our linear programming results.
However, TIMP-1 was an orphan node in the M2c network and not shown in Table 6 and Fig. 9. Previous results have shown that TIMP-1 is consistently produced by macrophages and further induced by LPS (M1) and IL-10 (M2c) in fully differentiated macrophages [128,129]. Additionally, TIMP-1 was down-regulated by IL-4 in M2a [130]. Since TIMP-1 functions by forming one-to-one complexes with target metalloproteinases and there is no measurement on metalloproteinases in this study, TIMP-1 is classified as an orphan node. IL-11 is included in the M2c but not the M2a network in this study. IL-11 is a member of the IL-6-type cytokine family and may provide a link between inflammation and cancer [131,132]. The role of IL-11 in macrophage activation has not been well studied yet.
M2a network has one input, IL-1α, which remains low consistently in the binarized profiles [71]. The M2c network has no input node. The shared output nodes for M2a and M2c include GM-CSF, IL-3, IL-6, IL-7, IL-12p70, IL-17, TNF-α, and OSM. IL-6 is an inflammatory cytokine and has demonstrated a consistently low profile in both M2 activation models [133,134]. The low expression profile of IL-6 also indicates low expression profiles of Lymphotactin and IL-1a. TNF-α promotes the activation of inflammatory M1 cells and has a decreased value in M2 cells, which was also demonstrated by our M2 activation model [135,136]. In addition, IL-6 and TNF-α, outputs of M2 macrophage, can reduce  inflammation and kill the pathogen [134]. IL-12p70 is another common output that is elevated in M1 activation and remains as a logic low in both M2 models [137]. It has been reported that IL-17 induced the production of pro-inflammatory cytokines by human macrophage, but the quantitative analysis is still needed [138].

Validation of IL-4 induced M2a and IL-10 induced M2c activation models
Parent and child nodes, Boolean functions, attribution of each node, interaction strengths from linear programming algorithm, and published results confirming the links in M2a and M2c networks were also shown in Tables 5 and 6, respectively. The majority of the links can be verified based on the KEGG pathway or protein interaction database. The established M2a Boolean network includes a link between MIP-2 (Chemokine (C-X-C) motif ligand 2-CXCL-2) and FGF-9 which cannot be validated by STRING or KEGG database. The regulation between MIP-2 and FGF-9 is the most complicated Boolean function in all three activation networks. However, it has been reported that FGF-9 enhanced M2 differentiation post-myocardial infarction accompanying with significantly elevated IL-10 secretion [139]. MIP-2 as an inflammatory protein is typically secreted by M1 macrophages. Another un-validated link in the M2a network is inhibition of IL-7 by MCP-5. Since MCP-5 is typically expressed in M1 and IL-7 in M2 [10,11,13,136,140], we consider these as meaningful Boolean regulations and keep them in our network model. In addition, all these unvalidated links are links to output nodes in the network, which will not affect the core regulation. Once the expression profiles in the experiments agree with the Boolean function, the links are kept in the established M2a activation network.
The only un-validated link in the M2c network is MIP-1β to FGF-9. Again, the experimental results illustrated the effectiveness of the Boolean regulation and the link was kept in the network model. FGF-9 demonstrated elevated expression levels in three macrophage activation networks, while LPS stimulated macrophages produced much more FGF-9 than M2a and M2c [22].
M2 activation models were further validated by temporal transitions of their Boolean network models. M2a and M2c activation networks were simplified by  Figure S2).
Simplified M2c network contains a 5-protein core network including IL-11, IL-1α, SCF, MCP-5, and IFN-γ as shown in Fig. 9. This core network has MCP-5 and SCF self-inhibition loop, IL-1α, and IL-11 interaction loop, IL-1a and IFN-γ interaction loop, and feedback loops among IL-1α, IL-11, SCF, IFN-γ. The link between IFN-γ and IL-1α was obtained by removing MCP-3 since MCP-3 is just a signal transduction state in this branch. Separated core networks for M1, M2a, and M2c activation were also available in (Additional file 1: Figure S1, S2, and S3). All these interaction loops were confirmed by previous studies or the KEGG pathway as shown in Table 6.
Evolutionary maps of M2a and M2c were shown in Fig. 10 and Fig. 11, respectively. The evolution of M2a has a state transition map that contains 128 = 2 7 states while M2c has a state transition map for 32 = 2 5 states. The binarized M2a temporal profiles in our dataset showed the transition from e 3 128 →e 4 128 →e 20 128 →e 84 128 →e 120 128 and these states were illustrated in red boxes in Fig. 10.
The binary values of these 7 states were shown in Table 7. M2c network has binarized transitions from e 1 32 →e 5 32 →e 13 32 →e 10 32 →e 4 32 →e 24 32 →e 32 32 →e 28 32 illustrated in red boxes in Fig. 11. The binary values of these 5 states were shown in Table 8. The computational binary state transition was compared with temporal experimental profiles and converged to experimental results.

Discussion
The most significant contribution of this study includes the following facts. We established and validated novel Boolean networks for 3 macrophage activation patterns using in vitro temporal cytokine profiles. Our results confirmed 1 stable M1 macrophage activation state and 2 stable M2 activation states, which agreed with reported in vitro classification on macrophage activation patterns. Validation of the models was conducted based on the coincidence of predicted cytokine expression levels, public databases, and reported experimental results. The literature review confirmed the majority of links in the Boolean networks for M1, M2a, and M2c network models as shown in Tables 3, 5, and 6, respectively.  9 The network for M2c activation includes 24 nodes with each node representing a protein (blue block), a black line with an arrow for a positive connection from a parent node to a child node, and logic negation being a red dashed connection with a 'T' arrow. The core network is shown as the diamond nodes in the network More importantly, our results demonstrated possible key factors IFN-γ, IL-1α, Lymphotactin, MCP-1, MCP-5, and SCF for M1 activation, FGF-9, MCP-1, MCP-5, MIP-1β, MIP-2, SCF, and VEGF-A for M2a activation IFN-γ, IL-1α, IL-11, MCP-5, and SCF for M2c activation. Further, expression profiles of these cytokines can serve as signatures for macrophage activation patterns since other protein expressions can be determined by proteins in the core network and input proteins of the network. These core signatures provide a possible mixed combination for macrophage activation markers instead of using only static individual markers. Additionally, cytokine markers might be more useful for in vivo study for easy and less costly measurements.
Robustness of Boolean networks is an important research aspect while only a few results have been reported [141][142][143]. These results have shown a large basin for the robustness and stability of a Boolean network. In this study, the simulated biggest core network for M1 macrophage activation has two stable states: e 7 64 and e 58 64 as shown in Fig. 6. The state e 7 64 represents a state when MCP-5, Lymphotactin, IL-1α, and MCP-1 have elevated expression levels, SCF and IFN-γ have decreased expression levels. This is a single state attractor meaning the Boolean network starting from this initial state will stay here forever. All of the other 63 states of the network will evolve to the state of e 58 64 , which is a controversial state of e 7 64 (SCF and IFN-γ are elevated and other 4 proteins are down-regulated). Similarly, the core network for M2a has two stable states e 88 128 , and e 1 128 (Fig. 10). In the state of e 1 128 ; all 7 proteins have elevated expression levels. The core network of M2c activation has only one stable state e 28 32 as shown in Fig. 11. All these have demonstrated the robustness of the established Boolean networks.
The STRING PPI database integrates several PPI databases and is the most widely adopted database for protein-protein interactions. We only use the STRING PPI database to confirm possible interactions among proteins in the established Boolean networks. Since the STRING PPI database does not provide directional information for protein-protein interactions, the directional information of our Boolean networks is generated by the linear programming algorithm computationally. The confirmation of directions in the network can be indirectly confirmed by 1) their positions in up-or downstream of pathways since cytokines can be either stimuli or production of a pathway; and 2) the temporal sequence of significantly elevated or inhibited expressions in biological experiments.

Conclusion
The proposed method can also be applied to large scale datasets to establish regulatory Boolean networks for other biological processes. Due to the limited sampling time points in most of the deposited experimental datasets, Boolean networks might be much more promising than traditional differential equation models, which require determination and estimation on a huge amount of parameters. In addition, with measurements at limited time points, deep learning might not be a good approach to attack specific biological processes at the current stage. It's worth to mention, this method can be applied to a small set of key factors extracted from big data, and the following computational prediction and validation of the small networks are much easier than the big network. Currently, our software package (Matlab) can easily handle the evolution of a Boolean network with 11 nodes, leading to a total of 2 11 transition states using a normal workstation. For larger Boolean networks with more than 11 nodes, highperformance computing is more suitable.
We did notice some limitations of the proposed modeling method due to noisy/outlier functions. Outliers can still cause the linear programming to gather an extra connection, drop a connection or give a wrong sign to a connection. Additionally, a time delay can also cause problems in determining Boolean regulations depending on the amount of delay. For example, if protein A promotes protein B in a motif, up-regulation of protein A should give an instant high expression of protein B in the mathematical model. However, time delays and variations of temporal responses are very common in biological responses. One of our modeling assumptions is the Boolean network is fixed without changing nodes and changing Boolean functions in the response time period. Accordingly, Boolean functions are determined with respect to the whole experimental time period, not the single time point In this study. Therefore, small time-delays or variations of temporal response may affect the Boolean regulation at a short period of time, but won't affect the Boolean regulation in the whole Fig. 10 The evolution map of the core network for M2a activation showed all pathways predicted by the M2a activation Boolean network. The evolution of 128 (128 = 2 7 ) states for 7 nodes in the core network were illustrated from different initial conditions. The states in red boxes represented the evolution path based on the temporal cytokine profiles in this study time period. However, if experimental results have a long delay on the up-regulation of protein B, incorrect Boolean functions may be selected. Finally, the model will be limited to the proteins that were measured in the experiment. Any other proteins not sampled were left out. Further studies will be conducted to integrate crossplatform data for a more complete macrophage activation model in the future.
Three Boolean network models were established to elucidate the regulation of M1, M2a, and M2c macrophage activations. Prediction of the Boolean network models agree with experimental results and validated the established models. Signatures of core cytokine profiles were determined, which provided a possible examination for macrophage activations based only on the cell productions.
All proteins in the established networks are cytokines, which are small, nonstructural proteins, including interleukins, chemokines, interferons, and tumor necrosis factors [144]. A total of 34 cytokines have been reported in the literature and are secreted by different cell types including macrophages, T-cells, neutrophils, and fibroblasts. In the manuscript by Melton's group, 24 proteins were reported. IL-1α, IL-6, and TNF-α were measured in their experiments and included in their dataset, but not reported because the authors thought these three cytokines were related to M2b activation [22]. All 27 cytokines secreted by macrophages were used to establish the Boolean network model in this study. Due to the limited temporal measurements, a curve fitting algorithm was applied to enrich the dataset and this dataset served as inputs for a linear programming algorithm to find the most significant protein (node) and connections (links) for network structure. The same enriched data was also analyzed by an iterative K-means method to find the threshold for binarization. Thus, a network could be established where each node represents a protein and each link between the nodes represents an association of two proteins. Meanwhile, the expression level of each node was defined as a Boolean variable '0' or '1'. A Karnaugh map was used to determine the Boolean regulatory functions among these nodes/variables to create a Boolean network model. Verification of the regulations was conducted based on published interactions from PPI databases and literature searches. The predicted evolution of each Boolean network model for a macrophage activation pattern was further validated with public databases and published results.

Curve fitting
The smooth spine algorithm in Matlab was applied to approximate continuous secretion profiles for 27 proteins. The algorithm was chosen due to its ability to construct an accurate curve with noisy data. By minimizing the squared error between the constructed data and experimental data at the sampling time points, optimized fitting curvature was obtained using the following equation [145]: where p denotes the smoothing parameter controlling the level between the smoothness of the function and the error between the experimental data x i and the value of the fitting function s(t i ) at chosen time points t i . Parameter a i represents the weight of the error, which is set to be the default value in MATLAB. Once the continuous fitting function s(t i ) was gathered, we divided the 24-h time span into 50-time points with 40 being evenly sampled from the initial reading to the 9th hour and the last 10 samples coming from the 9th hour to the final 24. This was set because the majority of regulations occurred within the first few hours [19][20][21][22]. No negative value is allowed in the fitting algorithm to match with real biological situations.

Binarization
In a Boolean network model, protein expression is either up-regulated or down-regulated, therefore, binarization of the fitting function is needed. Any protein with similar expression levels during the 24 h time period is considered non-significant and will not be modeled. The iterative K-means method was used to cluster the expression levels for each protein and determine an activation threshold for each stimulus and experiment [146,147]. The objective function for iterative K-means method using the absolute difference from the point to the center of the cluster was defined as: Where k is the cluster number, C j is the centroid of the j th cluster, i represents each sample, and n is the total number of samples (50 in this study). We started with 8 clusters for iterative K-Means method and reduced to 2 clusters. The average of the mean distance from each cluster was used as the activation threshold for a specific protein. Any expression level below the activation threshold is defined as "0" and above the threshold is defined as "1" for binarization.
The threshold could be a value of 0 if there was no reading at any of the time points, or no significant change in protein expression levels. However, if a threshold of 0 will be considered in our modeling procedure, any non-zero low expression measurement will be considered active. To avoid such a conflict, a threshold of value "0" was changed to a low value claimed by the detection range of the biological experiment, such as 1 × 10 − 6 in this study.

Linear programming algorithm
A network with connections among 27 proteins and each interaction strength was determined by linear programming method with an R package [148,149]. In the network model, each protein and interaction among proteins is represented as a node with a link. Each node is assigned with an activity level x i ϵ R + which is the expression level of each protein. Each link is assigned a weight, w j, i , form parent node j to the child node i. The weights w j, i ϵ R with w j, i > 0 representing promotion and w j,I < 0 for inhibition. Also, each protein has an activation threshold, δ i, determined by the Binarization method above. The activity level of each node x i can be determined by its initial status w 0 i , the activity level of a parent node x j and contribution weight w j, i from the parent node as Considering biological networks are sparse, we minimize the sum of the absolute link weights X i; j j w j;i j, the bias term P w 0 i , and noise effects denoted by ζ l . A random variable ζ l ϵ R 0 +, l = 1., …, L, is introduced to accounts for data variations in the experimental data. L represents the set of inactive proteins in the experimental data and coincides with the number of constraints that might violate the linear programming. The objective function was defined as: subject to the constraints of ∀ði; mÞ where x i;m ≥ δ i : ∀ði; mÞ where x i;m < δ i : where the constraints are defined for each experiment performed, mϵ 1,2,3,4, representing the 4 replicates in each experimental group, and for each time point from the enriched fitting data. Constraint (5) selected proteins up-regulating specific protein while constraint (6) looks for proteins down-regulating the protein. Effects of the data variation are minimized by the term 1 λ X l ζ l in the objective function (4).
In the objective function, the production term, |w j, i |x j shows the contribution to the child node expression x i level from the parent node x j .Two parameters γ and λ are non-negative weights of the bias term and slack variables representing the penalties on the initial values and data variations in the objective function. For an ideal parameter, λ, 5-fold cross-validation was used with the smallest mean square error limiting the value of λ a maximum set of Lσ 2 (x j,m ), where σ is the variance of a variable. Parameter Υ in the objective function (4) represents the weights on the bias w 0 i . A bigger value of Υ (Υ = 100) in this study will put more penalty on the link of the proteins rather than bias. Minimizing this term helps to find the most significant connection to the child node, x i.
We further modified the linear programming algorithm to simplify the network structure. For each child node x i , a weighted contribution |w j, i |x j from a parent node (x j ) is calculated. The maximum value of all weighted contribution from a parent node can be determined. For any node, if a value of |w j, i |x j is larger than 70% of the maximum value related to the node, the link from node j to node i is kept; In addition, orphan nodes and a node without significant changes in expression levels were not included in our model. The final output of the modified linear programming is a set of nodes and links with interaction strength w j, i . Positive w j, i means promotion while negative w j, i means inhibition.

Establish the Boolean logic function
Theoretically, to determine a Boolean regulation with 3 nodes, 8 (2 3 ) states representing the status of the 3 nodes are needed. It means at least 8-time points are needed and each time point represents a unique status of the network. Since we only have data at 7-time points, curving fitting with respect to time and resample the fitted data at different time points is reasonable to enrich the dataset. The 50 samples generated from the fitted data allows us to determine a Boolean function for up to 5 nodes ideally since 2 5 = 32 < 50. Considering the same status of the system may occur at different time points and most biological sub-networks have dual or triplet nodes, a maximum 5-node sub-network motif is expected with 50-time points in this study. It's worth to mention, that the number of time points to be selected depends on how fast the dynamics evolve and the complexity of the network, is not the focus of this study and won't be discussed in detail.
Based on assigning expression levels '0' or '1' to each node, a truth table of a motif including parents and child nodes can be established. Boolean regulation between parent and child node was determined by the Karnaugh map.
If there is a mismatch of promotion or inhibition between Boolean relation obtained from Karnaugh Map and linear programming or experimental profiles, then an extended 200 samples were re-examined (50 from each replicate and 4 replicates in each group) to determine the Boolean function instead of using the average. This can occur when one set of data has a large variation from the other 3 replicates, causing a skew in the average or when a time delay affects promotion and inhibition responses. The 200-sample-set was also used to determine if self-regulation could have been an option, which was mainly done on the results that always remained logic high, occurring once in the M1 cell and validated with experimental results.

Mathematical modeling and validation
The linear programming approach determined the nodes and possible connections in a network. Boolean regulations among nodes were determined by the Karnaugh map and represented by Boolean functions. With each node in the network as a Boolean variable X, the evolution of the status of a network can be written as: where k represents current time point and k +1 for next time point, X (k) and X (k + 1) represent current and future states of the network, M is the state transition matrix determined by Boolean functions and ⋉ represents the semi-tensor product [150]. We and other research groups have established computational tools for Boolean network models that are available for free download [150][151][152]. .An example of determining a 3node network is shown in Fig. 12. The 8 states, state transition matrix, and evolution of the 8 states for the 3node network was computed with our own software package [151]. All codes associated with macrophage polarization can be found in our GitHub link (https:// github.com/RicardoRamirez2020/Macrophage_Boolean_ Network_LP).

Establishment of a core network
Dynamics of Boolean networks become more complicated with increasing numbers of nodes and Boolean functions, resulting in difficulties in understanding and analysis of the network. Given a network with N nodes, there exist 2 N states of the network and the state transition matrix will be 2 N by 2 N dimension, indicating a huge computational cost. Simplification of a Boolean network for easy analysis has been conducted to reduce both the number of nodes and functions while still keeping the capability of predicting all states of the network. It's worth to mention, output nodes of a network only have incoming signals and do not affect the regulations of the network. In addition, if multiple nodes are connected by single direction links, all intermediate nodes only transduce the signal with promotion (keep the signal) or inhibition (Not operation). These nodes with single direction connections can be simplified by keeping the starting and ending nodes and the corresponding Boolean function can be derived with basic logic operations. Therefore, by removing the output and intermediate transduction nodes, a core network can be established to represent a Boolean network with much simpler Boolean functions and fewer states for simulation. Meanwhile, with the determined states of a core network, states of removed nodes can be reconstructed. Fig. 12 The Boolean functions and evolutionary map for a 3-node network were illustrated