### Definitions

A Boolean model *B* consists of *n* genes *x*
_{1}, …, *x*
_{
n
} and *n* update functions *f*
_{1}, …, *f*
_{
n
} : {0, 1}^{n} → {0, 1}, with each *f*
_{
i
} being associated with gene *x*
_{
i
} (Fig. 1a). Each gene *x*
_{
i
} corresponds to a binary variable representing the expression value of the gene, i.e. *x* ∈ {0, 1}. Gene *x*
_{
i
} is a target gene when it acts as a response variable and an input gene when it acts as a predictor variable. Each update function *f*
_{
i
} can be evaluated to give a value to a target gene *x*
_{
i
}, and is expressed in terms of Boolean logic by specifying the relationships among a subset of the input genes *x*
_{1}, …, *x*
_{
n
} using Boolean operators AND (∧), OR (∨) and NOT (¬). An update function *f*
_{
i
} consists of an activation clause and an inhibition clause in the form of:

$$ \left(\mathit{\mathsf{activation}}\kern0.5em \mathit{\mathsf{clause}}\right) \wedge \neg \kern0.5em \left(\mathit{\mathsf{inhibition}}\kern0.5em \mathit{\mathsf{clause}}\right) $$

Each clause is individually expressed in disjunctive normal form, (*u*
_{1}) ∨ (*u*
_{2}) ∨ (*u*
_{3}) ∨ … ∨ (*u*
_{
n
}), where *u* represents a slot which can either take in a single input gene *x*
_{
i
} or a conjunction of two input genes *x*
_{
i
} ∧ *x*
_{
i + 1}. An example update function *f*
_{1}(*s*
_{
t
}) for a target gene *x*
_{1} with an input state *s*
_{
t
} is given below:

$$ {\mathit{\mathsf{x}}}_{\mathsf{1}} = {\mathit{\mathsf{f}}}_{\mathsf{1}}\left({\mathit{\mathsf{s}}}_{\mathit{\mathsf{t}}}\right)=\left(\left({\mathit{\mathsf{x}}}_{\mathsf{3}}\wedge {\mathit{\mathsf{x}}}_{\mathsf{4}}\right)\right)\wedge \neg \kern0.5em \left(\left({\mathit{\mathsf{x}}}_{\mathsf{5}}\right)\vee \left({\mathit{\mathsf{x}}}_{\mathsf{2}}\wedge {\mathit{\mathsf{x}}}_{\mathsf{9}}\right)\right) $$

A few constraints are imposed on the update functions during model learning in BTR. Firstly, the update function allows a conjunction of up to two input genes in each slot *u*. Secondly, each input gene *x*
_{
i
} can only be present in a single update function once, but the same input gene *x*
_{
i
} can be present in multiple update functions. Thirdly, a user is able to specify a soft limit on the number of input genes (i.e. in-degree) allowed per update function, where the default in BTR is 6 in-degree per gene. Lastly, by default no self-loop is allowed in BTR.

A model state given by a Boolean model *B* is represented by a Boolean vector *s*
_{
t
} = {*x*
_{1t
}, …, *x*
_{
nt
}} at simulation step *t*. A model state space *S* represents the set of all model states *s*
_{
t
} reachable from an initial model state *s*
_{1}, i.e. *S* = {*s*
_{1}, …, *s*
_{
t
}}. *S* can be obtained by simulating the model *B* starting from an initial model state *s*
_{1} using the asynchronous update scheme. The asynchronous update scheme specifies that at most one gene is updated between two consecutive states (Fig. 1b). Assuming we have a model state *s*
_{t} which is not a steady state, there will be *i* (*i* ≥ 1) genes in *s*
_{
t
} such that *x*
_{
it
} ≠ *f*
_{
i
}(*s*
_{
t
}). Therefore at simulation step *t* + 1, *s*
_{
t + 1} would have *i* possible configurations *s*
^{i}_{
t + 1}
, where *s*
^{i}_{
t + 1}
= {*x*
_{1t
}, …, *f*
_{
i
}(*s*
_{
t
}), …, *x*
_{
nt
}}. This simulation is repeated until it reaches a steady state. By definition, steady states are a set of states whose destination states also belong to the same set. That is, a steady state may be a single model state *s*
_{
t
}, or it may consist of a cyclic sequence of model states *s*
_{
t
}, …, *s*
_{
t + j
}.

The single-cell expression data used in this study are each a matrix consisting of *n* individual genes in the columns and *k* individual cells in the rows. The expression data are normalised and standardised to give *y*
_{
kn
} ∈ [0, 1]. A data state *v*
_{
k
} = {*y*
_{1}, …, *y*
_{
n
}} represents the expression state of cell *k* for *n* genes that are observed in the cell. A data state space *V* = {*v*
_{1}, …, *v*
_{
k
}} represents the set of all data states that are observed in an experiment.

### BTR model learning

The aim of BTR is to identify a Boolean model *B* with *x*
_{
n
} genes and *f*
_{
n
} update functions, that can produce a model state space which closely resembles an independent single-cell expression data (i.e. data state space). Note that model state space and data state space are defined in a similar way, the only difference being that the *n* genes take continuous values in [0, 1] within a data state, while the *n* genes take binary values 0 and 1 in a model state. The distance between model and data state spaces is measured by the pairwise distance between pairs of model and data states, as stated in the scoring function (See below). By iteratively modifying an initial Boolean model *B*
_{1}, the distance between the model and data state spaces can be minimised until a resulting final Boolean model *B*
_{
f
} with less distance is obtained.

BTR performs model learning by utilising techniques in discrete optimisation framework. In any optimisation problem, there are two important components, namely a scoring function and a search strategy.

#### BSS Scoring function in BTR

The scoring function used in BTR is a novel scoring function we developed, termed as Boolean state space (BSS) scoring function. BSS scoring function *g*(*S*, *V*) is a distance function, which consists of a base distance variable and two penalty variables. *g*(*S*, *V*) is given by:

$$ \mathit{\mathsf{g}}\left(\mathit{\mathsf{S}},\ \mathit{\mathsf{V}}\right)=\mathit{\mathsf{h}}\left(\mathit{\mathsf{S}},\ \mathit{\mathsf{V}}\right) + {\lambda}_{\mathsf{1}}{\varepsilon}_{\mathsf{1}} + {\lambda}_{\mathsf{2}}{\varepsilon}_{\mathsf{2}} $$

Where *h*(*S*, *V*) = base distance, *ε* = penalty variable, *λ* = constant for penalty variable.

The base distance *h*(*S*, *V*) is given by the following equation. To prevent multiple model states from matching to a single data state, one-to-one matching between model and data states is enforced if the number of data states, *N*
_{
v
}, are more than or equal to the number of model states, *N*
_{
s
}, i.e. *N*
_{
v
} ≥ *N*
_{
s
}. For cases where *N*
_{
v
} < *N*
_{
s
}, one-to-one matching between model and data states is enforced greedily up until the point where every data states have been assigned a matching model state, then non-unique matching will occur for the remaining model states with respect to each corresponding data state with the minimum distance.

$$ \mathit{\mathsf{h}}\left(\mathit{\mathsf{S}},\ \mathit{\mathsf{V}}\right) = \frac{{\displaystyle {\sum}_{\mathit{\mathsf{t}}=\mathsf{1}}^{{\mathit{\mathsf{N}}}_{\mathit{\mathsf{s}}}}}\mathit{\mathsf{m}}\mathit{\mathsf{i}}{\mathit{\mathsf{n}}}_{\mathit{\mathsf{k}}=\mathsf{1}}^{\mathit{\mathsf{N}}\mathit{\mathsf{v}}}\left(\mathit{\mathsf{d}}\left({\mathit{\mathsf{s}}}_{\mathit{\mathsf{t}}},\ {\mathit{\mathsf{v}}}_{\mathit{\mathsf{k}}}\right)\right)}{{\mathit{\mathsf{N}}}_{\mathit{\mathsf{s}}}\ \mathit{\mathsf{n}}} $$

Where \( \mathit{\mathsf{d}}\left({\mathit{\mathsf{s}}}_{\mathit{\mathsf{t}}},\ {\mathit{\mathsf{v}}}_{\mathit{\mathsf{k}}}\right) \) = pairwise distance between each model state *s*
_{
t
} and data state *v*
_{
k
} (0 ≤ *d*(*s*
_{
t
}, *v*
_{
k
}) ≤ 1), *N*
_{
s
} = number of model states, *N*
_{
v
} = number of data states, *n* = number of genes.

The distance between model state *s*
_{
t
} and data state *v*
_{
k
}, *d*(*s*
_{
t
}, *v*
_{
k
}), is defined as the sum of the absolute differences between values of each gene *i* in model state *s*
_{
t
} and data state *v*
_{
k
}.

$$ \mathit{\mathsf{d}}\left({\mathit{\mathsf{s}}}_{\mathit{\mathsf{t}}},\ {\mathit{\mathsf{v}}}_{\mathit{\mathsf{k}}}\right) = {\displaystyle \sum_{\mathit{\mathsf{i}}=\mathsf{1}}^{\mathit{\mathsf{n}}}}\left|\ {\mathit{\mathsf{x}}}_{\mathit{\mathsf{t}}\mathit{\mathsf{i}}} - {\mathit{\mathsf{y}}}_{\mathit{\mathsf{k}}\mathit{\mathsf{i}}}\right| $$

Where *x*
_{
ti
} ∈ {0, 1} is the value of gene *i* in model state *s*
_{
t
} and *y*
_{
ki
} ∈ [0, 1] is the value of gene *i* in data state *v*
_{
k
}.

The two penalty variables, *ε*
_{1} and *ε*
_{2}, in *g*(*S*, *V*) are used to prevent underfitting and overfitting. *ε*
_{1} penalises depending on the proportions of 0 s, *p*
_{0}, and 1 s, *p*
_{1}, across all genes and all states in a model state space. The concept of *ε*
_{1} is that it penalises complexity in Boolean models by their simulated model state spaces. We have shown that as a Boolean model becomes more complex (i.e. increase in the number of edges), both *p*
_{0} and *p*
_{1} of its model state space will become closer to 0.5 (See Additional file 5: Figure S4), therefore making *ε*
_{1} a good penalty for model complexity.

$$ {\varepsilon}_{\mathsf{1}} = {\mathit{\mathsf{e}}}^{-\mathit{\mathsf{a}}},\ \mathit{\mathsf{where}}\kern0.5em \mathit{\mathsf{a}}={\displaystyle \sum_{\mathit{\mathsf{i}}\in \left\{\mathsf{0},\ \mathsf{1}\right\}}}\frac{{\left({\mathit{\mathsf{p}}}_{\mathit{\mathsf{i}}}-\mathsf{0.5}\right)}^{\mathsf{2}}}{\mathsf{0}.\mathsf{5}} $$

*ε*
_{2} penalises based on the number of input genes present in each of the update function *f*
_{
i
} in a Boolean model *B*, given a specified threshold *z*
_{
max
}.

$$ {\varepsilon}_{\mathsf{2}} = {\displaystyle \sum_{\mathit{\mathsf{i}}=\mathsf{1}}^{\mathit{\mathsf{n}}}}{\mathit{\mathsf{w}}}_{\mathit{\mathsf{i}}} $$

Where *w*
_{
i
} the penalty for each update function *f*
_{
i
} is given by:

$$ {\mathit{\mathsf{w}}}_{\mathit{\mathsf{i}}} = \left\{\begin{array}{ll}\ \frac{{\mathit{\mathsf{z}}}_{\mathit{\mathsf{i}}}-{\mathit{\mathsf{z}}}_{\mathit{\mathsf{max}}}}{\mathit{\mathsf{n}}}, \hfill & \mathit{\mathsf{i}\mathsf{f}}\ {\mathit{\mathsf{z}}}_{\mathit{\mathsf{i}}}>{\mathit{\mathsf{z}}}_{\mathit{\mathsf{max}}}\hfill \\ {}\mathsf{0}, \hfill & \mathit{\mathsf{i}\mathsf{f}}\ {\mathit{\mathsf{z}}}_{\mathit{\mathsf{i}}}\le {\mathit{\mathsf{z}}}_{\mathit{\mathsf{max}}}\hfill \end{array}\right. $$

Where *z*
_{
i
} = the number of input genes in update function *f*
_{
i
}, *z*
_{
max
} = the maximum number of input genes allowed per update function. The default *z*
_{
max
} in BTR is 6, which means that each target gene is encouraged to have not more than 6 input genes.

#### Search strategy in BTR

A good search strategy is required in optimisation to locate the optimal solutions within a high dimensional and complex solution space. The search strategy in BTR is a form of swarming hill climbing strategy, in which multiple optimal solutions are kept at each search step and the search only ends when the score converges for all of the optimal solutions (Fig. 9). In BTR search algorithm, the search starts from an initial Boolean model, and iteratively explores the neighbourhood of the current Boolean model in the solution space by minimal modification. When no initial model is given to BTR, it will generate a random initial model whose degree distribution satisfies a power-law distribution with a degree exponent *γ* = 3.

The minimal modification of a Boolean model is performed by adding or removing a gene from a single update function in the Boolean model. The resulting modified model is then evaluated by the BSS scoring function. By repeating this procedure, BTR is able to explore the solution space and eventually arrives at a more optimal Boolean model. Due to the nature of Boolean models that multiple possible Boolean models can give rise to the exact same simulated state space, BTR usually retains a list of equally optimal Boolean models at the end of the search process. In such cases, a consensus model, whose edges are weighted according to the frequencies of their presence in the list of optimal Boolean models, will be generated. Due to the design of the search strategy, it is more geared towards a local search rather than a global search. Therefore in line with the results shown in Fig. 5, BTR is best used for iteratively improving a gene network with known biological knowledge using an independent set of single-cell expression data.

#### BTR data processing

BTR is capable of handling all types of expression data, including qPCR and RNA-Seq. Expression data should be processed and normalised before being used in BTR. In BTR, the expression data is further processed in order to facilitate score calculation by the BSS scoring function. Firstly, if the input data is qPCR expression data, it should be inversed such that the gene with a low expression level should have a low value and vice versa. Finally, the expression values for each gene in the data are scaled to continuous values with a range of 0 ≤ *x* ≤ 1.

### Calculation of F-score

F-score, which is the harmonic average of precision and recall, represents precision and recall concisely [35], is often used to assess the performance of network inference algorithms. Precision denotes the proportion of edges that are truly present among all edges classified as present, while recall denotes the proportion of edges that are truly present among all correctly classified edges (including both edges that are present and absent) [42]. The calculations were performed on directed adjacency matrix.

Precision is defined as:

$$ \mathit{\mathsf{p}}=\frac{\mathit{\mathsf{T}}\mathit{\mathsf{P}}}{\mathit{\mathsf{T}}\mathit{\mathsf{P}}+\mathit{\mathsf{F}}\mathit{\mathsf{P}}} $$

Where *TP* = true positive and *FP* = false positive.

Recall is defined as:

$$ \mathit{\mathsf{r}}=\frac{\mathit{\mathsf{T}}\mathit{\mathsf{P}}}{\mathit{\mathsf{T}}\mathit{\mathsf{P}}+\mathit{\mathsf{F}}\mathit{\mathsf{N}}} $$

Where *TP* = true positive and *FN* = false negative.

F-score is defined as:

$$ \mathit{\mathsf{F}}=\frac{\mathsf{2}\mathit{\mathsf{p}}\mathit{\mathsf{r}}}{\mathit{\mathsf{r}}+\mathit{\mathsf{p}}} $$

### Synthetic data

The synthetic data used for comparing scoring functions and network inference algorithms consist of true networks, expression data and lists of modified networks. The true networks and expression data were generated using GeneNetWeaver version 3.13 [28]. The true networks contain 10 genes each and were extracted from the gene network of yeast. Each true network generated by GeneNetWeaver was then categorised into acyclic and cyclic networks. A total of 5 acyclic and 5 cyclic true networks were used in this study. The expression data were generated using ordinary and stochastic differential equations based on the true networks. A single time series expression data with 1000 observations were generated per true network, and the expression data were simulated under steady state wild type condition. A coefficient of 0.05 was used for noise term in the stochastic differential equations. The synthetic expression data as generated by GeneNetWeaver is used as non zero-inflated data. In addition, the synthetic expression data is converted into a zero-inflated data to simulate drop-outs in single-cell expression data by calculating the probability of a reading being a drop-out (i.e. zero value) based on its expression level. The probability of a reading being a drop-out, *p*
_{
d
}, is modelled using the following equation:

$$ {\mathit{\mathsf{p}}}_{\mathit{\mathsf{d}}} = {\mathsf{2}}^{-\mathit{\mathsf{c}}\mathit{\mathsf{y}}} $$

Where *c* = a constant (in this study, *c* = 6), and *y* = a reading of the expression level of a particular gene,

The lists of modified networks were generated in R using the bnlearn package [43] for Bayesian networks and the BTR package for Boolean models. The modified networks were generated by modifying the number of edges that differ from the true network, ranging from 2 edges up to 40 differing edges. The modified Bayesian networks and the modified Boolean models were generated separately due to different underlying structural constraints imposed by each framework. In Bayesian framework all networks must be directed acyclic graphs, while Boolean models do not have such restrictions. In contrast, Boolean models require explicit specification of activation and inhibition edges, while Bayesian networks handle activation and inhibition implicitly without modifying the edges. Although the generation of modified Bayesian networks and Boolean models were done separately and therefore they are not identical, all modified networks contain the same number of differing edges (2 to 40 edges) with respect to the true network. Note that the differences in edges for acyclic modified networks are not cumulative, due to difficulties in generating a directed acyclic graph with cumulative edge differences. The differences in edges for cyclic modified networks are also not cumulative to maintain consistency with the acyclic modified networks.

For synthetic data, the initial state used for the simulation of Boolean models is the expression values at time *t* = 0.

### Haematopoietic data

Two Boolean models of haematopoiesis were used as initial models for model learning in this study, namely Krumsiek [39] and Bonzanni models [38]. The update functions of both models were converted into functions with an activation clause and an inhibition clause, in which each of the clauses are individually expressed in disjunctive normal form. Note that one of the nodes (EgrNab) in the Krumsiek model comprises of 3 different genes, Egr-1, Egr-2 and Nab-2. The initial states used in the simulation were obtained from both papers respectively.

A single-cell qPCR data and a single-cell RNA-Seq data, both obtained from Wilson et al. [10], were used for model learning. The single-cell qPCR data contain 44 genes from 1626 cells (992 HSCs, 178 LMPPs, 147 CMPs, 185 GMPs and 124 MEPs), while the single-cell RNA-Seq data are collected from 96 HSCs. The expression data are processed and normalised as described in the original paper.

For Bonzanni and Krumsiek models, the initial states used for the simulation Boolean models are obtained from each paper respectively.

### Network inference algorithms and analyses software used

BIC and its associated hill-climbing algorithm are implemented in bnlearn [43]. BestFit [29] is an algorithm for inferring Boolean models under synchronous framework implemented in BoolNet [44]. ARACNE [30] and CLR [31] are inference algorithms for inferring relevance networks based on mutual information. bc3net [32] and GeneNet [33] are inference algorithms based on Bayesian networks, while GENIE3 is a type of tree-based methods [34].

Plots in this study were generated using ggplot2 [45], except network plots that were generated using Cytoscape [46] and heat maps that were generated using gplots [47]. Steady state analysis was performed using genYsis [48], which search for steady states reachable from all possible initial states.