BINDER: computationally inferring a gene regulatory network for Mycobacterium abscessus

Background Although many of the genic features in Mycobacterium abscessus have been fully validated, a comprehensive understanding of the regulatory elements remains lacking. Moreover, there is little understanding of how the organism regulates its transcriptomic profile, enabling cells to survive in hostile environments. Here, to computationally infer the gene regulatory network for Mycobacterium abscessus we propose a novel statistical computational modelling approach: BayesIan gene regulatory Networks inferreD via gene coExpression and compaRative genomics (BINDER). In tandem with derived experimental coexpression data, the property of genomic conservation is exploited to probabilistically infer a gene regulatory network in Mycobacterium abscessus.Inference on regulatory interactions is conducted by combining ‘primary’ and ‘auxiliary’ data strata. The data forming the primary and auxiliary strata are derived from RNA-seq experiments and sequence information in the primary organism Mycobacterium abscessus as well as ChIP-seq data extracted from a related proxy organism Mycobacterium tuberculosis. The primary and auxiliary data are combined in a hierarchical Bayesian framework, informing the apposite bivariate likelihood function and prior distributions respectively. The inferred relationships provide insight to regulon groupings in Mycobacterium abscessus. Results We implement BINDER on data relating to a collection of 167,280 regulator-target pairs resulting in the identification of 54 regulator-target pairs, across 5 transcription factors, for which there is strong probability of regulatory interaction. Conclusions The inferred regulatory interactions provide insight to, and a valuable resource for further studies of, transcriptional control in Mycobacterium abscessus, and in the family of Mycobacteriaceae more generally. Further, the developed BINDER framework has broad applicability, useable in settings where computational inference of a gene regulatory network requires integration of data sources derived from both the primary organism of interest and from related proxy organisms.

and translational levels [4], a comprehensive understanding of regulatory elements is lacking. Without functional identification of the modes of regulation present, a complete understanding of how M. abscessus modulates its transcriptomic tendencies, enabling cells to survive and thrive in hostile environments such as in the presence of antibiotics or in the host sputum, remains out of reach.
Gene regulatory network (GRN) resources are typically split into two categories: generalist resources and specialist resources. The former category provides regulatory information (such as transcription factors, putative and confirmed target genes/operon structures, transcription factor binding sites (TFBS) motifs, upstream location coordinates) for a wide group of organisms. CollecTF [5] is one such resource that hosts a large collection of DNA binding sites for prokaryotic transcription factors. Although CollecTF comprises a small amount of regulatory information pertaining to mycobacteria, it currently does not contain any information on M. abscessus. Indeed most generalist resources tend not to comprise much content on regulatory information directly relevant to M. abscessus.
Specialist resources tend to provide regulatory information for a much narrower subgroup of organisms such as a single species or genus; RegulonDB [6] is one such resource which comprises information regarding transcriptional regulation in Escherichia coli. Most resources of both types provide curation based on techniques such as SELEX-based methods [7] as well as ChIP-seq [8]. Currently, for M. abscessus, there is no such existing specialist resource.
Many approaches have been designed for in silico inference of prokaryotic GRNs. Two popular strategies for regulon mapping include (1) the use of conservation data arising from comparative genomics analyses and (2) expression data in the form of transcriptional abundance comparison. The conservation approach relies on the observation that TFBSs are often conserved between related species. This implies that regulatory resources from a given organism can be leveraged to elucidate on transcriptional control in closely related organisms [9]. Further, if two organisms with a non-distant common ancestor share an orthologous gene that is understood to assist in achieving a certain biological process (such as transcriptional regulation) in one organism, it is likely to perform a similar role in the other organism [10]. Phylogenetic footprinting provides a conservationbased approach for determining conserved noncoding sequences and associated TFBSs; such methods typically involve quantifying the rate of occurrence of noncoding DNA sequences in the upstream regions of orthologs of genes of interest in related species [11,12].
Expression-based approaches tend to model the expression of a target gene candidate as a function of the expression or activation of a regulator gene. The GENIE3 [13] method frames the problem of deriving a regulatory network between p genes as p different regression tree-based ensemble models where the expression pattern of one gene is predicted by the expression pattern of all other genes in the collection. Other authors have noted the observed property that genes sharing a common network have a greater tendency to exhibit strong coexpression [14]. Weighted correlation network analysis (WGCNA) [15] is a software package that implements a suite of correlation-based methods for describing the coexpression patterns among genes across experimental samples designed with a view to uncovering gene networks of several varieties.
The literature on prokaryotic gene regulation is replete with ChIP-seq experiments detailing the specifics of transcriptomic control [16,17]. ChIP-seq provides a means of isolating target DNA sequences and transcription factor bound protein complexes stimulated in response to induced transcription factor production. This process facilitates the ascertaining of relationships between specific transcription factors and target binding site DNA sequences (including their downstream genic and intergenic units). Such data are not presently available for M. abscessus, due to its status as an emerging pathogen [3]. However, similar resources exist to varying degrees of completeness for closely related organisms, such as those in the family of Mycobacteriaceae [18,19]. Many efforts have focussed on the integration of ChIP-seq experimental data with RNA-based expression results to improve GRN inference [20].
In general, the concept of designing hybrid models that integrate existing regulatory information and expression abundance results has been the focus of much research. For example, iRafNet [21] implements a random forest approach to inferring GRNs while incorporating prior regulatory knowledge such that putative regulators used to build individual trees are sampled in accordance with the provided prior information. GRACE [22] integrates biological a priori data as well as heterogeneous data and makes use of Markov random fields to infer regulatory networks in eurkaryotic organisms. The RNEA [23] approach also combines prior knowledge from manual literature curation and experimental data with enrichment analysis to infer relevant subnetworks under experimental conditions. The multi-species cMonkey approach [24] includes gene expression data for multiple related organisms in addition to upstream sequence information and other network knowledge, iteratively building biclusters to detect putative co-regulated gene groupings.
Hierarchical Bayesian frameworks provide a natural choice for heterogenous data integration; Bayesian methods like COGRIM [25] and CRNET [26] have sought to exploit this quality. With a view to inferring GRNs, integrative Bayesian methods have focussed on directly modelling putative target gene expression data as a function of regulator activity in addition to binding strength and sequence information.
Herein, we introduce a novel statistical modelling approach to computationally inferring the GRN for M. abscessus: BayesIan gene regulatory Networks inferreD via gene coExpression and compaRative genomics (BINDER). BINDER is an integrative approach, hybridising coexpression data and comparative genomics profiles to infer prokaryotic regulons. BINDER requires two organisms: an organism of interest, here M. abscessus, and an annotated proxy organism, here Mycobacterium tuberculosis (M. tuberculosis). To computationally infer the GRN for M. abscessus we leverage existing resources: specifically we exploit several RNA-seq libraries elicited from M. abscessus generated across a range of experimental conditions, and the unique availability of a high-quality and comprehensively catalogued ChIP-seq-derived regulatory network in M. tuberculosis [27]. BINDER utilises a primary data stratum and an auxiliary data stratum. Here, the data forming the primary and auxiliary strata are derived from RNA-seq experiments and sequence information from M. abscessus as well as ChIP-seq data extracted from the related M. tuberculosis. BINDER is a Bayesian hierarchical model that appositely models the type and structure of both this primary and auxiliary data to infer the probability of a regulatory interaction between a regulator-target pair. The auxiliary data inform the prior distributions and the posterior distributions are updated by accounting for the primary coexpression data in a novel, apposite bivariate likelihood function. BINDER's Bayesian framework facilitates the borrowing of information across the genome yielding estimates of the probability of regulation between regulator and target candidate genes, as well as quantification of the inherent uncertainty in a probabilistically principled manner.
In what follows, we explore the performance of BINDER under a range of challenging simulated data settings, as well as in two case studies using Bacillus subtilis (B. subtilis) and Escherichia coli (E. coli) as the primary organisms of interest, for which regulatory interactions have been well-established. We present the regulatory interactions inferred on M. abscessus by BINDER, and explore in detail the putative inferred regulon corresponding to the transcriptional regulator zur. We also include an exploration of prior sensitivity concerns and some discussion. The "Methods" section describes the data utilised and details the architecture of the BINDER approach.
The results of this effort provide insight to, and a valuable resource for further studies of, transcriptional control in M. abscessus, and in the family of Mycobacteriaceae more generally. Further, the developed BINDER framework has broad applicability, useable in settings where computational inference of a GRN requires integration of data sources derived from both the primary organism of interest and from a related proxy organism. A software implementation for BINDER is provided by its associated R package, which is freely available from github.com/ptrcksn/BINDER.

Exploring M. abscessus and M. tuberculosis shared orthology
It has been established that there is high retention of gene regulation in prokaryotes between species [28]. Moreover, it has been demonstrated that gene function is also retained across wide phylogenetic distances in prokaryotes [29]. Given the availability of a large number of experimentally validated regulatory networks in M. tuberculosis [27], from the standpoint of inferring a GRN in M. abscessus using conservation phenomena, we quantifed the extent to which genes present in M. tuberculosis are conserved in M. abscessus. To do so, we employ the Ortholuge [64] procedure which facilitates bacterial and archaeal comparative genomic analysis and large-scale ortholog predictions. Through Ortholuge, we categorise orthologs as belonging to one of five tiers, ranging from more reliable to less reliable: supportingspecies-divergence (SSD), borderline supporting-speciesdivergence (borderline SSD), reciprocal best blast (RBB), similar non-supporting-species-divergence (similar non-SSD) and non-supporting-species-divergence (non-SSD). We found 1343 SSD putative orthologs, 116 borderline SSD putative orthologs, 845 genes that satisfied the RBB criteria but did not undergo any further analysis, 6 similar non-SSD putative orthologs and 85 non-SSD putative orthologs. In total, we found 2395 predicted orthologs of all qualities, equating to ≈ 48% of all annotated genes in M. abscessus.
In terms of regulatory interactions, for 34 orthologous regulators of interest and where possible, we performed a one-to-one mapping of all validated regulatory interactions in M. tuberculosis to their corresponding orthologs in M. abscessus. We found a mean regulon size in M. tuberculosis of 107.91 genes (sd: 128.78) (standard deviations in parentheses). Of these 34 regulons, the mean regulon proportion comprising orthologous interactions in M. abscessus is 0.61 (sd: 0.16) (Fig. 1). These results are suggestive of conserved regulatory interactions between M. tuberculosis and M. abscessus.

BINDER simulation study
In order to evaluate the performance of BINDER ("The BINDER model for inferring a GRN" section), we perform a simulation study across a number of settings. Our focus is on exploring the impact of BINDER's  hierarchical Bayesian model structure and on the influence of the inclusion of the auxiliary data when inferring a GRN. Specifically we focus on the parameter θ r,t representing the probability of an interaction in the (r, t)th regulator-target pair and consider two simplified versions of the BINDER model: • Deterministic model : each θ r,t is modelled deterministically as a linear function of the auxiliary data. Thus BINDER's prior on θ r,t is replaced by: logit(θ r,t ) = ζ r + τ ME r ME r,t + τ PE r PE r,t • Non-auxiliary model : no auxiliary data are used during inference on θ r,t , which are instead inferred based on the primary data only. In this case BINDER's prior on θ r,t is instead replaced by the prior logit(θ r,t ) ∼ U (−∞, ∞).
In addition, the impact on inference of noisy primary data and of large variability in the true underlying θ r,t parameters is also of interest. Since the primary data CP and CM are assumed to be N l (logit θ r,t ), ψ k r for k ∈ {CP, CM}, larger values of ψ k r reflect noisier primary data. Similarly, logit(θ r,t ) ∼ N γ r,t , φ r , with larger values of φ r reflecting larger variation in the underlying regulatory interaction probabilities. Hence, we compare the performance of BINDER, the deterministic model and the non-auxiliary model on 9 distinct dispersion parameterisations corresponding to the Cartesian product of ψ r = {ψ CM r , ψ CP r } = {low = 1, mid = 2, high = 3} and φ r = {low = 1, mid = 2, high = 3}.
For each of the nine dispersion settings, we simulate three data sets, each with N = 1, 000 regulatortarget pairs. To challenge the BINDER model, we consider weakly informative auxiliary data: ME and PE are generated from a Bernoulli distribution with success parameter 0.1. We compute γ r,t according to (1) where ζ r , τ ME r , τ PE r = (−3.5, 3.8, 2.9) and simulate logit(θ r,t ) ∼ N (γ r,t , φ r ). Finally, for the primary data, we simulate CM r,t ∼ N l (logit θ r,t ), ψ CP r and CP r,t ∼ N (logit(θ r,t ), ψ CM r ). Model performance across the 27 settings considered was assessed using the mean absolute deviation (MAD) [30] between each true simulated θ r,t and its resulting posterior mean estimate.
We observed competitive performance of the BINDER approach over both the deterministic and non-auxiliary approaches for the majority of settings considered in terms of lower MAD (Fig. 2). Specifically, the mean for the MAD statistics for the BINDER approach was 0.087 (sd: 0.034) as compared with 0.120 (sd: 0.050) and 0.120 (sd: 0.056) for the deterministic and non-auxiliary approaches respectively. The deterministic approach has a tendency to perform worse in instances where the dispersion around each θ r,t value is large (i.e. high values for φ r ). This is to be expected as the deterministic approach has insufficient flexibility to model θ r,t values that lie distant from their mean value resulting in higher MAD statistics. On the contrary, the deterministic approach does well in the setting of low φ r . In contrast, the nonauxiliary approach tends to be less sensitive to changes in the dispersion around the mean of the distribution of θ r,t .
However, given that the non-auxiliary approach only uses the primary data to infer θ r,t , when the level of dispersion around the mean of CP and CM is high (i.e. high values for ψ r ) the primary data contain a weaker signal leading to poor estimation of the true θ r,t and resulting in higher MAD statistics. As a compromise between the deterministic and non-auxiliary approaches, BINDER utilises the information contained in the auxiliary data whilst, simultaneously, providing the flexibility to accommodate observation-specific variation in the regulation interaction probabilities resulting in more accurate inference. BINDER outperforms the non-auxiliary model in all settings considered, and is only marginally outperformed in a minority of cases by the deterministic model in settings where φ r is mid or low.

Application of BINDER to Escherichia coli and Bacillus subtilis data
As a benchmarking exercise to assess the performance of BINDER on a bona fide regulatory interaction data set, we investigated BINDER's ability to infer interaction  Fig. 2 Simulation results illustrating the mean absolute deviation (MAD) between the true and estimated regulation interaction probabilities achieved by the deterministic, non-auxiliary and BINDER approaches across a range of dispersion parameter settings plausibility for the fur and lexA regulons in Escherichia coli [31] and Bacillus subtilis [32]. Where E. coli constitutes the organism of interest, Pseudomonas aeruginosa (P. aeruginosa) [33] constitutes the proxy organism and where B. subtilis is the organism of interest, Listeria monocytogenes (L. monocytogenes) [34] fulfils the role of the proxy organism. Considering two regulons across these well-researched settings allows for intra-regulon and inter-regulon analysis as well as intra-organism and inter-organism analysis. The ferric uptake regulator, or fur, is a transcriptional factor originally described as a repressive regulator of genes involved in iron import. Since then, aside from ironhomeostasis, fur has been shown to be associated with processes such as resistance to oxidative stress, pH homeostasis and quorum sensing as well as other cellular mechanisms [35]. In bacteria, the SOS response provides the means for responding to DNA damage; the expression of genes comprising the SOS regulatory network is under the control of lexA [36]. lexA is a global transcription factor that undergoes cleavage during stress permitting expression of DNA repair functions [37]. lexA also regulates genes that are not comprised within the SOS response program [36].
Here we avail of well-established regulator-target interactions as detailed by RegulonDB [6] for E. coli and well-established regulator-target interactions as per SubtiWiki [38] for B. subtilis. To build the primary data, we used E. coli expression data from COLOMBOS [39] and B. subtilis expression data from SubtiWiki [40]. For the auxiliary data, we use regulatory sequence motifs and orthologous target interactions from P. aeruginosa and L. monocytogenes curated by collecTF [5].
We consider the BINDER, deterministic and nonauxiliary approaches to infer the GRNs in Escherichia coli and in Bacillus subtilis from their primary and auxiliary data. Non-informative priors were employed with mean hyperparameters set to 0 and standard deviation hyperparameters set to 3, with the exception of the prior on φ r which was set to φ r ∼ N (0,∞) (1, 0.1) for regularisation purposes. Further, we also consider iRafNet [21] which employs an integrative prior-information-based approach to random forest inference of GRNs from expression data. For iRafNet, we applied the algorithm to each target candidate of interest individually using the fur and lexA regulator genes as predictors; further, in addition to the standardised expression matrix, for the iRafNet prior information matrix W, the element w ij , corresponding to the ith regulator and jth target candidate, was configured such that w ij = exp(1) if ME = 1 or PE = 1 and w ij = exp(0) for i = j.
In total, of the 4221 uniquely labelled genes present in RegulonDB with available expression data, 67 correspond to well-established regulatory interactions concerning fur and 23 correspond to well-established interactions concerning lexA in E. coli. For B. subtilis, of the 4162 uniquely labelled genes with available expression data, 58 correspond to well-established regulatory interactions with fur and 57 to well-established regulatory interactions with lexA.
For the fur regulon in E. coli, BINDER achieved an area under curve (AUC) of 0.880. Notably however, in contrast to BINDER, iRafNet omits data recorded under conditions for which expression levels for all genes are not available. Thus, in order to fairly compare performance with iRafNet, we applied BINDER to a reduced expression matrix comprising fewer conditions such that no missing data were present. BINDER achieved an AUC of 0.787 as compared with 0.710, 0.654 and 0.725 for the non-auxiliary, deterministic and iRafNet approaches respectively (Fig. 3, Table 1).
Interestingly, for BINDER applied to the reduced coexpression data, the mean posterior 50th percentile θ 50% fur,t ∀t ∈ T corresponding to validated regulatory interactions was only 0.0050 as compared with 0.0016 for the mean θ 50% fur,t corresponding to observations without evidenced regulatory interactions (Fig. 4). That this BINDER implementation achieved a corresponding AUC of 0.787 suggests that the distribution of θ 50% fur,t values is highly skewed to the right, and thus their relative magnitude is of importance when observing BINDER's output. Interestingly, we did not observe this effect when BINDER was applied to the complete expression data. Thus, we imposed a more informative prior φ fur ∼ N (0,∞) (10, 0.01) and applied BINDER again resulting in a mean θ 50% fur,t corresponding to validated regulatory interactions of 0.2427 as compared with 0.0183 for the mean θ 50% fur,t corresponding to observations without evidenced regulatory interactions (Fig. 4). However, with this informative prior the AUC dropped to 0.729. This is almost identical to the AUC for the non-auxiliary implementation which is intuitive because as φ fur increases, the auxiliary stratum provides diminishing influence (Fig. 3, Table 1).
For the lexA regulon in E. coli, BINDER achieves an AUC of 0.888. Once again, in order to compare performance with iRafNet, we re-applied BINDER to a reduced expression matrix comprising fewer conditions such that no missing data were present. For the reduced expression data BINDER achieved an AUC of 0.857 as compared with 0.768, 0.778 and 0.829 for the non-auxiliary, deterministic and iRafNet approaches respectively (Fig. 3, Table 1).
Performance was similar for the B. subtilis organism (Fig. 3 3 ROC analysis for θ 50% r,t posterior estimates for the BINDER, deterministic and non-auxiliary approaches and gene importance estimates for iRafNet for the r = fur and r = lexA regulons in E. coli and B. subtilis. BINDER (all) denotes results from analysis of BINDER applied to the complete coexpression data; BINDER relates to its application to the reduced data set the non-auxiliary, deterministic and iRafNet approaches respectively.
Not only does BINDER out perform all other considered approaches in terms of AUC, but, considering false positive rates in the neighbourhood of 0, BINDER tends to achieve higher true positive rates than any of the other approaches. This is particularly important because, owing to sparse regulatory connectivity across a given genome, regulon mapping is typically a minority class problem i.e. the vast majority of target candidates will constitute negatives for most regulators. This implies that a low false positive rate can still translate to a large number of false positives. The ability of BINDER to integrate and borrow information across primary and auxiliary data when inferring a GRN is demonstrated in Fig. 5 for the particular case of the lexA regulator in B. subtilis when there is no auxiliary evidence. Only the full BINDER implementation is capable of tempering estimates when there is disagreement between interaction status and auxiliary evidence; when there is an interaction but no auxiliary evidence BINDER is capable of exploiting the individual primary data values, CM and CP, to provide higher estimates to the regulator-target candidate; however, the deterministic approach lacks the flexibility to provide any high θ 50% lexA,t estimates in the absence of auxiliary evidence. Similarly, owing to the lack of auxiliary evidence, BINDER is capable of tempering its estimates for θ 50% lexA,t when there is no interaction and no auxiliary evidence; in contrast, the non-auxiliary approach results in high θ 50% lexA,t estimates for all observations with high primary data values CM and CP. BINDER's hierarchical modelling structure and ability to borrow local and global information from both the primary and auxiliary data sources results in more realistic estimates: higher θ 50% lexA,t estimates for putative interactions and lower θ 50% lexA,t estimates for putative non-interactions in general. Synoptically, BINDER's ability to integrate the information on whether a given regulator-target pair has an affinity for the predicted motif and/or an orthologous regulatory interaction in the proxy organism with the information provided in the primary data stratum provides greater flexibility.

Application of BINDER to M. abscessus data
With a view to producing a model of regulation in M. abscessus, we leveraged data from across 34 orthologous ChIP-seq validated interactions in M. tuberculosis and from 32 RNA-seq libraries from across 16 distinct experimental conditions in M. abscessus. We considered R = 34 orthologous regulators in M. tuberculosis, and T = 4920 target candidates in the M. abscessus genome, yielding N = 167, 280 regulator-target pairs. For computational efficiency, given the likelihood function can be factored by regulator, we run BINDER on the R = 34 orthologous regulators' data in parallel. To computationally infer the gene regulatory network for M. abscessus the posterior distribution p(θ r,t | . . .) is of key interest, for r ∈ R and t ∈ T with . . . denoting all auxiliary and primary data and other model parameters.

Prior sensitivity analysis
In order to assess the sensitivity of inference to the prior distribution specifications, we constructed three different For the lexA regulon in B. subtilis and for targets where the auxiliary data ME = 0 and PE = 0, estimates of θ 50% lexA,t for the BINDER, deterministic and non-auxiliary approaches, factored by known interaction status. The primary data values are CM and CP; points are jittered slightly for visibility prior parameterisation settings and compared the resulting inferences. The three settings considered were labelled as 'non-informative' , 'informative' and 'precise' ( Table 2). In particular, the informative settings reflect a priori beliefs that: (1) the auxiliary data PE and ME would encode a reliable positive indication as to whether a given regulatory interaction exists and (2) a negative intercept would be required to correctly model interaction plausibility. The precise setting reflects more extreme versions of the informative setting (in terms of smaller auxiliary data scale hyperparameters).
Inference was relatively insensitive to prior specification in terms of MAD scores for θ 50% r,t (uninformative versus informative: 0.0040, sd: 0.0094; uninformative versus precise: 0.0183, sd: 0.0466; informative versus precise: 0.0168, sd: 0.0437, Fig. 6). Using a classification criterion such that regulator-target pairs with a posterior 50th percentile θ 50% r,t > 0.9 are classified as positive regulation cases, comparing uninformative to informative positive regulation cases yielded an adjusted Rand index [41] of 0.9247, versus 0.5203 and 0.5553 for uninformative versus precise and informative versus precise respectively (an adjusted Rand index of 1 indicates perfect agreement). Thus, for the remainder of this work, with a view to allowing the data to determine the parameter estimates without imposing strong beliefs, we focus on the uninformative parameterisation.

Inferred regulatory interactions in M. abscessus
Of the N = 167, 280 regulator-target pairs considered in M. abscessus, under the uninformative parameterisation, BINDER identified 54 pairs across 5 transcription factors with a posterior 50th percentile θ 50% r,t > 0.9 (Table 3). Of these 54 interactions, 24 are known to have validated orthologous regulatory interactions in M. tuberculosis as per ChIP-seq data (Fig. 7); the number of interaction pairs almost doubles by reducing the threshold by 0.1 (102 pairs with 31 known orthologous interactions satisfying θ 50% r,t > 0.8 ). In comparison, under the informative parameterisation, a similar effect was observed with 54 pairs with 21 known orthologous interactions satisfying θ 50% r,t > 0.9. A more conservative effect was observed for the precise settings: 33 pairs across 28 transcription factors with a posterior 50th percentile θ 50% r,t > 0.9. As expected, for all parameterisations, the vast majority of posterior distributions of θ were centred at low values, suggesting low levels of regulatory connectivity across the M. abscessus interactome; the mean 50th percentile for all of θ was 0.085 (sd: 0.106) for the uninformative parameterisation and 0.087 (sd: 0.105) and 0.0885 (sd: 0.0995) for the informative and precise parameterisations respectively. It should be noted that in the benchmarking exercise ("Application of BINDER to Escherichia coli and Bacillus subtilis data" section) we observed  that the nominal value of a regulator-target pair's θ 50% is not always as informative as its relative magnitude to {θ r,1 , . . . , θ r,N }. In general, whilst there were many instances of plausible conserved interactions, the results suggest evidence for many non-conserved interactions that may be unique to M. abscessus. Further, it can be observed that for a given regulator, many of the regulated genes appear to be spatially clustered along the genome (Fig. 7). This observation lends support to the concept of gene colocalization arising as a means to affect efficient transcription [42,43]. The parameter ζ r in the auxiliary component influences the inferred probability of regulator-target interaction before any further regulator-target pair information is taken into account, with larger values of ζ r meaning higher interaction probabilities. In this sense, each ζ r is related to the ubiquity of regulation by regulator r across the genome. Under the uninformative parameterisation, we observed an average posterior mean of -6.63 across all regulator models (sd: 4.07). Hence, intuitively, conditional on the auxiliary data ME and PE being zero, the probability of a regulatory interaction is low.
The parameter τ ME r captures the influence the auxiliary ME data has on the prior mean of the inferred probability of a regulatory interaction between regulator r and target t, given all other covariates. Across all regulators, under the uninformative parameterisation, we observed an average posterior mean for τ ME r of 1.43 (sd: 0.9982) (Fig. 8). The parameter τ PE r has a similar interpretation for the auxiliary data PE. Across all regulators, under the uninformative parameterisation, we observed an average posterior mean for τ PE r of 1.95 (sd: 1.8981) (Fig. 8). These results suggest that, on average, both ME and PE are positively correlated with the primary data in the likelihood. Given the phenomenon of genomic conservation, this is as we would expect and lends credence to the BINDER approach. Furthermore, although the mean posterior means for τ ME r and τ PE r are quite similar, the latter has larger variation suggesting higher volatility in the influence of PE than in the influence of ME.
In terms of scale parameters, under the uninformative parameterisation, φ tended to have the lowest posterior mean values (average posterior mean of 1.12 with standard deviation 1.0067) (Fig. 9). Both ψ CM r and ψ CP r yielded larger posterior mean estimates. In particular, under the uninformative parameterisation, ψ CM r yielded an average posterior mean of 4.23 (sd: 1.7713) and ψ CP r yielded an average posterior mean of 3.63 (sd: 1.4499), suggesting that the primary CM data tend to lie further from logit(θ r,t ) than CP (Fig. 9). Also, the larger average posterior mean associated with ψ CM r compared with that of ψ CP r is intuitive, given the extra uncertainty associated with motif inference (comprised within CM) compared with validated orthologous interactions comprised within CP.

Interpretation of results: composition of the zur regulon
As an example of a putative discovery facilitated by BINDER, we examine the inferred regulon corresponding to the transcriptional regulator zur (MAB_1678c). The zur regulator present in M. tuberculosis and M. abscessus is a zinc-responsive transcription factor. Zinc is an essential element for life in many organisms [44]. In addition to its role as a structural scaffold for many proteins, it fulfils a critical function as a frequent enzyme and DNA-binding protein cofactor [45]. However, zinc can be toxic at high concentrations [46]. For prokaryotes, efficient zinc acquisition, concentration and tolerance are critical processes for survival and pathogenicity [47]. Zinc homeostasis in prokaryotes is achieved via cellular import and export, zinc binding, and zinc-sensing [47]. Cellular zinc levels are maintained by importer and exporter proteins which are then regulated at the transcriptional level by several zinc-responsive transcription factors [48], including the zur regulator.
As per ChIP-seq results, the original regulon pertaining to zur in M. tuberculosis (Rv2359/furB) comprised 26 target genes (12 directly regulated targets); under the uninformative parameterisation, of these targets, 14 (53.8%) contained orthologs in M. abscessus. Using the cutoff criterion θ 50% zur,t > 0.9, BINDER suggested 15 target candidate genes in M. abscessus be considered valid targets of zur, 8 of which correspond to evidenced interactions in M. tuberculosis. Gene ontological analysis carried out on the putative targets provided intuitive insight, revealing up-regulated biological processes (p ≤ 0.05) corresponding to metal ion transport.
BINDER also identified a number of interesting non-conserved putative targets for zur. For example, MAB_1046c, is annotated as a cobalamin synthesis protein. This is interesting as MAB_0335, one of the identified conserved targets, is also annotated as a cobalamin synthesis protein. This is perhaps owing to the role of cobalamin as a cofactor for cobalamin dependent methionine synthase in prokaryotes. Cobalamin dependent methionine synthase is involved in zinc ion binding [49]. Further, MAB_2698c and its immediately adjacent neighbour MAB_2699c also yield high θ 50% zur,t posterior estimates; gene ontology suggests that MAB_2699c, another unconserved putative target, is involved in pseudouridine synthesis/pseudouridine synthase activity; pseudouridine synthases catalyse the isomerisation of uridine to pseudouridine in RNA molecules and are thought to act as RNA chaperones. Intriguingly, pseudouridine synthase I (TruA) [50], one of the four distinct families of pseudouridine synthases, contains one atom of zinc essential for its native conformation and tRNA recognition [51]. Another unconserved target is the PPE-like gene MAB_0809c; PPE genes are widely considered to play a key role in pathogenesis. Interestingly, phagosomes containing PPE genes found to disrupt lysosome-phagosome fusion have been shown to display differences in zinc levels relative to corresponding phagosomes containing PPE-knockout mutants [52]. Another highly-probable unconserved interaction, MAB_1680, is annotated as a putative transmembrane protein. Given its association with zur, MAB_1680 is perhaps involved with zinc uptake in M. abscessus.

Discussion
In this work we have inferred the GRN in M. abscessus using the BINDER approach, the primary purpose of which is to infer the probability of pairwise interactions in a collection of regulator-target pairs. BINDER exploits experimental coexpression data in tandem with the property of genomic conservation to probabilistically infer a GRN in M. abscessus. To infer a GRN, BINDER proceeds by binding information from data in primary and auxiliary strata.
BINDER facilitates information sharing horizontally (by sharing parameters in the same layer of the model hierarchy) and vertically (by sharing of parameters in distinct strata of the hierarchy). The likelihood function assumes independence of the assumed logit-normal distributed primary data variables, conditional on the shared parameter of interest θ r,t , representing the probability of an interaction in the (r, t) th regulator-target pair. Further, the mean of this interaction probability's logit-normal distribution is informed by a linear function of the auxiliary data, serving as a proxy for genomic conservation information. Thus inference is strengthened through the borrowing of information across variables and strata.
With the exception of PE, the construction of all variables considered (i.e. ME, CM and CP) involves the choice of thresholds and/or decisions. For example, from the outset we have formed a TFBS-based module binary membership structure and an orthologous target binary membership structure, recorded in the auxiliary binary variables ME and PE respectively, on which the primary variables CM and CP rely. However, in order to circumvent potential loss of information associated with such hard membership, a "soft" approach using scale free topology or clustering coefficients may be worth exploring. Under these scenarios, the idea of membership has a continuous representation [15]. Further, the auxiliary variable ME is derived from thresholding a p-value and as such is sensitive to the cutoff point selected. The BINDER approach also implements a further two threshold points δ CM and δ CP ; clearly it is of paramount importance to choose these thresholds in an informed and careful manner. We have employed a hypergeometric framework for CM and CP, but any mapping to [0, 1] is possible. Again, topological overlap mapping or clustering coefficent mapping [15] are alternative approaches. With a view to foregoing the need to choose a threshold at all, simply mapping a regulator-target pair to the mean of its coexpression with members of the ME and PE modules is possible because the mean of a group of unsigned coexpressions will also lie in [ 0, 1]; validation studies suggests that this approach, although convenient, does not perform quite as well as the hypergeometric framework. It should be noted that, for our purposes, we had a relatively small-scale expression compendium with which to form our coexpression networks. Both the volume and diversity of RNA-seq conditions used to construct the coexpression networks may not be fully sufficient to computationally infer the entire GRN in M. abscessus. Small coexpression data sets are more likely to comprise noisy correlation results and similar experimental conditions have the effect of duplicating expression information leading to low numbers in terms of effective sample sizes. Similarly, for some regulators, we observed a lack of specificity in binding sites (owing to very long binding regions and small numbers of binding interactions); this has the effect of negatively impacting motif inference (i.e. false discovery of erroneous motifs). Naturally, more reliable data are preferable, however where data are less reliable, it is possible to account for this uncertainty through specification of the hyperparameters in the priors on the variable-specific parameters. Regardless, as the signal deteriorates (e.g. erroneous consensus motifs, inaccurate binding interactions), inference will suffer and thus it is important to ensure that all data sources are as accurate as possible. For the above reasons, it may be worthwhile to examine the more conservative BINDER parameterisations (i.e. the precise parameterisations) detailed above. This parameterisation implements a less diffuse prior distribution such that candidates lacking auxiliary support are less likely to achieve high θ r,t estimates.
Through the course of this analysis, with a view to focusing on inferred highly probable regulator-target interactions, we have examined pairs for which the posterior median θ 50% r,t > 0.9. However, the intention behind this model is not to define interaction probability on the basis of a single point estimate, but rather to provide a posterior distribution of θ r,t . This allows for a more nuanced analysis on interaction probability estimates than is typically provided by a simple binary classifier. Instead, we recommend that estimates are received in the context of the scientific question posed; varying the the number and severity of thresholds and tolerances will allow for differing results. Similarly, as noted in the fur regulon inference for E. coli explored in the benchmarking results, under certain scenarios BINDER estimates low values for all interaction candidates (both positive and negative cases); this is either due to influential hyperparameter settings and/or poor agreement between the auxiliary and primary data. However, even under these scenarios, BINDER can still estimate higher estimates for positive interaction cases. In such cases, as is good statistical practise, prior sensitivity analyses should be conducted or it may be worthwhile to consider regulator results individually.
One obvious limitation of any model that exploits conservation phenomena to perform inference in scarcely annotated organisms is that such a model can only make inference based on existing conservation data; indeed BINDER cannot infer interaction that may exist in M. abscessus on regulators not considered here. There are modelling approaches for "de novo" network inference that are based exclusively on coexpression analysis or other non-conservation based predictors, but such approaches can contain many false positives [53]. Instead BINDER aims to overcome such issues by allowing coexpression-based data have partial influence on model inference. Moreover, while BINDER requires a consensus sequence motif and a collection of orthologous regulatortarget interactions to perform inference, it is possible to run BINDER with a consensus sequence motif or a collection of orthologous interactions only. In this case, BINDER comprises one variable in the auxiliary stratum and one variable in the primary stratum.
One mechanism used by cells to refine and maintain transcription factor levels is autoregulation. It has been argued that the occurrence of autoregulation positively correlates with the developmental or physiological importance of the transcription factor [54]. Given that any gene will have a perfect coexpression with itself, most expression-based approaches (such as GENIE3 and iRafNet) to GRN inference are unable to detect transcription factor autoregulation. For a given regulator, BINDER uses the coexpression profiles of a target gene with genes under the control of the regulator to inform the probability of a regulator-target interaction. BINDER does not examine the coexpression of the target candidate with regulator directly. As a result, BINDER is able to detect autoregulation.
For each regulator considered here, we applied the BINDER approach to all 4920 annotated protein-coding genes in M. abscessus. However, in theory, BINDER could be applied to any desired subset of genes. With a view to accurately describing whole-population behaviour we recommend including all available data, albeit acknowledging the associated additional computational cost.
Pearson's correlation was employed here as a measure of coexpression. Although there are other options, with a view to remaining conservative and reducing false positives, Pearson's correlation gives high values when expression values are strongly linearly related. Common alternatives include the more flexible Spearman's method, but often with increased flexibility comes an increase in less biologically significant relationships. Although use of Pearson's correlation can come at the cost of increased false negatives, studies have suggested that many coexpression relationships are linear and monotonic so this issue may be overstated [55].
Recent studies have suggested that implementing an ensemble approach to motif identification can improve detection results [56]. BINDER could be extended to augment the number of motif search tools used in the analysis. Similarly, another suggestion might be to augment the number of proxy organisms from a single proxy organism to k proxy organisms, similar in vein to [24]. A spikeand-slab prior distribution [57] for the associated model parameters would provide insight on the information contained in the individual proxy organisms. Furthermore, it is possible to extend the dimensionality of the primary stratum. In general, data that are binary or lie in [0, 1] can be appended to the primary stratum: for example, the direct coexpression between a given regulator-target pair could be used to form a trivariate primary stratum. Although we have used exclusively binary variables in the auxiliary stratum, there is no restriction on the form of auxiliary data that can be modelled by BINDER.
It may be worthwhile to investigate the effect of incorporating more sophisticated levels of dependency in the BINDER model. Such dependencies could be based on operon comembership, on regulator family membership (e.g. the whiB-like family [58]), on target reoccurrence or on gene function using GO [59] or COG [60], for example. Here, we only consider the gene immediately downstream of a confirmed or putative TFBS to be under the regulation of the associated regulator. Recent studies suggest that operon organisation is dynamic and, hence, operon structures are capable of changing across conditions [61]. However, given that BINDER considers not only the existence of a precedent interaction and/or motif match for a given candidate, but also the coexpression of that candidate with other candidates that do comprise a precedent interaction and/or motif match, BINDER is capable of detecting adjacent gene coregulation. Members of operon structures that are cotranscribed across all conditions considered will exhibit greater coexpression than those that are only cotranscribed under a fraction of conditions considered; as a result, BINDER is able to reflect that behaviour through the θ r,t posteriors. Furthermore, it is possible to construct prior distribution parameterisations such that BINDER will tend to estimate higher θ r,t median values for genes in cotranscribed structures if they comprise a precedent interaction and/or motif match; this may facilitate the determination of gene importance in cotranscribed structures. Owing to the lack of assumptions made by BINDER with respect to transcription start sites and operon co-membership, we expect that the results generated by BINDER will sufficiently aid in the generation of dynamic regulatory networks, as well as the understanding of transcriptional unit plasticity.

Conclusions
We have sought to determine the evidence for gene regulation in M. abscessus using a range of expression data from M. abscessus and experimentally validated regulatory network data from M. tuberculosis. We have demonstrated the extent to which there is a correlation between gene regulation in M. tuberculosis and transcriptome coexpression in M. abscessus. Our results imply not only strong genic conservation between M. abscessus and M. tuberculosis but also evidence of conservation with respect to the modes of transcriptomic control between these two organisms.
We have implemented a Bayesian modelling approach to quantifying the probability of an interaction across a collection of 167,280 regulatory-target pairs. Of these, 54 regulator-target pairs, across 5 transcription factors, were inferred to have a posterior 50th percentile for θ r,t > 0.9 in M. abscessus.
The interactions identified in this study will form a valuable resource for further studies of transcriptional control in M. abscessus and in the family of Mycobacteriaceae more generally. Further, the BINDER framework is applicable across a wider range of organisms for which similar data are available.

Data
Given the paucity of data available from the primary organism M. abscessus (MAB), BINDER integrates data from a proxy organism M. tuberculosis (MTB) into the inferential procedure. Specifically, we leverage data from across orthologous ChIP-seq validated interactions in M. tuberculosis as proxy data and extract the primary data from 32 RNA-seq libraries across 16 distinct experimental conditions in M. abscessus. Thus we consider the set of all possible regulator-target interaction candidate pairs, arising from the set R = 34 orthologous regulators in M. tuberculosis, and T = 4920 target genes in the M. abscessus genome yielding N = 167, 280 regulator-target pairs of interest.

Auxiliary data: motif evidence (ME) and precedent evidence (PE)
Motif Evidence: With respect to a given regulator r, the TFBS status of a target t is encoded through a binary variable termed motif evidence (ME). Specifically, for a regulator-target pair, ME takes the value 1 if the corresponding target contains a putative TFBS for the regulator's motif in its upstream region and a value of 0 otherwise. Here, the binding motif is assumed to be identical to the binding motif in the proxy organism.
With a view to determining regulator motifs, we extracted binding sequences using the NCBI M. tuberculosis (Accession: AL123456) complete chromosome sequence and annotation, S MTB . The evidenced binding region coordinates were provided by ChIP-seq data sets ranging across several induced transcription factor experiments in M. tuberculosis. We subsequently categorised these binding sequences by regulator with a view to discovering binding sequence consensus motifs. The MEME motif discovery tool [62] was used to infer a single consensus binding motif M r for each regulator r ∈ R: in particular, using a DNA alphabet, we searched on both strands seeking zero or one occurrence per binding sequence of a single consensus motif between 10 and 30 nucleotides long.
To find putative TFBSs for the derived motifs in the M. abscessus genome, we defined a sequence region U t corresponding to the region -300nt to +50nt of the start of each target of interest t ∈ T. This interval size was chosen in light of the distribution of intergenic region lengths in the M. abscessus genome. In order to find putative TFBSs for each M r , we searched in each U t using the complete chromosome sequence and annotation S MAB provided by NCBI for M. abscessus (Accession: NC010397). In the scenario that the most upstream coordinate of an immediately adjacent upstream gene was annotated to occur within 300nt of an upstream region of interest, the upstream region of interest was truncated to the most upstream coordinate of the upstream gene. To perform this search, we used the FIMO tool [63] to find the highscoring upstream sequences with a q-value ≤ = 0.1. We provided a background file encoding 0-order nucleobase probabilities based on all upstream sequences of interest.
In summary, for each regulator-target pair (r, t) for r = 1, . . . , R and t = 1, . . . , T the motif evidence ME r,t is computed where: For a given regulator r, we refer to the set of all genes where ME r,t = 1 as the 'ME r module' . Precedent Evidence: The presence of an annotated orthologous regulator-target interaction in the proxy organism is encoded in the binary variable termed precedent evidence (PE). For a regulator-target pair, PE takes the value of 1 if such an orthologous interaction exists and takes the value of 0 otherwise. Specifically, given both the proxy genome G MTB and the primary genome of interest G MAB , Ortholuge [64] derived one-to-one orthologs were used to map orthologous regulator-target interactions from G MTB to G MAB . ChIP-seq data sets drawn from 34 induced transcription factor experiments in G MTB were scanned for orthologous regulator-target interactions with respect to G MAB ; orthologous regulator-target pairs were subsequently grouped by regulator to derive a rudimentary orthology of regulons in G MAB .
Thus, given the rudimentary orthology, for a given regulator r and target t: As in the ME case, for a given regulator r, we refer to the set of all genes where PE r,t = 1 as the 'PE r module' .

Primary data: coexpression of motif and precedent evidence
Coexpression of Motif Evidence: Exploiting the property that genes sharing a common regulator exhibit strong coexpression [14], we computed a measure termed coexpression of motif evidence (CM). For a given regulator, using the motif derived from the proxy organism, CM quantifies the extent to which a target gene coexpresses with genes that have strong affinity for the putative regulator motif in the primary organism. Specifically, for a regulator binding sequence motif M r inferred from G MTB , we define CM r,t for a given gene regulator-target pair (r, t) in G MAB . We define the reduced primary genome G MAB, where O t is a t-inclusive set of genes in G MAB that should not be used in the calculation of CM r,t . This set will naturally include t, but can contain any other genes that are not desired for calculation of CM r,t . The variable CM r,t lies in [0, 1], where values closer to 1 represent stronger correlation between expression levels of the target t with genes in G MAB,−O t producing strong matches to the inferred sequence motif M r . Specifically, for a regulator-target pair A Benjamini and Hochberg adjustment [65] is applied to these probabilities to relax the observed polarisation of probabilities around 0 and 1; for a given regulator r, the adjustment is relative to all targets t ∈ T. We expect genes under the control of regulator r to coexpress strongly with members of the ME r module. For our purposes, we vary the threshold such that each δ CM is specific to each target. For a given target t, assuming CX i,j represents the coexpression between genes i and j, we choose δ CM to be equal to the 95th percentile of all values in the set {CX t,g for g ∈ G MAB,−O t }.
Coexpression of Precedent Evidence: Analogous to CM, we develop a score of coexpression of precedent evidence, CP. For a given regulator, CP quantifies the extent to which a target gene coexpresses with orthologs of genes comprising regulator-target interactions in the proxy organism.
Specifically, for regulator r, we define the regulon P r as the collection of orthologous interactions annotated in G MTB . For a given gene regulator-target pair (r, t) in G MAB the variable CP r,t is defined on the interval [0, 1], where values closer to 1 represent stronger expression correlation of gene t with orthologs of genes from P r in G MAB,−O t . That is, Again, the probabilities are subject to Benjamini and Hochberg adjustment relative to all target candidates t ∈ T. We expect genes under the control of regulator r to coexpress strongly with members of the PE r module. Thus again we choose δ CP to be equal to the 95th percentile of all values in the set {CX t,g for g ∈ G MAB,−O t }.
With a view to quantifying coexpression in G MAB , the expression profiles (using RPKM [66]) of all genes constituting the NCBI GenBank annotation for the G MAB genome were computed across 32 RNA-seq libraries (comprising 16 distinct experimental conditions) elicited from a range of astringent response and control experiments. In order to compute the corresponding coexpression profiles, we generated the unsigned Pearson correlation coefficient of all possible pairwise annotated gene-pair combinations. All read files were aligned using Bowtie (version 1.2.2) [67] and totalled using Samtools (version 1.7) [68]. RNA-seq libraries can be found on NCBI's Gene Expression Omnibus (Accession: GSE78787).

The BINDER model for inferring a GRN
Borrowing strength across the primary and auxiliary data sets, we computationally infer the GRN for M. abscessus through a novel statistical modelling approach: BayesIan gene regulatory Networks inferreD via gene coExpression and compaRative genomics (BINDER). BINDER is a Bayesian hierarchical model that appositely models the type and structure of both the primary and auxiliary data to infer the probability of a regulatory interaction between a regulator-target pair candidate. Each of N = |R| × |T| observations is a regulator and target candidate pair (r, t) from the set of regulators R and the set of target candidates T in the M. abscessus genome. Interest lies in the probability θ r,t of there being an interaction between regulator r and target t. Thus, inferring θ r,t facilitates inference of the M. abscessus GRN.
As stated, BINDER integrates primary data from M. abscessus with data from the proxy organism M. tuberculosis. Specifically, the variables CM and CP ("Primary data: coexpression of motif and precedent evidence" section) constitute the primary data stratum whilst ME and PE ("Auxiliary data: motif evidence (ME) and precedent evi-dence (PE)" section) constitute the auxiliary stratum. As BINDER is a Bayesian hierarchical model, the Fig. 10 Graphical representation of the hierarchical BINDER model; squares correspond to observed data, large discs correspond to random parameters and small discs correspond to fixed hyperparameters; the surrounding boxes denote observation-specific parameters and data auxiliary data inform the prior distribution for each θ r,t ; the posterior distribution for each θ r,t is then updated by accounting for the primary data.
To define the likelihood function of the BINDER model we appositely model the primary data type and assume logit-normal distributions for CM and CP. As such, in the case where CM r,t or CP r,t were 0 or 1, they were increased or decreased respectively by a small factor (10 −4 ). Further we assume, given θ r,t , the regulator-target pairs and primary variables are conditionally independent: L(θ , ψ CM , ψ CP |CM, CP) = r∈R t∈T N l {CM r,t |logit(θ r,t ), ψ CM r }N l {CP r,t |logit(θ r,t ), ψ CP r } Here N l (x|a, b) denotes the logit-normal distribution of x with location and standard deviation parameters a and b respectively. The location parameter is common across the distributions for CM and CP. This shared parameter enables the borrowing of information across variables, in addition to facilitating tractability through the conditional independence assumption. The conditional independence assumption is widely employed in other settings, such as latent class analysis [69,70].
As with any Bayesian hierarchical model, prior distributions are specified on the BINDER model parameters. For each θ r,t we posit a logistic normal prior such that logit(θ r,t ) ∼ N (γ r,t , φ) where φ is the standard deviation parameter controlling the level of dispersion around the mean. The mean γ r,t is informed by the auxiliary data ME and PE on the regulator-target pair (r, t) through a linear model. Specifically: γ r,t = ζ r + τ ME r ME r,t + τ PE r PE r,t Independent priors are then posited on the parameters in (1) such that the intercept ζ r ∼ N (μ ζ , σ ζ ) and a truncated normal prior is assumed on the slope parameters: τ k r ∼ N (0,∞) (μ τ k , σ τ k ) for k ∈ {ME, PE}. This truncated normal prior with mass on the positive real line reflects the assumption that the presence of regulation in regulator-target pair (r, t) in the proxy organism is suggestive of the presence of such regulation in M. abscessus. To complete the model setup, prior distributions are placed on the scale parameters such that ψ l r ∼ N (0,∞) (μ ψ l , σ ψ l ) for l ∈ {CP, CM}. The hyperparameters of all the specified prior distributions must be set by the practitioner and their values are potentially influential; sensitivity of inference to their choice is explored in "Prior sensitivity analysis" section.
In order to infer the GRN for M. abscessus, the set of parameters {θ r,t : r ∈ R, t ∈ T} are of primary interest.
Thus the required posterior distribution is p(θ|CM, CP, ME, PE, μ, σ ) = τ . . . ψ p(θ , ψ, φ, τ , ζ |CM, CP, ME, PE, μ, σ )dψdφdζ dτ This posterior distribution is explored using Stan [71], a state-of-the-art platform for statistical modelling and computation for large data sets that employs Hamiltonian Monte Carlo methods [72] to draw samples from the posterior distribution of interest. An illustration of the BINDER model is provided in Fig. 10.