Generation of digital patients for the simulation of tuberculosis with UISS-TB

Background The STriTuVaD project, funded by Horizon 2020, aims to test through a Phase IIb clinical trial one of the most advanced therapeutic vaccines against tuberculosis. As part of this initiative, we have developed a strategy for generating in silico patients consistent with target population characteristics, which can then be used in combination with in vivo data on an augmented clinical trial. Results One of the most challenging tasks for using virtual patients is developing a methodology to reproduce biological diversity of the target population, ie, providing an appropriate strategy for generating libraries of digital patients. This has been achieved through the creation of the initial immune system repertoire in a stochastic way, and through the identification of a vector of features that combines both biological and pathophysiological parameters that personalise the digital patient to reproduce the physiology and the pathophysiology of the subject. Conclusions We propose a sequential approach to sampling from the joint features population distribution in order to create a cohort of virtual patients with some specific characteristics, resembling the recruitment process for the target clinical trial, which then can be used for augmenting the information from the physical the trial to help reduce its size and duration.


Introduction
Tuberculosis (TB) represents one of the worlds deadliest diseases: one third of the worlds population, mostly in developing countries, is infected with TB.But TB is becoming again very dangerous also for developed countries, due to the increased mobility of the world population, and the appearance of several new bacterial strains that are multidrug resistant (MDR).There is now a growing awareness that TB can be effectively fought only working globally, starting from countries like India, where the infection is endemic.
Once a person presents the active disease the most critical issue is the current duration of the therapy, because of the high costs it involves, the increased chances of noncompliance (which increase the probability of developing an MDR strain), and the time the patient is still infectious to others.One exciting possibility to shorten the duration of the therapy is represented by new host-reaction therapies (HRT) as a coadjuvant of the antibiotic therapy.The endpoints in the clinical trials for HRTs are time to sputum culture conversion, and incidence of recurrence.While for the first it is in some cases possible to have a statistically powered evidence for efficacy in a phase II clinical trial, recurrence almost always requires a phase III clinical trial with thousands of patients involved, and huge costs.
In the STriTuVaD multidisciplinary consortium we are going to test, through a phase IIb clinical trial, two of the most advanced therapeutic vaccines against drug sensistive tuberculosis (DS-TB) and multi-drug resistant tuberculosis (MDR-TB) i.e., RUTI vaccine, provided by Archivel Farma S.L (Spain) and ID93+GLA-SE vaccine, provided by Infectious Disease Research Institute (US).
In parallel we extend the Universal Immune System Simulator to include all relevant determinants of such clinical trial, establish its predictive accuracy against the individual patients recruited in the trial, use it to generate digital patients and predict their response to the HRT being tested, and combine them to the observations made on physical patients using a new in silico-augmented clinical trial approach that uses a Bayesian adaptive design.This approach, where found effective could drastically reduce the cost of innovation in this critical sector of public healthcare.
To reproduce biological diversity of the subjects that have to be simulated, an appropriate strategy for the generation of libraries of digital patients has been developed.This has been achieved through the identification of a vector of features that combines both biological and pathophysiological parameters that personalize the digital patient.
In this paper, after a brief recall about UISS and its extension to tuberculosis (sect.2), we sketch the strategy we adopt to generate the cohort of digital patients(sect.3), and we show some preliminary results about the dynamics of MTB on a subset of these patients.(sect.4).

Extension of the UISS computational framework to reproduce TB
We will briefly describe here the UISS computational framwork and its extension to model tuberculosis, UISS-TB.The interested reader can find more details about UISS-TB in [1].

Introduction to the UISS modeling framwork
UISS is a multi-agent framework for the simulation of the immune system dynamics that can be extended to reproduce specific diseases and related treatments.Differently from classical top-down approaches, in which mean behaviors are studied by means of differential equations as presented in [2], [3], [4], in agent based models and multiagent systems entities are followed individually, and global nonlinear behaviors arise as the sum of individual behaviors.UISS has been developed as a multi-scale computer simulator of the immune system, as it takes into account both cellular and molecular entities and processes.
UISS has a long track record of successful stories that include, among others, its use for modelling the effects of a vaccine against the onset of mammary carcinoma [5], [6] and consequent lung metastases [7], for the initial stages of atherosclerosis [8], for melanoma [9], and more recently, for the study of Multiple Sclerosis [10], [11] and for testing the efficacy of citrus-derived adjuvants for influenza vaccines and human papilloma virus [12], [13].
We then extended UISS to include all the MTB dynamics along with the artificial immunity induced by vaccination strategies as presented in [1].
Finally, to depict individual diversity, a vector of features has been identified.It combines both biological and pathophysiological parameters that personalize the digital patient to reproduce the physiology and the pathophysiology of the subject.In particular, the digital patient model defines a specific patient through 26 features: Drug Sensitive (DS)/Multi-drug resistant (MDR); Bacteria Load (BL) in

Generation of Digital Patients: a Bayesian approach 3.1. The UISS-TB input vector
The UISS-TB model defines a specific patient through a vector of 26 features as described in section ??: In order to create an in silico patient, one needs to provide a single value for each one of 1-26.These values could be taken from individual physical patients; however, if a cohort of digital patients is to be produced, one should have a mechanism for producing as many different input vectors as needed, that are biological/physiological plausible.Formally, this requires the characterisation of the joint distribution of the inputs in the population.We have compiled typical values and standard deviations for each feature, providing a way to generate plausible values for each component at a time.Proceeding in this way would neglect the biological correlations between features and thus would not guarantee a physiologically plausible input vector.
Hence, we must take into account these correlations.Given that we have 25 numerical input variables (DS/MDR is a factor), we should specify 25 × 24/2 = 300 correlations.Using relevant literature and expert opinion, we have qualified these correlations, determining that all correlations are positive, but the correlation of IL-10 with the rest of the features.

Formalising in silico profile generation
In theory, one could elicit the joint distribution of the 25 features, i.e. describe mathematically how each feature relate to each other in a space of 25 dimensions; but this would be not only extremely difficult, but also time consuming and data demanding.Our approach is to rely on current mathematical biology consensus and use a Gaussian to represent the population distribution.The additional advantage of using this approach will be discussed in the next section.
Formally, we say that the vector x = {x 1 , . . ., x d } follows a d-variate Gaussian distribution with joint probability density function (pdf) with mean µ = {µ 1 , . . ., µ d } and covariance matrix, where, Cov(x i , x j ) = σ ij related to the correlations by So, if we are able to elicit a measure of correlation between two inputs, we can calculate their covariance.The elements in the diagonal, σ 2 i are the marginal variances of each element, x i , and µ i the corresponding marginal mean.As mentioned above, we already have compiled a list with these values, so we have elicited values for µ and the diagonal elements of Σ, σ 2 i .

Cohort generation
Once µ and Σ have been elicited, generating an in silico profile is a relatively trivial task: one must sample a point in the25-dimensional space, consistent with N d (x | µ, Σ).But we can exploit the properties of the Gaussian distribution to produce a cohort consistent with some specific characteristics.Say, for instance, that our target population has a particular range of BL, we would like then to produce digital patients consistent with that specific profile.Formally, let x 1 represent BL and x −1 = {x 2 , . . ., x 18 }, the rest of the features; we would like to sample from i.e. the conditional distribution of the rest of the features, given that BL has a specific value.This is a standard procedure, which can be readily implemented.We can go even further and sort the list of features according to either their importance in determining the profile of a patient, or to the precision of their elicited mean, variance and covariance, and then proceed to sample from the conditional distributions, one at a time.

Preliminary results
To test the approach presented in sect.3 we created an R script for the generation of digital patents.In table 2 we report 30 generated digital patients using the aforementioned approach.All the patients have been then simulated using UISS-TB.
In the following figures we show the typical UISS-TB simulation framework when applied to the sample set of digital in silico patients depicted in table 2. We show, for each biological entity, both the mean behavior (according to the entities the color line may vary) and the +/-SD (blue lines).We run a total of 30 simulations for untreated digital in silico patients.In figure 1 it is depicted the dynamics of alveolar macrophages during two phases.The first one is during the active TB phase where both necrotic and MTBinfected populations increase.After the active phase, a latent phase is established, and necrotic alveolar macrophages contribute to typical granuloma formation.

Conclusions and future work
The set up an "in silico" trial requires that the involved computational model is able to coherently reproduce the disease dynamics on different individuals.As a consequence of that, it is important to establish a rigorous strategy for the definition of a credible cohort of digital patients.To this end, we presented an approach for creating a set of digital patients whose features can be in line with those of the real population.
Preliminary results about the execution of UISS-TB on the cohort of digital patients show that the simulator is able to capture the dynamics of this pathology.
The next step will be focused on the generation of reference digital populations to be used as part of the technical validation of UISS-TB.Once the data from the clinical trials will be available, we will regenerate the digital cohorts, and we will use a Bayesian statistical model approach to explore specific use cases, such as that of in silico-augmented clinical trials, where digital and physical patients are combined.
by the European Commission, under the contract H2020-SC1-2017-CNECT-2, No. 777123.The information and views set out in this article are those of the authors and do not necessarily reflect the official opinion of the European Commission.Neither the European Commission institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein.

Figure 1 .
Figure 1.Total (red lines), Dead (yellow lines) Infected (green lines) and Necrotic (dark green lines) Alveolar Macrophages mean behavior for a simulation time of 2 years computed over the 30 random digital patients in table 2 absence of treatment.Blue lines represent mean +/-SD .

TABLE 1 .
VECTOR OF FEATURES AND RELATIVE RANGE USED TO IDENTIFY AND DEFINE DIGITAL PATIENTS.(D) STANDS FOR DISCRETE VARIABLE; (C) STANDS FOR CONTINUOUS VARIABLE.

TABLE 2 .
VECTOR OF FEATURES FOR 30 DIGITAL PATIENTS GENERATED ACCORDING THE APPROACH DESCRIBED IN SECTION 3. THE COLUMN FOR DS/MDR STATUS IS NOT REPORTED IN TABLE.