Skip to main content

A strategy to detect metabolic changes induced by exposure to chemicals from large sets of condition-specific metabolic models computed with enumeration techniques

Abstract

Background

The growing abundance of in vitro omics data, coupled with the necessity to reduce animal testing in the safety assessment of chemical compounds and even eliminate it in the evaluation of cosmetics, highlights the need for adequate computational methodologies. Data from omics technologies allow the exploration of a wide range of biological processes, therefore providing a better understanding of mechanisms of action (MoA) related to chemical exposure in biological systems. However, the analysis of these large datasets remains difficult due to the complexity of modulations spanning multiple biological processes.

Results

To address this, we propose a strategy to reduce information overload by computing, based on transcriptomics data, a comprehensive metabolic sub-network reflecting the metabolic impact of a chemical. The proposed strategy integrates transcriptomic data to a genome scale metabolic network through enumeration of condition-specific metabolic models hence translating transcriptomics data into reaction activity probabilities. Based on these results, a graph algorithm is applied to retrieve user readable sub-networks reflecting the possible metabolic MoA (mMoA) of chemicals. This strategy has been implemented as a three-step workflow. The first step consists in building cell condition-specific models reflecting the metabolic impact of each exposure condition while taking into account the diversity of possible optimal solutions with a partial enumeration algorithm. In a second step, we address the challenge of analyzing thousands of enumerated condition-specific networks by computing differentially activated reactions (DARs) between the two sets of enumerated possible condition-specific models. Finally, in the third step, DARs are grouped into clusters of functionally interconnected metabolic reactions, representing possible mMoA, using the distance-based clustering and subnetwork extraction method. The first part of the workflow was exemplified on eight molecules selected for their known human hepatotoxic outcomes associated with specific MoAs well described in the literature and for which we retrieved primary human hepatocytes transcriptomic data in Open TG-GATEs. Then, we further applied this strategy to more precisely model and visualize associated mMoA for two of these eight molecules (amiodarone and valproic acid). The approach proved to go beyond gene-based analysis by identifying mMoA when few genes are significantly differentially expressed (2 differentially expressed genes (DEGs) for amiodarone), bringing additional information from the network topology, or when very large number of genes were differentially expressed (5709 DEGs for valproic acid). In both cases, the results of our strategy well fitted evidence from the literature regarding known MoA. Beyond these confirmations, the workflow highlighted potential other unexplored mMoA.

Conclusion

The proposed strategy allows toxicology experts to decipher which part of cellular metabolism is expected to be affected by the exposition to a given chemical. The approach originality resides in the combination of different metabolic modelling approaches (constraint based and graph modelling). The application to two model molecules shows the strong potential of the approach for interpretation and visual mining of complex omics in vitro data. The presented strategy is freely available as a python module (https://pypi.org/project/manamodeller/) and jupyter notebooks (https://github.com/LouisonF/MANA).

Peer Review reports

Introduction

Toxicology is entering a new era with the urgent need to follow a 3R (Reduce, Replace and Refine) policy when assessing risks of chemical molecules. The European Cosmetics regulation (EC) No 1223/2009 banning animal testing for cosmetic ingredients is a striking example of the need to develop non-animal approaches, particularly for systemic toxicity. In that context, new approach methodologies (NAMs) from the combination of in silico and in vitro methods are required to be fit for purpose and protective of human health. These NAMs are now being developed to support the so-called next generation risk assessment (NGRA) [1]. NAMs are already evolving at a fast pace thanks to the ever-increasing amount of omics data generated from in vitro experiments, creating unprecedented capacity to study biological systems. Omics screening allows, for instance, a more holistic classification of compounds based on their global effects [2, 3] and it can be integrated to improve quantitative structure–activity relationships strategies [4]. The exploration of most biological processes is covered by transcriptomics or proteomics data. Among these processes, endogenous metabolism is becoming a source of concern since chemicals have the capacity to affect metabolism at a cellular and tissue level, potentially leading to adverse effects for humans such as diabetes, obesity, or even organ dysfunction [5,6,7,8,9]. Moreover, generating and analyzing several large-omics datasets can be expensive and methodologically challenging. Such datasets are usually of large dimensions (thousands of genes, proteins, and metabolites) that will contain information for broader insights on how an organism or a cell is globally impacted by a chemical. One of the current key challenges in the field is to extract knowledge from these rich but complex datasets.

The volumes of data generated by omics approaches and their complexity require advanced statistical and computational solutions to pinpoint patterns of interest among thousands of variables. Dimension reduction (PCA, MDS, t-SNE, etc.) methods are widely used to describe and visualize these large multidimensional datasets, but they are not designed for functional interpretation of omics data. From that perspective, enrichment-based methods, which allow the identification of biological modules (gene sets, metabolic pathways and cellular functions) in which identified modulated variables are over-represented, have the advantage of providing an overview of the studied biological modulations [10]. Nevertheless, these methods rely on arbitrary definitions of the pathways and functional sets, which might differ depending on the selected database, and tend to hide functional processes spanning several pathways [11, 12]. Many studies [13,14,15,16] aim to take advantage of published gene expression data available in databases such as DrugMatrix [17], Connectivity Map [18, 19], ToxicoDB [20] and Open TG-GATEs [21] (https://toxico.nibiohn.go.jp) to improve chemical toxicity assessment. For instance, Heusinkveld et al. [22] implemented an approach based on the comparison of Open TG-GATEs top 50 DEG signatures ranked according to their t-statistic. They aimed at providing both a score to compare compounds and a mechanistic understanding based on enrichment-based methods. On the other hand, Ting Li et al. [2] trained a deep neural network model for drug-induced liver injury prediction based on the LINCS L1000 dataset [19]. In both cases, metabolism is not the primary focus of the study, and the functional interpretation relies on enrichment-based methods, with the limitations explained above.

Therefore, to improve our understanding of how chemicals impact endogenous metabolism and can trigger potential adverse effects, it is necessary to develop new computational methods that would allow provision of functional metabolic information from omics data including the diversity of possible mMoAs leading to adverse outcomes. Genome-scale metabolic networks (GSMNs) are biological networks representing all the possible biochemical reactions occurring in an organism. They are therefore well suited to consider the diversity of endogenous metabolic disruptions potentially associated with adverse outcomes at a cellular or at an organ level. These networks are reconstructed from an annotated genome, curated, and refined thanks to biochemical knowledge retrieved from the literature [23, 24]. Several GSMNs representing human metabolism such as Recon2 [25], Recon2.2 [26], Recon3D [27] and Human-GEM [28] have been published. GSMNs are composed of thousands of reactions and metabolites interconnected according to the stoichiometric matrix of the network. For instance, Recon2.2, which is one of the most used human GSMNs, is composed of 7785 reactions, 5324 metabolites, and 1675 associated genes distributed over 10 cellular compartments. The relationships between reactions and metabolites in the metabolic network are modelled by the stoichiometric matrix, which is the mathematical representation containing the proportions of substrates and products involved in each reaction. The stoichiometric matrix is the ground for constraint-based modeling approaches, which are largely used to model metabolic networks at the scale of cells, tissues, or organisms. Genes and reactions are linked by gene-protein-reaction (GPR) associations, which are formulated as Boolean rules, including all genes coding for either the isoenzymes (OR association) or the enzymatic complex subunits (AND association) catalyzing a reaction. Since these networks include all the reactions that can potentially be expressed in any tissue or cell type, and any condition for a given organism, they represent the global metabolism of this organism. It is then crucial to tailor these models to specifically represent the metabolism of one tissue, one cell, or one condition in order to accurately decipher the mMoA of chemical compounds.

Preliminary to any interpretation, it is essential to perform a condition-specific metabolic network reconstruction to avoid interpreting metabolic functions that would involve reactions that are not active in the biological condition or cell type under study. Condition-specific modeling approaches aim at exploiting experimental data to assemble a GSMN that more closely represents the condition under study. Many algorithms such as iMAT [29, 30] or FASTCORE [31] have been developed to build condition-specific metabolic networks, most of which rely on transcriptomic data. Unlike other algorithms, these 2 algorithms do not require optimizing a physiological objective function such as biomass production maximization: this is especially more relevant for studying eukaryotic differentiated cells which are usually not growing anymore and might be oriented toward performing several other objective metabolic tasks.

FASTCORE aims to find a minimal flux consistent metabolic network that contains a list of “core” reactions and as few reactions from the global GSMN as possible. These “core” reactions correspond to all the reactions that are strongly assumed to be active in the studied condition and are defined by the user with the method of its choice, making FASTCORE a generic algorithm for reconstructing condition-specific metabolic networks [31]. The iMAT algorithm aims to find the best consensus between the reaction activity inferred from categorized gene expression data and the activity inferred from the GSMN structure, which defines the biochemical interdependencies between reactions through consumed and produced metabolites. The output is a condition-specific subnetwork that more faithfully represents the metabolic state of the cell in the condition of the transcriptomic experiment. Selecting the best algorithm among all these possibilities is not trivial as several benchmarking studies stated that none of the benchmarked methods outperforms the others for all cases [32] and that selection of the method should be mainly guided by the aim of the study and the available data [33].

One issue that is overlooked by most of these algorithms and that we tackled in our strategy is the alternative optimal solutions [34] issue. Indeed, due to the high complexity of GSMNs and the relatively low amount of biological data, the mathematical problem solved by algorithms such as iMAT and FASTCORE to find a condition-specific metabolic network is under-constrained. Practically, it means that many equally optimal condition-specific metabolic networks exist for a single biological condition and that arbitrarily taking one optimal condition-specific network would limit the reproducibility and bias further analyses made for that condition [34].

To avoid this problem, partial enumeration methods have been developed [34,35,36,37]. These methods explore the solution space (i.e., the range of possible optimal condition-specific metabolic networks for this condition) to find a representative set of possible condition-specific metabolic networks. Rodriguez et al. formally described the enumeration problem by designing a network model with a finite and calculable set of alternative solutions which was used as a ground truth for evaluating partial enumeration performance. They highlighted that considering thousands of condition-specific metabolic networks is more robust and less error-prone than considering only one network in the solution space [34]. Nevertheless, it makes the interpretation more difficult since a condition will be associated with thousands of potential network configurations. There is therefore a need to define a new strategy to extract mechanistic information from these numerous condition-specific metabolic networks to allow the final identification of the mMoA.

In this study, we propose a strategy designed to reduce the increased analysis complexity resulting from constraint-based modelling coupled with partial enumeration by calculating DARs and exploiting them with graph analysis methods with the final aim to predict and visualize chemicals’ mMoA. To develop our strategy, we used gene expression data from the Open TG-GATEs database and we built condition-specific metabolic networks with a partial enumeration method adapted from the DEXOM [34] approach. Open TG-GATEs has been created from two projects spanning several years: Toxicogenomics Project One [38] (TGP1: 2004–2007) and Toxicogenomics Project Two [39] (TGP2: 2010–2011). Open TG-GATEs contains gene expression data generated on Crl:CD Sprague–Dawley rats and on primary human hepatocytes (PHH) and primary rat hepatocytes, after exposure to low cytotoxic doses of 150 pharmaceutical molecules. We selected gene expression data generated on PHH exposed for 24 h to the maximum dose, which corresponds to a dose inducing less than 20% of cytotoxicity. In order to evaluate the metabolic impact of a given chemical at a specific dose, we developed a statistical approach consisting of identifying DARs between two distinct sets of condition-specific metabolic networks representing respectively the exposure condition and the control (i.e., without exposure) condition. We took advantage of the GSMN structure to compute the metabolic distance between DARs in order to identify clusters of functionally interdependent metabolic reactions. We then extracted a minimal subnetwork for each cluster to better understand how these small biological processes are constructed and how reactions are interconnected. Finally, we used MetExploreViz [40] to interactively visualize the subnetworks and overlay additional annotation on the network such as cellular compartments, metabolic pathways, and custom mappings. This approach enables the analysis of the metabolic impact of a chemical at a global level but also at a very precise level such as the biochemical reaction scale. The workflow has been applied to eight molecules (ethanol, valproic acid, indomethacin, amiodarone, allopurinol, rifampicin, sulindac, and tetracycline) selected for their known hepatotoxic effects and the large amount of scientific literature describing their MoA. We further highlight capabilities of our strategy by predicting and analyzing the metabolic impact of two known hepatotoxic compounds: amiodarone and valproic acid.

Results

We set up a strategy to better understand the mMoA of chemicals through transcriptomic data integration with condition-specific modelling enhanced by partial enumeration: this strategy combines constraint-based modelling methods and graph techniques in a three-step workflow.

The first step (Fig. 1, box A) consists of building condition-specific metabolic networks. Indeed, GSMNs encompass all possible metabolic reactions regardless of the tissue, cell, or condition. Performing computational analysis on this generic model may raise inaccurate results (e.g., highlighting a pathway which is known to be inactive for a cell type). Condition-specific metabolic networks are composed of reactions predicted as active by a modeling algorithm for a given biological condition based on the generic GSMN and a set of gene expression data. In this study, we used Recon2.2 [26] as the initial GSMN to build reconstructions for several biological conditions (i.e., cells exposed to a chemical) and transcriptomic data obtained from the Open TG-GATEs database (described in “Methods” in the section “Transcriptomic data processing”). For each studied condition, a set of condition-specific metabolic networks optimally matching gene expression data, network topology and the stoichiometry of reactions is computed with an adapted version of DEXOM [34]. Our major adaptation of the DEXOM method consisted in combining two of the proposed algorithms (reaction-enum followed by diversity-enum) while reducing the number of enumerated solutions by randomly selecting 1% of the reaction-enum solutions as starting solutions for diversity-enum, rather than using all of them (details are provided in the “Methods” section “).We assessed the impact of this random selection on our final results by performing five different runs of our approach for the amiodarone control and treated conditions (S1 Appendix). These robustness tests showed that the diversity of enumerated solutions was not significantly impacted by this random and partial selection. We therefore consider that this adaptation is relevant to allow a drastic reduction in computing time while providing robust and consistent results.

Fig. 1
figure 1

General overview of the three-step strategy. In the first step (A), transcriptomics data are integrated to a GSMN with a partial enumeration approach adapted from DEXOM. In the second step (B), DARs are computed from the large number of metabolic networks obtained after the optimization and sampling step. Finally, in the third step (C), network analysis methods are developed and employed to interpret DARs and improve our understanding of the chemicals’ mMoA

The output is a dataset that gathers, for each reaction, the number of times it has been predicted as active by the algorithm across all optimal enumerated solutions. This result highlights the likelihood of each reaction of being active for a specific condition. Note that this step allows the prediction of activity for reactions, even if they are not associated with any gene (e.g., passive transport reactions), by inferring their activity from the activity of surrounding reactions.

In the second step (Fig. 1, box B) of the workflow, the objective is to identify reactions which activity changes in a significant manner between control and treated conditions. We therefore apply a statistical test between sets of reactions’ predicted activity in order to identify perturbed reactions between the two conditions (described in “Methods” in the section “Identification of differentially activated reactions”). Since some reactions cannot be correctly constrained by transcriptomic data (e.g., associated with less specific genes or no gene at all), they are more likely to be indiscriminately predicted as active or inactive while maintaining the same optimal fit with the data. The predicted activity of these reactions may therefore be more variable even across optimal subnetworks representing the same condition. We implemented a “baseline noise” calculation and filtration approach (described in “Methods” in the section “Baseline noise calculation”) to identify these reactions. Perturbed reactions passing this final filter are then considered as DARs. The list of DARs by itself is a very insightful result since it allows the listing of reactions whose activity is significantly affected by the studied condition (exposed vs. control cells).

The last step (Fig. 1, box C) of the workflow consists of deciphering where and how cell metabolism is perturbed by chemical exposure. DARs are not independent and may act in a coordinated manner through sequences of reactions, hence highlighting the potential modulated cascade of enzymatic reactions. To do so, a network-based approach has been developed to detect, visualize, and analyze the functional role of previously identified DARs. The method aims at stratifying the list of DARs into clusters of close reactions in the metabolic network (detailed in “Methods” in the section “DARs clustering”). Two reactions are considered to be close when they are connected to each other with only a few intermediary reactions. Once these clusters were identified, close reactions were used to extract small human-readable subnetworks (detailed in “Methods” in the section “Subnetwork extraction”) that would describe parts of the metabolic network that are specifically modulated in the studied condition, suggesting potential mMoA of the molecules.

Identification of differentially activated reactions (DAR) associated with exposure to eight chemicals

The strategy has been applied to eight molecules present in the Open TG-Gates database (ethanol, valproic acid, indomethacin, amiodarone, allopurinol, rifampicin, sulindac, and tetracycline) selected for their known liver toxicity and the associated MoAs reported in the literature. We used gene expression data generated in PHH exposed for 24 h at the highest concentration of each of these eight molecules along with their associated controls. Condition-specific metabolic networks were reconstructed (Fig. 1, box A) from gene expression data for each retrieved sample (two replicates per condition). On average, 10,000 alternative optimal solutions were enumerated for each sample. Each optimal solution is composed of a unique set of active and inactive reactions. In Table 1, the minimal, maximal, and average number of active reactions for the solutions calculated by DEXOM for each chemical and two vehicle controls (i.e., medium and DMSO) are reported. It is worth noting that the size (i.e., the number of active reactions) of DEXOM optimal solutions is in the same order of magnitude for all conditions (treatment or controls, see Table 1) with a minimal number of active reactions ranging from 3423 (DMSO condition) to 3639 (tetracycline 25 µM, 24 h), a maximal number of active reactions ranging from 4396 (valproic acid 5000 µM, 24 h) to 4645 (indomethacin 200 µM, 24 h), and a mean number of active reactions ranging from 4070 (valproic acid 5000 µM, 24 h) to 4570 (amiodarone 7 µM, 24 h).

Table 1 Number of predicted active reactions for each chemical and vehicle set of condition-specific metabolic networks. Sets of condition-specific metabolic networks were calculated by the adapted DEXOM approach implemented in our workflow (Fig. 1, box A). All solutions obtained for samples of the same molecule have been merged before calculating the minimal, maximal, and mean number of active reactions per condition

DARs were computed to identify metabolic reactions modulated following exposure to each selected molecule (Fig. 1, box B) compared to the corresponding control condition, after filtering out reactions that show a large variability in the baseline condition. Valproic acid, indomethacin, amiodarone and allopurinol perturbed reaction lists were only slightly affected by the noise filtration procedure with 12.6, 5.3, 6.7 and 11.4%, respectively, of reactions filtered out (Table 2). Conversely, for tetracycline and ethanol, 43.4 and 62.8% of perturbed reactions were respectively filtered out (i.e., reactions being modulated by both molecules and control conditions (S1 Table)). The number of predicted DARs ranged from 35 for PHH exposed to ethanol (10 mM, 24 h) to a maximum of 417 for PHH exposed to valproic acid (5000 µM, 24 h) (Table 2). For each condition, we also computed the DARs specificity ratio as the number of DARs retrieved solely in this condition and not in any other studied condition (i.e., DARs specific to the condition) divided by the total number of DARs retrieved for this condition (S2 Table). Calculated DARs specificity ratios ranged from 22.2% for indomethacin to 91.1% for valproic acid, indicating that the predicted mMoA for indomethacin was at least similar to one other condition in the study, whereas in contrast, the predicted mMoA for valproic acid was quite different from any other condition in the study.

Table 2 Number of perturbed reactions and DARs calculated for each condition. DARs were identified by the developed workflow for PHH after 24 h exposure to eight molecules known for their hepatotoxicity at dose levels yielding an 80–90% survival ratio

Global analysis of DARs within the context of the metabolic network

We performed a pathway over-representation analysis with DARs on Recon2.2 pathways (S1 Fig and S2 Fig) for the eight molecules. For amiodarone, predicted DARs were significantly enriched for four metabolic pathways: “Fatty acid synthesis”, “Pyrimidine synthesis”, “Aminosugar metabolism” and “Purine synthesis”. Similarly, DARs predicted for valproic acid were significantly enriched for 12 metabolic pathways: “Pyrimidine synthesis”, “Aminosugar metabolism”, “Sphingolipid metabolism”, “NAD metabolism”, “Thiamine metabolism”, “Glycolysis/gluconeogenesis”, “Fructose and mannose metabolism”, “Lysine metabolism”, “Tryptophan metabolism”, “Urea cycle”, “Limonene and pinene degradation”, and “Hyaluronan metabolism”. Interestingly, pathway enrichment results are very different when artificial reactions (S3 Table) (i.e. sink, pool and extracellular exchange reactions) are kept in the list of DARs, suggesting that the list of DARs should be carefully processed before performing pathway enrichment (see S1 Appendix). We observed that the fatty acid synthesis pathway was evidenced as significantly enriched from DARs for four molecules and compared how these four molecules were impacting this pathway (Fig. 2) by visually comparing the metabolic footprint of each chemical on the fatty acid synthesis pathway. Amiodarone and allopurinol affected the same reactions (Fig. 2A and C), which suggested that they share similar mMoAs regarding fatty acid synthesis. On the other hand, rifampicin and sulindac seemed to impact another part of the fatty acid synthesis pathway (Fig. 2B and D). These observations showed that, although pathway enrichment can provide information on which general pathways are impacted by a chemical, it is not sufficient to evidence more precise modulated functions or to differentiate between potential different mechanisms of action. Even if these four compounds are identified as being able to induce liver steatosis, underlying MoAs might differ [41].

Fig. 2
figure 2

Comparison of the metabolic impact of four molecules with fatty-acid synthesis pathway significantly enriched. Each metabolic graph represents which fatty acid pathway reactions are differentially activated (in blue) after in vitro exposure to amiodarone (A), sulindac (B), allopurinol (C), and rifampicin (D). Nodes represented by a square are metabolic reactions and nodes represented by a circle are metabolites. DARs links have been highlighted in blue to identify which part of the pathway is perturbed by the molecule

Two of the eight molecules were selected for further analysis: amiodarone (7 µM, 24 h) and valproic acid (5000 µM, 24 h). These two molecules were selected because (1) They are well described hepatotoxic compounds with several MoAs known to induce liver damage such as steatosis [42, 43], and (2) Their level of transcriptomics-based information differs, with 5709 and two differentially expressed genes (DEGs) for PHH exposed to valproic acid and amiodarone, respectively (described in “Methods” in the section “DEG identification”). Despite the low number of DEGs observed for amiodarone, 56 DARs were identified in PHH after 24 h exposure to 7 µM amiodarone using the developed method.

Figure 3 shows the visualization produced by MetExploreViz for DARs identified for amiodarone and valproic acid in the context of human GSMNs using Recon 2.2. For amiodarone (Fig. 3A), a main group is composed of well interconnected up-activated reactions, while few other groups of up- or down-activated DARs are disconnected with no shared substrates or products. For valproic acid (Fig. 3B), many small groups of DARs are disconnected from each other without any particular area of the metabolic network specifically impacted. Hence, this general observation indicates that the mMoAs might differ between these two molecules, and further in-depth analysis is required.

Fig. 3
figure 3

Visualization of DARs identified for amiodarone and valproic acid within the Recon2.2 metabolic network. This visualization was performed with MetExploreViz while removing side compounds (S4 Table). DARs identified for amiodarone (7 µM, 24 h) are highlighted in panel A and DARs identified for valproic acid (5000 µM, 24 h) are highlighted in panel B, with more frequently active reactions in the exposed versus control condition colored in red and less frequently active reactions in green. Nodes represent reactions and metabolites and are connected if a metabolite is a product or a substrate of a reaction. The two figures are based on the same network layout; thus, each reaction and metabolite is located at the same coordinates, allowing visual comparison. Interactive visualizations can be accessed through the following links: https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_1726726315 and https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_1522683843. The same visualization, with DARs colored according to their cluster’s group rather than their activity status, is also provided as supplementary material (S4 Fig)

Identifying subnetworks of DARs with graph-distance clustering and subnetwork extraction

In order to capture the mMoA of a molecule without having to rely on subjectively defined pathways [11, 12], it is assumed that the closeness within the network can be used as a measure of the metabolic proximity between reactions. Indeed, reactions are linked through compounds being produced and consumed by others. The shorter the chain between two reactions, the stronger the expected interdependency. Therefore, the metabolic distance, which is the length of the chain or path between two reactions in the metabolic network, is used to estimate interdependency. Hence, we propose grouping reactions that are closely located as a proxy to identify reactions involved in the same metabolic function.

The pairwise distance matrix between all reactions identified as differentially active for each of the two molecules, amiodarone and valproic acid, was computed based on the list of DARs. As described in the “Methods” section, the computation of pairwise distances was performed on the metabolic reaction undirected graph of Recon2.2, using the shortest path distance metric. Then, subsets of DARs were identified by clustering the pairwise distance matrix with a hierarchical clustering approach. Two clusters were selected for PHH exposed to 7 µM amiodarone for 24 h (Fig. 4A) and three clusters for PHH exposed to 5000 µM valproic acid for 24 h (Fig. 4B).

Fig. 4
figure 4

Biclustered heatmap of the pairwise reaction distance matrix for amiodarone and valproic acid. Hierarchical biclustering on the pairwise reaction distance matrix for amiodarone and valproic acid was performed with the Ward algorithm. Reactions corresponding to “pool” reactions and extracellular transports, as well as isolated reactions, were excluded from the distance matrix before performing the clustering. The biclustering was visualized as a heatmap computed with the Pheatmap R package. The color scale depicts the distance between two reactions. The distance ranges between zero (cells colored in blue) to eight for amiodarone (A) and 14 for valproic acid (B) (cells colored in red). Two main clusters (C1 and C2) were identified for amiodarone (A) and three (C1, C2, and C3) for valproic acid (B)

To go further in the interpretation and to better understand how these reactions interact and are implicated in the mMoA of our chemicals of interest, a subnetwork extraction step was implemented and applied to each cluster of DARs. Extracting a subnetwork allows one to visualize how the DARs are interconnected and fill gaps between disconnected DARs by adding the necessary intermediate reactions. The subnetwork extraction was performed using the Met4J library (https://forgemia.inra.fr/metexplore/met4j) implementation of the minimal Steiner tree on a pruned reaction graph of Recon2.2. The minimal Steiner tree extraction algorithm enables the extraction of a subnetwork connecting two lists of nodes (i.e., DARs) while minimizing the size of the final subnetwork. Therefore, this algorithm fits well with our objective of visualizing interactions between DARs while keeping it human-readable. To guide our analysis, the “DARs subnetwork coverage” metric was defined, which is calculated as the number of DARs in a subnetwork divided by the total number of reactions in this subnetwork. This metric provides an estimate of how rich in DARs a subnetwork is. A high DARs subnetwork coverage score indicates closely located and tightly interacting DARs.

For PHH exposed to 5000 µM of valproic acid for 24 h, a subnetwork with a 77% DARs subnetwork coverage was extracted (Cluster C2 in Fig. 4B, see S5 Table for details), meaning that most reactions within the extracted subnetwork were DARs that were closely interconnected. Among the DARs of this subnetwork, 21 were up-activated while only four were down-activated (Fig. 5A). Five DARs predicted as up-activated were associated with -oses metabolism such as fructose/mannose metabolism and glycolysis/gluconeogenesis pathway, some of which are involved in the phosphorylation of hexoses (Fig. 5B). Another group of up-activated DARs is associated with lysine metabolism in mitochondria (html links for visualization in S1 Appendix) and especially the degradation of lysine through production of L-saccharopinate from 2-oxoglutarate. One up-activated reaction is associated with transport of L-asparagine into the mitochondria, which is then used by the L-asparaginase to produce L-aspartate. The resulting L-aspartate is finally transported by an aspartate-glutamate shuttle that was not differentially modulated. The phosphatidate cytidylyltransferase associated with the glycerophospholipid metabolism was also predicted to be up-activated in PHH after exposure to 5000 µM valproic acid for 24 h. Finally, eight differentially up-activated reactions were associated with tryptophan metabolism and involved in reactions associated with kynurenate, L-glutamate and 2-oxoglutarate. Down-activated reactions were more sparsely located in the extracted subnetwork and were associated with hyaluronan metabolism, propanoate metabolism, where acetoacetate and coenzyme A are conjugated to produce acetoacetyl-CoA, and finally with pyrimidine synthesis, where glutamate and aspartate production/consumption were disturbed. The subnetwork extraction algorithm could add reactions that were not identified as DARs but were necessary for the connectivity of the subnetwork. These added reactions were associated with pathways also associated with DARs such as mitochondrial transport, glycolysis/gluconeogenesis, pyrimidine synthesis and lysine metabolism (Fig. 5B).

Fig. 5
figure 5

Metabolic visualization of the metabolic subnetwork extracted from one of the DARs clusters predicted for valproic acid. DARs were predicted by performing condition-specific reconstructions for PHH exposed to 5000 µM valproic acid for 24 h. The visualized subnetwork was computed from DARs in cluster #2, which is the cluster with the highest DARs subnetwork coverage among the clusters identified in the distance matrix (see Fig. 4B) for this condition. Nodes represented by a square are metabolic reactions and nodes represented by a circle are metabolites. A and B represent the same subnetwork with the same topology. Links in A are highlighted according to the direction of the perturbation (e.g., if the reaction is more frequently active in the exposed vs. control condition) and links in B are colored according to the metabolic pathway of the associated reaction. Interactive visualizations can be accessed through the following links: https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_292937465 and https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_1994092833

For PHH exposed to 7 µM of amiodarone for 24 h, a subnetwork with a DARs coverage of 95% (S5 Table) was extracted, indicating that nearly all the reactions that make up this subnetwork are DARs. 36 DARs are up-activated in the treated condition and only one DAR is down-activated in the treated condition (Fig. 6A). Most of the up-activated reactions (i.e., 31) of this subnetwork are associated with the fatty acid synthesis, four reactions are linked to the fatty acid oxidation pathway and one reaction is associated with aminosugar metabolism. Regarding the fatty acid synthesis pathway, many reactions associated with conjugation/deconjugation of acyl carrier protein (ACP) to fatty acids were perturbed. Fatty-acyl-CoA synthase reactions were also perturbed with many reactions involved in the addition of malonyl-CoA to fatty acids as being differentially up-activated. Some up-activated reactions associated with the fatty-acid oxidation pathway were also disturbed, such as the acetyl-CoA-ACP transacylase and the malonyl-CoA-ACP transacylase as well as two fatty-acyl-ACP hydrolases. Finally, the only down-activated reaction of this subnetwork is a reaction associated with the aminosugar metabolism involved in N-acetylglucosamine-6-phosphate’s production from acetyl-CoA and D-glucosamine-6-phosphate. In the subnetwork presented in Fig. 6, only two reactions associated with fatty-acyl-CoA elongation from the fatty-acid synthesis pathway were added by the subnetwork extraction algorithm.

Fig. 6
figure 6

Metabolic visualization of the subnetwork extracted from one the DARs cluster predicted for amiodarone. DARs were predicted by performing condition-specific reconstructions for PHH exposed to 7 µM amiodarone for 24 h. The visualized subnetwork was computed with DARs from cluster #2, which is the cluster with the highest DARs subnetwork coverage among the clusters identified in the distance matrix (Fig. 4A) for this condition. Nodes represented by a square are metabolic reactions and nodes represented by a circle are metabolites. A and B represent the same subnetwork with the same topology. Links in A are highlighted according to the direction of the perturbation (e.g., if the reaction is more frequently active in the exposed condition vs. control condition) and links in B are colored according to the metabolic pathway of the associated reaction. Interactive visualizations can be accessed through these links: https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_373423088 and https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_725935955

Discussion

Choosing a modelling algorithm able to explore alternative solutions

The proposed strategy has been applied to Recon2.2 for demonstration purposes. It can also be used with other Human GSMN like HumanGEM, Recon3D or any another GSMN following SBML standard. In fact, the only requirement is to be able to map transcriptomics data using adequate identifiers for gene expression integration. Choosing which GSMN to use among the variety of published GSMNs can be a challenging task due to the difficulty of comparing these metabolic networks [44, 45]. To our knowledge, no standard procedure exists to compare in an objective manner GSMNs [46]. According to [44, 45], the main challenge to compare GSMNs is the lack of a standardized procedure, the lack of published differences between GSMNs, and the lack of common identifiers. Therefore, each author should define their own criteria for selecting the best GSMN to answer the question raised in their study.

Integration of gene expression data into GSMNs allows additional information to be uncovered that could not be retrieved from transcriptomics data, such as the activity of reactions not associated with the expression of specific genes (passive transports, spontaneous reactions, etc.) and cellular localization. To be predicted as active, a reaction must have its source metabolite(s) being produced by another reaction in the metabolic network, and its product(s) being consumed by another reaction. Therefore, the objective of the modeling algorithm is to find the best consensus between reactions predicted as active according to gene expression data and these production/consumption constraints from the metabolic network.

As mentioned previously, the amount of information available in gene expression data for the modeling is not sufficient to allow the algorithm to find a unique condition-specific network. This limit has already been described by several authors [34,35,36, 47] and Rodriguez et al. demonstrated that the analysis of a single solution as it is usually done with standard implementations of widely used constraint-based algorithms (i.e. iMAT, FASTCORE, GIMME, …) return less robust results [34]. To circumvent this issue, partial enumeration approaches have been developed [34, 35, 37, 47] to obtain a set of alternative solutions that is representative of the diversity of solutions existing in the solution space. Such methods provide many similarly adequate optimal subnetworks based on the gene expression data. Rossell et al. proposed EXAMO to reconstruct a minimal condition-specific network from the alternative optimal solution space that prioritizes reactions found to be active in all alternative solutions. This strategy tends, however, to emphasize central metabolism although it might not necessarily be the part of metabolism most affected by chemical compound exposure. Poupin et al. identified required, inactive, and potentially active reactions, based on how frequently the reactions were found to be active in the alternative optimal solution space. In this method, the group of reactions defined as potentially active is difficult to interpret and ended up being merged with required reactions to perform the pathway enrichments analyses in their study. MOOMIN [37] also provides an alternative approach able to explore the solution space, by predicting whether reactions are likely to be up or down activated based on differential expression data.

In our strategy we chose to use the DEXOM method for its ability to explore the solution space and to return a set of alternative solutions that were shown to represent the diversity of solutions that can be found in the solution space [34]. DEXOM also has the advantage of outputting a qualitative prediction of reactions activity, not relying on computed metabolic flux values. Indeed, several studies [48,49,50] reported a low correlation between gene expression intensities and metabolic fluxes values, whereas other studies [32, 51] showed that gene expression data can be used as a good proxy for predicting metabolic fluxes. Given the lack of consensus and formal proof on a direct and quantitative relation between gene expression data and metabolic fluxes, the strategy adopted in DEXOM consists in binarizing flux values and considering a reaction as active if it is predicted to carry a non-zero flux.

As discussed above, exploring the alternative solution space is important to obtain more robust results. A complementary approach not explored in this study is to further constrain the space of solutions by combining gene expression data with medium metabolomics data (exometabolomics) that would add further constraints on metabolic fluxes, thereby reducing the uncertainty related to the number of possible alternative solutions and obtain more accurate metabolic fluxes predictions.

It should be noted, that as many methods that use gene expression data for building context-specific models, our strategy is based on the binarization of the gene expression data. As a first pre-processing step, genes are classified using percentile threshold on the z-score issued from barcode package. The choice of the threshold should be carefully considered with regard to the biological question, as it has been shown that it can impact the generated models [33]: a more stringent threshold will allow a larger diversity in the optimal retrieved subnetworks but will conversely increase the number of false possible alternatives [34]. In our strategy, we chose to use a threshold-based method for the classification of gene expression levels because of its simplicity and widespreadness, but other methods could be used instead. Indeed, changing the method would change the set of highly/not expressed genes, but does not change the purpose of our strategy. Also, we applied the threshold on z-score values calculated with BARCODE rather than directly on the gene expression values: this limits the issue related to probe effects, which usually prevents from directly comparing intensities between genes on a single array.

Identification of DARs from the partial enumeration of condition-specific reconstructions

Using DEXOM to explore the alternative solution space of each modelled condition is crucial to increase the robustness of the results. However, the subsequent results analysis is a challenging task because, for each condition, thousands of different condition-specific metabolic networks need to be analyzed. To solve this challenge, we introduce the DAR calculation approach, which aims to compare two sets of condition-specific metabolic networks and identify reactions that are significantly perturbed in one condition compared to the other one (e.g., treated vs. control). For that purpose, we chose to compare the activation frequency of each reaction between the two conditions using the R2 metrics. The identification of DAR based on the results of this measure, although relying on the choice of the threshold, avoids the biases that would be introduced if we were using classical tests for comparing the proportions between two populations, such as the sensitivity to extreme values, as with the odd ratio metric, or the sensitivity to the sample size as with the Fisher test for instance. However, some metabolic reactions are more sensitive to the under-constrained problem and will be predicted as active or inactive independently of experimental data specificity. To limit this effect, we added a filtration step called “baseline noise filter” that takes into account the uncertainty of predicting the activation status associated with some reactions. The baseline noise filter had a different impact on perturbed reactions lists depending on the condition. Most of the perturbed reactions filtered out for ethanol and tetracycline, which are the conditions the most impacted by the noise filtration, are transport and peroxisomal β-oxidation reactions. Indeed, these reactions are more prone to noisy predictions either because they are not associated with any genes (e.g., exchange reactions) or they are associated with complex GPR rules that tend to result in unconstrained reactions. The baseline noise filter is a rather conservative approach, but it ensures that the predicted perturbation is not due to a lack of constraints on the model. Indeed, one key point in the development of condition-specific reconstructions was to get enough biological constraints. Both the DAR computation and the baseline noise filtration steps implemented in our workflow can be adapted to other partial enumeration approaches as they only require activation frequencies for each metabolic reaction in two distinct conditions.

Comparison and complementarity with DEGs and pathway enrichment

To evaluate the added value of our workflow, in comparison with directly using gene expression data through approaches such as pathway over-representation on DEG, we performed a pathway ORA on DEGs for the valproic acid condition (S3 Fig). The first limit of this DEG-based approach is that it becomes inapplicable or unreliable when the list of DEGs is short, which is often the case for cells exposed to a low concentration of chemicals triggering subtle metabolic impacts. It is indeed the case in our study for the 24 h treatment of PHH with 7 µM amiodarone, where only two DEGs were evidenced. On the contrary, we showed that using our strategy, it is possible to analyze the metabolic impact of a chemical starting from transcriptomic data even when the DEGs list is short. Another limit of using DEG-based approaches to explore metabolic alterations is that it provides information on changes that are often not directly linked to metabolism but involve more generic cellular processes. For instance, when performing pathway ORA on DEGs with Reactome [52] (i.e., including all genes, not just metabolic genes) for PHH exposed to 5000 µM valproic acid for 24 h, we observed that many of the top 50 enriched terms are related to cell cycle, gene expression regulation, and cell signaling (S3 Fig). Although this is valuable information about the MoA of valproic acid, it is not directly associated with its metabolic impact in liver cells, suggesting that combining both gene expression data analysis and condition-specific reconstruction is beneficial. The purpose of integrating transcriptomics data in the metabolic network is to bring some additional information to the one that can be drawn directly from gene differential expression. This is quite clear for the Amiodarone example, for which no metabolic DEG can be evidenced although we could identify more than 50 DARs which were coherent with the literature. Regarding the valproic acid example, we noticed that conversely 87% of the DARs (347/417) were associated to at least 1 DEG, showing that DARs and DEG are consistent even if the method does not directly used differential expression information, but in addition, 70 DARs were inferred thanks to the topology of the network. More broadly, DEG and DARs are not designed for the same purposes. While DEG focus on differential expression between 2 conditions, allowing to evidence slight changes in the expression of genes, DARs are designed to detect strong switches in reaction activities between two conditions. This approach would therefore put more emphasis on reactions and/or pathways that are globally activated or inactivated under exposure than slightly modulated. The two approaches provide a complementary vision of the metabolism. One of the future challenges will be to integrate both approaches so we are able to quantify the flux modulations through DARs.

Metabolic distance-based clustering and subnetwork extraction for mMoA interpretation

To go beyond pathway enrichment-based analysis, we propose a clustering approach based on the metabolic distance in the network, which ensures the identification of groups of functionally interdependent modulated reactions. These groups can involve reactions belonging to different pathways, therefore enabling the study of the mMoA as a continuous phenomenon instead of a binary (i.e., enriched/not enriched pathway) phenomenon. However, hierarchical clustering suffers from some limitations. Indeed, hierarchical clustering is not designed to capture communities (i.e., groups of densely interconnected elements in a graph) and tends to consider reactions implicated at the ends of linear cascade as distant, therefore not biochemically related, even if according to the network topology these reactions are biochemically related and highly dependent. Community detection approaches could be an answer to some of these limitations but their performance may be affected by the particular topological properties of large networks [53] such as GSMNs, therefore requiring a thorough evaluation to find the most effective method.

To add metabolic context and functional information to the lists of DARs and find connected subnetworks that include clustered DARs, we extracted minimal subnetworks with an algorithm based on the Steiner minimal tree problem. This algorithm is well adapted to our principal objective of visualizing human-readable subnetworks because it will find the smallest subnetwork connecting all the nodes of interest (i.e., DARs) while adding as few nodes as possible for connectivity. However, the Steiner minimal tree problem is computationally hard (proven to be NP-Complete [54]). The metric closure graph [55] (or shortest distance graph) approximation was thus used. While employing an approximation is necessary, it introduces an arbitrary decision-making process within the algorithm, leading to the selection of a single alternative minimal Steiner tree among various possibilities. This choice, although necessary to extract a minimal subnetwork representative of the mMoA, could miss biologically interesting connections between DARs. The drawback of favoring the obtention of small graphs is that some interesting alternatives could be omitted. Another option could therefore be the k-shortest paths algorithm, which would make it possible to find these alternatives, at the expense of readability. Hence, regarding subnetwork extraction, method choice is guided by finding the right trade-off between readability and exhaustivity.

Because metabolic disruptions can impact a wide range of cellular functions spanning several metabolic pathways, the clustering and subnetwork extraction methods are key to connect and analyze disruptions. Interestingly, in the subnetwork presented in Fig. 5, most of the reactions added by the subnetwork extraction algorithm are transport reactions, which highlight how the subnetwork extraction algorithm can help to better understand the mMoA of the molecule as a continuous phenomenon spanning several pathways and compartments rather than localized and disconnected perturbations. The graph-based analysis developed in this strategy could be used with other approaches able to identify perturbed reactions between two conditions, such as MOOMIN.

In addition to elucidate and compare MoA of different chemical compounds, the same strategy could be interestingly applied for investigating the treatment duration-related effect of one specific molecule, by comparing how the clusters of identified DARs evolve with the treatment duration. This could be especially relevant in the context where the kinetic of the metabolic effects of a molecule is unknown and might differ across compounds.

mMoA analysis of PHH exposed to amiodarone or valproic acid

The perturbations predicted by our workflow and visualized in Figs. 5 and 6 for Amiodarone and valproic acid are mainly associated with -oses and lipid metabolism, which is line with Amiodarone and valproic acid being known to induce hepatotoxicity via the occurrence of steatosis [42, 43]. Valproic acid is also widely known for its impact on mitochondrial function [43] and is a histone de-acetylase inhibitor [56], a class of molecules known to impair many cell functions: the wide diversity of pathways subparts identified in the mMoA of valproic acid tend to support this mechanism (Fig. 5B).

To study the metabolic impact of valproic acid on PHH after 24 h exposure, we focused on visualizing cluster 2 (Fig. 5), which is the cluster with the highest DARs coverage for this condition. We observed that DARs from this cluster were associated with 12 pathways. We were also able to identify another group of four up-activated reactions that is associated with lysine degradation in mitochondria, a phenomenon associated with mitochondrial homeostasis disruption in mice [57]. This phenomenon could also be due to a decrease in the L-carnitine pool associated with valproic acid exposure [58, 59] that could trigger a compensating mechanism to restore L-carnitine which requires lysine as a substrate. Finally, a group of eight reactions predicted as up-activated is associated with the metabolism of tryptophan. Interestingly, an increase in the metabolism of tryptophan and kynurenine has already been reported as a potential effect of valproic acid in rats [60], and the valproic acid-induced increased conversion of tryptophan to nicotinamide has been observed in rats [61].

The impact of amiodarone on the fatty acid synthesis pathway predicted by our workflow (Fig. 6) has already been observed in HepaRG cell cultures [41]. It is suggested that the increase in fatty acid synthesis could be due to the activation of SREBP1, which is a transcription factor mediating the expression of several important genes for de novo lipogenesis. One of these genes, FASN, encodes the fatty-acid synthase, which our workflow predicted as DAR in the treated state (Fig. 6A). Interestingly, SREBP1 and FASN were not identified as differentially expressed in amiodarone’s transcriptomic data, suggesting that our workflow is able to retrieve mechanistic information from lowly informative transcriptomic data. A similar increase of de novo lipogenesis on 3T3L1 adipocytes has also been reported [62] and is linked to an increase in palmitate production, which is in line with the predicted up-activation of the fatty-acyl-ACP hydrolase hydrolyzing palmitoyl-ACP to palmitate (Fig. 6). In the subnetwork presented in Fig. 6, only two reactions have been added by the subnetwork extraction algorithm, suggesting that DARs identified in this cluster are closely located in the metabolic network, therefore functionally interdependent. Moreover, the visualized subnetwork accounts for 74% of predicted DARs for amiodarone. Since the majority of DARs in the visualized subnetwork (Fig. 6) are associated with fatty acid synthesis, predictions from our workflow suggest that amiodarone mMoA is mostly related to this pathway and quite localized in the metabolic network. This mechanistic observation would not have been possible with the initial transcriptomic data analysis that yielded only two DEGs for PHH exposed 24 h at 7 µM amiodarone, thus emphasizing the capacity of our workflow to extract mechanistic information from condition-specific metabolic networks constructed with lowly informative transcriptomic data.

Conclusion

In this study, we proposed a strategy dealing with the increased complexity of performing partial enumeration during condition-specific modeling by identifying DARs and visualizing the chemical’s mMoA with a graph-based approach. The proposed strategy starts by integrating metabolic gene expression data to a GSMN to construct sets of condition-specific metabolic networks. These sets of condition-specific metabolic networks are then analyzed through an original statistical approach to identify reactions that are likely to be differentially activated between the two compared conditions. To go further on the mechanistic understanding, we implemented a graph-based network analysis step to extract and visualize subnetworks corresponding to metabolic functional and connected groups of DARs. Finally, we showed that the strategy that we developed succeeded in retrieving parts of the mMoA described in the literature for two well-known hepatotoxic molecules (amiodarone and valproic acid), even when only few genes were significantly disrupted.

The presented strategy provides a way to take advantage of the increased robustness obtained with condition-specific enumerated networks while keeping the analysis doable and human readable, allowing a better and deeper understanding of chemicals’ mMoA. This strategy can be divided in several independent parts: data integration, condition-specific modelling and graph-based analysis, so that it can be combined with other condition-specific modelling approaches. This comprehensive metabolic network exploration and visualization comes at the expense of the computation time, which could make it challenging to apply this strategy in large-scale safety assessment studies and would require development in parallel of complementary methods providing a more global assessment of the metabolic impact of a compound. These complementary developments could be plugged into the current workflow quite easily thanks to its flexibility. Although we focused on studying the metabolic impact of chemicals on PHH, our workflow could also be used to model the metabolic impact of chemicals on different cellular types from different tissues, therefore paving the way for a more precise understanding of how a chemical impacts the metabolism of many organs.

Methods

Transcriptomic data processing

Transcriptomic data used in this study were obtained from the Open TG-GATEs [21] database. For this study, we selected in vitro data generated on PHH. Raw transcriptomic data were downloaded as CEL files from https://dbarchive.biosciencedbc.jp/en/open-tggates/download.html. CEL files were read with the affy [63] package and the resulting dataset was normalized with the robust multiarray average method [64] from the limma R package. Information about the different PHH cell batches used in the different experiments was obtained from the authors (S6 Table) and was used to correct the batch effect with Combat [65], available in the SVA package. We annotated the probes with the annotation database corresponding to the Affymetrix HG133U Plus 2 chips available in the hgu133plus2.db package [66] and the AnnotationDbi package [67]. Where several different probes mapped to the same gene, we selected the probe with the highest standard deviation of expression values in the dataset. Finally, to match DEXOM [34] requirements, we categorized transcriptomic data with Barcode [68,69,70]. We used the z-scores output available from the barcode package and applied a 25/75 percentiles cutoff to obtain a list of highly and lowly expressed genes. Genes below the 25th percentile were considered as lowly expressed genes and assigned a − 1 value, while genes above the 75th percentile were considered as highly expressed genes and assigned a + 1 value. Genes between the 25th and 75th percentiles were not considered (i.e., 0 value) and therefore did not have any impact on the modeling process.

DEG identification

Lists of DEGs were obtained from the ToxicoDB [20] database, which provides pre-processed differential gene expression data from several databases, including Open TG-GATEs. A jupyter notebook querying ToxicoDB API was developed to automate the process. First, it downloads the ToxicoDB compounds json file and iterates over the list of compounds obtained from this file to query the ToxicoDB API to get analyzed data associated with each compound. The retrieved data were then filtered by selecting genes from the Open TG-GATEs human study with an absolute value of log2(FC) larger than 0.26 and a q-value less than 0.05, from samples subjected to a “high” dose (three doses were investigated and available for this dataset: low, middle, and high) over 24 h. The filtered DEGs list for each compound was then stored in a.tsv file.

Metabolic model preparation

We used the Recon2.2 [26] (downloaded from https://www.ebi.ac.uk/biomodels/MODEL1603150001) GSMN as the initial framework for the modeling and network analysis steps. This GSMN is composed of 5324 metabolites, 7785 reactions, and 1675 genes. We modified the model biomass reaction, to account only for the precursors required for cell maintenance but not replication, in order to better represent the fact that PHH are differentiated cells with a short lifespan and do not proliferate under classical culture conditions [71]. To do so, we set the stoichiometric coefficient of the DNA precursor in the biomass reaction to zero and applied a correction coefficient (S1 Appendix) to other metabolites to keep the reaction balanced for the production of one unit of biomass. Next, we forced the lower bound of the modified biomass reaction, which is now representing the cost of maintaining the cell viability, to a value of 1 instead of 0 to ensure that the generated models will be able to maintain cell viability (i.e., have a non-zero flux through the maintenance reaction). Exchange reactions were left unconstrained as we did not have information on cell medium composition.

Condition-specific modeling with partial enumeration

We integrated the categorized transcriptomic data into the GSMN through GPRs to define an a priori set of active and inactive reactions. A highly expressed gene will be given the value of 1 whereas a lowly expressed gene will be given the value of − 1. Genes are associated with reactions by the GPRs. GPRs are Boolean rules that indicate which genes are required to produce a specific enzyme that catalyzes one or more reactions. When a reaction has an AND GPR, the algorithm will annotate the reaction as active if the minimal value of the categorized GPR’s gene expression values equal 1 (i.e., all genes of the GPR are highly expressed) and inactive otherwise, such as:

$$\begin{gathered} For\;Gene1 = 1,\;Gene2 = - 1,\;Gene3 = 1,\;Gene4 = - 1: \hfill \\ Gene1\;AND\;Gene2 = \min \left( {Gene1,Gene2} \right) = \min \left( {1, - 1} \right) = - 1 \hfill \\ \end{gathered}$$

In this case, if Gene1 is highly expressed AND Gene2 is lowly expressed, the reaction is considered inactive.

When a reaction has an OR GPR, the algorithm will annotate the reaction as active if the maximal value of the categorized GPR’s gene expression values equal one (i.e., at least one gene of the GPR is highly expressed) and inactive otherwise, such as:

$$Gene1\;OR\;Gene2 = \max \left( {Gene1,Gene2} \right) = \max \left( {Gene1,Gene2} \right) = \max \left( {1, - 1} \right) = 1$$

In this case, if Gene1 or Gene2 is highly expressed, the reaction is considered active.

These rules can be applied to more complex GPRs such as:

$$\begin{gathered} \left( {\left( {Gene1\;OR\;Gene2} \right)AND \left( {Gene3\;OR\;Gene4} \right)} \right) = \min \left( {\max \left( {Gene1,Gene2} \right),\max \left( {Gene3,Gene4} \right)} \right) \hfill \\ \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad = \min \left( {\max \left( {1, - 1} \right),\max \left( {1, - 1} \right)} \right) = \min \left( {1,1} \right) = 1 \hfill \\ \end{gathered}$$

In this case, if Gene1 or Gene2 is highly expressed and Gene3 or Gene4 is highly expressed, the reaction is considered active.

A reaction will be identified as a priori active if the resulting value equals 1 and as a priori inactive if the resulting value equals − 1. At the end of this process, we identified a list of a priori active/inactive reactions for each sample that was then provided to DEXOM. DEXOM is a constraint-based modeling algorithm based on iMAT [29, 30] whose objective is to find a steady state distribution of flux that maximizes the number of reactions whose flux is consistent with transcriptomic data levels [30].

To extract a representative set of solutions from the whole solution space, we adapted the partial enumeration from DEXOM and used the python implementation available in dexom-python (https://github.com/MetExplore/dexom-python). First, we applied a full Reaction-enum strategy (see [34] for the complete description). This method iterates over all the reactions of the network, blocking each of them successively and solving the resulting mixed integer linear problem. Next, we applied the DEXOM Diversity-enum strategy, starting with a random solution picked per range of Reaction-enum solutions as a starting point. Diversity-enum aims to find new solutions that are gradually more different from the starting solution, allowing one to explore the limits of the solutions’ space. To reduce the computational costs of the original DEXOM partial enumeration approach, we reduced the number of starting solutions to a set of 1% of the solutions obtained with the Reaction-enum approach. We used an adapted version of systematic sampling (i.e., one random solution picked per batch of solutions, S1 appendix for details) to generate a starting set representative of the complete set of Reaction-enum solutions. Finally, all the optimal solutions for a sample were grouped in a single tabulation-separated file. The list of parameters used for running the DEXOM algorithm can be found in S1 Appendix.

Identification of perturbed reactions

For each reaction, the predicted activation status across all optimal condition-specific metabolic networks enumerated by the DEXOM strategy (Table 1) is stored in a numeric vector. Therefore, for each condition, the output of the DEXOM enumeration is a matrix of binary vectors where columns correspond to reactions and rows correspond to enumerated optimal solutions (Table 3).

Table 3 DEXOM output example for one condition. Alternative solutions obtained with partial enumeration are stored as binary vectors

For each reaction and each condition, we can compute an activation frequency value f, as the number of solutions in which the reaction is predicted to be active, over the total number of solutions. To compare the activation frequency values of two conditions, we introduce a metric called R2.

$$R2 = \left( {\frac{{nActive_{ctrl} }}{{nTotal_{ctrl} }} - \frac{{nActive_{trt} }}{{nTotal_{trt} }}} \right)^{2} = \left( {f_{ctrl} - f_{trt} } \right)^{2}$$

where \(nActive\) is the number of times the reaction is predicted active and \(nTotal\) is the total number of solutions in the condition (i.e., \(nActive + nInactive\)). \(f_{ctrl}\) is the activation frequency of a reaction under the control condition and \(f_{trt}\) is the activation frequency of the same reaction under the treated condition.

We considered reactions as perturbed if the R2 value was higher than 0.2 when comparing treated versus non-treated conditions. This is a rather conservative threshold since it means that to be considered as differentially activated, a reaction needs to be at least almost twice (80%) more/less active in the treated condition compared to the non-treated condition.

Baseline noise calculation

The absence of gene expression information for some reactions, and therefore the absence of constraints on those reactions when using the DEXOM algorithm, leads to uncertainty (or “noise”) in the reaction activation frequency prediction. Indeed, in the absence of transcriptomic data on reactions, DEXOM can predict these reactions as active or inactive without any impact on the optimality score. This uncertainty could lead to biased conclusions when comparing two conditions. To avoid considering reactions that are loosely constrained as DAR, we implemented a method that aims at estimating the baseline noise associated with each reaction in the network. The noise refers to the variation of the activation frequency of each reaction between pairs of control conditions, which is due to the fact that the predicted activity of the reaction does not impact the consistency with the data. Because several control conditions and chemical exposure times have been used in this dataset, we computed the baseline noise for each reaction between control conditions at a specified time and a specified vehicle. Replicates were pooled for each molecule. Practically, for each reaction, we computed the R2 between all pairs of control conditions for a given vehicle at a given exposure time and calculated the median of all R2s for each reaction. Therefore, we obtained for each reaction a baseline noise estimation that is the median of all pairwise comparisons between selected control conditions.

Then we filtered out perturbed reactions with an R2 between two conditions of interest (e.g., control vs. treatment) that was less than two times the noise estimate for this reaction. The remaining perturbed reactions are considered as DARs.

Metabolic reaction graph construction

A metabolic reaction graph can be formally defined as a directed graph G = (V,E) where:

V is a set of vertices, where each vertex (\({v}_{i}\)) corresponds to a distinct metabolic reaction

E is a set of directed edges, in which reaction (\({v}_{i}\)) and reaction (\({v}_{j}\)) are connected by an edge (\({v}_{i},{v}_{j}\)) if a metabolite produced by reaction (\({v}_{i}\)) is a substrate of reaction (\({v}_{j}\)).

Edges can be annotated with the name of the metabolite consumed/produced by the two reaction nodes it connects. The metabolic reaction graph formalism allows to focus the attention on metabolic reactions and the interactions between these reactions therefore it is well adapted in our approach to study the identified DARs.

GSMNs, which are large and complex metabolic networks, contain hub nodes (i.e., nodes connected to many other nodes). In fact, as shown by Arita et al. [72], these compounds will create shortcuts in the network since they are connected to a large number of reactions. Hence, these side compounds will strongly impact any topology based algorithms like path search or subgraph extraction by resulting in non-relevant biochemical results [73]. Therefore, removing these side compounds is necessary to obtain a metabolic reaction graph topology more biochemically representative of the direct interactions between metabolic reactions and allow a precise understanding of the links and interdependencies between DARs. To solve this issue, we identified a list of “side compounds” corresponding to the main hub nodes of the network (S4 Table) that will be removed during the metabolic graph construction. To improve the performance of the graph theory methods, we also identified a list of reactions (S6 Table) to be excluded during the reaction metabolic graph construction. These reactions are blocked reactions (i.e., reactions that cannot carry a flux in the GSMN), reactions always inactive in our cellular model, pool, sink, and exchange reactions. Metabolic reaction graphs were constructed according to the parameters defined previously before applying any graph-based methods such as the distance matrix calculation and subnetwork extractions.

Reaction distance matrix calculation

We computed the pairwise distance between reactions of interest by computing all the shortest paths between the nodes of the network corresponding to these reactions, resulting in a reaction pairwise distance matrix. To that purpose, we implemented a java app using the Met4j metabolic network toolbox (https://forgemia.inra.fr/metexplore/met4j). This app takes as input the GSMN (i.e., Recon2.2) SBML file, a list of side compounds, a list of reactions to exclude, and a list of reactions between which the distances will be computed (e.g., DARs in our case).

After loading the SBML file, the GSMN is pruned by removing side compounds and reactions to exclude. Then the JGraphT [74] ManytoMany shortest path implementation [75] is used. This implementation has been preferred over more common implementations such as Dijkstra [76] or Floyd–Warshall [77] due to its performance and its ability to take a specific list of reactions as input. The distance matrix is then saved in a comma separated file.

DARs clustering

For each DARs distance matrix computed with Met4J, we performed a hierarchical clustering with the Ward algorithm implemented in the SciPy python module. To control the size of clusters, we visualized each clustering result with a dendrogram representation and manually determined the number of clusters. Clusters were then obtained with the cutree function.

Subnetwork extraction

For each cluster of DARs, we extracted a minimal subnetwork with the minimal Steiner tree algorithm implemented in Met4J. This algorithm searches for a minimum spanning tree that contains the set of DARs and a minimum set of nodes from the GSMN to connect them according to the initial network topology.

Subnetwork visualization

We visualized the extracted subnetworks with MetExploreViz [40]. MetExploreViz is a JavaScript library dedicated to visualization for GSMNs that is integrated within the MetExplore [78] webserver. The list of metabolic reactions contained in the subnetwork is first mapped on the selected GSMN with MetExplore (https://metexplore.toulouse.inrae.fr/index.html/). From this mapping, we can then visualize and prune (i.e., remove side compounds) the bipartite metabolic graph interactively with MetExploreViz. Two visualizations were done for each of the subnetworks. One with mapped up-activated DARs colored in red and down-activated DARs colored in green (Figs. 5A and 6A), and a second one with reaction links colored according to the pathway associated with the reaction (Figs. 5B and 6B). Interactive visualizations were saved online and are accessible via links provided in supplementary materials.

Workflow implementation

We partitioned the workflow into three jupyter notebooks with a common properties file. The first notebook, named “partial_enumeration.ipynb,” takes as input barcode z-scores and generates batch files to execute the partial enumeration protocol on a SLURM computing cluster. Of note, these batch files can also be executed directly on a standard Linux (i.e., Ubuntu, Debian, etc.) operating system. The second notebook, named “dars_calculation.ipynb,” takes as input the partial-enumeration results from the previous notebook and computes baseline noise and DARs. Finally, the third notebook, named “analysis.ipynb,” handles the network analysis step of our workflow, which includes computing the distance matrix for each list of DARs, hierarchical clustering, and extracting subnetworks with Met4J.

These notebooks provide an example as to how to run the pipeline from the datasets used in this paper. The code has been packaged and published on pypi (https://pypi.org/project/manamodeller/) with its associated API documentation (https://manamodeller.readthedocs.io/en/latest/?badge=latest). It provides the required functions to run the pipeline and apply it to other datasets. Python library code and jupyter notebooks are available on GitHub: https://github.com/LouisonF/MANA.

Computing environment

Condition-specific modeling and partial enumeration requires CPLEX v20.10 to be installed on your system. Batch files generated by the “partial_enumeration.ipynb” notebook are designed to be launched on a SLURM computing grid but can also be launched on a standard Linux operating system. A local version of dexom-python is also required and can be cloned from the dexom-python repository (https://github.com/MetExplore/dexom-python). Finally, the network analysis workflow calling Met4J apps requires at least Java 11 to be installed on your machine.

Availability of data and materials

All data used in this study were obtained from the Open TG-GATEs [21] database. Raw transcriptomic data were downloaded as CEL files from https://dbarchive.biosciencedbc.jp/en/open-tggates/download.html.

References

  1. Alexander-White C, Bury D, Cronin M, Dent M, Hack E, Hewitt NJ, et al. A 10-step framework for use of read-across (RAX) in next generation risk assessment (NGRA) for cosmetics safety assessment. Regul Toxicol Pharmacol. 2022;129:105094.

    Article  CAS  PubMed  Google Scholar 

  2. Li T, Tong W, Roberts R, Liu Z, Thakkar S. Deep learning on high-throughput transcriptomics to predict drug-induced liver injury. Front Bioeng Biotechnol. 2020;8:1366. https://doi.org/10.3389/fbioe.2020.562677.

    Article  Google Scholar 

  3. Harrill JA, Everett LJ, Haggard DE, Sheffield T, Bundy JL, Willis CM, et al. High-throughput transcriptomics platform for screening environmental chemicals. Toxicol Sci. 2021;181(1):68–89.

    Article  CAS  PubMed  Google Scholar 

  4. Chen Q, Wu L, Liu W, Xing L, Fan X. Enhanced QSAR model performance by integrating structural and gene expression information. Molecules. 2013;18:10789–801.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Heindel JJ, Blumberg B, Cave M, Machtinger R, Mantovani A, Mendez MA, et al. Metabolism disrupting chemicals and metabolic disorders. Reprod Toxicol. 2017;68:3–33.

    Article  CAS  PubMed  Google Scholar 

  6. Sarni ROS, Kochi C, Suano-Souza FI. Childhood obesity: an ecological perspective. J Pediatr. 2022;98:S38–46.

    Article  Google Scholar 

  7. Gore AC, Chappell VA, Fenton SE, Flaws JA, Nadal A, Prins GS, et al. EDC-2: the endocrine society’s second scientific statement on endocrine-disrupting chemicals. Endocr Rev. 2015;36(6):E1-150.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Heal DJ, Gosden J, Jackson HC, Cheetham SC, Smith SL. Metabolic consequences of antipsychotic therapy: preclinical and clinical perspectives on diabetes, diabetic ketoacidosis, and obesity. Handb Exp Pharmacol. 2012;212:135–64.

    Article  CAS  Google Scholar 

  9. Miranda RA, Silva BS, de Moura EG, Lisboa PC. Pesticides as endocrine disruptors: programming for obesity and diabetes. Endocrine. 2023;79(3):437–47.

    Article  CAS  PubMed  Google Scholar 

  10. Nguyen T-M, Shafi A, Nguyen T, Draghici S. Identifying significantly impacted pathways: a comprehensive review and assessment. Genome Biol. 2019;20(1):203. https://doi.org/10.1186/s13059-019-1790-4.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Wieder C, Frainay C, Poupin N, Rodríguez-Mier P, Vinson F, Cooke J, et al. Pathway analysis in metabolomics: recommendations for the use of over-representation analysis. PLoS Comput Biol. 2021;17(9):e1009105.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Karp PD, Midford PE, Caspi R, Khodursky A. Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics. BMC Genomics. 2021;22(1):191. https://doi.org/10.1186/s12864-021-07502-8.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Haider S, Black MB, Parks BB, Foley B, Wetmore BA, Andersen ME, et al. A qualitative modeling approach for whole genome prediction using high-throughput toxicogenomics data and pathway-based validation. Front Pharmacol. 2018;9:200. https://doi.org/10.3389/fphar.2018.01072.

    Article  CAS  Google Scholar 

  14. Liu Z, Zhu L, Thakkar S, Roberts R, Tong W. Can transcriptomic profiles from cancer cell lines be used for toxicity assessment? Chem Res Toxicol. 2020;33(1):271–80. https://doi.org/10.1021/acs.chemrestox.9b00288.

    Article  CAS  PubMed  Google Scholar 

  15. Soufan O, Ewald J, Viau C, Crump D, Hecker M, Basu N, et al. T1000: a reduced gene set prioritized for toxicogenomic studies. PeerJ. 2019;7:e7975.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Mav D, Shah RR, Howard BE, Auerbach SS, Bushel PR, Collins JB, et al. A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS ONE. 2018;13(2):e0191105. https://doi.org/10.1371/journal.pone.0191105.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ganter B, Snyder RD, Halbert DN, Lee MD. Toxicogenomics in drug discovery and development: mechanistic analysis of compound/class-dependent effects using the DrugMatrix database. Pharmacogenomics. 2006;7(7):1025–44.

    Article  CAS  PubMed  Google Scholar 

  18. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35. https://doi.org/10.1126/science.1132939.

    Article  CAS  PubMed  Google Scholar 

  19. Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell. 2017;171(6):1437-1452.e17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Nair SK, Eeles C, Ho C, Beri G, Yoo E, Tkachuk D, et al. ToxicoDB: an integrated database to mine and visualize large-scale toxicogenomic datasets. Nucleic Acids Res. 2020;48(W1):W455–62. https://doi.org/10.1093/nar/gkaa390.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Igarashi Y, Nakatsu N, Yamashita T, Ono A, Ohno Y, Urushidani T, et al. Open TG-GATEs: a large-scale toxicogenomics database. Nucleic Acids Res. 2015;43:D921–7.

    Article  CAS  PubMed  Google Scholar 

  22. Heusinkveld HJ, Wackers PFK, Schoonen WG, van der Ven L, Pennings JLA, Luijten M. Application of the comparison approach to open TG-GATEs: a useful toxicogenomics tool for detecting modes of action in chemical risk assessment. Food Chem Toxicol. 2018;121:115–23.

    Article  CAS  PubMed  Google Scholar 

  23. Terzer M, Maynard ND, Covert MW, Stelling J. Genome-scale metabolic networks. WIREs Syst Biol Med. 2009;1(3):285–97. https://doi.org/10.1002/wsbm.37.

    Article  CAS  Google Scholar 

  24. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5(1):93–121.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Thiele I, Swainston N, Fleming RMT, Hoppe A, Sahoo S, Aurich MK, et al. A community-driven global reconstruction of human metabolism. Nat Biotechnol. 2013;31(5):419–25.

    Article  CAS  PubMed  Google Scholar 

  26. Swainston N, Smallbone K, Hefzi H, Dobson PD, Brewer J, Hanscho M, et al. Recon 2.2: from reconstruction to model of human metabolism. Metabolomics. 2016;12:109.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Brunk E, Sahoo S, Zielinski DC, Altunkaya A, Dräger A, Mih N, et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol. 2018;36(3):272–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Robinson JL, Kocabaş P, Wang H, Cholley P-E, Cook D, Nilsson A, et al. An atlas of human metabolism. Sci Signal. 2020;13:624.

    Article  Google Scholar 

  29. Zur H, Ruppin E, Shlomi T. iMAT: an integrative metabolic analysis tool. Bioinformatics. 2010;26(24):3140–2.

    Article  CAS  PubMed  Google Scholar 

  30. Shlomi T, Cabili MN, Herrgård MJ, Palsson BØ, Ruppin E. Network-based prediction of human tissue-specific metabolism. Nat Biotechnol. 2008;26(9):1003–10.

    Article  CAS  PubMed  Google Scholar 

  31. Vlassis N, Pacheco MP, Sauter T. Fast reconstruction of compact context-specific metabolic network models. PLOS Comput Biol. 2014;10(1):e1003424. https://doi.org/10.1371/journal.pcbi.1003424.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Machado D, Herrgård M. Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism. PLOS Comput Biol. 2014;10(4):e1003580. https://doi.org/10.1371/journal.pcbi.1003580.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Opdam S, Richelle A, Kellman B, Li S, Zielinski DC, Lewis NE. A systematic evaluation of methods for tailoring genome-scale metabolic models. Cell Syst. 2017;4(3):318-329.e6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Rodríguez-Mier P, Poupin N, de Blasio C, Le Cam L, Jourdan F. DEXOM: diversity-based enumeration of optimal context-specific metabolic networks. PLOS Comput Biol. 2021;17(2):e1008730. https://doi.org/10.1371/journal.pcbi.1008730.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Rossell S, Huynen MA, Notebaart RA. Inferring metabolic states in uncharacterized environments using gene-expression measurements. PLOS Comput Biol. 2013;9(3):e1002988. https://doi.org/10.1371/journal.pcbi.1002988.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Robaina-Estévez S, Nikoloski Z. On the effects of alternative optima in context-specific metabolic model predictions. PLOS Comput Biol. 2017;13(5):e1005568. https://doi.org/10.1371/journal.pcbi.1005568.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Pusa T, Ferrarini MG, Andrade R, Mary A, Marchetti-Spaccamela A, Stougie L, et al. MOOMIN—Mathematical explOration of’ Omics data on a MetabolIc Network. Bioinformatics. 2020;36(2):514–23. https://doi.org/10.1093/bioinformatics/btz584.

    Article  CAS  PubMed  Google Scholar 

  38. Urushidani T. Prediction of hepatotoxicity based on the toxicogenomics database. In: Sahu SC, editor. Hepatotoxicity: from genomics to in vitro and in vivo models. Hoboken: Wiley; 2007. p. 507–29.

    Chapter  Google Scholar 

  39. Noriyuki N, Igarashi Y, Ono A, Yamada H, Ohno Y, Urushidani T. Evaluation of DNA microarray results in the Toxicogenomics Project (TGP) consortium in Japan. J Toxicol Sci. 2012;37(4):791–801.

    Article  CAS  PubMed  Google Scholar 

  40. Chazalviel M, Frainay C, Poupin N, Vinson F, Merlet B, Gloaguen Y, et al. MetExploreViz: web component for interactive metabolic network visualization. Bioinformatics. 2018;34(2):312–3.

    Article  CAS  PubMed  Google Scholar 

  41. Allard J, Bucher S, Massart J, Ferron P-J, Le Guillou D, Loyant R, et al. Drug-induced hepatic steatosis in absence of severe mitochondrial dysfunction in HepaRG cells: proof of multiple mechanism-based toxicity. Cell Biol Toxicol. 2021;37(2):151–75. https://doi.org/10.1007/s10565-020-09537-1.

    Article  CAS  PubMed  Google Scholar 

  42. Anthérieu S, Rogue A, Fromenty B, Guillouzo A, Robin M-A. Induction of vesicular steatosis by amiodarone and tetracycline is associated with up-regulation of lipogenic genes in heparg cells. Hepatology. 2011;53(6):1895–905. https://doi.org/10.1002/hep.24290.

    Article  CAS  PubMed  Google Scholar 

  43. Mnif L, Sellami R, Masmoudi J. Valproic acid and hepatic steatosis: a possible link? About a case report. Psychopharmacol Bull. 2016;46(2):59–62.

    PubMed  PubMed Central  Google Scholar 

  44. Pantziri MDA, Klapa MI. Standardization of human metabolic stoichiometric models: challenges and directions. Front Syst Biol. 2022;2:1–7.

    Article  Google Scholar 

  45. Ravikrishnan A, Raman K. Critical assessment of genome-scale metabolic networks: the need for a unified standard. Brief Bioinform. 2015;16(6):1057–68.

    Article  CAS  PubMed  Google Scholar 

  46. Cook DJ, Nielsen J. Genome-scale metabolic models applied to human health and disease. Wiley Interdiscip Rev Syst Biol Med. 2017;9(6):1–18.

    Article  Google Scholar 

  47. Poupin N, Corlu A, Cabaton NJ, Dubois-Pot-Schneider H, Canlet C, Person E, et al. Large-scale modeling approach reveals functional metabolic shifts during hepatic differentiation. J Proteome Res. 2019;18(1):204–16. https://doi.org/10.1021/acs.jproteome.8b00524.

    Article  CAS  PubMed  Google Scholar 

  48. Yang C, Hua Q, Shimizu K. Integration of the information from gene expression and metabolic fluxes for the analysis of the regulatory mechanisms in Synechocystis. Appl Microbiol Biotechnol. 2002;58(6):813–22.

    Article  CAS  PubMed  Google Scholar 

  49. Rossell S, van der Weijden CC, Lindenbergh A, van Tuijl A, Francke C, Bakker BM, et al. Unraveling the complexity of flux regulation: a new method demonstrated for nutrient starvation in Saccharomyces cerevisiae. Proc Natl Acad Sci. 2006;103:2166–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Moxley JF, Jewett MC, Antoniewicz MR, Villas-Boas SG, Alper H, Wheeler RT, et al. Linking high-resolution metabolic flux phenotypes and transcriptional regulation in yeast modulated by the global regulator Gcn4p. Proc Natl Acad Sci. 2009;106(16):6477–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Robaina Estévez S, Nikoloski Z. Generalized framework for context-specific metabolic model extraction methods. Front Plant Sci. 2014. https://doi.org/10.3389/fpls.2014.00491.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, et al. The reactome pathway knowledgebase 2024. Nucleic Acids Res. 2024;52(D1):D672–8. https://doi.org/10.1093/nar/gkad1025.

    Article  PubMed  Google Scholar 

  53. Lee C, Cunningham P. Community detection: effective evaluation on large social networks. J Complex Netw. 2014;2(1):19–37. https://doi.org/10.1093/comnet/cnt012.

    Article  Google Scholar 

  54. Karp RM. Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD, editors. BT—complexity of computer computations: proceedings of a symposium on the Complexity of Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York. Boston: Springer; 1972. pp. 85–103. https://doi.org/10.1007/978-1-4684-2001-2_9.

  55. Voß S. Steiner’s problem in graphs: heuristic methods. Discret Appl Math. 1992;40(1):45–72.

    Article  Google Scholar 

  56. Göttlicher M, Minucci S, Zhu P, Krämer OH, Schimpf A, Giavara S, et al. Valproic acid defines a novel class of HDAC inhibitors inducing differentiation of transformed cells. EMBO J. 2001;20(24):6969–78.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Zhou J, Wang X, Wang M, Chang Y, Zhang F, Ban Z, et al. The lysine catabolite saccharopine impairs development by disrupting mitochondrial homeostasis. J Cell Biol. 2018;218(2):580–97. https://doi.org/10.1083/jcb.201807204.

    Article  CAS  PubMed  Google Scholar 

  58. Fromenty B, Pessayre D. Inhibition of mitochondrial beta-oxidation as a mechanism of hepatotoxicity. Pharmacol Ther. 1995;67(1):101–54.

    Article  CAS  PubMed  Google Scholar 

  59. Lheureux PER, Penaloza A, Zahir S, Gris M. Science review: carnitine in the treatment of valproic acid-induced toxicity—what is the evidence? Crit Care. 2005;9(5):431. https://doi.org/10.1186/cc3742.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Maciejak P, Szyndler J, Kołosowska K, Turzyńska D, Sobolewska A, Walkowiak J, et al. Valproate disturbs the balance between branched and aromatic amino acids in rats. Neurotox Res. 2014;25(4):358–68.

    Article  CAS  PubMed  Google Scholar 

  61. Shibata K, Kondo R, Sano M, Fukuwatari T. Increased conversion of tryptophan to nicotinamide in rats by dietary valproate. Biosci Biotechnol Biochem. 2013;77(2):295–300.

    Article  CAS  PubMed  Google Scholar 

  62. Hubel E, Fishman S, Holopainen M, Käkelä R, Shaffer O, Houri I, et al. Repetitive amiodarone administration causes liver damage via adipose tissue ER stress-dependent lipolysis, leading to hepatotoxic free fatty acid accumulation. Am J Physiol Liver Physiol. 2021;321(3):G298–307. https://doi.org/10.1152/ajpgi.00458.2020.

    Article  CAS  Google Scholar 

  63. Gautier L, Cope L, Bolstad BM, Irizarry RA. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20(3):307–15. https://doi.org/10.1093/bioinformatics/btg405.

    Article  CAS  PubMed  Google Scholar 

  64. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–64.

    Article  PubMed  Google Scholar 

  65. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Carlson M. hgu133plus2.db: affymetrix human genome U133 plus 2.0 array annotation data. 2016.

  67. Pagès H, Carlson M, Falcon S, Li N. AnnotationDbi: manipulation of SQLite-based annotations in Bioconductor. 2020.

  68. McCall MN, Bolstad BM, Irizarry RA. Frozen robust multiarray analysis (fRMA). Biostatistics. 2010;11(2):242–53.

    Article  PubMed  PubMed Central  Google Scholar 

  69. McCall MN, Uppal K, Jaffee HA, Zilliox MJ, Irizarry RA. The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res. 2011;39:D1011–5.

    Article  CAS  PubMed  Google Scholar 

  70. McCall MN, Jaffee HA, Zelisko SJ, Sinha N, Hooiveld G, Irizarry RA, et al. The Gene Expression Barcode 3.0: improved data processing and mining tools. Nucleic Acids Res. 2014;42:D938–43.

    Article  CAS  PubMed  Google Scholar 

  71. Guguen-Guillouzo C, Guillouzo A. General review on in vitro hepatocyte models and their applications BT—hepatocytes: methods and protocols. Totowa: Humana Press; 2010. p. 1–40.

    Google Scholar 

  72. Arita M. The metabolic world of Escherichia coli is not small. Proc Natl Acad Sci USA. 2004;101(6):1543–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Frainay C, Jourdan F. Computational methods to identify metabolic sub-networks based on metabolomic profiles. Brief Bioinform. 2017;18(1):43–56.

    Article  CAS  PubMed  Google Scholar 

  74. Michail D, Kinable J, Naveh B, Sichi JV. JGraphT—a java library for graph data structures and algorithms. ACM Trans Math Softw. 2020;46(2):1–29. https://doi.org/10.1145/3381449.

    Article  Google Scholar 

  75. Knopp S, Sanders P, Schultes D, Schulz F, Wagner D. Computing many-to-many shortest paths using highway hierarchies. In: 2007 proceedings of the workshop on algorithm engineering and experiments (ALENEX). Society for Industrial and Applied Mathematics; 2007. pp. 36–45. https://doi.org/10.1137/1.9781611972870.4.

  76. Dijkstra EW. A note on two problems in connexion with graphs. Numer Math. 1959;1(1):269–71. https://doi.org/10.1007/BF01386390.

    Article  Google Scholar 

  77. Floyd RW. Algorithm 97: shortest path. Commun ACM. 1962;5(6):345. https://doi.org/10.1145/367766.368168.

    Article  Google Scholar 

  78. Cottret L, Frainay C, Chazalviel M, Cabanettes F, Gloaguen Y, Camenen E, et al. MetExplore: collaborative edition and exploration of metabolic networks. Nucleic Acids Res. 2018;46:W495–502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Dr. Hiroshi Yamada for providing supplementary data regarding the cell batches for the assays from the Open TG-GATEs database. We thank the Genotoul infrastructure for providing access to large HPC resources that were key in developing the presented approach. We also thank Dr. Pablo Rodriguez-Mier for his expertise on how to tailor DEXOM to our needs and the discussions regarding condition-specific modeling and partial enumeration. Finally, we thank Dr. Gladys Ouedraogo for proofreading the manuscript and discussions throughout the project, and Dr. Bathilde Ambroise for the very interesting discussions we had during the very first phase of this work.

Funding

This work is supported by the Association Nationale de la Recherche et de la Technologie (ANRT) and by the French National Facility in Metabolomics and Fluxomics, MetaboHUB (11-INBS-0010), launched by the French Ministry of Research and Higher Education and the French ANR funding agency within the “Investissements d’Avenir” program.

Author information

Authors and Affiliations

Authors

Contributions

LF, OP, AR, RG, FJ and NP designed the research; LF analysed the data; LF, AO, JCG, MS, CF FJ and NP contributed to the development of the methodology (including development and implementation of statistical, modelling and visualisation methods). AR, OP, BF, FJ and NP supervised the research; all authors read and approved the final manuscript.

Corresponding authors

Correspondence to Louison Fresnais or Nathalie Poupin.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

12859_2024_5845_MOESM1_ESM.pdf

Supplementary Material 1: S1 Appendix. Supplementary information. Pdf including: Correction coefficient formula, DAR specificity ratio formula, DEXOM parameters, systematic sampling and robustness analysis.

12859_2024_5845_MOESM2_ESM.tiff

Supplementary Material 2: S1 Fig. Pathway enrichment performed on DARs, after removing blocked and artificial reactions. Pathway over-representation analysis was performed on Recon2.2 metabolic pathways for DARs predicted for the eight selected hepatotoxic molecules? DARs were filtered to remove blocked and artificial reactions. P-values were computed using a Fisher Exact test, with a Benjamini-Hochberg correction.

12859_2024_5845_MOESM3_ESM.tiff

Supplementary Material 3: S2 Fig. Pathway enrichment performed on DARs. Pathway over-representation analysis was performed on Recon2.2 metabolic pathways for DARs predicted for the eight selected hepatotoxic molecules, without removing blocked and artificial reactions. P-values were computed using a Fisher Exact test, with a Benjamini-Hochberg correction.

12859_2024_5845_MOESM4_ESM.tiff

Supplementary Material 4: S3 Fig. Pathway enrichment performed on valproic acid DEGs with ReactomePA. Pathway over-representation analysis performed with a Fisher Exact test, p-values corrected with the Benjamini-Hochberg method on Reactome 2022 pathways (genes with log2(absFC)) > 0.26 and FDR-corrected p-values < 0.05 were considered as DEG).

12859_2024_5845_MOESM5_ESM.tiff

Supplementary Material 5: S4 Fig. Visualization of DARs identified for amiodarone and valproic acid within the Recon2.2 metabolic network, colored by cluster. This visualization was performed with MetExploreViz while removing side compounds (S4 Table). DARs identified for amiodarone (7 µM, 24 h) are highlighted in panel A and DARs identified for valproic acid (5000µM, 24h) are highlighted in panel B. In panel A, DARs highlighted in blue are associated with the cluster 1 for amiodarone and DARs highlighted in orange are associated with the cluster2 for amiodarone. In panel B, DARs highlighted in blue are associated with the cluster 1 for valproic acid, DARs highlighted in orange are associated with the cluster 2 for valproic acid and DARs highlighted in green are associated with the cluster 3 for valproic acid. The two figures are based on the same network layout; thus, each reaction and metabolite is located at the same coordinates, allowing visual comparison. Interactive visualizations can be accessed through these links https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_1726726315 and https://metexplore.toulouse.inrae.fr/userFiles/metExploreViz/index.html?dir=/5b6c886c4916c1de9e6c16a776cc6d64/networkSaved_1522683843.

12859_2024_5845_MOESM6_ESM.xlsx

Supplementary Material 6: S1 Table. Reactions filtered out by the baseline noise filter. This table lists reactions initially identified as DARs having a R2 that is less than two times the noise estimated for this reaction, therefore filtered out by the baseline noise filter.

12859_2024_5845_MOESM7_ESM.xlsx

Supplementary Material 7: S2 Table. DAR metrics. This table contains several metrics which aim at understanding how perturbed reactions are commonly perturbed between the eight studied chemicals. This table lists size, specificity ratio and average percentage of DARs intersection size.

12859_2024_5845_MOESM8_ESM.xlsx

Supplementary Material 8: S3 Table. Excluded reactions identifiers. This table lists the reactions that have been excluded in the network analysis stages.

12859_2024_5845_MOESM9_ESM.xlsx

Supplementary Material 9: S4 Table. Side compounds identifiers. This table lists the sides compounds used in the network analysis stages.

12859_2024_5845_MOESM10_ESM.xlsx

Supplementary Material 10. S5 Table. DAR subnetwork coverage. This table lists the DAR subnetwork coverage for each identified subnetwork for amiodarone and valproic acid.

12859_2024_5845_MOESM11_ESM.xlsx

Supplementary Material 11. S6 Table: DEG lists size obtained from the ToxicoDB database. This table summarize key metadata for each compound such as the dose, exposure time, number of replicates, cell type, cell batch, control vehicle, DEG signature size and metabolic DEG signature size. Were considered differentially expressed, genes having a log2(abs(FC)) above 0.26 and a FDR corrected p-value below 0.05.

12859_2024_5845_MOESM12_ESM.xlsx

Supplementary Material 12. S7 Table. DAR lists predicted for Amiodarone and Valproic Acid. This table lists the DARs identified for Amiodarone and Valproic Acid. DARs are stored by cluster.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fresnais, L., Perin, O., Riu, A. et al. A strategy to detect metabolic changes induced by exposure to chemicals from large sets of condition-specific metabolic models computed with enumeration techniques. BMC Bioinformatics 25, 234 (2024). https://doi.org/10.1186/s12859-024-05845-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-024-05845-z

Keywords