Skip to main content

Comparison of seven in silico tools for evaluating of daphnia and fish acute toxicity: case study on Chinese Priority Controlled Chemicals and new chemicals

Abstract

Background

A number of predictive models for aquatic toxicity are available, however, the accuracy and extent of easy to use of these in silico tools in risk assessment still need further studied. This study evaluated the performance of seven in silico tools to daphnia and fish: ECOSAR, T.E.S.T., Danish QSAR Database, VEGA, KATE, Read Across and Trent Analysis. 37 Priority Controlled Chemicals in China (PCCs) and 92 New Chemicals (NCs) were used as validation dataset.

Results

In the quantitative evaluation to PCCs with the criteria of 10-fold difference between experimental value and estimated value, the accuracies of VEGA is the highest among all of the models, both in prediction of daphnia and fish acute toxicity, with accuracies of 100% and 90% after considering AD, respectively. The performance of KATE, ECOSAR and T.E.S.T. is similar, with accuracies are slightly lower than VEGA. The accuracy of Danish Q.D. is the lowest among the above tools with which QSAR is the main mechanism. The performance of Read Across and Trent Analysis is lowest among all of the tested in silico tools. The predictive ability of models to NCs was lower than that of PCCs possibly because never appeared in training set of the models, and ECOSAR perform best than other in silico tools.

Conclusion

QSAR based in silico tools had the greater prediction accuracy than category approach (Read Across and Trent Analysis) in predicting the acute toxicity of daphnia and fish. Category approach (Read Across and Trent Analysis) requires expert knowledge to be utilized effectively. ECOSAR performs well in both PCCs and NCs, and the application shoud be promoted in both risk assessment and priority activities. We suggest that distribution of multiple data and water solubility should be considered when developing in silico models. Both more intelligent in silico tools and testing are necessary to identify hazards of Chemicals.

Background

Global regulations have called for systematic testing of potential environmental contaminants to protect human health and the environment from exposure to anthropogenic chemicals, such as industrial chemicals and pharmaceuticals. Considering the ever-increasing number of chemicals, more than 350,000 chemicals and mixtures of chemicals been registered for production and use currently [1], are presenting challenges to traditional ecotoxicity testing strategies for in vivo experiments, which are expensive, time-consuming, and reliant on large number of animal subjects. Therefore, it is virtually impossible to test acute toxicity for all the chemicals used globally.

To mitigate the challenges associated with in vitro and in vivo toxicity testing, global regulations, including European Chemical Agency (ECHA) REACH initiative, U.S. Toxic Substances Control Act and Canadian Environmental Protection Act, encourage increased reliance on in silico approaches [2,3,4,5]. China is also attempting to explore the possibility using in silico approaches when chemicals risk assessment.

The cost-benefit advantages and regulatory support of in silico methods have led to the development of a number of tools for ecotoxicity assessments [6]. The major in silico methods including (Quantitative) Structure–Activity Relationships (QSAR), and chemical category methods.

QSAR method uses a mathematical model that was derived from a training set of example chemicals. The training set includes the chemicals that were found to be positive and negative in a given toxicological study (e.g., the bacterial reverse mutation assay) or to induce a continuous response (e.g., Lowest Observed Adverse Effect Level in teratogenicity) that the model will predict. As part of the process to generate the model, physicochemical property based descriptors (e.g., molecular weight, octanol water partition coefficient (Kow)), electronic and topological descriptors (e.g., quantum mechanics calculations), or chemical structure-based descriptors (e.g., the presence or absence of different functional groups) are generated and used to describe the training set compounds. The model encodes the relationship between these descriptors and the (toxicological) response. After the model is built and validated, it can be used to make a prediction. The (physical) chemical descriptors incorporated into the model are then generated for the test compound and are used by the model to generate a prediction. This prediction is only accepted when the test compound is sufficiently similar to the training set compounds (i.e., it is considered within the applicability domain of the QSAR model, often considering the significance of descriptors). This applicability domain analysis may be performed automatically by some software to determine whether the training set compounds share similar chemical and/or biological properties with the test chemical [7].

Chemicals whose physical-chemical, toxicological and ecotoxicological properties are likely to be similar or follow a regular pattern as a result of structural similarity may be considered as a group, or ‘category’ of chemicals. The assessment of chemicals by using this category approach differs from the approach of assessing them on an individual basis, since the properties of the individual chemicals within a category are assessed on the basis of the evaluation of the category as a whole, rather than based on measured data for any one particular chemical alone. For (a) category member(s) that lacks data for one or more endpoints, the data gap can be filled in a number of ways, including by read-across from one or more other category members. Within a chemical category, the members are often related by a trend in an effect for a given endpoint, and a trend analysis can be carried out through deriving a model based on the data for the members of the category [8].

In 2007, the Organization for Economic Co-operation and Development (OECD) guidelines on the development and validation of QSAR models were issued [9]. They proposed that a QSAR model for practical application should be associated with an unambiguous algorithm [10], a defined endpoint, an AD, appropriate goodness-of-fit measures, robustness as well as predictive ability, and a mechanistic interpretation, if possible [9, 11]. Despite these guidelines, lack of external validations and model performances of the test sets, model overfitting, and poor AD definitions remain major concerns [12,13,14,15]. A clear AD definition would ensure that the model assumptions are met [16, 17].

A number of studies developed in silico models for the endpoint of acute toxicity to daphnia and fish [18,19,20,21,22]. Specifically, some in silico tools were developed for ecological risk assessment and are widely used for support chemicals regulation purpose. These include: Ecological Structure Activity Relationships (ECOSAR) [23], Toxicity Estimation Software Tool (T.E.S.T.) [24], Kashinhou Tool for Ecotoxicity (KATE) [25], Virtual models for property Evaluation of chemicals within a Global Architecture (VEGA) [26], Danish QSAR Database (Danish Q.D.) [27], and QSAR Toolbox developed by OECD [28].

In view of the possible uses of in silico tools, regulators often use predictions from multiple in silico tools to arrive at a decision, such as persistence, bioaccumulation,and toxicity/very persistent and very bioaccumulative (PBT/vPvB) assessment and prioritization [29]. In framework of regulation purpose, the performance of in silico tools requires not only accuracy, but also ease of use, and can fulfil the different purpose, such as qualitative risk assessment, quantitative risk assessment, and even high throughput screening [30].

Based on models for specific chemical classes and different classes of substances, some studies have compared the performance of some QSAR models for acute toxicity. Moore et al. [31] evaluated model performance of six QSAR modeling packages that predict acute toxicity to fish: ECOSAR, TOPKAT, a Probabilistic Neural Network, a Computational Neural Network, the QSAR components of the Assessment Tools for the Evaluation of Risk (ASTER) system, and the Optimized Approach Based on Structural Indices Set (OASIS) system. Golbamaki et al. [32] evaluated and compared eight in silico modelling packages that predict daphnia acute toxicity: TOPKAT, ACD/Tox Suite, ADMET Predictor™, ECOSAR, TerraQSAR™, T.E.S.T. and two models implemented in VEGA. Cassotti et.al [33]. evaluated the accuracy, stability and reliability of two acute toxicity models (MICHEM and ChemProp) to daphnia.

However, some of those evaluated tools were not easy to use and were not developed for regulatory purposes. These evaluation study did not include recently developed models, such as QSAR Toolbox, Danish Q. D., KATE, or the latest version of prediction tools, such as VEGA. Finally, the performance of chemical category approach for predicting acute toxicity to fish and daphnia has not been evaluated.

To implement the regulatory requirements of the “Action Plan for Prevention and Control of Water Pollution,” the Ministry of Ecological Environment of China issued the List of Priority Controlled Chemicals (PCCs) (the first batch) at the end of 2017 [34]. List of PCCs (the second batch) has been compiled and is under comment [35]. Most of these PCCs had been assessed shown the characteristic of PBT/vPvB, especially hazard to aquatic ecosystem. If a model can identify such eventually hazard-determining chemicals, it has great regulation application prospects. In addition, in silico tools should also be able to predict the hazard of emerging chemical substances in order to respond to the premanufacture notification for new chemical substances.

In this study, we selected seven in silico tools, namely ECOSAR, T.E.S.T., Danish Q. D., VEGA, KATE, Read Across and Trent Analysis, to predict acute aquatic toxicity to daphnia and fish, in order to provide insight into the applicability, accuracy and ease of use (convenience and the level of expert knowledge required) of these in silico tools. The testsets used in this evaluation were PCCs which are representative the final chemicals in the regulatory management process and NCs which are representative of emerging substances.

Methods

Validation datasets

Systematic and rigorous model evaluation requires reliable experimental data. As such, acute aquatic toxicity experimental data (48-h LC50 for daphnia and 96-h LC50 for fish) of PCCs with a great reliability were obtained from resources such as ECHA’s risk assessment report, Good Laboratory Practice (GLP) reports, or study with standard test methods were prioritize used. Other sources, such as ECHA, OECD eChemPortal database and QSAR Toolbox were also considered. If more than one data existed, a lowest reasonable value was used. Daphnia species were consist of Daphnia magna, Daphnia pulex. Fish species were consist of Lepomis macrochirus, Cyprinus carpio, Pimephales promelas, Poecilia reticulate, Oncorhynchus mykiss, Oryzias latipes, and Brachydanio rerio et.al. within Actinopterygii.

A total of 92 NCs tested were used after removing the mixture and UVCBs (Chemical Substances of Unknown or Variable Composition, Complex Reaction Products and Biological Materials), within which, there are 42 daphnia 48-h LC50 value and 82 fish 96-h LC50 value. These NCs were tested at the year from 2014 to 2017 using OECD testing guideline 202 [36] and 203 [36] under the GLP conditions in Lab of Chemical Testing and Assessment, Nanjing Institute of Environment Sciences, Ministry of Environment Protection (MEP), China. Daphnia species were Daphnia magna, and fish species were zebra fish. As these NCs came from chemical companies, the testing data is used for registration as the requirement of Measures for Environmental Management of New Chemical Substances in China. For confidentiality requirements, identification information of these NCs such as structural can not be provided. The functional groups contained were used to analysis and were obtained by module of organic functional groups (nested) in QSAR Toolbox.

Predictive tools

The following seven in silico methods were evaluated for predicting acute aquatic toxicity to daphnia and fish: ECOSAR, T.E.S.T., Danish Q. D., VEGA, KATE, Read Across in QSAR Toolbox, and Trent Analysis in QSAR Toolbox. All of seven in silico tools were evaluated with PCCs dataset. Five tools including ECOSAR, T.E.S.T., Danish Q. D., VEGA and KATE were evaluated with NCs dataset.

Simplified Molecular Input Line Entry System (SIMLES) of each chemicals was used as input to models. A brief description of each program is provided below, and the pertinent details are summarized in Table 1.

Table 1 Summary of the predictive tools

ECOSAR

ECOSAR estimates acute aquatic toxicity via the Mayer–Overton relationship for chemicals within structurally similar classes. ECOSAR is trained on a large data set of ecotoxicity studies from the ECOTOX database that follow the U.S. EPA Office of Chemical Safety and Pollution Prevention guidelines, which comprise 130 structural classes. The log10 KOW values for each training set chemical is predicted using the KOWWIN program from U.S. EPA’s Estimation Programs Interface Suite (EPISuit) model. The linear regression models between the LC50 toxicity estimates and log10 KOW were developed for substances in each class. The predicted results of acute toxicity of fresh water other than saltwater were select to validation. Chemicals that do not meet the log10 KOW range are considered to lie outside the AD.

KATE

KATE estimates acute aquatic toxicity via the Mayer–Overton relationship for chemicals within a total of 40 structural chemical classes [37, 38]. KATE is trained on the US EPA fathead minnow (Pimephales promelas) and the Japanese Ministry of Environment Oryzias latipes datasets [25]. The log KOW value of the test chemical, which is obtained from an internal experimental database or is estimated with the alternative forced choice method. The relationship between LC50 value and log10 Kow is obtained by linear regression. log10 Kow of predicted substance is compared to the range of log Kow values in each structural class of the training set, and it internally defines the ADs. The lowest predicted values were used to validation.

T.E.S.T

T.E.S.T. estimates acute aquatic toxicity using several QSAR methodologies: hierarchical clustering, single model, the Food and Drug Administration method, multilinear regression method, group contribution method, mode of action method, nearest neighbour method and consensus methods. In the default consensus methods (used to validation), the predicted toxicity is simply the average of the predicted toxicities from the above QSAR methodologies (taking into account the applicability domain of each method). T.E.S.T. is trained on the endpoint from the EPA ECOTOX database [39]. T.E.S.T has AD for each method and a final AD where predicitons must be made by at least 2 methods for a consensus value to be used. If only a single QSAR methodology can make a prediction, the predicted value is deemed unreliable and not used. So if there is a predicted value given by consensus methods, we defined this situation as in the AD.

VEGA

VEGA provides seven models to predict the fish acute toxicity: (1) SarPy/IRFMN (V1.0.2), QSAR classification model based on fragments built by SarPy software. (2) KNN/Read-Across (V 1.0.0), Read-Across model. (3)NIC (V1.0.0), QSAR quantitative modely based on a Neural Network. (4) IRFMN (V1.0.0), Quantitative model. (5) IRFMN/Combase (V1.0.0), Quantitative model, specific for biocides, developed by IRFMN for the Combase EU project. (6) EPA (V 1.0.7), QSAR model for Fathead Minnow LC50 (96 h), based on multiple linear regression. The model extends the original model implemented in the T.E.S.T. software. (7) KNN/IRFMN(V1.1.0). KNN model on fathead minnow.

VEGA provides two models to predict the daphnia acute toxicity: (1) EPA (1.0.7), QSAR model, based on multiple linear regression. The model extends the original model implemented in the T.E.S.T. software. (2) DEMETRA (1.0.4), Hybrid Model upon two ANNs and a single PLS for pesticides.

Two sets of fragments have been considered and implemented in VEGA and freely available: Functional Groups that account for 154 chemical groups, and Atom-Cantered Fragments (ACF), for 115 fragments, each one corresponding to a type of atom with different connectivity. The software to analyse the chemical space checks for the presence of the above mentioned Functional Groups and ACF, then reports, for each of these chemical features, the total number of matches, the number of matches in each class, and its percentage. The overall reliability of the prediction is measured by combining statistical values, elements of case based reasoning, and possibly presence of active substructures. The possible reasons of concern are underlined. All those considerations are weighted and summed up in an index (in 0–1) that is called Applicability Domain Index (ADI) [26].

All of the seven models predicting the fish acute toxicity and two models predicting daphnia acute toxicity were used with an integrated method (Fig. 1), except that experimental values were not used. The predicted results with good reliability were deem as inside the AD, else deem as outside the AD.

Fig. 1
figure1

Recommended integrated assessment strategy with different models in VEGA when predicting the fish acute toxicity

Danish Q. D

Danish Q. D. includes nearly all organic single constituent substances that were pre-registered or registered under REACH (around 80,000). The database was developed by Technical University of Denmark. The endpoints are modelled in two software systems (Leadscope, and SciQSAR), and an overall battery prediction is made to reduce “noise” from the individual model estimates and thereby improve accuracy and broaden the AD [27, 40].

Leadscope is a software program for systematic sub-structural analysis of a chemical using predefined structural features stored in a template library, training set-dependent generated structural features (scaffolds) and calculated molecular descriptors. Leadscope has a default automatic descriptor selection procedure. This procedure selects the top 30% of the descriptors (structural features and molecular descriptors) according to X2-test for a binary variable or the top and bottom 15% descriptors according to t-test for a continuous variable. After selection of descriptors the program performs partial least squares (PLS) regression for a continuous response variable, or partial logistic regression for a binary response variable, to build a predictive model.

The SciQSAR software provides over 400 built-in molecular descriptors such as connectivity indices, electrotopological (atom E and HE-state) indices, and other descriptors. For continuous data, regression analysis is used to build the predictive model, and a number of different regression methods are available such as regression on principal components and PLS.

The Battery results were used firstly. If not given for Battery results, the lowest toxicity value of Leadscope and SciQSAR was selected to verification.

Trent Analysis and Read Across

OECD QSAR Toolbox finds structurally and mechanistically defined analogues and chemical categories, which serve as sources for Read Across, Trent Analysis and QSAR for filling in data gaps. QSAR Toolbox has multiple functions, such as identifying analogues of a chemical, retrieving the existing experimental results of those analogues, and filling in data gaps through Read Across, Trent Analysis or QSAR.

The predictions of Read Across and Trent Analysis were accomplished by collecting a set of test data for PCCs considered to be in the same category as the target molecule. The category was firstly defined using categorization method of “Organic functional groups (nested)”. The analogues of each PCCs were identified. Then all available experimental data on 48 h-LC50 value for daphnia and 96 h-LC50 value for Actinopterygii of identified analogues were retrieved from the selected databases (Aquatic ECETOC, Aquatic Janpan MoE, Aquatic OASIS, ECHA REACH, ECOTOX and Food TOX Hazard EFSA). Finally the Read Across and Trend Analysis were implemented with internal standardized workflow. By default of Read Across, the QSAR Toolbox averages the result of the 5 “nearest” analogues (log10 Kow in this case) to estimate the result for the target chemical. AD of each prediction was recorded as it automatic assessed by combing the log10 Kow range and organic functional groups similarity. log10 Kow must be in the range of all collected analogues, and organic functional groups must be included by that of all collected analogues.

Statistical analysis

Two types of method were used to quantify the performance of all the models to PCCs: qualitative assessment and quantitative assessment methods. Only qualitative assessment was used to quantify the performance of the five models to NCs, as most of NCs were not harmful and only a limit test result of 96-h LC50 > 100 mg/L were given.

Qualitative effect assessment only needs classified chemicals according to toxicity values (Table 2). This is related to the toxicity classes described in the The Globally Harmonized System of Classification and Labelling of Chemicals (GHS) [41]. These classification criteria are accepted by most of countries as regulatory classes. In qualitative assessment, the experimental data and predicted data were classified into four classes based GHS criteria of United Nations (Table 2). If the predicted value and the experimental value are in the same regulation category, the prediction can be considered accurate without specific values.

Table 2 Classification criteria of acute toxicity according to GHS

Quantitative assessment needs exact toxicity value to obtain the risk quotient [42]. In quantitative assessment, the difference between predicted and measured LC50 value was analysed, with difference factors of 10, 100 and 100.

A number of summary statistics were calculated to compare model performance. The correlation coefficient (R2), correlation coefficient of the AD (R2AD), root mean square error (RMSE), and percent of accuracy between predicted and measured toxicity were statistic with Microsoft excel. Software of IBM SPSS Statistics (V19) was used to obtain distribution of difference frequency between log10 experimental LC50 and log10 estimated LC50.

Total accuracy was calculated as:

$$\mathrm{Total}\ \mathrm{accuracy}=\frac{\mathrm{No}.\mathrm{of}\ \mathrm{correct}}{\mathrm{No}.\mathrm{of}\ \mathrm{all}-\mathrm{No}.\mathrm{of}\ \mathrm{missing}\ \mathrm{predictions}}\times 100\%$$

Similar to total accuracy, predictive power measures the total number of correct category assignments. However, lack of prediction was treated as an incorrect assignment:

$$\mathrm{Predictive}\ \mathrm{power}=\frac{\mathrm{No}.\mathrm{of}\ \mathrm{correct}}{\mathrm{No}.\mathrm{of}\ \mathrm{all}}\times 100\%$$

Results

Statistical distribution of experimental values

The 37 PCCs assessed in this study represent a diverse array of commercial substances. They include olefins, nitrobenzene, perfluorinated and polyfluoro compounds, halogenated hydrocarbon, halogenated benzene, organophosphate, phenols, aldehydes, organophosphate, phthalates, polycyclic aromatic hydrocarbons. The experimental LC50 values of 37 chemicals cover all regulatory categories (Fig. 2 (A) and (B)). 43% of chemicals are very toxic chemicals. The number of very toxic, toxic and hazardous chemicals are account for 92 and 86% of all the chemicals for daphnia and fish acute toxicity, respectively.

Fig. 2
figure2

Distribution of acute toxicity of experimental values (mg/L). a 48-LC50 of daphnia for PCCs. b 96-h LC50 of fish for PCCs. c 48-LC50 of daphnia for NCs. b 96-h LC50 of fish for NCs.

The NCs assessed in this study include almost all of the organic functional groups. They are much more complex as many of which have two or more functional groups, and the most complex NC have 12 functional groups. The overall toxicity of NCs are lower than PCCs shown in Fig. 2 (c) and (d). The number of non-toxicity NCs account for 57 and 65% of total NCs to Daphnia and fish, respectively.

Acute toxicity of daphnia

Experimental and predicted toxicity values to daphnia for the 37 PCCs are shown in Table 3, for the results of NCs can be found in section of “Availability of data and materials”.

Table 3 Experimental and predicted toxicity values to daphnia for the 37 PCCs

Models performance across the entire data set

Model performance was evaluated on the entire 37 PCCs and 42 NCCs. The performance metrics for all models tested in this evaluation to acute toxicity of daphnia are summarized in Table 4.

Table 4 Tool performance and comparison summary statistics to 48 h-LC50 of daphnia based on entire dataset

Prediction to 37 PCCs

In qualitative assessment based on classification into the four toxicity classes of the entire 37 PCCs data set, KATE has total accuracies of 84%, which is highest among all of the test models. However, the predictive power of KATE is decrease to 57% as it did not predict 12 of PCCs, which is most among all of the test models. ECOSAR predict all of the PCCs, both of total accuracy and the predictive power is 65%. Based on total accuracies, the tested tools can be ranked in the following order from highest- to lowest-performers: KATE > ECOSAR >T.E.S.T. > Danish Q.D. > VEGA>Read Across>Trend Analysis. KATE shows the excellent performance as only five PCCs were predicted incorrectly.

In quantitative assessment based on comparison of the LC50 value of PCCs provided by models, the KATE and ECOSAR shows better performance with accuracies of 80 and 76%, respectively, when predictions fall within a factor 10 of the measured LC50. All of the models can achieve the accuracy of 80% when differences between measured and predicted toxicity within a factor 100, except for Trent Analysis was only 55%. From Coefficient of variance (R2) in both qualitative assessment and quantitative assessment, it can be further prove that KATE has the best performance.

Prediction to 42 NCs

In qualitative assessment based on classification into the four toxicity classes of the entire 42 NCs dataset, total accuracy and predictive power are decrease dramatically compare with to PCCs. Danish Q.D and KATE have 18 and 22 chemicals that could not be predicted, which are relative higher than other model. These indicate that the performance of models are poor to NCs, and predictive power to NCs is limited.

Model performance within AD

Robust and relevant AD definition is essential for model performance. Model performance within ADs is shown in Table 5.

Table 5 Model performance to 48 h-LC50 of daphnia for chemicals within each applicability domains

Prediction to 37 PCCs

ECOSAR has the most chemicals inside the AD, with 27 of the 37 PCCs. VEGA has the least chemicals inside the AD, with 10 of the 37 tested chemicals, showing a rigorous AD assessment mechanism.

In qualitative assessment, the accuracies of VEGA increased slightly from 51 to 60% after considering AD. T.E.S.T. kept at 64%. The accuracies of other five tools did not increase when inside the AD.

Accuracies and R2AD of Danish Q.D., Read Across and KATE after considering the AD are decreasing. Some PCCs with correct predicted were excluded as a results of outside the AD. Danish Q.D., Read Across and KATE assess the AD by the range of log10 Kow and structural classes, and the methods are not as rigorous as used by VEGA. Similar phenomena was also found by Melnikov et.al [43]. that KATE total accuracy decreased from 58 to 46% when analysis is limited to the compounds within its AD.

In quantitative assessment, performance of all tools is increase when inside the AD. VEGA shows the best performance with 100% accuracy when predictions fall within a factor 10 of the measured LC50. VEGA also has the lowest RMSE (0.48 log10 units) and highest R2AD (0.82). Read Across and Trent Analysis have the worst predictive ability from all of the indictors: accuracies, RMSE and R2AD.

In general, Based on the accuracies of quantitative assessment, the tested tools for daphnia can be ranked in the following order, from the highest to the lowest performers: VEGA> KATE > ECOSAR > T.E.S.T. > Danish Q.D > Trend Analysis > Read Across.

Prediction to 57 NCs

The number of NCs outside the AD and missing prediction are more for Danish Q.D, VEGA and KATE, except for ECOSAR and T.E.S.T. Accuracies inside AD of ECOSAR and Danish Q. D. are still high as same as in prediction to PCCs, whereas, T.E.S.T., VEGA and KATE are lower with accuracies of 29, 30 and 40%, respectively.

Figure 3 shows the error distribution of the daphnia toxicity predictions to PCCs and NCs with respect to under- and overestimation. Positive errors indicate predicted LC50 is above experimental LC50 and toxicity is underestimated. Considering the error of prediction between the log10 LC50 of the experimental value and the log10 LC50 of the estimated toxicity value provided by the model, over- and underestimation of daphnia by ECOSAR, T.E.S.T, Danish Q.D. and KATE are more or less similarly distributed. Daphnia toxicity predicted by VEGA appear to be overestimated, whereas, Read Across and Trent Analysis are underestimated significantly. Underestimated toxicity does not meet the principal of reasonable worst-case.

Fig. 3
figure3

Errors distribution (predicted – experimental) of daphnia toxicity categories. Positive errors indicate predicted LC50 is above experimental LC50 and toxicity was underestimated. Dataset of Read Across and Trend Analysis were based on PCCs, others were based on both PCCs and NCs. Mean is average error, SD is Standard Deviation, and N is number of chemicals.

Acute toxicity of fish

Experimental and predicted toxicity results to fish for the 37 PCCs are shown in Table 6, for the results of 86 NCs can be found in section of “Availability of data and materials”.

Table 6 Experimental and predicted toxicity results to fish for the 37 PCCs

Model performance across the entire test set

Models performance were first evaluated on the entire dataset regardless of the AD to assess the tool utility for any new or existing chemical. The performance metrics for all models tested in this evaluation to acute toxicity of fish are summarized in Table 7.

Table 7 Tool performance and comparison summary statistics to 96 h-LC50 of fish based on entire dataset

Prediction to 37 PCCs

In qualitative assessment based on predictive power of classification into the four toxicity categories of the entire dataset, all models besides ECOSAR are performance not well, with accuracies not more than 50%. ECOSAR has the highest predictive power, with accuracy of 54% and all of the 37 chemicals predicted. The performance of ECOSAR to fish is similar as well as to daphnia. The total accuracies followed are Danish Q.D., T.E.S.T. and VEGA, with the accuracy of 50, 49 and 47%, respectively. Read Across and Trend Analysis have the lowest total accuracies, which are same as the situation of prediction to daphnia. The total accuracy of KATE is only 36%, the performance to predict the toxicity of fish is far less than prediction to daphnia.

In quantitative assessment of comparison log10 LC50 of experiment value with predicted value, VEGA and T.E.S.T. shows excellent predicted ability as they can achieve the accuracy of 80% when the absolute deviation between predicted and experimental value is limited to 10 times. The performance is followed by KATE and ECOSAR when deviation is limited to 10 times, with the accuracy of 71 and 68%, respectively. The coefficient of variance also reflect the same tendency with accuracy.

Prediction to 86 NCs

In qualitative assessment based on classification into the four toxicity classes of the entire 86 NCs, total accuracies decreased comparing with prediction to PCCs. As T.E.S.T., Danish Q.D and KATE could not predict 25, 45 and 49 NCs, respectively, the predictive power of these three tools are lowest. Both total accuracy and predictive power of VEGA are about 20%, which are decrease dramatically compare with prediction to PCCs. ECOSAR has the highest total accuracy and Predictive power compare with others tools, however, it is still not high with accuracy of about 40%.

Model performance within the AD

Model performance within AD to fish toxicity is shown in Table 8.

Table 8 Tool performance to 96 h-LC50 of fish for chemicals within each applicability domains

Prediction to 37 PCCs

The number PCCs inside the AD of VEGA, Read Across and Trend Analysis is most, with 29, 31 and 30 tested chemicals, respectively. T.E.S.T. and KATE have the minimal number of chemical inside the AD.

In qualitative assessment based on classification into the four toxicity categories, ECOSAR, Danish Q.D. and VEGA have the highest performance, with R2AD of 0.66, 0.58 and 0.57 and accuracies of 55, 58 and 55%, respectively. The performance of tested tools for fish can be ranked in the following order, from the highest to the lowest performers: ECOSAR = Danish Q.D. = VEGA> T.E.S.T. > KATE > Read Across > Trend Analysis. The prediction Accuracies inside the AD is not significant improved in comparison to entire accuracy not considering the AD. This phenomenon is similar in prediction of daphnia.

In quantitative assessment, there are four models: VEGA, KATE, ECOSAR and T.E.S.T., with which the prediction accuracies are greater than 80% when the absolute error is limited to 10 times. VEGA reaches highest accuracy of 90%, with accuracy increased significantly after considering the AD. RMSE is a measure of accuracy, the lower of the RMSE, the higher of the predication accuracy. ECOSAR has the best RMSE (0.71 log10 units) and Trend Analysis has the worst (2.09 log units). All RMSEs of ECOSAR, T.E.S.T., VEGA and KATE are below 1 log10 scale, which are at same performance levels.

In general, based on the predictive power of quantitative assessment, the tested tools for fish can be ranked in the following order, from the highest to the lowest performers: VEGA > ECOSAR = KATE = T.E.S.T. > Danish Q.D > Read Across >Trend Analysis.

Prediction to 86 NCs

Accuracies inside AD of ECOSAR, T.E.S.T., Danish Q. D. and KAT are as same as prediction to PCCs. Whereas, Accuracy inside AD of VEGA to decreased from 55% for PCCs to 36% for NCs. The lower accuracy of VEGA’s prediction of NCs, probably because most of the measured results of SCs were non-toxic (LC50 > 100 mg/L), but when VEGA predicted, the lowest value of the 7 model included in VEGA was used and finally the probability of being predicted to be toxic category increased.

Figure 4 shows the distribution of the 96 h-LC50 fish toxicity predictions with respect to under- and overestimation. Positive errors indicate predicted LC50 is above experimental LC50 and toxicity is underestimated. Considering the error of prediction between the log10 LC50 of the experimental value and the log10 LC50 of the estimated toxicity value provided by the model, over- and underestimation of fish toxicities by Danish Q.D. are more or less similarly distributed. Fish toxicity predicted by ECOSAR, T.E.S.T, VEGA and KATE appear to be more often overestimated than underestimated, which meet the principal of reasonable worst-case.

Fig. 4
figure4

Errors distribution (predicted – experimental) of fish toxicity categories. Positive errors indicate predicted LC50 is above experimental LC50 and toxicity was underestimated. Datasets of Read Across and Trend Analysis were based on PCCs, others were based on both PCCs and NCs. Mean is average error, SD is Standard Deviation, and N is number of chemicals.

Discussion

Methods to assess AD

All models provide AD assessments that predictions fall inside or outside the AD of the models. Most of these models (ECOSAR, KATE, Read Across and Trent Analysis) assess the AD directly with the range of log10 Kow. In addition to log10 Kow, these models also consider the structural similarity. The ECOSAR package provides warnings when the model prediction is above the substance solubility limit or if the substance log10 Kow is outside the AD, it is helpful when non-professional application.

T.E.S.T. does not provide the AD of results directly. However, T.E.S.T has AD for each method and a final AD where predicitons must be made by at least 2 methods for a consensus value to be used .

Although there is no criterion to judge the validity or invalidity of the predicted data, predicted results within the AD are preferred. Although, the prediction accuracy inside the AD is not obviously improved compare to total accuracy not considering the AD in qualitative assessment, it improved significant in quantitative assessment.

There is no single and absolute AD assessment methods for a given model. Generally, the broader the definition of the AD, the lower the accuracies. This principle can be confirmed in the prediction of daphnia, in which the number of PCCs outside the AD and missing prediction are most by VEGA, however, the performance is best. In the quantitative evaluation within AD with the 10-fold factor, the accuracy of VEGA is the highest among all of the models, both to daphnia and to fish toxicity, with accuracy of 100 and 90%, respectively. The reason for the highest accuracy of VEGA prediction may be attributed to the detailed definition of the AD.

VEGA assess the AD with overall reliability, which is a relative complex mechanism. An overall reliability of the prediction is measured in a quantitative manner, whose value ranges from 1 to 0, by considering five factors, including Global AD Index, similar index of molecules with known experimental value, accuracy index of prediction for similar molecules, concordance index for similar molecules, index of Atom Centered Fragments similarity check. All those considerations are weighted and summed up into reliability of a model.

Difference between classification and quantitative assessment

The qualitative method has a certain randomness for the substances at the classification boundary point. Substances at the toxicity boundary point will be divided into two distinct toxicities class easily. Therefore, qualitative method with toxicity classification method to assess accuracy will be inferior to quantitative methods in terms of scientific significance. The current aquatic acute classification method is based on the 10-fold factor in toxicity values. The quantitative method with a 10-fold factor is similar to the toxicity classification method, but it overcomes the uncertainty of the boundary points and is more meaningful for accuracy evaluation. It can also be proven from the results that the accuracy of the quantitative method is higher than that qualitative method. Therefore, the results of quantitative method is a good indicator to assess the performance of tested tools.

Integrated assessment strategy when predicting the fish acute toxicity using VEGA

In the quantitative evaluation to prediction both daphnia and fish toxicity inside the AD, VEGA performs very well with the highest accuracy. However, there are seven models can be used to predict the fish acute toxicity in VEGA. Some confuse existing even if internal reliability is given. For example, several models may give the same liability with different AD index. And SarPy/IRFMN model is a classification model, it will give a toxicity class instead of toxicity value. Therefore, it is crucial to choose the most rational value of different models, and to use the toxicity class provided by SarPy/IRFMN model in quantitative effect assessment.

In order to make full advantage of VEGA, we proposed an integrated assessment strategy for fish acute toxicity, as shown in Fig. 1. This integrated assessment strategy were used in this study except that experimental values were not used, and it is prove to be useful.

  • Step 1: if experimental value exist, it should be used, else go to step 2.

  • Step 2: if reliability shows 3 stars with all ADI =1, it should be used, else go to step 3 at the following case:

    • -If more than 1 models have 3 stars, or.

    • -If models have only 2 stars or 1 star.

  • Step 3: if it has a highest global ADI, it should be priority used, else go to step 4.

  • Step 4: if the other ADI outperforms the others models, it should be priority used.

Notes: (1) A lowest toxicity value should be used when all ADIs are same; (2) Toxicity class given by SarPy/IRFMN model is transformed to lower limit, if needed. e.g. transformed the toxic-3 (between 10 and 100 mg·L− 1) to 10.1 mg·L− 1.

QSAR vs Chemical category approach

ECOSAR, KATE, T.E.S.T. Danish Q.D and some of models in VEGA belong to QSAR methods. Both Read Across and Trent Analysis method are category approach. QSAR models and category approach method have similarities and differences.

In QSAR Toolbox, application strategy of Read across, Trend analysis and QSAR models is addressed. Read across is recommended for “qualitative” (e.g. skin sensitisation or mutagenicity) or “quantitative endpoints” (e.g., 96 h-LC50 for fish) if only a low number of analogues with experimental results are identified. Trend analysis is the appropriate data-gap filling method for “quantitative endpoints” (e.g., 96 h-LC50 for fish) if a high number of analogues with experimental results are identified. QSAR models can be used to fill a data gap if no adequate analogues are found for a target chemical.

The issue of chemical-to-chemical similarity is not directly present in the case of QSAR models. In the case of QSAR models, the target chemical is in some way compared with the whole population of chemicals as the basis of the model, and this is addressed within the AD of the model. Thus, the comparison is done not between one chemical and another, or a few others, as in the category approach, but with the whole set of compounds used for the model.

The overall structure of the SAR models model is like a collection of read across models, with similarity structure or fragment are collect and statistic. Identification of similarity structure in QSAR models is completed automatically. The evaluation of similar compound(s) in case of category approach is often done manually, typically done by the expert, which is quite subjective.

The accuracies of Read Across and Trend Analysis method are lowest among of tested tools. Read Across may be used when there are experimental data from high quality databases for one or more substances which are similar enough to the target chemical of interest. It is difficult to assess the quality of experimental data. Predictions applied in this research were based on category on organic functional groups, and standardized workflow in QSAR Toolbox. However, Trend Analysis can be further refined by subcategorization, such as elimination of analogues, which are dissimilar to the target chemical with respect to have same mode of action or same elements. Expert judgement always used when removing outliers. Each expert is guided by his or her past experience, pieces of information may escape her or his knowledge, the weight assigned to each element of evidence and value may be different, and expressed in a subjective way, such as likely, plausible, reasonable, level of concern, etc. and hence often difficult to replicate. Besides, the category approach is typically not so strictly formalized, depending on the similar chemicals data existing in internal database [44].

A case study is shown in Fig. 5 that fish 96-h LC50 to 2,4,6-tri-tert-butylphenol was predicted using Trent Analysis. Figure 5a is the case that using standardized workflow in QSAR Toolbox without any manually disruption. An outlier can be judged easily. However, after deleting that obvious outlier, the result is still uncertain on how to refining shown in Fig. 5b. Thus, professional judgement require by chemical category methods limit application in regulation purpose, especially in high throughput screening in risk assessment. QSAR Toolbox also allows some different category methods, such as acute aquatic toxicity classification by ECOSAR, acute aquatic toxicity Mode of Action by OASIS, acute aquatic toxicity classification by Verhaar (Modified). Thus, performance of these category methods need further assessment, and they shall be used limiting in experts. At the same time, more intelligence technologies, such as artificial intelligence shall apply in category approach.

Fig. 5
figure5

Case study on predicting fish 96-h LC50 to 2,4,6-tri-tert-butylphenol using Trent Analysis. (a Using standardized workflow in QSAR Toolbox without any manually disruption, b Using standardized workflow after deleting an obvious outlier substance)

PCCs that were incorrect predicted frequently

There are two PCCs, which daphnia toxicity were predicted incorrectly by more than 2 models (Table 9). The water solubility of anthracene is 0.047 mg·L− 1, which is lower than experimental LC50 value of 0.0356 mg·L− 1, indicating that experimental LC50 value may be tested incorrectly. There was only one experimental data of anthracene, so the acute toxicity to daphnia needs further testing.

Table 9 The PCCs that daphnia toxicity were predicted incorrectly by more than 2 models

The experimental LC50 value to daphnia used to validate of dibutyl phthalate is 0.5 mg·L− 1, which was evaluated and accepted by ECHA. However, values are range from 1.4 to 3.7 mg·L− 1 gathered from database of these models. Predicted LC50 value of dibutyl phthalate from T.E.S.T, Danish Q.D, Read Across and Trend Analysis is 6.61, 17.5, 6.68 and 73.6 mg·L− 1, respectively. Therefore, it is the experiment value difference causing the “incorrectly prediction” to dibutyl phthalate by T.E.S.T, Danish Q.D and Read Across. Trend Analysis will still give a value that exceed to 10 times difference to experimental value, which performances not well.

For the acute toxicity of fish, according to the evaluation criterion that the difference between the experimental value and the predicted value is 10 times, there are 6 substances that more than 3 models predicted incorrectly, shown in Table 10.

Table 10 The PCCs that fish toxicity were predicted incorrectly by more than 2 models

Among them, five substance have low water solubility of below 1 mg·L− 1. In principle, the experimental LC50 value of a substance should be lower than its water solubility. The water solubility of musk xylene, 2,4,6-tri-tert-butylphenol and bis(2-ethylhexyl) phthalate, show no significant difference to experimental LC50 value. Water solubility of heptadecafluorooctanesulfonic acid and pentadecafluorooctanoic acid is much lower than experimental LC50 value, indicating an incorrect experimental data. In fact, substance with low water solubility is classed as “difficult to test”, the aquatic toxicity of these difficult substance were often testing improperly even at GLP condition. Hence, the special caution should be given to this low water solubility substance when developing models. Meanwhile, uncertainly of models when validation and comparison of these PCCs, with low water solubility. As a result, some of the differences between model predictions and measured toxicity values can be partially attributed to the measured toxicity values themselves being less-than-perfect indicators of true toxicity. The errors associated with the measured toxicity values, however, should not affect our conclusions regarding the relative performance of the tested models (their rank orders), particularly in the common PCCs comparison, because all models are being evaluated against the same measured toxicity values.

Danish Q.D. predicted large errors to heptadecafluorooctanesulfonic acid, perfluoro-1-octanesulfonyl fluoride, potassium perufluorooctane sulfonate, pentadecafluorooctanoic acid, with which all LC50 value are above 100,000 mg·L− 1. There are two models in Danish Q.D: Leadscope and SciQSAR. As a case to predict Heptadecafluorooctanesulfonic acid, Leadscope predict a 0.00636 mg·L− 1, that is much closer to its water solubility of 0.10 mg·L− 1 than SciQSAR with predicted value of 354,065 mg·L− 1. This situation is similar in prediction of Perfluoro-1-octanesulfonyl fluoride, Potassium perufluorooctane sulfonate, Pentadecafluorooctanoic acid. Therefore, the SciQSAR model in Danish Q.D. is note suite for estimate the fish acute toxicity of perfluorinated compounds.

There are 54 experimental 96 h- LC50 fish values of benzene ranging from 5.3 mg·L− 1 to 542 mg·L− 1 collected in QSAR Toolbox, covering 21 fish species within the Actinopterygii class. As many factors affect the experimental results, such as test method, test conditions, species, or even the experience dealing with difficult substance.

It is difficulty to select a fish species to compare the models performance, as the fish species in tanning data of some model are not deterministic. Hence, this single point comparison method has some limitation when more than one experiment data exist. Therefore, we suggest that distribution of multiple data other than single value should be consider when developing in silico models.

Analysis to Groups of NCs that were incorrect predicted frequently

The functional groups of NCs with more than three model prediction incorrectly were analyzed. Among them, the functional groups with more than 2 occurrences are shown in Table 11.

Table 11 Groups in NCs that were incorrect predicted frequently and the number of occurrences (≥2)

Of the 42 NCs in the daphnia toxicology prediction, 14 substances were simultaneously incorrect predicted by more than 3 models. The most frequently predicted functional groups are aryl, aryl halide, and aromatic amine.

Of the 86 NCs in the fish toxicology prediction, 40 substances were simultaneously incorrect predicted by more than 3 models. The most frequently predicted functional groups are aryl, aromatic amine, organic amide and thioamide, alkyl (hetero)arenes, ketone, diketone, aryl halide, ether moiety, alkane branched with secondary carbon.

So these function groups should be pay more attention when developing in silico tools.

Outlook

In silico tools are developed based on existing information to hazard. However, over 350,000 chemicals and mixtures of chemicals have been registered for production and use [1]. These chemicals consisted various type of chemicals. As science and technology advances, the chemicals synthetic or prepared chemicals are more and more complicated. Existing in silico tools have note covered all type of chemicals. It is expect that most of chemicals registered or used are not testing for their hazards, and hence no abundant data to support the development of in silico tools. Besides, in silico tools developed are most focus on individual compounds, it is difficulty to identified hazard of a number of mixtures, polymers and UVCBs, the number of which is over 75,000 [1].

So, testing is still needed whether it is used to identify chemical hazards or to provide more information to develop in silico tools. In silico tools are also need continuous development to accuracy, and expansion to AD of various substance, such as mixtures, polymers and UVCBs.

Conclusion

In this study, the performance of seven in silico methods (ECOSAR, T.E.S.T., Danish Q. D., VEGA, KATE, Read Across and Trend Analysis) for acute aquatic toxicity to daphnia and fish was evaluated and compared using PCCs and NCs datasets.

In the quantitative evaluation of PCCs with the criteria of 10-fold difference between experimental value and estimated value, the accuracy of VEGA is the highest among all of the models, both in prediction of daphnia and fish acute toxicity, with accuracy of 100 and 90% after considering AD, respectively. The performance of KATE, ECOSAR and T.E.S.T. is at the similar level, with the accuracies are slight lower than VEGA. The accuracies of Danish Q.D. is lowest among above tools within them QSAR is the main mechanism. The performance of Read Across and Trent Analysis is lowest among all of the tested in silico tools by standardized workflow of QSAR Toolbox, indicating that chemical category approach shall limited in expert use at this stage. The main factor affects the accuracies of in silico tools may be the distribution of multiple experimental data, and the accuracies of experimental values for PCCs with poorly water solubility.

The performance of models to NCs that are much more complex are not as well as to PCCs, indicating in silico tools are also need continuous development. Testing is still needed whether it is used to identify hazards of NCs or to provide more information to develop in silico tools.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author or from: https://pan.baidu.com/s/19I6oMJDAhMDw2eatJ6EViA, with Extracted code: t392.

Abbreviations

ACF:

Atom-Cantered Fragments

AD:

Applicability domain

ADI:

Applicability Domain Index

ASTER:

Assessment Tools for the Evaluation of Risk

Danish Q. D.:

Danish QSAR Database

ECHA:

European Chemical Agency

ECOSAR:

Ecological Structure Activity Relationships

EPISuite:

Estimation Programs Interface Suite

GHS:

The Globally Harmonized System of Classification and Labelling of Chemicals

GLP:

Good Laboratory Practice

KATE:

Kashinhou Tool for Ecotoxicity

K ow :

Octanol water partition coefficient

LC50 :

Median lethal concentration

MEP:

Ministry of Environment Protection

NCs:

New Chemicals

OASIS:

Optimized Approach Based on Structural Indices Set

OECD:

Organization for Economic Co-operation and Development

PBT/vPvB:

Persistence, bioaccumulation, and toxicity/very persistent and very bioaccumulative

PCCs:

Priority Controlled Chemicals in China

PLS:

Partial least squares

QSAR:

(Quantitative) Structure–Activity Relationships

R 2 :

The correlation coefficient

R 2 AD :

Correlation coefficient of the AD

REACH:

Registration, Evaluation, Authorization and Restriction of Chemicals

RMSE:

Root mean square error

SIMLES:

Simplified Molecular Input Line Entry System

T.E.S.T.:

Toxicity Estimation Software Tool

UVCBs:

Chemical Substances of Unknown or Variable Composition, Complex Reaction Products and Biological Materials

VEGA:

Virtual models for property Evaluation of chemicals within a Global Architecture

References

  1. 1.

    Wang Z, Walker GW, Muir DCG, Nagatani-Yoshida K. Toward a global understanding of chemical pollution: a first comprehensive analysis of national and regional chemical inventories. Environ Sci Technol. 2020;54(5):2575–84.

    CAS  Article  Google Scholar 

  2. 2.

    EC: Directive 2010/63/EU of the european parliament and of the council of 22. September 2010 on the protection of animals used for scientific purposes. In: Official Journal of the European Union. vol. L276; 2010: 33–76.

  3. 3.

    Sanderson H, Solomon K. Contaminants of emerging concern challenge ecotoxicology. Environ Toxicol Chem. 2009;28(7):1359–60.

    CAS  Article  Google Scholar 

  4. 4.

    Council NR. A framework to guide selection of chemical alternatives. Washington, DC: The National Academies Press; 2014.

    Google Scholar 

  5. 5.

    ECHA. The use of alternatives to testing on animals for the REACH regulation, European chemicals agency, third report under article 117(3) of the REACH regulation. Helsinki: European Chemicals Agency; 2017.

    Google Scholar 

  6. 6.

    Voutchkova AM, Osimitz TG, Anastas PT. Toward a comprehensive molecular design framework for reduced Hazard. Chem Rev. 2010;110(10):5845–82.

    CAS  Article  Google Scholar 

  7. 7.

    Myatt GJ, Ahlberg E, Akahori Y, Allen D, Amberg A, Anger LT, Aptula A, Auerbach S, Beilke L, Bellion P, et al. In silico toxicology protocols. Regul Toxicol Pharmacol. 2018;96:1–17.

    CAS  Article  Google Scholar 

  8. 8.

    OECD. Guidance on grouping of chemicals, second edition, OECD series on testing and assessment, no. 194. Paris: OECD Publishing; 2017.

    Google Scholar 

  9. 9.

    OECD. Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models, OECD series on testing and assessment, no. 69. Paris: OECD Publishing; 2014.

    Google Scholar 

  10. 10.

    Cardoso-Silva J, Papageorgiou LG, Tsoka S. Network-based piecewise linear regression for QSAR modelling. J Comput Aided Mol Des. 2019;33:831–44.

    CAS  Article  Google Scholar 

  11. 11.

    Toropov AA, Raska I Jr, Toropova AP, Raskova M, Veselinovic AM, Veselinovic JB. The study of the index of ideality of correlation as a new criterion of predictive potential of QSPR/QSAR-models. Sci Total Environ. 2019;659:1387–94.

    CAS  Article  Google Scholar 

  12. 12.

    Lombardo A, Roncaglioni A, Benfenati E, Nendza M, Segner H, Jeram S, Pauné E, Schüürmann G. Optimizing the aquatic toxicity assessment under REACH through an integrated testing strategy (ITS). Environ Res. 2014;135:156–64.

    CAS  Article  Google Scholar 

  13. 13.

    Benfenati E, Diaza RG, Cassano A, Pardoe S, Gini G, Mays C, Knauf R, Benighaus L. The acceptance of in silicomodels for REACH: requirements, barriers, and perspectives. Chem Central J. 2011;5(1):58.

    CAS  Article  Google Scholar 

  14. 14.

    Feher M, Ewing T. Global or local QSAR: is there a way out? QSAR Combinatorial Sci. 2009;28(8):850–5.

    CAS  Article  Google Scholar 

  15. 15.

    Gramatica P. Principles of QSAR models validation: internal and external. QSAR Combinatorial Sci. 2007;26(5):694–701.

    CAS  Article  Google Scholar 

  16. 16.

    Nendza M, Muller M, Wenzel A. Discriminating toxicant classes by mode of action: 4. Baseline and excess toxicity. SAR QSAR Environ Res. 2014;25(5):393–405.

    CAS  Article  Google Scholar 

  17. 17.

    Cronin MTD, Schultz TW. Pitfalls in QSAR. J Mol Struct THEOCHEM. 2003;622(1):39–51.

    CAS  Article  Google Scholar 

  18. 18.

    Sheffield TY, Judson RS. Ensemble QSAR modeling to predict multispecies fish toxicity lethal concentrations and points of departure. Environ Sci Technol. 2019;53(21):12793–802.

    CAS  Article  Google Scholar 

  19. 19.

    Ding F, Wang Z, Yang X, Shi L, Liu J, Chen G. Development of classification models for predicting chronic toxicity of chemicals to Daphnia magna and Pseudokirchneriella subcapitata. SAR QSAR Environ Res. 2019;30(1):39–50.

    CAS  Article  Google Scholar 

  20. 20.

    Fan D, Liu J, Wang L, Yang X, Zhang S, Zhang Y, Shi L. Development of quantitative structure-activity relationship models for predicting chronic toxicity of substituted benzenes to Daphnia magna. Bull Environ Contam Toxicol. 2016;96(5):664–70.

    CAS  Article  Google Scholar 

  21. 21.

    Kluver N, Bittermann K, Escher BI. QSAR for baseline toxicity and classification of specific modes of action of ionizable organic chemicals in the zebrafish embryo toxicity test. Aquat Toxicol. 2019;207:110–9.

    Article  Google Scholar 

  22. 22.

    Jia Q, Zhao Y, Yan F, Wang Q. QSAR model for predicting the toxicity of organic compounds to fathead minnow. Environ Sci Pollut Res Int. 2018;25(35):35420–8.

    Article  Google Scholar 

  23. 23.

    Mayo-Bean K, Moran K, Meylan B, Ranslow P. Methodology document for the ECOlogical structure-activity relationship model (ECOSAR) class program. Washington DC: US-EPA; 2012.

    Google Scholar 

  24. 24.

    EPA: User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool): A Program to Estimate Toxicity from Molecular Structure. In. Cincinati, Ohio: U.S. Environme ntal Protection Agency 2016.

  25. 25.

    Furuhama A, Toida T, Nishikawa N, Aoki Y, Yoshioka Y, Shiraishi H. Development of an ecotoxicity QSAR model for the KAshinhou tool for Ecotoxicity (KATE) system, march 2009 version. SAR QSAR Environ Res. 2010;21(5–6):403–13.

    CAS  Article  Google Scholar 

  26. 26.

    Benfenati E, Manganaro A, Gini GC: VEGA-QSAR: AI inside a platform for predictive toxicology. In: Proceedings of the workshop popularize artificial intelligence co-located with the 13th conference of the Italian Association for Artificial Intelligence (AIxIA 2013): 2013; Turin, Italy. 21–28.

  27. 27.

    DTU: User Manual for the Danish (Q)SAR Database. In.: National Food Institute, DTU; 2018.

  28. 28.

    OECD. The guidance document for using the OECD (Q)SAR application toolbox to develop chemical categories according to the OECD guidance on grouping chemicals, OECD series on testing and assessment, no. 102. Paris: OECD Publishing; 2014.

    Google Scholar 

  29. 29.

    Pizzo F, Lombardo A, Manganaro A, Cappelli CI, Petoumenou MI, Albanese F, Roncaglioni A, Brandt M, Benfenati E. Integrated in silico strategy for PBT assessment and prioritization under REACH. Environ Res. 2016;151:478–92.

    CAS  Article  Google Scholar 

  30. 30.

    Gramatica P, Papa E, Sangion A. QSAR modeling of cumulative environmental end-points for the prioritization of hazardous chemicals. Environ Sci Process Impacts. 2018;20(1):38–47.

    CAS  Article  Google Scholar 

  31. 31.

    Moore DRJ, Breton RL, MacDonald DB. A comparison of model performance for six quantitative structure-activity relationship packages that predict acute toxicity to fish. Environ Toxicol Chem. 2003;22(8):1799–809.

    CAS  Article  Google Scholar 

  32. 32.

    Golbamaki A, Cassano A, Lombardo A, Moggio Y, Colafranceschi M, Benfenati E. Comparison of in silico models for prediction of Daphnia magna acute toxicity. SAR QSAR Environ Res. 2014;25(8):673–94.

    CAS  Article  Google Scholar 

  33. 33.

    Cassotti M, Consonni V, Mauri A, Ballabio D. Validation and extension of a similarity-based approach for prediction of acute aquatic toxicity towards Daphnia magna. SAR QSAR Environ Res. 2014;25(12):1013–36.

    CAS  Article  Google Scholar 

  34. 34.

    MEP-China. List of Priority Controlled Chemicals (The First Batch). Beijing: MEE,China; 2017.

    Google Scholar 

  35. 35.

    MEP-China. List of Priority Controlled Chemicals (The Second Batch)(Draft for Comment). Beijing: MEE; 2020.

    Google Scholar 

  36. 36.

    OECD: Test no. 202: Daphnia sp. Acute Immobilisation Test; 2004.

  37. 37.

    Results of eco-toxicity tests of chemicals conducted by Ministry of the Environment in Japan (- March) [http://www.env.go.jp/chemi/sesaku/02e.pdf].

  38. 38.

    Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE, Drummond RA. Predicting modes of toxic action from chemical structure: acute toxicity in the fathead minnow (Pimephales promelas). Environ Toxicol Chem. 1997;16(5):948–67.

    CAS  Article  Google Scholar 

  39. 39.

    Gramatica P, Pilutti P: Evaluation of different statistical approaches for the validation of quantitative structure-activity relationships. Ispra, Italy, The European Commission-Joint Research Centre. Institute for Health and Consumer Protection–ECVAM 2004.

  40. 40.

    Jaworska JS, Comber M, Auer C, Leeuwen CJV. Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspect. 2003;111(10):1358–60.

    Article  Google Scholar 

  41. 41.

    UN. Globally harmonized system of classification and Labelling of chemicals (GHS), , eighth revised edition edn. New York and Geneva: United Nations; 2019.

    Google Scholar 

  42. 42.

    Nations U. Globally harmonized system of classification and labelling of chemicals. United Nations: New York and Geneva; 2011.

    Google Scholar 

  43. 43.

    Melnikov F, Kostal J, Voutchkova-Kostal A, Zimmerman JB, Anastas T. P: assessment of predictive models for estimating the acute aquatic toxicity of organic chemicals. Green Chem. 2016;18(16):4432–45.

    CAS  Article  Google Scholar 

  44. 44.

    Benfenati E, Roncaglioni A, Petoumenou MI, Cappelli CI, Gini G. Integrating QSAR and read-across for environmental assessment. SAR QSAR Environ Res. 2015;26(7–9):605–18.

    CAS  Article  Google Scholar 

Download references

Acknowledgments

Not applicable.

Funding

This work was funded by the National Key Research and Development Program of China (No. 2018YFC1801504), and the Central Scientific Research Projects for Public Welfare Research Institutes (GYZX200102). Funders played no role in the design of the study or collection, analysis, or interpretation of data and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

LJZ compared the data. DLF and GXJ analyzed the data, and was a major contributor in writing the manuscript. WG researched the in silico tools. JJL and MQL revised the manuscript. ZW and WY predicted the chemical using models. YHX and LLS contributed to design of the study. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yanhua Xu or Lili Shi.

Ethics declarations

Ethics approval and consent to participate

Permission was obtained for datasets used from the owner of Key Lab of Pesticide Environmental Assessment and Pollution Control, MEE. All fish and daphnia experiments by the Key Lab were approved by ethics committee of Nanjing Institute of Environmental Sciences, MEE.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, L., Fan, D., Yin, W. et al. Comparison of seven in silico tools for evaluating of daphnia and fish acute toxicity: case study on Chinese Priority Controlled Chemicals and new chemicals. BMC Bioinformatics 22, 151 (2021). https://doi.org/10.1186/s12859-020-03903-w

Download citation

Keywords

  • QSAR
  • Category
  • Aquatic toxicity
  • Daphnia
  • Fish
  • In silico
\