Comparison of Seven in Silico methods for Evaluating of Ecotoxicological Acute Toxicity of Daphnia magna and Pimephales promelas: Case study on Chinese Priority Controlled Chemicals

Background: The acute toxicity on aquatic organisms are indispensable parameters in the ecological risk assessment priority chemical screening process (e.g. persistent, bioaccumulative and toxic chemicals). Currently, a number of predictive models for aquatic toxicity are available, however, the accuracy of in silico tools in priority assessment and risk assessment still remains to be further studied. Herein, this study evaluated the performance of seven Quantitative Structure–Activity Relationship (QSAR) in silico methods (Danish QSAR Database, Ecological Structure Activity Relationships, KAshinhou Tool for Ecotoxicity on PAS, Toxicity Estimation Software Tool, QSAR Toolbox, Read Across, and Virtual models for property Evaluation of chemicals within a Global Architecture) for assessing acute aquatic toxicity to Daphnia magna and Pimephales promelas using the first batch list of Priority Controlled Chemicals in China. Results: Based on the values for the median lethal dose and the US Environmental Protection Agency’s acute aquatic toxicity categories of concern, the acute toxicity grade was classified into six categories. According to the comparative prediction results, the accuracy of the Daphnia magna toxicity categories prediction was 25%– 56%, the correlation coefficient ranged from 0.1236 to 0.6349, and the correlation coefficients of the applicability domain were 0.040 and 0.5148. The corresponding values for the Pimephales promelas toxicity categories prediction were 22%–44%, 0.1495–0.4144, 0.2156 and 0.6793. Conclusion: As the structure of chemicals of first batch list of Priority Controlled Chemicals in China are complex, the accuracy of model prediction is low, which depends on the quality of the constructed model and application domain. Although

in silico methods can be used to preliminarily estimate aquatic toxicity, experimental data validation is still required for prioritizing environmental hazards assessments and risk assessments.

Background
Global regulations have called for systematic testing of potential environmental contaminants to protect human health and the environment from exposure to anthropogenic chemicals, such as industrial chemicals and pharmaceuticals [1].
Considering the ever-increasing number of chemicals, more than 140000 chemicals used in the global market currently, are presenting challenges to traditional ecotoxicity testing strategies for in vivo experiments, which are expensive, timeconsuming, and reliant on large number of animal subjects. Therefore, it is virtually impossible to determine acute toxicity for all the chemicals used globally, especially in light of new EU legislation to phase out animal testing [2]. The National Research Council (NRC) and global regulations, including the European Chemical Agency's REACH initiative, the U.S. Toxic Substances Control Act (TSCA), and the Canadian Environmental Protection Act (CEPA), are encouraging increased reliance on in silico approaches to mitigate the challenges associated with in vitro and in vivo toxicity testing approaches [3][4][5][6].
With regard to the risk assessment of chemicals, the use of Quantitative Structure-Activity Relationship (QSAR) models, which relate chemical molecular structures with their physicochemical properties and environmental behavioural parameters, in the absence of toxicity data, toxicities relating to hazard identification can be assessed at minimal computational costs [7]. The cost-benefit advantages of in silico methods and regulatory support for them have led to the development of a number of ecotoxicity assessment tools [8]. Such tools include the Ecological Structure In 2007, the OECD guidelines on the development and validation of QSAR models were issued. They proposed that a QSAR model for practical application should be associated with an unambiguous algorithm [11], a defined endpoint, an AD, appropriate goodness-of-fit measures, robustness as well as predictive ability, and a mechanistic interpretation, if possible [12,13]. A number of studies developed QSAR models for the endpoint of acute toxicity to Daphnia magna and Pimephales promelas [14][15][16][17][18]. Based on models for specific chemical classes and different classes of substances, Golbamaki,et al and Moore,et al have compared the performance of some QSAR models for acute toxicity [19,20], but not systematic and overall. Despite these guidelines, lack of external validations and model performances of the test sets, model overfitting, and poor AD definitions remain major concerns [21][22][23][24]. A clear AD definition would ensure that the model assumptions are met [25,26]. Compared to physicochemical QSARs, previous validation efforts neglected to conduct a strictly external validation, relied on small data sets, or evaluated one tool at a time. The results of such efforts suggested that model accuracy for aquatic toxicity endpoints decreases during validation [27]. Thus, to ensure practical utility, we validated specific acute toxicity in silico models using an external testing set [28].
To implement the regulatory requirements of the "Action Plan for Prevention and Control of Water Pollution," the Ministry of Ecological Environment, Ministry of Industry and Information Technology, and Health Planning Commission (China) jointly issued the List of Priority Controlled Chemicals (the first batch) at the end of 2017 [29,30]. This list is presented in Table 1.

Predictive tools
The following seven in silico methods were evaluated for predicting acute aquatic toxicity to Daphnia magna and Pimephales promelas: ECOSAR, T.E.S.T., Danish QSAR Database, VEGA, KATE, QSAR Toolbox, and Read Across. A brief description of each program is provided below, and the pertinent details are summarized in Table 2.  [32]. KATE is trained on the US EPA fathead minnow (Pimephales promelas) and the Japanese Ministry of Environment Oryzias latipes datasets [33].
The tool is available as a standalone application or as a web plug-in.  [34]. The optimal descriptor set, determined by the genetic algorithm, is used to characterize the toxicity of the chemicals [35]. The data from the suitable cluster is used to make predictions for test compounds. Each Read Across or regression model has a specific AD and structural similarity coefficient. The program provides estimated LC 50 thresholds based on each model's prediction as well as the most accurate estimate of the component model [36].
VEGA estimates physicochemical, environmental, ecotoxicological, and toxicological properties using the Read Across method. The six most similar compounds are identified using the chemical similarity algorithm [37]. VEGA provides a simple way for Read Across using a similarity search, which is also used to evaluate the reliability of the model prediction. VEGA performs a sophisticated procedure to evaluate the reliability of the model; it refers to aspects of the AD of the model (such as the similarity of the most similar compounds, descriptor space, descriptor sensitivity, outliers based on specific fragments, identification of the presence of rare fragments, and accuracy of the prediction assessment). The prediction results and applicability domain assessment are outputted.
Danish QSAR Database provides estimates for more than 600,000 chemicals in over 200 QSAR models by sorting with regard to chemical similarity to facilitate Read Across groupings. The database was developed at the Technical University of Denmark. The endpoints are modelled in three software systems (Leadscope, CASE Ultra, and SciQSAR), and an overall battery prediction is made to reduce "noise" from the individual model estimates and thereby improve accuracy and broaden the AD. The prediction results and AD judgment are outputted [38].
OECD QSAR Toolbox v4.2 finds structurally and mechanistically defined analogues and chemical categories, which serve as sources for Read Across and QSAR for filling in data gaps. The prediction results and AD assessment are outputted.
Toolbox has multiple functions, such as identifying analogues of a chemical, retrieving the existing experimental results of those analogues, and filling in data gaps through Read Across or QSAR, classifying a large number of existing chemicals using the mechanism or behaviour model, using a QSAR model to fill in the data gaps for chemicals, evaluating the robustness of a potential analogue with Read Across, evaluating the applicability of a (Q)SAR model for a target compound to fill in the missing data, and establishing the QSAR model.

Results
External validation results for acute toxicity of  Table 3.

External validation results for acute toxicity of Pimephales promelas
The external validation results for the acute toxicity of Pimephales promelas are shown in Table 3. Based on predictive power of classification into the six toxicity categories of the entire data set, the tested tools for Pimephales promelas can be ranked in the following order from the highest to the lowest performers: QSAR  The deviation (Fig. 2)

Discussion
Chapter R.6 of the "Guidance on Information requirements and Chemical Safety Assessment" is devoted to QSAR and the grouping of chemicals [41] does not provide a list of QSAR tools that have regulatory acceptance. However, it does provide the criteria that must met before a QSAR tool can be accepted in the regulatory context. These criteria refer to the OECD Setubal Principles for (Q)SAR Validation [42,43]. The results show that the accuracies of four categories of acute toxicity are significantly higher than six categories of acute toxicity (see Table 5).  [44]. By contrast, assessing toxicity mechanisms is complex with relatively few experimental data [45].
Moreover, lack of information about the quality of the prediction, external validation, and ADs increases the uncertainty of the model accuracy (e.g. ECOSAR and KATE). VEGA, Read Across, and the Danish QSAR Database provide predictions that fall inside or outside the AD of the models. There is no single and absolute AD for a given model. Generally, the broader the definition of the AD, the lower the predictivity. The AD should be clearly defined, and the validation results should correspond to this defined domain, which is used again when the model is applied for the predictions [46,47]. Although there is no criterion to judge the validity or invalidity of the predicted data, predicted results within the AD are preferred.
Measured data are thus valuable for assessing toxicities of priority controlled chemicals. And the alternative test systems or endpoints, for instance using fish embryos, may allow reduction or replacement of the the fish early-life stage test or prioritize compounds for conduction of the FELS test [35].  Table 5 Summary of the accuracies of the predictive tools analysed in this work.

Conclusion
With regard to ecological risk assessments of organic chemicals, QSAR models play an important role in filling the data gaps of toxicity endpoints, decreasing experimental expenses, reducing and replacing actual testing (especially animal testing), and assessing the uncertainty of experimental data. In this study, the   Number of correct classifications for acute toxicity predictions of Pimephales promelas.