Test set and Source | Number of random formulas | Mass range [Da] | target DB top hit [%] | PubChem top hit [%] | PubChem false top hit [%] | no DB query top 3 hits [%] |
---|---|---|---|---|---|---|
Pharmaceuticals (DrugBank) | 2400 | 30–1093 | 99 | 90 | 8 | 78 |
Natural Products (DNP) | 1200 | 92–2020 | 99 | 84 | 10 | 81 |
Toxic Chemicals (TSCA) | 1200 | 56–2170 | 98 | 87 | 8 | 78 |
Unknowns taken from Wiley+NIST | 1200 | 150–1536 | - | - | 78 | 65 |