Skip to main content

Table 1 DUGMO results for two B. subtilis genomes (i.e., Data1, Data2) and three E. coli genomes

From: DUGMO: tool for the detection of unknown genetically modified organisms with high-throughput sequencing data for pure bacterial samples

 

Machine Learning

 

Learning data

Prediction data

DUGMO final results

Number of host genome CDSs (1)

Number of known GMO CDSs (2)

Number of potential GM CDSs (1)

True positives (3)

False positives (3)

Max false negatives (3) (4)

True negatives (3)

GM B. subtilis (Data1)

4102

2714

39

25

0

12

2

Wild type B. subtilis

(Data2)

3941

2724

4

0

0

E. coli with genes of A. tumefaciens

4033

2588

6

5

0

0

1

E. coli with genes of M. tuberculosis

4018

2589

5

4

0

0

1

E. coli with genes of S. pyogenes

4015

2587

6

5

1

0

0

  1. (1) After two BLASTN alignments on pangenomes without RNA. (2) After filtering out CDSs of the known GMO databank that are too close to the host species (paragraph 3 of section 2.2). (3) In “potential GM inserts”. The “DUGMO final results” column details the results obtained after combining the results of the RF and logit methods, using the data from the learning data and prediction data columns. (4) Estimation of the maximum number of false negatives: the true number cannot be deduced because of the unknown origins of CDSs, potentially from the Bacillales family. (-) Does not apply