Skip to main content

Table 3 Benchmark description

From: geneRFinder: gene finding in distinct metagenomic data complexities

Dataset name

Genomes

Sequences

CDS

Description

Training1

20

108,004

54,002

First training set

Validation

5

19,337

14,016

Validation set used to setup parameters of model built

Training2

129

712,886

356,443

Second training set used to build final model

Test1

12

54,980

31,507

First test set used to evaluate geneRFinder

Test2low

40

255,589

41,068

Data extracted from low complexity metagenomic (CAMI)

Test2medium

132

347,642

57,894

Data extracted from medium complexity metagenomic (CAMI)

Test2high1

160a

200,000

34,640

Data extracted from high complexity metagenomic (CAMI) (sample 01)

Test2high2

156a

200,000

34,445

Data extracted from high complexity metagenomic (CAMI) (sample 02)

Test2high3

157a

200,000

34,486

Data extracted from high complexity metagenomic (CAMI)(sample 03)

  1. aThe estimated number of genomes was obtained by taxonomic analysis performed by the Kaiju tool [39]