Table 8 Domain types for Genomics data in BoaG

Shared data science infrastructure for genomics data

Type Attributes Details
Genome taxid Taxonomy ID of each species
refseq Refseq ID of the GFF file
Sequence List of sequence reads in each GFF file [26].
AssemblerRoot List of assembly programs associated with this genome
accession Accession number
Sequence header Header of Sequence
FeatureRoot List of features including exon,gene,mRNA, and CDS associated with this sequence
seq Actual DNA sequences from FASTA files
FeatureRoot refseq This field shows the key ID
feature This field is the list of features associated with this ID
Feature accession Accession code of the Sequence
seqid Sequence ID
source A text qualifier that describes the algorithm or procedure that generated this feature.
ftype Type of the feature
start starting point of the feature
end End point of the feature
score Score of the feature. This is a floating point number.
strand + and - for positive and negative strand respectively
phase Phase of the feature. The phase is one of the integers 0, 1, or 2
Attribute List of attributes for each feature
parent Shows the parent of the attribute
Attribute id Attribute ID
tag Attribute tag including gbkey etc.
value Value of the tag
AssemblerRoot Assembler List of assembly programs
total-length Total length or genome size (base pair)
total-gap-length Total gap length after genome assembly
scaffold-N50 Scaffold N50 metric
scaffold-count Scaffold count metric
contig-N50 Contig N50 metric
contig-count Contig count metric
Assembler name Assembly program used to assemble the genome
desc Program attributes: program name, program version, etc.