Identifying elemental genomic track types and representing them uniformly

Gundersen, Sveinung; Kalaš, Matúš; Abul, Osman; Frigessi, Arnoldo; Hovig, Eivind; Sandve, Geir Kjetil

doi:10.1186/1471-2105-12-494

BMC Bioinformatics

Table 2 The track types supported by existing tabular, binary and XML formats

From: Identifying elemental genomic track types and representing them uniformly

Format	Ref.	Data	Repr.	P	S	VP	VS	SF	F	L	Strand	#Cols	Value type
GFF3/GTF	[2]	General	Tab.	✓¹	✓	✓¹	✓			²	✓	9	Float³
BED/bigBed	[4]	General	Tab./Bin.	✓¹	✓	✓¹	✓			²	✓	3-12	Int(0-1000)/string⁴
BED15	[4]	Microarray	Tab.			✓¹	✓			²	✓	15	List of floats⁵
bedGraph	[4]	General	Tab.			✓¹	✓					4	Float
WIG/bigWig (fixedStep)	[8]	General	Tab./Bin.			✓	✓	✓	✓			1	Float
WIG/bigWig (variableStep)	[8]	General	Tab./Bin.			✓	✓					2	Float
CNT	[36]	Copy number	Tab.			✓						4	Float
Personal Genome SNP	[4]	Variation	Tab.			✓¹	✓					7	String⁶
VCF	[37]	Variation	Tab.			✓	✓					≥ 8	String^6,3
GVF	[6]	General/Variation	Tab.	✓¹	✓	✓¹	✓			²	✓	9	Float³
PSL	[4]	Alignment	Tab.		✓		✓				✓	21	Int⁷
SAM/BAM	[38]	Alignment	Tab./Bin.		✓		✓				✓	11	Int/string⁸
BioHDF	[39]	Alignment	Bin.		✓		✓				✓	11	Int/string⁸
MAF	[4]	Multiple Alignment	Tab.		✓		✓			⁹	✓	2-7	Float/string⁸
FASTA	[40]	Sequence	Text						✓			N/A	Char
DAS XML	[12]	General	XML	✓¹	✓	✓¹	✓			²	✓	N/A	Float
BioXSD 1.0	[16]	General	XML	✓¹⁰	✓¹⁰	✓¹⁰	✓¹⁰			✓¹¹	✓	N/A	Float¹²
USeq	[19]	General	Bin.	✓	✓	✓	✓				✓	N/A	Int/float/string
Genomedata	[41]	General	Bin.			✓	✓	✓	✓			N/A	Int/float/char

The track type abbreviations are as follows: Points (P), Segments (S), Valued Points (VP), Valued Segments (VS), Genome Partition (GP), Step Function (SF), and Function (F). L refers to any of the linked track types. The table also denotes whether the format supports specification of strand, the number of columns of the tabular formats, and the type of the dominant value, if any.
¹ Points are specified using both start and end values. There is no way of specifying that a file contains only points.
² Only a special case of linked segments is supported, namely part-of relationships, such as en exon being a part of a gene.
³ The chosen value type refers to what may be considered the main score column of the format. The format also includes a configurable column containing values that may be extracted by specialized parsers.
⁴ We limit the bigBed format to the standard BED columns for simplicity, as the bigBed format is highly customizable through the use of AutoSQL configurations.
⁵ The float values represent a set of gene expression values from microarray experiments.
⁶ The values represent the possible alleles at a SNP position. Also, the allele frequencies and quality scores are reported and could be used as values.
⁷ E.g. the number of bases that match/do not match.
⁸ E.g. the mapping quality or the aligned sequence itself.
⁹ Links to alignments in other genomes.
¹⁰ There is no way of specifying that a record contains only points or only segments.
¹¹ No weights are supported in BioXSD 1.0.
¹² Numerical values are always signed double precision floats (8 bytes). A limited set of other value types is also allowed (e.g. sequence variation and alignments).

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com