Skip to main content

Table 2 The track types supported by existing tabular, binary and XML formats

From: Identifying elemental genomic track types and representing them uniformly

Format

Ref.

Data

Repr.

P

S

VP

VS

GP

SF

F

L

Strand

#Cols

Value type

GFF3/GTF

[2]

General

Tab.

1

1

   

2

9

Float3

BED/bigBed

[4]

General

Tab./Bin.

1

1

   

2

3-12

Int(0-1000)/string4

BED15

[4]

Microarray

Tab.

  

1

   

2

15

List of floats5

bedGraph

[4]

General

Tab.

  

1

     

4

Float

WIG/bigWig (fixedStep)

[8]

General

Tab./Bin.

  

 

  

1

Float

WIG/bigWig (variableStep)

[8]

General

Tab./Bin.

  

     

2

Float

CNT

[36]

Copy number

Tab.

  

      

4

Float

Personal Genome SNP

[4]

Variation

Tab.

  

1

     

7

String6

VCF

[37]

Variation

Tab.

  

     

≥ 8

String6,3

GVF

[6]

General/Variation

Tab.

1

1

   

2

9

Float3

PSL

[4]

Alignment

Tab.

 

 

    

21

Int7

SAM/BAM

[38]

Alignment

Tab./Bin.

 

 

    

11

Int/string8

BioHDF

[39]

Alignment

Bin.

 

 

    

11

Int/string8

MAF

[4]

Multiple Alignment

Tab.

 

 

   

9

2-7

Float/string8

FASTA

[40]

Sequence

Text

      

  

N/A

Char

DAS XML

[12]

General

XML

1

1

   

2

N/A

Float

BioXSD 1.0

[16]

General

XML

10

10

10

10

   

11

N/A

Float12

USeq

[19]

General

Bin.

    

N/A

Int/float/string

Genomedata

[41]

General

Bin.

  

 

  

N/A

Int/float/char

  1. The track type abbreviations are as follows: Points (P), Segments (S), Valued Points (VP), Valued Segments (VS), Genome Partition (GP), Step Function (SF), and Function (F). L refers to any of the linked track types. The table also denotes whether the format supports specification of strand, the number of columns of the tabular formats, and the type of the dominant value, if any.
  2. 1 Points are specified using both start and end values. There is no way of specifying that a file contains only points.
  3. 2 Only a special case of linked segments is supported, namely part-of relationships, such as en exon being a part of a gene.
  4. 3 The chosen value type refers to what may be considered the main score column of the format. The format also includes a configurable column containing values that may be extracted by specialized parsers.
  5. 4 We limit the bigBed format to the standard BED columns for simplicity, as the bigBed format is highly customizable through the use of AutoSQL configurations.
  6. 5 The float values represent a set of gene expression values from microarray experiments.
  7. 6 The values represent the possible alleles at a SNP position. Also, the allele frequencies and quality scores are reported and could be used as values.
  8. 7 E.g. the number of bases that match/do not match.
  9. 8 E.g. the mapping quality or the aligned sequence itself.
  10. 9 Links to alignments in other genomes.
  11. 10 There is no way of specifying that a record contains only points or only segments.
  12. 11 No weights are supported in BioXSD 1.0.
  13. 12 Numerical values are always signed double precision floats (8 bytes). A limited set of other value types is also allowed (e.g. sequence variation and alignments).