Skip to main content

Advertisement

Table 2 The track types supported by existing tabular, binary and XML formats

From: Identifying elemental genomic track types and representing them uniformly

Format Ref. Data Repr. P S VP VS GP SF F L Strand #Cols Value type
GFF3/GTF [2] General Tab. 1 1     2 9 Float3
BED/bigBed [4] General Tab./Bin. 1 1     2 3-12 Int(0-1000)/string4
BED15 [4] Microarray Tab.    1     2 15 List of floats5
bedGraph [4] General Tab.    1       4 Float
WIG/bigWig (fixedStep) [8] General Tab./Bin.         1 Float
WIG/bigWig (variableStep) [8] General Tab./Bin.          2 Float
CNT [36] Copy number Tab.           4 Float
Personal Genome SNP [4] Variation Tab.    1       7 String6
VCF [37] Variation Tab.          ≥ 8 String6,3
GVF [6] General/Variation Tab. 1 1     2 9 Float3
PSL [4] Alignment Tab.          21 Int7
SAM/BAM [38] Alignment Tab./Bin.          11 Int/string8
BioHDF [39] Alignment Bin.          11 Int/string8
MAF [4] Multiple Alignment Tab.         9 2-7 Float/string8
FASTA [40] Sequence Text           N/A Char
DAS XML [12] General XML 1 1     2 N/A Float
BioXSD 1.0 [16] General XML 10 10 10 10     11 N/A Float12
USeq [19] General Bin.      N/A Int/float/string
Genomedata [41] General Bin.         N/A Int/float/char
  1. The track type abbreviations are as follows: Points (P), Segments (S), Valued Points (VP), Valued Segments (VS), Genome Partition (GP), Step Function (SF), and Function (F). L refers to any of the linked track types. The table also denotes whether the format supports specification of strand, the number of columns of the tabular formats, and the type of the dominant value, if any.
  2. 1 Points are specified using both start and end values. There is no way of specifying that a file contains only points.
  3. 2 Only a special case of linked segments is supported, namely part-of relationships, such as en exon being a part of a gene.
  4. 3 The chosen value type refers to what may be considered the main score column of the format. The format also includes a configurable column containing values that may be extracted by specialized parsers.
  5. 4 We limit the bigBed format to the standard BED columns for simplicity, as the bigBed format is highly customizable through the use of AutoSQL configurations.
  6. 5 The float values represent a set of gene expression values from microarray experiments.
  7. 6 The values represent the possible alleles at a SNP position. Also, the allele frequencies and quality scores are reported and could be used as values.
  8. 7 E.g. the number of bases that match/do not match.
  9. 8 E.g. the mapping quality or the aligned sequence itself.
  10. 9 Links to alignments in other genomes.
  11. 10 There is no way of specifying that a record contains only points or only segments.
  12. 11 No weights are supported in BioXSD 1.0.
  13. 12 Numerical values are always signed double precision floats (8 bytes). A limited set of other value types is also allowed (e.g. sequence variation and alignments).