Skip to main content

Table 1 Protocol Buffer specifications

From: Fragment assignment in the cloud with eXpress-D

Field Type Description
name string Unique query name of fragment in SAM file
paired bool Boolean specifying if both ends were sequenced
alignments FragmentAlignments Collection of alignments for fragment
target_id uint32 ID of target aligned to (index in SAM header)
read_l ReadAlignment Alignment information for 5’ (left) read, if exists
read_r ReadAlignment Alignment information for 3’ (right) read, if exists
first bool Boolean specifying if this end was sequenced first
left_pos unit32 0-based left endpoint of alignment to reference
right_pos unit32 0-based right endpoint of alignment to reference
mismatch_indices byteArray Positions in read that differ from reference
mismatch_nucs byteArray Nucleotides in read at mismatches, 2 bits/nuc
name string Unique name of target sequence
id uint32 Index of target in SAM header
length uint32 Number of nucleotides in target sequence
seq byteArray Nucleotides of target sequence, 2 bits/nuc
  1. eXpress pre-processes the input data (SAM/BAM and FASTA file) and converts it to a format that is compatible with the distributed file system’s partitioning scheme. The information for each target and fragment are put into a space-efficient Protocol Buffer-retaining only the information necessary for optimization-, which is then serialized and encoded in base64. Each target or fragment takes up exactly one line in the file created for input into eXpress-D.