Skip to main content

Table 1 Protocol Buffer specifications

From: Fragment assignment in the cloud with eXpress-D

Field Type Description
Fragment
name string Unique query name of fragment in SAM file
paired bool Boolean specifying if both ends were sequenced
alignments FragmentAlignments Collection of alignments for fragment
FragmentAlignment
target_id uint32 ID of target aligned to (index in SAM header)
read_l ReadAlignment Alignment information for 5’ (left) read, if exists
read_r ReadAlignment Alignment information for 3’ (right) read, if exists
ReadAlignment
first bool Boolean specifying if this end was sequenced first
left_pos unit32 0-based left endpoint of alignment to reference
right_pos unit32 0-based right endpoint of alignment to reference
mismatch_indices byteArray Positions in read that differ from reference
mismatch_nucs byteArray Nucleotides in read at mismatches, 2 bits/nuc
Target
name string Unique name of target sequence
id uint32 Index of target in SAM header
length uint32 Number of nucleotides in target sequence
seq byteArray Nucleotides of target sequence, 2 bits/nuc
  1. eXpress pre-processes the input data (SAM/BAM and FASTA file) and converts it to a format that is compatible with the distributed file system’s partitioning scheme. The information for each target and fragment are put into a space-efficient Protocol Buffer-retaining only the information necessary for optimization-, which is then serialized and encoded in base64. Each target or fragment takes up exactly one line in the file created for input into eXpress-D.