Skip to main content

Table 1 Protocol Buffer specifications

From: Fragment assignment in the cloud with eXpress-D

Field

Type

Description

Fragment

name

string

Unique query name of fragment in SAM file

paired

bool

Boolean specifying if both ends were sequenced

alignments

FragmentAlignments

Collection of alignments for fragment

FragmentAlignment

target_id

uint32

ID of target aligned to (index in SAM header)

read_l

ReadAlignment

Alignment information for 5’ (left) read, if exists

read_r

ReadAlignment

Alignment information for 3’ (right) read, if exists

ReadAlignment

first

bool

Boolean specifying if this end was sequenced first

left_pos

unit32

0-based left endpoint of alignment to reference

right_pos

unit32

0-based right endpoint of alignment to reference

mismatch_indices

byteArray

Positions in read that differ from reference

mismatch_nucs

byteArray

Nucleotides in read at mismatches, 2 bits/nuc

Target

name

string

Unique name of target sequence

id

uint32

Index of target in SAM header

length

uint32

Number of nucleotides in target sequence

seq

byteArray

Nucleotides of target sequence, 2 bits/nuc

  1. eXpress pre-processes the input data (SAM/BAM and FASTA file) and converts it to a format that is compatible with the distributed file system’s partitioning scheme. The information for each target and fragment are put into a space-efficient Protocol Buffer-retaining only the information necessary for optimization-, which is then serialized and encoded in base64. Each target or fragment takes up exactly one line in the file created for input into eXpress-D.