Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: CARE 2.0: reducing false-positive sequencing error corrections using machine learning

Fig. 1

Workflow of CARE 2.0: a The signature of an anchor read (\(r_i\)) is determined by minhashing and used to query the precomputed hash tables. The retrieved reads form the candidate read set \(C(r_i)\). b All reads in \(C(r_i)\) are aligned to \(r_i\). Reads with a relatively low semi-global pairwise alignment quality are removed, resulting in the filtered set of candidate reads (\(F(r_i)\)). c The initial MSA is constructed around the center \(r_i\) using \(F(r_i)\). The MSA is refined by removing candidate reads with a significantly different pattern from the anchor (i.e. \(r_{15}, r_{22}, r_7\) in the example). d The anchor read (the seventh nucleotide in \(r_i\) in the example) and optionally some of the candidates are corrected (the fifth nucleotide in \(r_2\) in the example), using a provided random forest trained for correction

Back to article page