Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes

Fig. 1

Schematic overview highlighting the difference between GenoGAM 1.0 and GenoGAM 2.0: Raw BAM Files are read-in, pre-processed normalized and written to hard drive in HDF5 format. Moreover, normalization factors for sequencing depth variation are computed using DESeq2 [10]. The resulting object is the dataset upon which fitting is done. Then global hyperparameters are estimated by cross-validation and for each tile coefficients are estimated via Newton-Raphson and standard errors via sparse inverse subset algorithm. The final model is written as a new object to hard drive in HDF5 format. Note, that the schematic view is a simplification: The pre-processed dataset and the fitted model are not generated in memory and written to HDF5 in the end. Instead, all HDF5 matrices are initialized on hard drive directly and the writing is done on the fly. Blue (GenoGAM 1.0) and orange colors (GenoGAM 2.0) mark differences between both GenoGAM versions, simultaneously displaying the content of this paper

Back to article page