Flowchart of the DNAPROT algorithm. Starting from the set of protein-DNA complexes included in the PDB, we filter the entries using the criteria described in the Methods section to eliminate redundancy. a) The culled training set is used to derive atomic matrices that capture the interaction preferences at binding interfaces. Taking as input the Cartesian coordinates of a TF-DNA complex with N complementary base pairs, DNAPROT mutates one by one all 4N nucleotides in the template. c) During the saturating mutation assay each mutation is scored in terms of direct – i.e., using the atomic PWMs built in step a) – and indirect – i.e., by estimating the deformation cost of DNA upon mutation, as described in step b) – readout and the combined scores are used to fill a position weight matrix. A sequence logo might be calculated from the structure-based PWM by stacking the best B oligonucleotides, usually 50.