Skip to main content
Fig. 8 | BMC Bioinformatics

Fig. 8

From: Prop3D: A flexible, Python-based platform for machine learning with protein structural properties and biophysical data

Fig. 8

HSDS affords significantly improved training runtimes. Using Prop3D, we trained an immunoglobulin-specific variational autoencoder with \(\approx \,\)25K domain structures, employing 64 CPUs to process data and four GPUs for 30 epochs of training (orange trace; [12]). A Before we chose to implement HSDS in Prop3D, we stored and processed domain structures as simple plaintext PDB files (parsed with BioPython), along with the corresponding biophysical properties for all atoms in these structures as plaintext files of comma-separated values (CSV; parsed with Pandas). That computation took \(\approx \,\)24 h of wallclock time for \(\approx \,\)50K ASCII files on a well-equipped GPU workstation. B. Reformulating and streamlining the Prop3D pipeline with HSDS yielded a substantial (\(\approx \,\)33%) speed-up: training runtimes across many epochs (orange) improved by \(\approx \,\)8 h (to \(\approx \,\)16 h total), with there being far more efficient CPU usage while reading all of the data (blue traces; note the different vertical scales in A and B). These data-panel images were exported from our Weights and Biases training dashboard

Back to article page