Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Prop3D: A flexible, Python-based platform for machine learning with protein structural properties and biophysical data

Fig. 1

Overview of Prop3D and its components. Prop3D is a framework to create and share protein structures featurized with custom sets of properties (biophysical, phylogenetic, etc.), thereby providing ML-ready datasets for structural bioinformatics. One works towards this goal, represented by the green- and blue-background regions to the right and top of this schematic, by utilizing two distinct packages that lie at the core of Prop3D (yellow region at left): (i) ‘Meadowlark’, which enables one to prepare structures, compute and apply features, and run bioinformatics tools/utilities as Docker-ized software (sw); and (ii) ‘AtomicToil’, for performing massively-parallel calculations, locally or in the cloud, using the Toil pipeline system. Proceeding in this way, a dataset of featurized structures can be readily used in the popular ML framework PyTorch, for instance using various representational schemes and types of ML models (language models, graphical models, etc.), as shown in the green region at right; Prop3D facilitates these steps by providing custom PyTorch data loaders that enable rapid, high-volume processing. Prop3D-20sf, a dataset that we created by applying Prop3D to CATH, is available as a publicly-available HSDS endpoint

Back to article page