Fig. 1From: CASTELO: clustered atom subtypes aided lead optimization—a combined machine learning and molecular modeling methodThe general pipeline for CASTELO. The starting point is the generation of MD trajectories, with tools such as GROMACS. RMSD clustering can be done with VMD software. In another route, we process MD trajectories with python scripts to obtain contact matrices. Atom subtype information is used to aggregate the calculated contact matrices. Following that, dynamism tensors with temporal information is generated on top of the contact matrices using python scripts. CVAE model is used to encode the dynamism data, before clusters are calculated with tools such as HDBSCAN. Finally, we converge the two routes by comparing clusters from conventional RMSD clustering and CVAE clustering with proposed comparison metrics. The atom subtypes are ranked, as the final output of CASTELO. With domain knowledge, we suggest modifications for the lowest ranked atoms. Methods such as free energy perturbation calculations can be used to verify CASTELO’s suggestionsBack to article page