Skip to main content

Table 3 Sample code for the evaluation of a multistep design pipeline

From: rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

Action Code Sample
Load import rstoolbox as rs
import matplotlib.pyplot as plt
Read # With Rosetta installed, scoring can be run for a single structure
baseline = rs.io.get_sequence_and_structure(‘1kx8.pdb’, minimize = True)
slen = len(baseline.iloc[0 ].get_sequence (‘A’))
# Pre-calculated sets can also be loaded to contextualize the data
# 70% homology filter
cath = rs.utils.load_refdata(‘cath’, 70)
# Length in a window of 10 residues around expected design length
cath = cath[(cath[‘length’] > = slen - 5) & (cath[‘length’] < = slen + 5)]
# Designs were performed in two rounds
gen1 = rs.io.parse_rosetta_file(‘1kx8_gen1.designs’)
gen2 = rs.io.parse_rosetta_file(‘1kx8_gen2.designs’)
# Identifiers of selected decoys:
decoys = [‘d1’, ‘d2’, ‘d3’, ‘d4’, ‘d5’, ‘d6’]
# Load experimental data for d2 (best performing decoy)
df_cd = rs.io.read_CD(‘1kx8_d2/CD’, model = ‘J-815’)
df_spr = rs.io.read_SPR(‘1kx8_d2/SPR.data’)
Plot fig = plt.figure(figsize = (170 / 25.4, 170 / 25.4))
grid = (3, 4)
# Compare scores between the two generations
axs = rs.plot.multiple_distributions(gen2, fig, (3, 4), values = [‘score’, ‘hbond_bb_sc’, ‘hbond_sc’, ‘rmsd’], refdata = gen1, violins = False, showfliers = False)
# See how the selected decoys fit into domains of similar size
qr = gen2[gen1[‘description’].isin(decoys)]
axs = rs.plot.plot_in_context(qr, fig, (3, 2), cath, (1, 0), [‘score’, ‘cav_vol’])
axs[0].axvline(baseline.iloc[0][‘score’], color = ‘k’, linestyle = ‘--’)
axs[1].axvline(baseline.iloc[0][‘cavity’], color = ‘k’, linestyle = ‘--’)
# Plot experimental validation data
ax = plt.subplot2grid(grid, (2, 0), fig = fig, colspan = 2)
rs.plot.plot_CD (df_cd, ax, sample = 7)
ax = plt.subplot2grid(grid, (2, 2), fig = fig, colspan = 2)
rs.plot.plot_SPR (df_spr, ax, fitcolor = ‘black’)
plt.tight_layout()
plt.savefig(‘BMC_Fig4.png’, dpi = 300)
  1. The code shows how to combine the data from multiple Rosetta simulations and assess the different features between two design populations in terms of scoring as well as the comparison between the final designs and the initial structure template. Code comments are presented in italics while functions from the rstoolbox are highlighted in bold. Styling commands are skipped to facilitate reading, but can be found in the repository’s notebook.