Skip to main content

Table 3 Sample code for the evaluation of a multistep design pipeline

From: rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

Action

Code Sample

Load

import rstoolbox as rs

import matplotlib.pyplot as plt

Read

# With Rosetta installed, scoring can be run for a single structure

baseline = rs.io.get_sequence_and_structure(‘1kx8.pdb’, minimize = True)

slen = len(baseline.iloc[0 ].get_sequence (‘A’))

# Pre-calculated sets can also be loaded to contextualize the data

# 70% homology filter

cath = rs.utils.load_refdata(‘cath’, 70)

# Length in a window of 10 residues around expected design length

cath = cath[(cath[‘length’] > = slen - 5) & (cath[‘length’] < = slen + 5)]

# Designs were performed in two rounds

gen1 = rs.io.parse_rosetta_file(‘1kx8_gen1.designs’)

gen2 = rs.io.parse_rosetta_file(‘1kx8_gen2.designs’)

# Identifiers of selected decoys:

decoys = [‘d1’, ‘d2’, ‘d3’, ‘d4’, ‘d5’, ‘d6’]

# Load experimental data for d2 (best performing decoy)

df_cd = rs.io.read_CD(‘1kx8_d2/CD’, model = ‘J-815’)

df_spr = rs.io.read_SPR(‘1kx8_d2/SPR.data’)

Plot

fig = plt.figure(figsize = (170 / 25.4, 170 / 25.4))

grid = (3, 4)

# Compare scores between the two generations

axs = rs.plot.multiple_distributions(gen2, fig, (3, 4), values = [‘score’, ‘hbond_bb_sc’, ‘hbond_sc’, ‘rmsd’], refdata = gen1, violins = False, showfliers = False)

# See how the selected decoys fit into domains of similar size

qr = gen2[gen1[‘description’].isin(decoys)]

axs = rs.plot.plot_in_context(qr, fig, (3, 2), cath, (1, 0), [‘score’, ‘cav_vol’])

axs[0].axvline(baseline.iloc[0][‘score’], color = ‘k’, linestyle = ‘--’)

axs[1].axvline(baseline.iloc[0][‘cavity’], color = ‘k’, linestyle = ‘--’)

# Plot experimental validation data

ax = plt.subplot2grid(grid, (2, 0), fig = fig, colspan = 2)

rs.plot.plot_CD (df_cd, ax, sample = 7)

ax = plt.subplot2grid(grid, (2, 2), fig = fig, colspan = 2)

rs.plot.plot_SPR (df_spr, ax, fitcolor = ‘black’)

plt.tight_layout()

plt.savefig(‘BMC_Fig4.png’, dpi = 300)

  1. The code shows how to combine the data from multiple Rosetta simulations and assess the different features between two design populations in terms of scoring as well as the comparison between the final designs and the initial structure template. Code comments are presented in italics while functions from the rstoolbox are highlighted in bold. Styling commands are skipped to facilitate reading, but can be found in the repository’s notebook.