Skip to main content

Table 4 Sample code for the comparison between 4 different decoy populations

From: rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

Action Code Sample
Load import pandas as pd
import rstoolbox as rs
import matplotlib.pyplot as plt
Read df = []
# With Rosetta installed, scoring can be run for a single structure
baseline =  rs.io.get_sequence_and_structure (‘4yod.pdb’)
experiments = [‘no_target’, ‘static’, ‘pack’, ‘packmin’]
scores = [‘score’, ‘LocalRMSDH’, ‘post_ddg’, ‘bb_clash’]
scorename = [‘score’, ‘RMSD’, ‘ddG’, ‘bb_clash’]
for experiment in experiments:
  # Load Rosetta silent file from decoy generation
  ds = rs.io.parse_rosetta_file(experiment + ‘.design’)
  # Load decoy evaluation from a pre-processed CSV file.
  # Casting pd. DataFrame into DesignFrame is as easy as shown here.
  ev = rs.components. DesignFrame(pd.read_csv(experiment + ‘.evals’))
  # Different outputs for the same decoys can be combined through
  # their ‘description’ field (decoy identifier)
  df.append(ds.merge (ev, on = ‘description’))
  # Tables can be joined together into a single working object
  df = pd.concat(df)
  # As we are comparing over BINDI’s sequence, that is our reference.
  df.add_reference_sequence(‘B’, baseline.iloc[0].get_sequence(‘B’)[:-1])
Plot fig = plt.figure (figsize = (170 / 25.4, 170 / 25.4))
grid = (12, 4)
# Show the distribution for key score terms
axs =  rs.plot.multiple_distributions (df, fig, grid, values = scores, rowspan = 3,
labels = scorename, x = ‘binder_state’, order = experiments, showfliers = False)
# Sequence score for a selected decoys with standard-matrix weights
ax = plt.subplot2grid(grid, (3, 0), fig = fig, colspan = 4, rowspan = 4)
qr = df[df[‘binder_state’] == ‘no_target’].sort_values(‘score’).iloc[0]
rs.plot.per_residue_matrix_score_plot ( qr , ‘B’, ax, ‘BLOSUM62’, add_alignment = False, color = 0)
qr = df[df[‘binder_state’] == ‘no_pack’].sort_values(‘score’).iloc[0]
rs.plot.per_residue_matrix_score_plot (qr, ‘B’, ax, ‘BLOSUM62’, add_alignment = False, color = 2,
selections = [(‘43–64’, ‘red’)])
# Small functions help edit the plot display
rs.utils.add_top_title (ax, ‘no_target (blue) - pack (green)’)
# Evaluate the variability of residue types in the binding region
ax = plt.subplot2grid(grid, (7, 0), fig = fig, colspan = 2, rowspan = 4)
qr = df[df[‘binder_state’] == ‘no_target’]
rs.plot.sequence_frequency_plot (qr, ‘B’, ax, key_residues = ‘43–64’, cbar = False, clean_unused = 0.1, xrotation = 90)
rs.utils.add_top_title (ax, ‘no_target’)
ax = plt.subplot2grid(grid, (7, 2), fig = fig, colspan = 2, rowspan = 4)
ax_cbar = plt.subplot2grid(grid, (11, 0), fig = fig, colspan = 4)
rs.plot.sequence_frequency_plot (df[df[‘binder_state’] == ‘pack’], ‘B’, ax, key_residues = ‘43–64’,                                                                                                 cbar_ax = ax_cbar, clean_unused = 0.1, xrotation = 90)
rs.utils.add_top_title (ax, ‘pack’)
plt.tight_layout()
plt.savefig(‘BMC_Fig5.png’, dpi = 300)
  1. The code shows how to join data from multiple Rosetta experiments to assess the key difference between four design populations in terms of different scoring metrics and sequence recovery. Code comments are presented in italics while functions from the rstoolbox are highlighted in bold. Styling commands are skipped to facilitate reading, but can be found in the repository’s notebook.