rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

Bonet, Jaume; Harteveld, Zander; Sesterhenn, Fabian; Scheck, Andreas; Correia, Bruno E.

doi:10.1186/s12859-019-2796-3

Table 4 Sample code for the comparison between 4 different decoy populations

From: rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

Action	Code Sample
Load	import pandas as pd import rstoolbox as rs import matplotlib.pyplot as plt
Read	df = [] # With Rosetta installed, scoring can be run for a single structure baseline = rs.io.get_sequence_and_structure (‘4yod.pdb’)
Read	experiments = [‘no_target’, ‘static’, ‘pack’, ‘packmin’] scores = [‘score’, ‘LocalRMSDH’, ‘post_ddg’, ‘bb_clash’] scorename = [‘score’, ‘RMSD’, ‘ddG’, ‘bb_clash’] for experiment in experiments: # Load Rosetta silent file from decoy generation ds = rs.io.parse_rosetta_file(experiment + ‘.design’) # Load decoy evaluation from a pre-processed CSV file. # Casting pd. DataFrame into DesignFrame is as easy as shown here. ev = rs.components. DesignFrame(pd.read_csv(experiment + ‘.evals’)) # Different outputs for the same decoys can be combined through # their ‘description’ field (decoy identifier) df.append(ds.merge (ev, on = ‘description’)) # Tables can be joined together into a single working object df = pd.concat(df) # As we are comparing over BINDI’s sequence, that is our reference. df.add_reference_sequence(‘B’, baseline.iloc[0].get_sequence(‘B’)[:-1])
Plot	fig = plt.figure (figsize = (170 / 25.4, 170 / 25.4)) grid = (12, 4) # Show the distribution for key score terms axs = rs.plot.multiple_distributions (df, fig, grid, values = scores, rowspan = 3, labels = scorename, x = ‘binder_state’, order = experiments, showfliers = False)
	# Sequence score for a selected decoys with standard-matrix weights ax = plt.subplot2grid(grid, (3, 0), fig = fig, colspan = 4, rowspan = 4) qr = df[df[‘binder_state’] == ‘no_target’].sort_values(‘score’).iloc[0] rs.plot.per_residue_matrix_score_plot ( qr , ‘B’, ax, ‘BLOSUM62’, add_alignment = False, color = 0) qr = df[df[‘binder_state’] == ‘no_pack’].sort_values(‘score’).iloc[0] rs.plot.per_residue_matrix_score_plot (qr, ‘B’, ax, ‘BLOSUM62’, add_alignment = False, color = 2, selections = [(‘43–64’, ‘red’)]) # Small functions help edit the plot display rs.utils.add_top_title (ax, ‘no_target (blue) - pack (green)’)
	# Evaluate the variability of residue types in the binding region ax = plt.subplot2grid(grid, (7, 0), fig = fig, colspan = 2, rowspan = 4) qr = df[df[‘binder_state’] == ‘no_target’] rs.plot.sequence_frequency_plot (qr, ‘B’, ax, key_residues = ‘43–64’, cbar = False, clean_unused = 0.1, xrotation = 90) rs.utils.add_top_title (ax, ‘no_target’) ax = plt.subplot2grid(grid, (7, 2), fig = fig, colspan = 2, rowspan = 4) ax_cbar = plt.subplot2grid(grid, (11, 0), fig = fig, colspan = 4) rs.plot.sequence_frequency_plot (df[df[‘binder_state’] == ‘pack’], ‘B’, ax, key_residues = ‘43–64’, cbar_ax = ax_cbar, clean_unused = 0.1, xrotation = 90) rs.utils.add_top_title (ax, ‘pack’)
	plt.tight_layout() plt.savefig(‘BMC_Fig5.png’, dpi = 300)

The code shows how to join data from multiple Rosetta experiments to assess the key difference between four design populations in terms of different scoring metrics and sequence recovery. Code comments are presented in italics while functions from the rstoolbox are highlighted in bold. Styling commands are skipped to facilitate reading, but can be found in the repository’s notebook.

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Bioinformatics

Contact us