COZOID: contact zone identifier for visual analysis of protein-protein interactions

Furmanová, Katarína; Byška, Jan; Gröller, Eduard M.; Viola, Ivan; Paleček, Jan J.; Kozlíková, Barbora

doi:10.1186/s12859-018-2113-6

Research article
Open access
Published: 06 April 2018

COZOID: contact zone identifier for visual analysis of protein-protein interactions

Katarína Furmanová¹,
Jan Byška²,
Eduard M. Gröller³,
Ivan Viola³,
Jan J. Paleček^4,5 &
…
Barbora Kozlíková ORCID: orcid.org/0000-0001-6358-9420¹

BMC Bioinformatics volume 19, Article number: 125 (2018) Cite this article

3800 Accesses
11 Citations
2 Altmetric
Metrics details

Abstract

Background

Studying the patterns of protein-protein interactions (PPIs) is fundamental for understanding the structure and function of protein complexes. The exploration of the vast space of possible mutual configurations of interacting proteins and their contact zones is very time consuming and requires the proteomic expert knowledge.

Results

In this paper, we propose a novel tool containing a set of visual abstraction techniques for the guided exploration of PPI configuration space. It helps proteomic experts to select the most relevant configurations and explore their contact zones at different levels of detail. The system integrates a set of methods that follow and support the workflow of proteomics experts. The first visual abstraction method, the Matrix view, is based on customized interactive heat maps and provides the users with an overview of all possible residue-residue contacts in all PPI configurations and their interactive filtering. In this step, the user can traverse all input PPI configurations and obtain an overview of their interacting amino acids. Then, the models containing a particular pair of interacting amino acids can be selectively picked and traversed. Detailed information on the individual amino acids in the contact zones and their properties is presented in the Contact-Zone list-view. The list-view provides a comparative tool to rank the best models based on the similarity of their contacts to the template-structure contacts. All these techniques are interactively linked with other proposed methods, the Exploded view and the Open-Book view, which represent individual configurations in three-dimensional space. These representations solve the high overlap problem associated with many configurations. Using these views, the structural alignment of the best models can also be visually confirmed.

Conclusions

We developed a system for the exploration of large sets of protein-protein complexes in a fast and intuitive way. The usefulness of our system has been tested and verified on several docking structures covering the three major types of PPIs, including coiled-coil, pocket-string, and surface-surface interactions. Our case studies prove that our tool helps to analyse and filter protein-protein complexes in a fraction of the time compared to using previously available techniques.

Background

Understanding the constitution and biological function of proteins is essential in many research disciplines, such as medicine and pharmaceutics. Most of the proteins critical for cellular life act in a cooperative manner, forming multiprotein complexes. It is estimated that approximately 800 complexes exist in just one yeast cell [1].

All complexes are composed of subunits, which constitute the complex via mutual protein-protein interactions (PPIs). The main goal of studying these PPIs, known as protein-protein docking, is to identify the appropriate spatial configuration of the interacting proteins. This configuration is represented by the mutual spatial orientation of the interacting proteins. Each configuration contains a contact zone, consisting of the set of amino acids from both interacting proteins that are with interaction distance, usually spanning from 3 to 5 Ångströms.

The structure determination of PPIs in laboratories is very challenging, as well as expensive and time-consuming. This is due to many problems related to the dynamic nature of proteins, difficulties in their purification and sample preparation. Therefore, computational docking is often used to study the feasibility of proposed configurations. Many algorithms and tools have appeared to examine these configurations in the last years. A categorization of the existing algorithms, along with a description of their basic principles, was published recently by Huang [2].

However, these algorithms produce a large number of possible configurations, which need to be explored to identify the proteomically most relevant ones. Even though the computational tools usually provide the users with some score to rank the configurations, the resulting ordering does not necessarily correspond to their proteomic relevance. Therefore, the configurations have to be processed and examined manually, which requires a proper visual support to enhance the exploration process.

Even for the comparison of two configurations, a traditional overlay representation suffers from many occlusion problems and it is hard to perceive the differences between individual solutions. When comparing more configurations, even without a detailed visualization of the hot spot amino acids, the problem becomes even more apparent (Fig. 1).

Related work

As the selection of the most proteomically relevant PPI configurations is a very challenging task, several algorithms have already been published for re-ranking the configurations according to different criteria. They suggest a subset of configurations that should be explored in detail. As a representative of these attempts, Malhotra et al. [3] presented DockScore, a web server for ranking the individual configurations produced by docking tools. Their idea is based on building a scoring scheme that considers several interface parameters, such as the surface area, hydrophobicity, spatial clustering, etc. This helps the user to reduce the number of configurations to a smaller set, which still has to be explored manually. For this exploration, a visual support is essential, as it enables the user to see the spatial orientation of the contact zones and to compare different configurations. However, DockScore provides only a rudimentary visual representation of top five configurations, which is insufficient for the proper exploration of the configuration space.

Finding a proper visual representation of PPIs can be approached from different perspectives. One technique consists of techniques visualizing the contact zones and their interacting amino acids. The spatial techniques have to address the problem of occlusion and visual clutter caused by the fact that the most interesting parts of interacting proteins, the contact zones, are facing each other inside the configuration. Without transformations or visual enhancements (e.g., through transparency), it is impossible to visually explore the contact zones. Jin et al. [4] presented an open-book view where the interacting proteins are rotated to orient the contact zones towards the camera. The problem with the presented solution lies mainly in the missing information about the interacting amino acids and the unified coloring of the contact zones. An alternative approach presented by Lee and Varshney [5] computes and visualizes the intermolecular negative volume and the area of the docking site. This way the users can observe the volume between the interacting proteins without the need to display the contact zones themselves. This can serve proteomic experts as an interactive tool for studying possible docking configurations, but it does not support their comparison. Similar approaches suggest the construction of an interface surface between the interacting proteins [6, 7]. The surface is visualized as a 3D mesh, encoding the information about the core and peripheral regions from the interface. However, this method also does not support the comparison of multiple configurations.

Two-dimensional abstract representations are also commonly used for the visualization of contact zones, such as the schematic representation used by the PDBsum database [8] (Fig. 2). In the overview visualization, each of the interacting proteins is represented by a circle equipped with information about the number of amino acids forming the contact zones and the number of different types of interactions in-between (e.g., salt bridges, disulphide bonds, hydrogen bonds, or non-bonded contacts). The detailed visualization in PDBsum lists all the contact zone amino acids. The interactions are visualized by lines of different color and thickness, which represent the type and strength of the interactions, respectively. This approach gives a comprehensible overview of one configuration, but comparing it with another configuration is not possible.

Lex et al. [9] proposed a visual analysis tool for the exploration of large scale heterogeneous genomics data for the characterization of cancer subtypes. They use multiple views of the complex data, and one of them is a method for the comparison of different datasets. The abstract representation shows the similarities in the datasets by connecting corresponding blocks of data. The thickness of a connection denotes the degree of similarity. This representation serves well for comparison, but it lacks detailed information about the individual items.

In this paper, we present a systemic tool, COZOID, comprised of a set of methods for the visualization, comparison, and selection of numerous docking configurations. The combination of our proposed methods eliminates the problems associated with the existing solutions and provides proteomic experts with an intuitive and user-friendly tool for the interactive exploration of PPIs. Our tool is integrated into the CAVER Analyst software [10], which allows for the analysis and visualization of biomolecules, and therefore, contains many relevant features, such as different molecular visualization modes, measurement tools, etc. The input PPI configurations are provided by the existing computational tools and our solution is designed for dealing specifically with a large number of configurations.

Methods

COZOID overview

Our newly proposed system enables for the efficient visual exploration of a large number of PPI complexes. For a better understanding, we introduced the following notation. A protein P consists of a set of amino acids forming a polypeptidic chain. A complex C is represented by a set of mutually interacting proteins. In our case, we focus primarily on the interactions between two protein structures P₁ and P₂, which form a complex C(P₁,P₂). The mutual spatial orientation of the interacting proteins in the complex forms a configuration. The i-th configuration of complex C(P₁,P₂), denoted as CONF_i(C(P₁,P₂)), represents one of the possible mutual orientations of this complex. Generally, there can be n (1≤i≤n) possible configurations for a given complex, and the task is to select the configuration that is the most relevant one from a proteomics point of view. The decision is based on various pieces of knowledge about the geometric arrangement of the configuration as well as other aspects, such as knowledge of the contacts between the amino acids present in the contact zone of the given configuration. Therefore, the selection of the most relevant configurations cannot be completed automatically and requires insights from the proteomic expert. This represents a typical domain-related problem, which has to be supported by specifically designed visualizations.

The visualization methods proposed in this paper allow the user to visually explore a set of possible configurations detected by one of the existing computational tools and to select the most proteomically relevant ones. The users have to iteratively filter out those configurations that do not fulfill the given specific criteria. The proteomic expert workflow, along with our proposed visual support of its individual stages, is depicted in Fig. 3. The input datasets, consisting of dozens of configurations between two interacting proteins, were computed using the HADDOCK [11] and PyDock [12] tools. However, any of the existing tools for protein-protein docking can serve as a source of input data for our system.

The proposed visualizations are based on the precondition that the users already have initial knowledge about the interacting proteins. Thus, the experts are able to define a pair of amino acids that are expected to interact. This is not restrictive, as computational tools also require this information to produce a meaningful set of configurations. In other words, we are using similar input information as the computational tools. The second possibility is that the users do not have this information but are aware of an already explored protein complex with a similar structure that can serve as a reference (primary) complex for further comparison and exploration. In this case, the computational tools usually produce even more configurations, but most of them are irrelevant and have to be filtered out. Our tool can utilize the information about the interactions in the primary complex and enhance the filtering process.

Our methods have been designed specifically to help proteomic experts answer the following questions:

Q1: Which configurations contain a selected interacting pair of amino acids (and what is the frequency of the occurrence of this pair in all configurations)?
Q2: Which pairs of amino acids are present in a given configuration?
Q3: How close are the amino acids in the contact zone and which are the closest ones?
Q4: How similar and different are the contact zones in the configurations?
Q5: What are the physico-chemical properties of the amino acids in the contact zone?
Q6: What are the differences between the sets of amino acids in the contact zones of different configurations?

Answering these questions helps the proteomic experts to better understand the interactions in the protein-protein complexes and to evaluate the correctness of the given configurations. The proposed visualizations enable one to find the answers by interactively exploring the configurations which is demonstrated in the supplementary video as well (see Additional file 1). In the following chapters, we introduce our proposed views in detail.

Matrix view

When using a computational tool to generate possible configurations, the resulting set S={CONF_i(C(P₁,P₂));1≤i≤n}, n can be very large, ranging from dozens to hundreds. This amount is impossible to explore manually; thus, some preliminary filtering is crucial. The filtering stage is designed to answer question Q1. We propose a matrix-based visualization inspired by commonly used heat maps (Fig. 4 a). The rows and columns in the Matrix view correspond to the interacting proteins P₁ and P₂, respectively. Each row or column represents one amino acid present in a contact zone in some of the configurations CONF_i(C(P₁,P₂)). The rows and columns are formed only by those amino acids from the interacting proteins that are in contact in at least one configuration. The contact between the amino acids is based on their Euclidean distance. Two amino acids are considered to be in contact if their distance is between 3 and 5 Å. This range can be interactively changed by the user. The color of each cell in the matrix corresponds to the number of occurrences of the corresponding interacting amino acids in the set S of all configurations. The colored lists of amino acids can be interpreted as histograms, encoding the number of their occurrences. The intense red color represents the pairs of amino acids that are interacting in most of the configurations. The Matrix view serves directly for filtering out improbable solutions using the interactive user-driven selection of cells. The selection is performed by clicking on individual cells. Moreover, the matrix allows the expert to selecSut a combination of several pairs of amino acids. This is useful if the user wants to further explore only those configurations that contain specific interactions, such as between the amino acid pair A, B and simultaneously the pair C, D.

The big advantage of the Matrix view is its independence from the size of the input set of possible configurations. The number of rows and columns is limited by the size of the interacting proteins, meaning that in the worst case, it corresponds to the total number of amino acids in these proteins. However, in most cases, the number of amino acids in the contact zones is much smaller than the total number of amino acids. Each configuration of the input dataset then increases the counters in the respective matrix cells. In the case of many interacting amino acids, the cells in the matrix can become too small. In these situations, the users can employ the table lens technique introduced by Rao and Card [13], which can be applied to both rows and columns in the matrix (Fig. 4 a).

To provide the users with more detailed information about individual configurations, the Matrix view contains an additional side view, which is positioned directly next to the matrix (Fig. 4 b). The user can select a primary configuration to which all the remaining configurations are compared. An example of a primary configuration can be a crystal structure downloaded from the PDB database. We propose the following ranking score, which indicates the similarity between the contact zone of a given configuration and the primary configuration. One of the interacting proteins, e.g., P₁, is selected as a reference protein, while the second protein, e.g., P₂, is marked as the paired protein. The score is computed in the following way.

For each match of an amino acid in the contact zones from the reference proteins of the compared and the primary configuration, the similarity score is increased by one.
For each matching interaction pair in the contact zones from the compared and the primary configuration, the similarity score is increased by four.
For each missing interaction pair in the contact zones from the compared and the primary configuration, the similarity score is decreased by one.

This score was determined experimentally while designing and testing the view (see Results chapter). The central part of the side view consists of a scrollable list of individual configurations from a subset of S that was filtered with the Matrix view. The configurations are ordered according to their similarity scores, from the most similar to the least similar ones. The primary configuration is always displayed as the first one on the top of the list.

The side view helps to answer questions Q2 and Q3, as it enables an iterative search through the list of configurations and the exploration of all pairs of interacting amino acids for each configuration. The user can select a configuration to focus on by clicking on it. By default, each configuration in focus contains one polyline connecting two amino acids from the contact zone that are the closest among all the possible pairs (Fig. 4 b). The user can hover the mouse over the lists of amino acids on the left and right side and inspect the corresponding connection lines for a given amino acid. By clicking on the rectangle representing a given amino acid, the connection lines remain in the view. The pairs of amino acids that form the configuration in focus can be highlighted in the matrix (with green border rectangles in Fig. 4 a). From the color of the matrix cells, the user can immediately estimate the number of configurations in which these pairs are present. Vice versa, by interacting with the matrix and selecting the given rectangles, the side view is automatically filtered to show only those configurations that satisfy the filtering condition.

The Matrix view serves as the first filtration tool for selecting only those configurations that contain a desired combination of interacting amino acids. This filtering cannot be automated because the frequency of a given pair in configurations does not correlate with the importance of these configurations. The most frequent pair of interacting amino acids can be of the same interest as a pair interacting only in one configuration. Therefore, insights from the proteomic expert in combination with the interaction possibilities from the Matrix view have proven to be a very efficient and powerful solution. Selected configurations can be further processed by the following visualization methods.

Exploded view

The proteomics experts are already familiar with the manipulation of molecules in a three-dimensional (3D) environment; thus, a 3D representation has to be an integral part of the workflow. Moreover, the 3D space helps to find answers for questions Q3-Q5, which are related to the appearance of the contact zones of selected configurations and the properties of interacting amino acids (expressed by different coloring schemes). Exploring and comparing many structures in 3D at once suffers from problems such as high overlap, occlusion, and visual clutter (Fig. 5 b). Traditionally used spatial representations are not sufficient. To overcome these limitations, we adapted an exploded-view technique, to enlarge the distance between the interacting proteins. Figure 5 c shows the comparison of three configurations using our proposed Exploded view.

The main principle of the Exploded view is the following. First, all the reference proteins taken from the configurations selected in the Matrix view are aligned using the Combinatorial Extensions from the structural-alignment algorithm [14] so that their 3D spatial representations overlap (Fig. 5). Here, it is important to understand that the reference protein shown in Fig. 5 b (the brown one) actually represents three overlapping aligned reference proteins, each coming from one configuration. The set of paired proteins interacting with the reference proteins is positioned around the aligned reference proteins with an enlarged distance.

To ensure that the paired proteins in the Exploded view will not collide with each other, we arrange the paired proteins into a parabolic regular grid. For each reference protein and it’s paired protein, the Exploded view retains the information about their interaction. If several configurations are exploded at once, the Exploded view contains many paired proteins arranged around the aligned reference proteins. As the change in the position of the exploded proteins can cause disorientation in the scene, the pairing information between the corresponding reference proteins (aligned) and paired proteins (“exploded”) is initially indicated as a partially transparent tube that connects the centers of their contact zones. The radius of the tube is modulated (it is smaller in the middle of the tube to reduce the visual clutter). Once the user understands the overview of the protein spatial arrangement, the tube can be switched off. The pairing information is also encoded by color (a different color is used for each configuration). If the contact zones contain colliding amino acids (i.e., their mutual distance is less than 3 Å), the residues are indicated by a red color.

Figure 5 depicts a set of three configurations before (a, b) and after (c) applying the Exploded view. The Exploded view removes the problem of overlapping paired proteins. It also helps to see the shape and position of the contact zones. However, this solution does not solve the problem where the contact zones face each other, meaning that the user has to adjust the camera to observe the contact zones of the reference and paired proteins from a perpendicular viewing direction. This manipulation does not enable the user to see both contact zones simultaneously. This problem is solved by the proposed Open-Book view, which is presented in the following section.

Open-Book view

The Exploded view does not allow one to observe both parts of a given contact zone simultaneously. The proposed Open-Book view is designed to specifically answer questions similar to Q5, which addresses a detailed exploration of one selected contact zone in the complex C(P₁,P₂). This involves the presentation of the information about different properties of individual amino acids forming the contact zone and their pairing.

The Open-Book view is activated if the user selects one of the configurations from the Exploded view. The selection is performed by clicking on the connection tube from the desired configuration CONF_i(C(P₁,P₂)) in the Exploded view. The other configurations are automatically hidden, the selected configuration returns to its initial position (before applying the Exploded view), and an animated transition for the opening of CONF_i(C(P₁,P₂)) is launched. When animating the opening, the reference and paired proteins are rotated and translated so that they are positioned next to each other and the contact zones are facing towards the observer (see Fig. 6).

The algorithm performing the opening computes the vectors defining the orientation of the contact zones (their normal vectors). From the normal vectors and the camera position, we compute the rotation angle, which is then applied to the reference and paired protein. To maintain the information about the amino acid pairings, the user can also visualize individual connections between these pairs through simple lines.

The contact zones represented by their surfaces can be color-coded according to multiple criteria. The color can encode the distance between the amino acids or represents different physico-chemical properties of the amino acids or their atoms, such as hydrophobicity or partial charges. The coloring scheme used in the Matrix view represents the so-called conservation of the amino acids in all configurations. It can also be used to color the contact zone. The surfaces can be augmented with labels to inform the users about the type and identifier of individual amino acids.

In both the Exploded view and the Open-Book view, a protein can also be represented by other traditionally used visualization styles, such as cartoon, spheres, balls&sticks, sticks, etc. Moreover, these methods can be combined. For example, the proteins can be represented by the cartoon style and the amino acids in the contact zones can be visualized using the sticks representation to see their spatial orientation.

If the task is to compare individual configurations with respect to the pairs of interacting amino acids, a further drill-down is necessary. Therefore, in the next section, we propose another abstract view supporting mainly the comparison of paired amino acids in individual contact zones from selected configurations.

Contact-Zone list-view

The Contact-Zone list-view helps to answer questions related to the comparison of the contact zones at the level of the individual amino acids, such as in Q6. The list for one configuration consists of two sets of amino acids in the contact zones, each set coming from one interacting protein (see Fig. 7). The left part of the view contains all amino acids coming by default from the reference protein, while the right part is formed by their interaction counterparts in the paired protein. However, the order of proteins in the list-view can be changed. The order depends on the current task, i.e., if we want to compare the constitution of contact zones from the reference or the paired protein in the given configurations. The view contains all possible connections (with respect to the distance) between the amino acids from both contact zones. To avoid the intersection of lines representing the connections, some amino acids on the right side are repeated – one instance for each reference protein amino acid within a user-defined distance. This solution was adopted because without these repetitions, there would be many line intersections, which substantially decreases the readability of the representation (see Fig. 2 b).

For each configuration, one list-view is created and all the list-views are juxtapositioned so the user can see and visually compare the constitution of the contact zones from all selected configurations. The user can modify this representation by changing the color, which can encode different properties for the amino acids mapped onto their corresponding rectangles. The properties are the same as those mapped onto the surface of the contact zone in the Exploded and Open-Book views. The left part of the list can then be sorted according to these properties (see Fig. 8). Moreover, by clicking on individual rectangles representing the amino acids, the corresponding amino acids are selected in the 3D view as well.

The principle steps for building the Contact-Zone list-view are the following. For all configurations, which should be visualized in the Contact-Zone list-view, we find the interacting pairs of amino acids in their contact zones.

Then, the list of amino acids present in all reference proteins from the selected configurations is created. Now, for each configuration, we take the interacting amino acids from the paired proteins, sort them according to a selected criterion (e.g., hydrophobicity), and add them to the Contact-Zone list-view. The amino acids in the left part of the Contact-Zone list-view are always sorted in the same way for all depicted configurations. Similar to the Matrix view, the user can select a primary configuration to which all the remaining configurations are compared (see Fig. 7 b) using the proposed ranking score algorithm, which is described in “Matrix view” section. The Contact-Zone list plots the configurations ordered from left to right by the similarity score from the most similar to the least similar. The Contact-Zone list-view of the primary configuration is always displayed as the first one from the left side of the view.

The user can select between two visualization modes – the compare and the compact list-view. In compare mode, the amino acids in the contact zone in the primary configuration that are not present in the contact zone from any other configuration are depicted as white rectangles with labels giving the names of the missing amino acids (see Fig. 7 b). The compact mode omits these missing amino acids to save space. In both modes, the matches between amino acids in the primary configuration are highlighted with red bordered rectangles and connecting lines. This way, the user can immediately see which amino acids are present in both the primary configuration as well as the other configurations and which amino acids are missing. To guide the visual comparison, we also introduced interactive highlighting and, if necessary, zooming to corresponding amino acids in different configurations.

Results and discussion

To demonstrate the usability of our proposed techniques, we selected three representative basic types of PPI patterns present in SMC complexes [15]. SMC (Structure Maintenance of Chromosome) complexes are the key players in chromatin organization where they ensure the stability and dynamics of chromosomes. The way the subunits of these complexes interact with each other is key for their functions [16]. A visual representation of such information is highly beneficial as it helps to reveal the spatial relationships between the subunits in an intuitive way. The three basic PPI types are coiled-coil, pocket-string, and surface-surface interactions [17]. In the following subsections, we demonstrate the usefulness of our proposed visualizations on these three types of interactions.

Surface-surface interaction

The most frequent surface-surface interaction type was tested on the NSE1 and NSE3 proteins in the SMC5/6 complex. This interaction has been analysed as it represents a dimer of kite proteins, which are critical for the function of eukaryotic SMC5/6 and bacterial SMC complexes [15, 18, 19].

The crystal structure of the human NSE1-NSE3 dimer was already examined in detail and the resulting configuration is already published in the PDBsum database under the PDB identifier 3NW0. Therefore, it can serve as a primary testing complex for both the computational tools as well as for our proposed visualizations. To restrict the set of possible docking configurations, we selected the web version of the HADDOCK tool and a pair of interacting amino acids, i.e., methionine with ID 23 from the reference protein and leucine with ID 97 from the paired protein (Fig. 2 b). This selection was based on experimental data from previous works [19–22]. The HADDOCK analysis resulted in 40 possible configurations. HADDOCK groups the configurations into clusters, according to their similarity, which is defined internally by the HADDOCK score. In our case, it led to 10 clusters each containing 4 configurations.

The computed configurations were loaded into our COZOID visualization system, which interactively links all the proposed visualizations. From these configurations, the Matrix view was computed first, which contains the frequencies of all the pairs of amino acids within the interaction distance within these 40 configurations. The matrix identified configurations containing pairs of interacting amino acids with interaction distances smaller than 4 Å. In our particular case, the leucine 97 and methionine 23 amino acids were within this interaction distance in only three configurations out of the initial 40 (Fig. 4). The Matrix view helped to filter these immediately through a simple interaction with the view. The remaining 37 configurations were automatically hidden in the remaining views.

In the next step, we switched to the Contact-Zone list-view and compared the list of amino acids from the 3NW0 crystal structure with the lists of all three selected configurations. Figure 9 shows the comparison between the 3NW0 structure and the three selected HADDOCK configurations. From the given portion of the Contact-Zone list-view, the similarities and differences between the 3NW0 crystal (in the leftmost list) and the three selected HADDOCK configurations at the level of the individual amino acids are clearly visible. Additionally, the pairs of the interacting amino acids identical to the 3NW0 crystal structure are highlighted (red lines in Fig. 9). The left-to-right order of the modelled configurations in Fig. 9 reflects their similarity to the primary crystal structure, based on the number of identical pairs of amino acids (the best model is next to the crystal).

Finally, the 3NW0 crystal and three selected configurations were explored using the 3D representations with the aim of exploring the constitution, mutual distances, and properties of the contact zones in detail. In 3NW0, the first NSE1 interacting protein was selected as the reference protein and all three configurations were aligned with respect to the paired proteins. The paired proteins were positioned around the reference one. Figure 5 a shows the situation where the three selected configurations are visualized using a commonly available method. The configurations are represented as surfaces and the contact zones are highlighted using different colors. However, the most interesting parts, i.e., the contact zones, are hidden (Fig. 5 b).

Our Exploded view overcomes this limitation so the individual contact zones from all the paired proteins are clearly visible (Fig. 5 c). Moreover, if we point the camera towards the aligned reference proteins, the differences between the positions in the contact zones in the reference proteins can be observed as well. The Exploded view representation gave us the information about the mutual positioning of the individual configurations with respect to the positions of the contact zones.

Using our tool, the investigation can go even deeper to the level where individual contact zones can be explored in detail using the Open-Book view. By animating the opening of the protein complex, we were able to look inside the contact zone. The Open-Book view enhancements, i.e., labelling the surface of the contact zones with the names of the corresponding amino acids and coloring them according to different properties, were highly beneficial for exploring the physico-chemical and geometric properties of the individual amino acids.

Coiled-coil interaction

For the second type of interaction, we picked the SMC3 coiled-coil arm from the SMC complex [16]. The interaction site is formed by two helical fragments from the SMC3 protein. The primary structure is published under the PDB identifier 4UX3 [23].

Using this structure, the results of both the HADDOCK and the PyDock tools were tested. The HADDOCK results contained 40 output configurations. Using the Matrix view, we set the interaction distance threshold between 3 and 5 Å and selected methionine 186 and isoleucine 1030 as the initial pair of interacting amino acids (Fig. 10). These amino acids were used as the input restraints for the HADDOCK computation as well. These restraints were applied to select the correct configurations in the Matrix view (Fig. 10).

Next, the selected configurations were structurally aligned to the primary 4UX3 structure in 3D space. Afterwards, we selected the first amino acid (A172) within the respective helices and visually compared their positions in the 3D view. In this case, it was not even necessary to use other views to see that the preselected HADDOCK configurations exhibited a wrong orientation of the aligned helices. In all the output models, the A172 amino acids were located on the opposite side in comparison with the primary 4UX3 crystal (see Fig. 11). The 3D view from COZOID helped to reveal this misorientation intuitively and quickly, without a detailed exploration of the HADDOCK configurations one-by-one.

As for the PyDock results, 28 out of 100 output PyDock models were selected using the Matrix view; the M186 and I1030 interaction pair was used to filter the results. The visual selection (based on A172 position judgement) provided us with 14 models in the correct orientation (see Fig. 12).

In the final step, we compared the Contact-Zone lists of the selected models with the original crystal structure (4UX3). Figure 13 shows the similarities (highlighted in red) of one of the selected models to the crystal. It is the best model, and fits the crystal structure very well. The Exploded view comparison of the contact zone from the crystal structure and the selected model can be observed in Fig. 14.

Pocket-string interaction

For the pocket-string interaction type, we selected an interaction present in the crystal structure from the MukE-MukF complex (PDB identifier 3EUH) [24]. The pocket is formed by the winged helix domain of the MukE protein, while one of the MukF helical fragments is sitting inside the MukE pocket (Fig. 15 a). This time, we selected valine 200 and arginine 300 as the pair of amino acids for the docking restraints. These were the closest contact amino acids in the structure, as can be observed from the Contact-Zone list ordered by the distance of the interacting amino acids (see Fig. 16), as well as from the Open-Book view of the crystal structure (Fig. 15 b).

The docking models were again generated with both HADDOCK and PyDock docking tools. The HADDOCK run resulted in 32 output configurations, which were first scrutinized using the Matrix view, using the initial V200-R300 amino acid pair. This first selection step filtered away only 8 models, leaving 24 models for further analysis. Then, we repeated the Matrix view filtering using the second tightest amino acid contact in the crystal (tyrosine 110 and arginine 302) (Fig. 15 b). This filtration resulted in 6 docking models. The Contact-Zone lists of these models were compared with the original crystal structure (3EUH), resulting in an ordered list of the best models (Fig. 16). The visual exploration confirmed that the first model from the Contact-Zone list fits best to the original structure (Fig. 17).

PyDock docking provided 100 models, which were analysed similarly to the HADDOCK models. The selection steps with the Matrix view, including the first filtration step with the initial amino acid pair and the filtering with the second amino acid pair, resulted in 32 and 19 models, respectively. The Contact-Zone lists of these models were then compared with the original crystal structure. The models most closely matching the original crystal structure, which was detected using the Contact-Zone list, were then visually explored in 3D using the Exploded view and the Open-Book view. This step revealed that the best five models from the list are very close to the original crystal, though none of them precisely fits the crystal structure.

Here, we took the advantage of our testing setup (using the tightest contacts between the interacting amino acids) and altered the interaction distance parameter in the Matrix view for the selection procedure. All PyDOCK models were re-evaluated with the distance parameter set to 4 Å (compared to the previous 5 Å default parameter settings). As expected, fewer configurations containing the V200-R300 and Y110-R302 amino acid pairs were found within the 4 Å distance (the Matrix view selection steps resulted in 21 and 13 models, respectively). However, the altered distance parameter also resulted in a different ranking of the configurations in the Contact-Zone lists. Figure 18 shows the comparison of the Contact-Zone lists for the 3EUH crystal structure computed with 5 Å and 4 Å distance parameter settings. It can be seen that the decreased distance parameter eliminated several amino acid pairs with distance greater than 4 Å from the crystal structure Contact-Zone list. The eliminated pairs were not considered in the new Contact-Zone list ranking, where five models, the most similar to the crystal, were once again selected (Fig. 19 a). Four of these five models overlapped with the five best models detected with the previous system set-up; however, a new model with a closer match was also identified (Fig. 19 b).

This test indicates the robustness of our tool with different parameter settings and its potential for experimental use in proteomics. Our tool can also be used to select an alternative input pair of interacting amino acids, which then serves as the input for the computational tools. These amino acids might be selected based on the COZOID analysis of the 3NW0 crystal (using the Matrix view or Exploded view) when searching for the most central and closest amino acids.

Altogether, COZOID helped us to quickly select the best docking configuration using several visualization approaches. First, the Matrix view allowed us to pick models containing a particular pair of interacting amino acids. Next, with the Contact-Zone list, we sorted these models based on the similarity of their contact zones with the original crystal structure. Using the 3D Exploded view, the best model was determined and confirmed. While the Exploded view is already available in some of current 3D visualization tools, the power of its combination with our other proposed approaches lies in the speed, user-friendly design, and highly interactive selection mechanism. Additionally, a similar workflow can be applied for the selection of docking models from homologous proteins, which is not available in the PDB database, yet is often used when different model organisms are employed in proteomic studies.

For example, our Contact-Zone list can be used in the experimental design of mutants by replacing key contact residues. This tool can be used by proteomic expert to select amino acids in the contact zones that could be mutated, i.e., replaced by other amino acids. The ultimate goal of these mutations could be to strengthen the interactions in the contact zone or, completely destroy the interaction between the involved proteins.

Conclusions

In this paper, we have presented COZOID, a new tool for the visual exploration of configurations of two interacting proteins. It introduces a set of visualization methods for the exploration and evaluation of proteomic relevance of large sets of configurations detected with existing computational tools. Our proposed methods were designed to follow and support the workflow followed by proteomic experts. We described the design rationale and the principles of these methods, as well as their linking and interaction possibilities. We tested these methods on real datasets of the SMC complex subunits and demonstrated their usability in three studies covering the most common interaction types. Our aim was to overcome the drawbacks of the existing methods for visual analysis and comparison of configurations, which provide users with traditional 3D view and exploration of individual configurations one-by-one. Additionally, specialized techniques enabling to explore the content of the contact zone are completely missing. Therefore, our proposed solution provides proteomic experts with information that is very hard or even impossible to obtain using these previously available methods. The system enables iterative filtering of the configurations that do not satisfy given criteria in the individual stages of the workflow. The executable binary, along with the exemplary dataset and user guide are available in the supplementary material of the manuscript (Additional files 2, 3, and 4).

In the future, we plan to focus on the extension of our proposed techniques in cases where the user has no a priori knowledge about the protein complex, but can still feed in experimental data from mutagenesis or crosslink analysis.

Abbreviations

PPI:: Protein-protein interaction
COZOID:: Contact zone identifier
SMC:: Structure maintenance of chromosome
PDB:: Protein data bank

References

Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB, Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006; 440(7084):631–6.
Article CAS PubMed Google Scholar
Huang SY. Search strategies and evaluation in protein-protein docking: principles, advances and challenges. Drug Discov Today. 2014; 19(8):1081–96.
Article CAS PubMed Google Scholar
Malhotra S, Mathew OK, Sowdhamini R. DOCKSCORE: a webserver for ranking protein-protein docked poses. BMC Bioinformatics. 2015; 16(1):1–6.
Article CAS Google Scholar
Jin L, Wang W, Fang G. Targeting protein-protein interaction by small molecules. Annu Rev Pharmacol Toxicol. 2014; 54(1):435–56.
Article CAS PubMed Google Scholar
Lee CH, Varshney A. Computing and displaying intermolecular negative volume for docking. In: Scientific Visualization: The Visual Extraction of Knowledge from Data. Berlin: Springer: 2005. p. 49–64.
Google Scholar
Varshney A, Brooks Jr FP, Manocha D, Wright WV, Richardson DC. Defining, computing, and visualizing molecular interfaces. In: Proceedings of the 6th Conference on Visualization’95. Atlanta: IEEE Computer Society: 1995. p. 36–43.
Google Scholar
Ban YEA, Edelsbrunner H, Rudolph J. Interface surfaces for protein-protein complexes. J ACM (JACM). 2006; 53(3):361–78.
Article Google Scholar
Laskowski RA, Hutchinson GE, Michie AD, Wallace AC, Jones ML, Thornton JM. PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci. 1997; 22(12):488–90.
Article CAS PubMed Google Scholar
Lex A, Streit M, Schulz H, Partl C, Schmalstieg D, Park PJ, Gehlenborg N. StratomeX: visual analysis of Large-Scale heterogeneous genomics data for cancer subtype characterization. Comput Graph Forum (EuroVis ’12). 2012; 31(3):1175–84.
Article CAS Google Scholar
Kozlikova B, Sebestova E, Sustr V, Brezovsky J, Strnad O, Daniel L, Bednar D, Pavelka A, Manak M, Bezdeka M, et al. Caver analyst 1.0: graphic tool for interactive visualization and analysis of tunnels and channels in protein structures. Bioinformatics. 2014; 30(18):2684–5.
Article CAS PubMed Google Scholar
Dominguez C, Boelens R, Bonvin AMJJ. HADDOCK: A protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc. 2003; 125(7):1731–7.
Article CAS PubMed Google Scholar
Jimenez-Garcia B, Pons C, Fernandez-Recio J. pyDockWEB: a web server for rigid-body protein-protein docking using electrostatics and desolvation scoring. Bioinformatics. 2013; 29(13):1698–9.
Article CAS PubMed Google Scholar
Rao R, Card SK. The table lens: Merging graphical and symbolic representations in an interactive focus + context visualization for tabular information. In: Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI ’94). Boston: ACM: 1994. p. 318–22.
Google Scholar
Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998; 11(9):739–47.
Article CAS PubMed Google Scholar
Palecek JJ, Gruber S. Kite Proteins: a Superfamily of SMC/Kleisin Partners Conserved Across Bacteria, Archaea, and Eukaryotes. Structure. 2015; 23(12):2183–90.
Article CAS PubMed Google Scholar
Gligoris T, Lowe J. Structural Insights into Ring Formation of Cohesin and Related Smc Complexes. Trends Cell Biol. 2016; 26(9):680–93.
Article CAS PubMed PubMed Central Google Scholar
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biology of the Cell. 4th edn.New York: Garland Science; 2002.
Google Scholar
Zabrady K, Adamus M, Vondrova L, Liao C, Skoupilova H, Novakova M, Jurcisinova L, Alt A, Oliver AW, Lehmann AR, Palecek JJ. Chromatin association of the SMC5/6 complex is dependent on binding of its NSE3 subunit to DNA. Nucleic Acids Res. 2016; 44(3):1064–79.
Article CAS PubMed Google Scholar
Doyle JM, Gao J, Wang J, Yang M, Potts PR. MAGE-RING protein complexes comprise a family of E3 ubiquitin ligases. Mol Cell. 2010; 39(6):963–74.
Article CAS PubMed PubMed Central Google Scholar
Hudson JJ, Bednarova K, Kozakova L, Liao C, Guerineau M, Colnaghi R, Vidot S, Marek J, Bathula SR, Lehmann AR, Palecek J. Interactions between the Nse3 and Nse4 components of the SMC5-6 complex identify evolutionarily conserved interactions between MAGE and EID Families. PLoS ONE. 2011; 6(2):17270.
Article Google Scholar
Kozakova L, Vondrova L, Stejskal K, Charalabous P, Kolesar P, Lehmann AR, Uldrijan S, Sanderson CM, Zdrahal Z, Palecek JJ. The melanoma-associated antigen 1 (MAGEA1) protein stimulates the E3 ubiquitin-ligase activity of TRIM31 within a TRIM31-MAGEA1-NSE4 complex. Cell Cycle. 2015; 14(6):920–30.
Article CAS PubMed PubMed Central Google Scholar
van der Crabben SN, Hennus MP, McGregor GA, Ritter DI, Nagamani SCS, Wells OS, Harakalová M, Chinn IK, Alt A, Vondrová L, Hochstenbach R, van Montfrans JM, Terheggen-Lagro SW, van Lieshout S, van Roosmalen MJ, Renkens I, Duran K, Nijman IJ, Kloosterman WP, Hennekam E, Orange JS, van Hasselt PM, Wheeler DA, Palecek JJ, Lehmann AR, Oliver AW, Pearl LH, Plon SE, Murray JM, van Haaften G. Destabilized SMC5/6 complex leads to chromosome breakage syndrome with severe lung disease. J Clin Investig. 2016; 126(8):2881–92.
Article PubMed PubMed Central Google Scholar
Gligoris TG, Scheinost JC, Burmann F, Petela N, Chan KL, Uluocak P, Beckouet F, Gruber S, Nasmyth K, Lowe J. Closing the cohesin ring: structure and function of its Smc3-kleisin interface. Science. 2014; 346(6212):963–7.
Article CAS PubMed PubMed Central Google Scholar
Woo JS, Lim JH, Shin HC, Suh MK, Ku B, Lee KH, Joo K, Robinson H, Lee J, Park SY, Ha NC, Oh BH. Structural studies of a bacterial condensin complex reveal ATP-dependent disruption of intersubunit interactions. Cell. 2009; 136(1):85–96.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We acknowledge the members of Palecek lab for their participation in the COZOID testing.

Funding

This work was supported through grants from the Vienna Science and Technology Fund (WWTF) through project VRG11-010, the PhysioIllustration research project 218023 funded by the Norwegian Research Council, the Ministry of Education, Youth and Sports of the Czech Republic project CEITEC 2020 (LQ1601), and an Internal Masaryk University grant (MU/0822/2015). This funding enabled the authors to design, implement, and evaluate the proposed system and to write the manuscript.

Availability of data and materials

Data and materials are available here: https://doi.org/http://decibel.fi.muni.cz/cozoid/.

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Katarína Furmanová & Barbora Kozlíková
Department of Informatics, University of Bergen, Bergen, Norway
Jan Byška
Institute of Visual Computing & Human-Centered Technology, TU Wien, Wien, Austria
Eduard M. Gröller & Ivan Viola
National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic
Jan J. Paleček
Central European Institute of Technology, Masaryk University, Brno, Czech Republic
Jan J. Paleček

Authors

Katarína Furmanová
View author publications
You can also search for this author in PubMed Google Scholar
Jan Byška
View author publications
You can also search for this author in PubMed Google Scholar
Eduard M. Gröller
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Viola
View author publications
You can also search for this author in PubMed Google Scholar
Jan J. Paleček
View author publications
You can also search for this author in PubMed Google Scholar
Barbora Kozlíková
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KF participated on the design of the tool, its implementation, and paper editing. JB contributed to the implementation. IV and EMG contributed to the design of visualizations and interactions of the tool. JJP contributed to the design and was responsible for testing and evaluation of the tool and paper editing. BK contributed to the design, paper writing, and was coordinating the team and activities. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Barbora Kozlíková.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Supplementary video. Video showcasing the software tool in action. (MP4 17,613 kb)

Additional file 2

Software build. Executable binary file of the software tool. (ZIP 118,784 kb)

Additional file 3

Example data. Testing dataset used in the manuscript. (ZIP 2662 kb)

Additional file 4

User guide. User guide for the software tool. (PDF 3502 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Furmanová, K., Byška, J., Gröller, E.M. et al. COZOID: contact zone identifier for visual analysis of protein-protein interactions. BMC Bioinformatics 19, 125 (2018). https://doi.org/10.1186/s12859-018-2113-6

Download citation

Received: 30 November 2017
Accepted: 12 March 2018
Published: 06 April 2018
DOI: https://doi.org/10.1186/s12859-018-2113-6

COZOID: contact zone identifier for visual analysis of protein-protein interactions

Abstract

Background

Results

Conclusions

Background

Related work

Methods

COZOID overview

Matrix view

Exploded view

Open-Book view

Contact-Zone list-view

Results and discussion

Surface-surface interaction

Coiled-coil interaction

Pocket-string interaction

Conclusions

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional files

Additional file 2

Additional file 3

Additional file 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us