The output of CIG-P is a circular diagram. On the outer circle of the diagram, both the reference protein sets and the bait protein are placed, latter is flanked by white spaces serving as separator and scale as each white space is proportional to three proteins. The interactions defined in the experiment file are drawn as arcs in the center of the circle (see Figure 1B and C). We believe this layout is quite intuitive and conveys the nature of an AP-MS experiment, whereby all interactions represented by arcs originate from the bait protein. The initial default settings of circle size and arc thickness can be adjusted using the controls in the top left of the CIG-P’s graphical user interphase. Also, new experiment, reference or color schemes can be loaded live into the newly drawn circular diagram. Following below, we present two distinct applications of CIG-P: first the quick visualization of various AP-MS experiments to each other, while the second application focuses on the visual integration of orthogonal proteomics datasets.
For the lenticular function CIG-P diagram, the protein sets in the reference input file are defined as high confidence prey proteins of individual baits, comparing multiple baits with each other. Alternatively, each protein set can be defined as set of high-confidence prey proteins per condition of the same bait where the cellular system underwent a perturbation. For example, the primary protein set is defined as the prey proteins of a particular bait when the cellular system is treated with the carrier, while all subsequent protein sets are the prey proteins upon stimulating the cellular system with a particular chemical compound. The resulting series of CIG-P circular diagrams will rapidly visualize the changes in the interactome of the bait as a function of perturbation.
CIG-P is also equipped with a reappearance mapping function. If turned OFF, only the first instance of a match is mapped and displayed as arc. The reappearance function OFF can be useful in the above mentioned scenario of a perturbed cellular system whereby the primary protein set contains all prey proteins of the control, while all other protein sets contain prey proteins under perturbation. This set up allows for visualizing which prey proteins are new compared to the control AP-MS experiment (see Additional file 1: Figures S7-S10). On the other hand, with the reappearance function ON all interactions are redundantly drawn, which is important if multiple reference protein sets contain the same protein, e.g. if a certain protein belongs to multiple functionally annotated protein complexes (see Application of CIG-P below).
Great insight into individual AP-MS experiments can be gained by projecting the newly generated data onto orthogonal proteomics datasets. Orthogonal datasets could include native protein complex fractionation techniques [12] or functional fractionations and annotations of super-complexes, such as the ribosome, proteasome or spliceosome [13]. Using this type of higher order annotation, the individual AP-MS experiment is immediately placed into a wider context for rapid interpretation of the data at hand.
To demonstrate the functionality of CIG-P, we visualize data of a published dataset [14] and draw conclusions from our circular diagrams which were not mentioned in the original publication, supporting our initial motivation that abstract visualization can guide scientists to establish new working hypothesis.
The original dataset [14] encompasses the interactome of the CMGC clade of kinases. Four members of this CMGC clade show many interactions with splicing related proteins. Hence, we will focus on these four kinases: SRPK1 (Uniprot ID: Q96SB4), SRPK2 (P78362), SRPK3 (Q9UPE1) and PRPF4B (Q13523). Although, all kinases mentioned are associated with the splicosome, latter is an extremely dynamic ribonucleic complex catalyzing the excision of exons from a primary messenger RNA. To visualize that some of these kinases with overlapping prey proteins, we used the lenticular function of CIG-P and defined as protein sets (reference file) the preys associated with each kinase. When loading the experiment file of SRPK1 in the non-redundant mapping mode, all 26 interactors are visualized (Additional file 1: Figure S2). To immediately visualize the overlap of the SRPK1 interactome with the prey proteins of the other kinases, the reappearance function of CIG-P was turned on (Additional file 1: Figure S7). From the redundant circular diagram it is apparent that SRPK1 and SRPK2 share a lot of prey proteins, while SRPK3 and PRPF4B have a distinct interactome.
To illustrate the distinct nature of PRPF4B its experiment file is loaded into CIG-P from the graphical user interface. It is apparent from the circular representation (see Additional file 1: Figures S5 and S10) that PRPF4B has a distinct interactome presumably acting on a subset of spliceosomal proteins within the splicing cascade.
To follow up on commonalities and differences of these four kinases with spliceosomal prey proteins, we set as reference list a protein set derived from extensive functional fractionation of some 300 spliceosomal proteins [13]. The projection of AP-MS data onto an orthogonal proteomics dataset allows scientist to place the AP-MS data into context (see Figure 1B). As already established in the lenticular function CIG-P, SRPK1 and SRPK2 share largely an overlapping network of interactors throughout the splicing cycle, except a complete lack of interactors from the tri-snRNP (U5.U4/U6) (see Additional file 1: Figures S15 and S16). On the contrary, PRPF4B almost exclusively interacts with tri-snRNP associated proteins (see Additional file 1: Figure S18). From the lenticular function CIG-P analysis it was expected that the interactors were quite dissimilar, however, projecting the AP-MS dataset onto an orthogonal functionally fractionated proteomics dataset allows for a rapid functional annotation and visualization of these differences.
Besides rapid comparison of different kinase interactors, integration of orthogonal proteomics datasets, CIG-P can also serve to create new working hypothesis: for SRPK1 and SRPK2 not only the prey proteins were determined, but also the in vitro kinase substrates [14]. Hence, we took advantage of CIG-P’s function to either draw colored or black arcs (as defined in the experiment file). We define colored arcs as protein-protein interactions and black arcs as protein kinase substrates (see Figure 1B). In the case of SRPK1 we postulate that the kinase binds to 17S U2 related proteins and phosphorylates a U1 snRNP protein, presumably promoting a dynamic transition at the onset of the splicing process.