ICC-CLASS: isotopically-coded cleavable crosslinking analysis software suite
© Petrotchenko and Borchers; licensee BioMed Central Ltd. 2010
Received: 26 June 2009
Accepted: 28 January 2010
Published: 28 January 2010
Successful application of crosslinking combined with mass spectrometry for studying proteins and protein complexes requires specifically-designed crosslinking reagents, experimental techniques, and data analysis software. Using isotopically-coded ("heavy and light") versions of the crosslinker and cleavable crosslinking reagents is analytically advantageous for mass spectrometric applications and provides a "handle" that can be used to distinguish crosslinked peptides of different types, and to increase the confidence of the identification of the crosslinks.
Here, we describe a program suite designed for the analysis of mass spectrometric data obtained with isotopically-coded cleavable crosslinkers. The suite contains three programs called: DX, DXDX, and DXMSMS. DX searches the mass spectra for the presence of ion signal doublets resulting from the light and heavy isotopic forms of the isotopically-coded crosslinking reagent used. DXDX searches for possible mass matches between cleaved and uncleaved isotopically-coded crosslinks based on the established chemistry of the cleavage reaction for a given crosslinking reagent. DXMSMS assigns the crosslinks to the known protein sequences, based on the isotopically-coded and un-coded MS/MS fragmentation data of uncleaved and cleaved peptide crosslinks.
The combination of these three programs, which are tailored to the analytical features of the specific isotopically-coded cleavable crosslinking reagents used, represents a powerful software tool for automated high-accuracy peptide crosslink identification. See: http://www.creativemolecules.com/CM_Software.htm
Recent developments in the mass spectrometric analysis of proteins and peptides have led to renewed interest in the application of classical protein chemistry methods for structural elucidation of proteins and protein complexes. Advances in modern proteomics approaches, instrumentation, and methods for studying proteins and protein complexes structures has resulted in the development of the distinct field of structural proteomics. One of the most powerful methods in the structural proteomics toolbox is chemical crosslinking combined with mass spectrometry . The idea behind this combination of techniques is straightforward -- to covalently modify proteins with reagents containing two reactive groups - i.e., crosslinkers -- to identify the sites of crosslinking and to deduce structural information about the protein system based on the spatial constraints derived from the length of the crosslinking reagents used. The current experimental paradigm is to enzymatically digest or to otherwise fragment these crosslinked proteins, and then to identify the crosslinked peptides (crosslinks) by mass spectrometry, thus determining the sites of crosslinking and providing information about protein conformation and protein-protein interaction sites. This approach inevitably creates a complex mixture of peptides in which unambiguous identification of the crosslinks is difficult.
One of the solutions to this problem is to use isotopically-coded crosslinking reagents . When a crosslinker contains a mixture of chemically-identical light and heavy isomers, crosslinked products show up as doublets of peaks in the mass spectra. This provides a unique mass spectrometric "signature" for the detection of the crosslinks. Unambiguous assignment of the crosslinks based on their mass and MS/MS fragmentation patterns is another problem, due to the combinatorial possibilities of the constituents of inter-peptide crosslinks and the complexity of the MS/MS spectra because of the simultaneous fragmentation of two peptides per crosslink. To address this challenge, cleavable crosslinking reagents have been proposed . Cleavage of the crosslinker converts the mass spectrometric analysis of a crosslink into the well-established analysis of the single peptides from which the inter-peptide crosslink was formed. Sequence data from the peptides making up the crosslink provides confirmation of the identity of these peptides and reduces the possibility of incorrect assignment of the crosslinks.
Particularly rewarding is a combination of these two features: isotopic coding and cleavage of the crosslinks . Cleavage of the isotopically-coded crosslinks creates a new distinct signature for the cleaved crosslinks because the resulting pair of peptides might contain only a portion of the total number of isotopes in the uncleaved crosslinks. For example, the cleavage of the crosslinker BiPS, which contains eight deuterium atoms, leads to two peptides from each inter-peptide crosslink. Both of these peptides contain residual portions of the crosslinker, with each portion containing four deuterium atoms. Thus, the uncleaved crosslink will appear in the MS spectrum as a doublet of peaks 8 Da apart while the cleaved crosslink products will appear as doublets of peaks 4 Da apart. Knowledge of the cleavage reaction chemistry for each cleavable crosslinking reagent allows one to establish the specific mass relationships between uncleaved and cleaved crosslink signals which can be used as diagnostic crosslink-identification tools. Furthermore, crosslink assignments can be unambiguously confirmed by MS/MS analysis of both the uncleaved and the corresponding cleaved crosslinks.
Because of the large amount of data which is typically produced in the course of mass spectrometric analysis of crosslinked proteins, data analysis needs to be automated to make this analytical strategy feasible and applicable. Several software products have been proposed for this purpose [5–13], reviewed in . The simplest approach described is to match the mass of a crosslink to possible combinations of the individual peptides predicted from protein sequences [5–7]. The success of this analysis depends both on the simplicity of the protein system and the mass accuracy of the MS measurements. The next level of confidence in assignment is achieved by programs taking into account MS/MS fragmentation of the crosslinks [8–11]. The discrimination of crosslinker-containing and non-containing fragment masses, distinguished from one another by the isotopic coding of the crosslinker, is the next step towards improving the efficiency and the accuracy of the crosslink identification process [12, 13]. Unfortunately, straightforward use of the fragment ion masses is still sometimes not sufficient for unambiguous identification of the crosslinks derived from complex protein systems or whole proteomes .
To enable more confident and correct assignment of inter-peptide crosslinks, we have incorporated cleavage information in combination with isotopic coding into this new crosslink analysis software suite. We have developed a set of programs specifically designed for each step of an experiment where isotopically-coded cleavable crosslinking reagents are used. These steps are: 1) detection of the uncleaved and cleaved crosslinks (DX), 2) cleavage and identification of the cleavage products (DXDX), and 3) MS/MS fragmentation analysis of uncleaved and cleaved crosslinks (DXMSMS). We call this set of programs "ICC-CLASS" (I sotopically-C oded C leavable C rossL inking A nalysis S oftware S uite). Detection of the signals from the isotopically-coded crosslinks in the mass spectra based their isotopic signatures (the DX program) is done by searching the data for pairs of peaks with a mass increment corresponding to the mass difference between the heavy and light isotopic forms of the crosslinker. The DXDX program is designed specifically to provide automated isotopically-coded cleavable crosslink type identification based on cleavage information. DXMSMS program features separate input for isotopically-coded and non-coded fragment masses, input for possible cleavage products, as well as output of fragment masses for the cleaved crosslinks. Incorporation of crosslink cleavage data into the analysis greatly enhances confidence of crosslinks assignments. Here we describe in detail the functions and algorithms used in each module, as well as the overall structure of this software suite.
Programs were written in Microsoft Visual Basic 2008 Express Edition. Downloadable files are posted on http://www.proteincentre.com and the http://www.creativemolecules.com website at http://www.creativemolecules.com/CM_Software.htm. Executing the downloaded programs requires installed Microsoft .NET Framework, which is freely available from http://www.microsoft.com.
These programs are primarily oriented towards Applied Biosystems (AB) MALDI-TOF/TOF data from HPLC fractions, but can be used with any tab-delimited mass lists and are therefore instrument independent. We also have provided the text of these macros, along with installation instructions for instruments using AB's Data Explorer software, which automatically generates mass lists and lists of ion signal doublets from multiple MALDI-MS spectra. All results are saved in text files as tab-delimited values and can therefore be easily copied into Excel spreadsheets.
Results and Discussion
Analysis of the data from experiments using isotopically-coded cleavable crosslinkers
At this step, the cleaved and uncleaved crosslinks can be matched to each other using the DXDX program, thus identifying dead-end, intra-peptide crosslinks, as well as possible cleaved components of the inter-peptides crosslinks. A dead-end crosslink is a single peptide modified with only one reactive group of the crosslinker reagent, while the second group is hydrolyzed or blocked. An intra-peptide crosslink is a single peptide where two residues within the same peptide are crosslinked to each other. An inter-peptide crosslink is a pair of peptides bridged with a crosslinker molecule. The cleaved DX mass list can be used again as an additional restricting input parameter for the DXMSMS assignments of the uncleaved crosslinks (see below). Finally, the assignments can be verified by MS/MS analysis of the cleaved crosslinks -- by matching their fragment ion masses with predicted masses from DXMSMS.
The ICC-CLASS software package thus provides a means of automating every data analysis step in a mass spectrometric experiment done with isotopically-coded cleavable crosslinkers. This software facilitates the assignment of these crosslinks, while the use of cleavable crosslinkers strengthens the confidence of these assignments.
Searching for doublets (DX)
Isotopic coding mass differences.
H (1.00728 Da)
D (2.01355 Da)
Comparison of the uncleaved and cleaved doublets (DXDX)
Mass additions for crosslinks cleavage products.
Crosslink identification based on MS/MS fragmentation data (DXMSMS)
The cornerstone of successful crosslinking applications in structural proteomics is the confident identification of the crosslinks. Unfortunately, even if one cleaves the crosslinks and uses high mass-accuracy instruments, it is still challenging to unambiguously identify crosslinks based solely on mass, especially in digests from complex protein systems. MS/MS fragmentation information on the crosslinks needs to be included in the analysis in order to provide confirmation of the identification. MS/MS fragmentation of the individual peptides obtained by cleavage of the inter-peptide crosslinks provides an additional level of confidence in the identifications by providing partial sequence information for the peptides forming the crosslink. To address the complex nature of the MS/MS spectra from inter-peptide crosslinks, we incorporated additional features into the analysis of the fragmentation data by the DXMSMS program. Fragmentation of the isotopically-coded crosslinks produces two types of ions: ions which contain the isotopically-coded crosslinker, and ions that are not isotopically coded. Isotopically-coded fragment ions are the most informative and important for confident assignment of the crosslinks. Distinguishing between these two types of ions provides additional specificity for assignment of the crosslink fragments. Finally, the fragmentation of the uncleaved crosslink is compared with the fragmentation of from the cleaved crosslinks. If corresponding ions are observed in both spectra, this provides additional confirmation of the crosslink assignment. Thus, we have designed our DXMSMS program to analyze MS/MS fragmentation data of the crosslinks by incorporating separate inputs for isotopically-coded and non-coded fragment ions from uncleaved crosslinks. In addition, the masses of the predicted fragments from the cleaved crosslinks are included in the output.
Mass additions for crosslinking reaction products.
The output for every match includes 1) the theoretical mass of the crosslink, 2) the mass differences between the experimental and theoretical masses in ppm, 3) the sequence of the crosslink, and 4) the masses of all possible b- and y- ions for each intact and crosslinked peptide, 5) the masses of the crosslink cleavage products, and 6) the fragment ion masses for the individual cleaved peptides in the case of cleavable crosslinkers. This information allows to one to re-inspect the MS/MS spectra and make confident crosslink assignments.
Applications of the ICC-CLASS software suite
A typical crosslinking experiment is done by offline LC separation followed by MALDI analysis of the HPLC fractions. For a mid-size protein complex, 50-100 kDa, this produces one to two hundred crosslinked peptide species, which are usually distributed over 24 to 48 fractions from a reversed-phase HPLC gradient. Following acquisition of the MS spectra, the mass lists for all of the fractions are searched with DX for doublets of signals corresponding to the isotopic coding of the crosslinker used. An important parameter for this step is an appropriate signal cut-off value to eliminate false-positive signals derived from noise. For an AB 4800 TOF/TOF instrument, we normally start with a value of 50 counts for this parameter. A strict tolerance value for the doublets mass difference is also helpful. We normally use values of 0.01 to 0.05 Da. Using DXDX for screening spectra from corresponding uncleaved and cleaved fractions allows one to rapidly eliminate most of the dominant (and less-informative) dead-end and intra-peptide crosslinks from downstream analysis, and to more easily identify potential inter-peptide crosslinks which are more structurally informative. At this point, the mass lists containing the masses of the potential cleaved and uncleaved inter-peptide crosslinks can be converted into an inclusion list for automated MS/MS acquisition. These MS/MS spectra can be subsequently analyzed with DXMSMS.
Several considerations which are intrinsic to crosslinking studies need to be taken into account. In the case of the analysis of a fixed number of chromatographic fractions, the complexity of the spectra will inevitably increase with increasing complexity of the system studied. The probability of detecting uncleaved-cleaved pairs of false doublets which satisfy the mass relationships of the cleavage reaction is negligibly low. However, eventually, increasing sample complexity could potentially lead to overlap of signals from uncleaved and cleaved crosslinks, with signals from free peptides, and to potentially increasing the chances of detecting "false doublets" in the spectra. This could interfere with fully-automated crosslink identification. This, however, is a common complication of crosslinking applications in complex protein systems. There are two possible solutions to this problem - the first is to reduce the complexity of the mixture by collecting additional fractions or by adding by additional stages of separation, e.g., multidimensional chromatography. The second alternative is to use affinity-purifyable isotopically-coded cleavable crosslinkers (Petrotchenko E. V., Thomas J. M., Borchers C. H.: DNBDPS: an isotopically-coded cleavable crosslinker affinity-purifyable with antibodies, submitted; Petrotchenko E.V., Serpa J. J., Borchers C. H.: An Isotopically-coded CID-cleavable biotinylated crosslinker: CBDPS, submitted). Affinity enrichment of the crosslinks eliminates most of the interfering signals due to non-crosslinked peptides and simplifies and clarifies the spectra. This makes the matching the un-cleaved and cleaved crosslinks masses, as well as acquisition of the MS/MS spectra, more feasible for complex protein systems. The situation where the mass of one of the cleaved products of an inter-peptide crosslink falls below the usual range for MALDI-MS (< 800 Da) also needs to be considered. In this case, the uncleaved crosslink will not be identified as inter-peptide by DXDX, but post-analysis re-acquisition of the MS spectra over a lower molecular weight range can often resolve this issue.
ICC-CLASS is a collection of programs which greatly facilitates data analysis from experiments using isotopically-coded cleavable crosslinking reagents. Together with advanced crosslinking reagents, selective crosslink purification strategies, and sophisticated mass spectrometric techniques, it provides a powerful analytical toolkit for structural proteomics crosslinking applications.
Availability and requirements
Project name: ICC-CLASS (DX, DXDX, DXMSMS).
Operating system: Windows XP.
Programming language: Visual Basic.
Other requirements: .NET Framework.
License: not required.
Restriction to use by non-academics: none.
matrix-assisted laser desorption/ionization mass spectrometry
liquid chromatography with mass spectrometry detection
tandem mass spectrometry
- ESI- MS:
electrospray ionization mass spectrometry
high performance liquid chromatography
This work was supported by Genome Canada, Genome British Columbia, through a Technology Development Grant, and a Science and Technology Platform Grant. The authors are thankful to Ashley Cabecinha for testing the software on real experimental data, and to Dr. Carol Parker for critical reading of the manuscript.
- Sinz A: Chemical cross-linking and mass spectrometry to map three-dimensional protein structures and protein-protein interactions. Mass Spectrom Rev 2006, 25: 663–682. 10.1002/mas.20082View ArticlePubMedGoogle Scholar
- Müller DR, Schindler P, Towbin H, Wirth U, Voshol H, Hoving S, Steinmetz MO: Isotope-tagged cross-linking reagents. A new tool in mass spectrometric protein interaction analysis. Anal Chem 2001, 73: 1927–1934. 10.1021/ac001379aView ArticlePubMedGoogle Scholar
- Bennett KL, Kussmann M, Björk P, Godzwon M, Mikkelsen M, Sørensen P, Roepstorff P: Chemical cross-linking with thiol-cleavable reagents combined with differential mass spectrometric peptide mapping--a novel approach to assess intermolecular protein contacts. Protein Sci 2000, 9: 1503–18. 10.1110/ps.9.8.1503View ArticlePubMedPubMed CentralGoogle Scholar
- Petrotchenko EV, Olkhovik VK, Borchers CH: Isotopically coded cleavable cross-linker for studying protein-protein interaction and protein complexes. Mol Cell Proteomics 2005, 4: 1167–1179. 10.1074/mcp.T400016-MCP200View ArticlePubMedGoogle Scholar
- Tang Y, Chen Y, Lichti CF, Hall RA, Raney KD, Jennings SF: CLPM: a cross-linked peptide mapping algorithm for mass spectrometric analysis. BMC Bioinformatics 2005, 6(Suppl 2):S9. 10.1186/1471-2105-6-S2-S9View ArticlePubMedPubMed CentralGoogle Scholar
- de Koning LJ, Kasper PT, Back JW, Nessen MA, Vanrobaeys F, Van Beeumen J, Gherardi E, de Koster CG, de Jong L: Computer-assisted mass spectrometric analysis of naturally occurring and artificially introduced cross-links in proteins and protein complexes. FEBS J 2006, 273: 281–291. 10.1111/j.1742-4658.2005.05053.xView ArticlePubMedGoogle Scholar
- Schilling B, Row RH, Gibson BW, Guo X, Young MM: MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides. J Am Soc Mass Spectrom 2003, 14: 834–850. 10.1016/S1044-0305(03)00327-1View ArticlePubMedGoogle Scholar
- MS3D collaboratory[http://ms3d.org]
- Gao Q, Xue S, Doneanu CE, Shaffer SA, Goodlett DR, Nelson SD: Pro-CrossLink. Software tool for protein cross-linking and mass spectrometry. Anal Chem 2006, 78: 2145–2149. 10.1021/ac051339cView ArticlePubMedGoogle Scholar
- Anderson GA, Tolic N, Tang X, Zheng C, Bruce JE: Informatics strategies for large-scale novel cross-linking analysis. J Proteome Res 2007, 6: 3412–3421. 10.1021/pr070035zView ArticlePubMedPubMed CentralGoogle Scholar
- Seebacher J, Mallick P, Zhang N, Eddes JS, Aebersold R, Gelb MH: Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing. J Proteome Res 2006, 5: 2270–2282. 10.1021/pr060154zView ArticlePubMedGoogle Scholar
- Rinner O, Seebacher J, Walzthoeni T, Mueller LN, Beck M, Schmidt A, Mueller M, Aebersold R: Identification of cross-linked peptides from large sequence databases. Nat Methods 2008, 5: 315–318. 10.1038/nmeth0808-748aView ArticlePubMedPubMed CentralGoogle Scholar
- Jin Lee Y: Mass spectrometric analysis of cross-linking sites for the structure of proteins and protein complexes. Mol Biosyst 2008, 4: 816–23. 10.1039/b801810cView ArticlePubMedGoogle Scholar
- Yu W, Vath JE, Huberty MC, Martin SA: Identification of the facile gas-phase cleavage of the Asp-Pro and Asp-Xxx peptide bonds in matrix-assisted laser desorption time-of-flight mass spectrometry. Anal Chem 1993, 65: 3015–3023. 10.1021/ac00069a014View ArticlePubMedGoogle Scholar
- Petrotchenko EV, Xiao K, Cable J, Chen Y, Dokholyan NV, Borchers CH: BiPS, a photo-cleavable, isotopically-coded, fluorescent crosslinker for structural proteomics. Mol Cell Proteomics 2009, 8: 273–286.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.