A reinforced merging methodology for mapping unique peptide motifs in members of protein families
© Chang et al; licensee BioMed Central Ltd. 2006
Received: 11 July 2005
Accepted: 25 January 2006
Published: 25 January 2006
Members of a protein family often have highly conserved sequences; most of these sequences carry identical biological functions and possess similar three-dimensional (3-D) structures. However, enzymes with high sequence identity may acquire differential functions other than the common catalytic ability. It is probable that each of their variable regions consists of a unique peptide motif (UPM), which selectively interacts with other cellular proteins, rendering additional biological activities. The ability to identify and localize such UPMs is paramount in recognizing the characteristic role of each member of a protein family.
We have developed a reinforced merging algorithm (RMA) with which non-gapped UPMs were identified in a variety of query protein sequences including members of human ribonuclease A (RNaseA), epidermal growth factor receptor (EGFR), matrix metalloproteinase (MMP), and Sma-and-Mad related protein families (Smad). The UPMs generally occupy specific positions in the resolved 3-D structures, especially the loop regions on the structural surfaces. These motifs coincide with the recognition sites for antibodies, as the epitopes of four monoclonal antibodies and two polyclonal antibodies were shown to overlap with the UPMs. Most of the UPMs were found to correlate well with the potential antigenic regions predicted by PROTEAN. Furthermore, an accuracy of 70% can be achieved in terms of mapping a UPM to an epitope.
Our study provides a bioinformatic approach for searching and predicting potential epitopes and interacting motifs that distinguish different members of a protein family.
Multiple protein sequence comparison can provide a valuable protein signature and, thus, contribute to the fields of structural biology and molecular evolution [1, 2]. In general, sequence similarity identified by multiple sequence alignment (MSA) among a set of query sequences suggests similar function among the proteins . These signatures can be readily obtained by web-based tools, such as BLAST , CLUSTALW , or MUSCA  systems. However, although the default parameters in most programs give satisfactory results, in some cases special variables need to be taken into consideration. For example, the allocation of major variations among a few query sequences can be achieved from the results of direct MSA, whereas the uniqueness of each sequence that is not well-aligned is difficult to reveal. It is also quite expensive and time-consuming to experimentally search for such unique peptide motifs (UPMs) that may involve the key biological functions of interest. Therefore, the need for effective and efficient identification of the UPMs located in a number of query proteins by novel bioinformatic tools is urgent. We have thus developed a reinforced merging algorithm (RMA) to rapidly identify the non-gapped UPMs for each member of a protein family.
In our study, the highly conserved human ribonuclease A (RNaseA) superfamily was initially tested to define the default input parameters. The RNaseA superfamily is composed of eight RNases including RNase1, 2, 3, 4, 5, 6, 7, and 8 with similar sequences and biological functions . A pairwise comparison between RNaseA sequences shows more than 29% similarity; especially for RNase2 v.s. RNase3 and RNase7 v.s. RNase8, which have as high as 69.6% and 78.2% identities, respectively. Molecular evolutionary analysis revealed that both gene pairs emerged as a result of a relatively recent gene duplication event . In addition to the RNaseA superfamily, other protein families with larger molecular weights including the members of epidermal growth factor receptor (EGFR), matrix metalloproteinase (MMP), and Sma-and-Mad related protein (Smad) family with respectively an average molecular weight of 150 kD, 50 kD and 55 kD were analyzed.
A monoclonal antibody (mAb) is commonly used in biological experiments for distinguishing the sequence specificity, quantity variation, and cellular localization of each member of the human protein families . Generally, the epitope is composed of a charged and hydrophilic peptide of 7 to 20 amino acids exposed at the structural surface of the protein [9, 10]. In this study, we generated one mAb specifically against RNase3 and the mapping experiments revealed that the epitope was indeed located at a unique region in RNase3 identified by RMA. Further analysis of several mAbs and polyclonal Abs against human RNaseA, EGFR, and MMP families demonstrated that they all recognized specific epitopes corresponding to the UPMs identified by RMA. We also mapped the UPMs on the resolved 3-D structures of the proteins and found that most of the UPMs were located on loop regions and exposed to the structural surfaces. Furthermore, the analytic comparison between RMA and PROTEAN , a commercially available software for prediction of antigenic regions, revealed that most of the UPMs identified by RMA matched well with the antigenic regions predicted by PROTEAN.
Identification of the UPMs in the human RNaseA superfamily by RMA
Characterization of UPMs in protein families identified by RMA
loop coverage (%)b
 Stratagene Cat: B50175
Santa Cruz Cat: sc-21731
Santa Cruz Cat: sc-21732
It should be noticed that the key enzymatic active site residues correspondent to His12, Lys41 and His119 of mature RNase1 have been found to be located in the well-conserved regions among all eight RNases (Figure 1A, open boxes), indicating that RMA is practical in distinguishing the UPMs from the conserved sequences in a protein family. The UPM P116-Y122 located in the last loop of the C-terminus of RNase3 has been determined as the epitope for a polyclonal Ab, D112-P123 Ab, which specifically recognized RNase3 and was generated by Boix . The UPM P64-F76 in RNase5 was located within a sheet-turn-sheet conformation involved in the protein-cell interaction (Figure 1B, labeled in green) .
Application of RMA on protein families with larger molecular weight
The sequences of mature human MMP1, 3, 8, 10, 12 and 13, which are categorized as members of the same subfamily and contain approximately 470 amino acids, were analyzed by RMA and forty five UPMs were identified. The results shown in Table 1 revealed that nine and six UPMs of five to ten residues in length were identified in MMP1 and MMP3, respectively. Currently only the structure of one of the catalytic domains (F1-P168) of MMP3 was determined in which two UPMs, Q103-T111 and T145-R151, could be identified by RMA. The former matched nicely with loop 7 and the latter matched well with loop 8 in the MMP3 structure . The UPM R273-I278 of MMP1 overlapped with the epitope for a mAb αMMP1 (3B6) (residues S267-H277) (Santa Cruz, Cat: sc-21731). Likewise, a UPM K218-P225 of MMP3 matched perfectly with the characterized epitope for another mAb αMMP3 (1B4) (residues K218-H228) (Santa Cruz, Cat: sc-21732). In addition, the enzymatic active sites of MMPs in the highly conserved motif DDXXGIXXXYG (residues D152-G162)  and Gln120  numbered correspondent to MMP1 were characterized to be located within the conserved regions, consistent with the results for the human RNaseA superfamily (data not shown).
Eight human Smads were analyzed by RMA and twenty four UPMs were identified. Currently only the C-terminal structures of approximately 230 residues of Smad2, Smad3 and Smad4 are available. Since the identified UPMs in Smad1 and Smad2 were located in the regions with unresolved structures, only the UPMs identified in Smad4 were shown in Table 1. The UPM G510-I518 was completely located in the L3 loop responsible for the binding with Ski protein, an oncoprotein that represses the TGF-β signaling. Another UPM E538-L551 with 64% of loop coverage in the last loop within the MH2 domain in the C-terminus of Smad4 was responsible for the interaction with the TGF-β receptors . In the Smad superfamily, Smad4 plays a central role as it is the shared hetero-oligomerization partner of other Smads and it possesses the DNA binding ability for regulation of gene expression . Within the DNA binding domain of Smad4, K45-K110, only one UPM T62-H67 was identified by RMA .
DOKs are phospho-proteins in the downstream of receptor tyrosine kinases. Six members of human DOK family, DOK1, 2, 3, 4, 5 and 6, of approximately 400 amino acids were analyzed by RMA and forty four UPMs were identified. A UPM Y402-G410 located at the C-terminus was found to be residing within the recognition site of an antibody αDOK2 (residues A393-K412) (Calbiochem, Cat: 506138); hence it is expected that αDOK2 can be used to distinguish DOK2 from the other DOKs (data not shown). Another protein family containing human carboxypeptidase D (CPD), E (CPE), M (CPM), N (CPN) was also analyzed. One of the UPMs in CPE matched well with the epitope for an anti-CPE mAb (BD Bioscience, Cat: 610758, data not shown). The analyses of EGFR, MMP, Smad, DOK, and CP families demonstrated that for members of protein families with large molecular weights, RMA still performed quite efficiently to provide useful structural and functional information.
Application of RMA on homologous proteins from different species
Characterization of UPMs in homologues from different species identified by RMA
loop coverage (%)b
MsbA is a 630-amino acid prokaryotic homologue of human multidrug resistant protein. Five prokaryotic MsbAs from E. coli, V. cholera, G. violaceus, S. oneidensis, and B. japonicum were analyzed by RMA and fifty two UPMs were identified. Among which thirteen UPMs of five to seven residues identified by RMA in E. coli and V. cholera MsbAs were shown in Table 2. According to the 3-D structures of MsbAs of E. coli  and V. cholera , five of the identified UPMs were completely located in the loop regions and 67%–83% residues of two UPMs were covered by the loops. As a result, RMA has been demonstrated to be useful in analysis of homologous proteins among several species to rapidly identify the unique signatures of each query sequence.
Specificity of mAbs against human RNase2 and RNase3
Epitope screening for mAb 3C1 and αRNase2
According to the identification by RMA, residues 51 to 57 of RNase3 were identical to those of RNase2; hence this region was further deleted to generate a recombinant RNase358–73-GFP (R358–73) protein. As expected, Figure 4B clearly showed that such a recombinant protein was probed by both αHis and mAb 3C1, indicating that the epitope for mAb 3C1 was indeed located within RNase358–73, consistent with the identification by RMA (Figure 1A, red double underlines).
The epitope for αRNase2 was not indicated in either its manufacture or other reports. To allocate the epitope region experimentally, three clones containing chimeric RNase3/RNase2 were generated to express the chimeric proteins in E. coli (Figure 4C). The chimeric junctions contain three separate regions where the residues are identical between RNase2 and RNase3. Immunoblotting analysis revealed that only chimera A was recognized by αRNase2, indicating that P58–P90 of RNase2 was quite important in determining the antigenicity (Figure 4D; Figure 1A, brown underline). Based on the RMA and experimental results, two recombinant clones RNase258–73-GFP (R258–73) and RNase273–90-GFP (R273–90) were further generated. The results showed that the former was not probed by αRNase2, whereas the latter was specifically recognized (Figure 4E, lanes 1 and 2). Since only five continuous residues in RNase273–90-GFP were predicted to be unique by RMA, the recombinant RNase273–77-GFP (R273–77) protein was further expressed and tested. Our result demonstrated evidently that a segment as short as only five residues of RNase2 were detected by αRNase2, indicating that the epitope for αRNase2 was successfully narrowed down to "H73SGSQ77" (Figure 4E, lane 3; Figure 1A, brown double underlines).
Comparison of RMA and PROTEAN
Performance analysis of RMA
The sequences of human protein families for RMA analysis
In this study, we reported the analysis of three large protein families including EGFR, MMP, and Smad with respectively 39%-52%, 46%-78%, and 20%-92% pairwise sequence identity to test the practical application of RMA. The UPMs in the extracelluar domains of four members in EGFR family were subsequently analyzed. Unlike RNases, the extracellular domains in ErbB, ErbB2 and ErbB3 showed quite different conformations. In this particular data set, we found that 71% of the UPMs in EGFR family were located in the loops (Table 1). The UPM S350-K356 located within domain III (also named as L2) has been identified as an EGF binding domain on the extracellular domain of ErbB, the only member of the EGFR family that EGF recognizes (Figure 2, labeled in green) . Combined with the structural features of human RNase and EGFR families, the UPMs provided additional information of many loops in the proteins with resolved structures. Thus, for the proteins without 3-D structural information, it is speculated that the UPMs may also be located in the loops to a considerable extent. Nevertheless, the possibility of a UPM being located in the well-defined α-helical or β-sheet structures cannot be excluded. As for the MMP superfamily, more than twenty members are currently classified, and several subfamilies are identified according to their domain compositions. It was thus necessary to classify the members of such a big protein family to several subgroups prior to RMA analysis. Accordingly, we have correlated two identified UPMs in MMP3 with the loop structures.
The experimental epitope mapping results suggest that some of the UPMs can serve as potential candidates to generate specific Abs to distinguish the highly conservative sequences in a protein family. In the past, sequential antigenic regions were determined based on the protrusion index, accessibility, mobility, charged amino acids or hydrophilic regions at the protein surface [9, 30–32]. However, a good antigenic region may not specifically distinguish one protein from the other members in the same protein family. Our RMA has been demonstrated to efficiently discriminate the accumulation of unique features of a peptide fragment to distinguish different regions in a protein family. Although eight members of human RNaseA superfamily share high similarity, identification by RMA has given thirty UPMs of four to sixteen amino acids in length, among which at least three were demonstrated to serve as epitopes for αRNase2, mAb 3C1 and D112-P123 Ab (Table 1). In our case, theoretically 975 UPMs of eight residues in length in the human RNaseA superfamily can be identified. Therefore, RMA techniques have at least increased the probability of finding a UPM in RNaseA superfamily from 1/975 to 1/30. Likewise, the probability of finding such UPMs in human EGFR, MMP, and Smad families can be raised from 1/5087, 1/2784, and 1/3707 up to 1/39, 1/31, and 1/24, respectively. In ErbB2, a UPM H42-Q51 was located within the epitope of αErbB2 N-term (P36-Q51) at the first α-helix in domain I (Table 1) . This Ab can recognize the structural surface of the well-folded extracellular domain. In addition, both αMMP1 and αMMP3 can probe denatured and well-folded proteins. Although 3-D structures containing the epitopes for αMMP1 and αMMP3 are not resolved yet, it is speculated that these regions are also exposed at the surfaces.
As for cross species protein analysis, the identity between bpRNaseA and hRNase1 was 68.8%. In these two highly conservative homologues, six identified UPMs of each RNase were localized in the corresponding positions in their primary sequences. Structurally, 83.3% of these regions were included in the loop areas of the resolved 3-D structures of bpRNaseA and hRNase1. Interestingly, except the first UPM, the loop coverage values of the second to the sixth UPMs in bpRNaseA are nearly the same as those of hRNase1 (Table 2), indicating that the loop areas of these two proteins are the most distinguishable. Similarly, the analysis of MsbAs by RMA revealed that the majority of the identified UPMs were located in the loop areas. Hence, RMA has been demonstrated to be useful to analyze homologous proteins from different species in addition to different proteins in one family in a particular species. The analysis of homologous proteins among different species may reveal the residue variation during molecular evolution. To our knowledge, it occurs sometimes that an antibody was not successfully produced as expected by injecting the immunized animals with a designed peptide or protein, probably due to lack of the uniqueness in the antigenic sequence . Our RMA can identify the UPMs that do not exist in the homologous protein in the immunized animals; hence it can be used to provide an initial screening for potential epitope design to raise the probability of obtaining polyclonal antibodies against synthetic peptides.
In comparison with other motif-finding methods, MEME (Multiple Em for Motif Elicitation) system can discover the motifs with highly conserved regions in a group of related protein sequences. It represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern [35, 36]. MAST (Motif Alignment & Search Tool) can search the biological sequence databases for sequences that contain one or more known motifs which are represented as position-dependent scoring matrices that describe the score of each possible letter at each position in the pattern . The MASIA algorithm recognizes the common patterns and properties in multiple aligned protein sequences. It converts a sequence to a properties matrix that can be scanned in both vertical and horizontal steps. Consistent patterns are recognized based on the statistical significance of their occurrence . PROSITE is a document database which collected a large number of biologically meaningful motifs that described as patterns or profile . The main characteristics of these four sequence prediction algorithms are determination of the motifs with similar features. We have specifically analyzed the eight human RNase A sequences using PROSITE. It was found that the biological motifs such as potential phosphorylation sites and glycosylation sites identified by PROSITE were actually located within the conservative regions, none of them was identified as a UPM by RMA analysis. We have also matched all the biological motifs or pattern matrices in the PROSITE database with the proteins in PDB database. It was found that 35.31%, 24.68%, 0.64% and 39.36% of the PROSITE patterns were located at the α-helix, β-sheet, turn, and undetermined regions, respectively. Unlike the RMA results, only a few PROSITE patterns were located at the loops. Our RMA reinforces the non-conserved patterns and extracts the motifs with more than 8 residues. These UPMs are demonstrated to be correlated well with the loops exposed on the structural surface, or regions that are known to be able to serve as molecular recognition sites. It is conceivable that such unique sequence-dependent functional characteristics may be detected by our method, although the novel functional properties still remain to be explored. Evolutionally, the random unique sequence pieces are expected to be evenly distributed or randomly scattered in the entire protein sequences by statistical estimation. They are not specifically located in the loops. If a particular random mutation occurs in a position that is crucial for maintaining the structure integrity or catalytic function of a protein family, either the preservation of the mutation during evolution is disfavoured, or a dysfunctional protein would be generated to cause disease. On the contrary, the UPMs identified by our method are evolutionarily preserved sequence motifs, hence apart from the conserved regions; the UPMs tend to be located in loops and may be correlated with different functions of members in a protein family.
RMA is proved to be useful for finding characteristic sequence motifs in highly conservative members of one protein family. The UPMs may have the potential to be used as minivaccines for protecting animals from pathogenic infections. In general, the peptide-induced anti-peptide Ab must specifically recognize the desired infectious pathogen, but not cross-react to others. The UPMs identified by RMA (Figure 5, labeled in blue) can be directly compared with the antigenic regions predicted by PROTEAN (Figure 5, labeled as red star), such that the most specific and antigenic peptide antigen may be allocated (Table 1 and Table 2, labeled as #). In a large-scale analysis on mAbs with known epitopes on members of human protein families, RMA demonstrated higher than 70% accuracy in terms of correlating UPMs with epitopes of mAbs (Figure 6). Recently, Ecale Zhou et al. developed computational approaches employing a combination of structure- and sequence-based alignments for identification of the antibody/protein binding pockets on the surface of ricin A chain . Structure alignment software and modified pocket-finding algorithm were required for extracting the residue-residue correspondence as well as for the determination of the conserved or unique binding pocket in the target molecules. Unlike the aforementioned method, our RMA directly identified and merged the non-gapped primary unique patterns; neither MSA nor structural information was prerequisite prior to the operation.
Taken together, we have developed a bioinformatic tool to facilitate identification of UPMs located mostly in the loops or on the surface in members of a protein family. The structural features of some peptides are correlated with molecular recognition including antigen-antibody, protein-protein, protein-ligand and protein-cell interactions. Our method provides an effective and efficient evaluation of sequence, structural and functional studies that allow biologists to examine their protein sequences of interest prior to experiments such as peptide antigen design and mutagenesis.
Development of a Reinforced Merging Algorithm
(A) Grouping module
The main purpose of the grouping module is to discriminate the tolerant features from UPM representation. Because the possible substitutions in different amino acids from observed frequencies in algorithms on related proteins are well-defined [41, 42], we provide a fundamental clustering algorithm that assembles 20 amino acids into several independent groups. Users are able to assign threshold parameters and referred scoring matrices. A substitution matrix can be selected from standard BLOSUM/PAM matrices or new ones can be created based on their own aligned block database. In this module, the traditional hierarchical methods are employed to cluster groups based on the substitution matrix and threshold values. Amino acids grouped in the same set are treated as possessing similar characteristics and substituted by a common symbol. For examples, if we select the BLOSUM62 matrix and set the threshold parameter as 2, the module provides the grouping results ((A), (R, K), (N), (D, Q, E), (C), (G), (H, F, W, Y), (I, L, M, V), (P), (S), (T)), where amino acids within the same cluster represent matching states of identity. In addition, users are allowed to specify the number of groups and the contents of grouped amino acids based on different considerations.
(B) Searching module
In this module, we adopted the Boyer Moore matching algorithm  to extract primary unique patterns to reduce the time complexity. Primary unique patterns are defined as the basic elements for merging operations. Because each set of grouped amino acids are represented by a unique symbol, the algorithm examines all candidate patterns and extracts the positional information of primary unique patterns by scanning all substituted sequences. If the clustering results are less than 20 groups from the previous module, the tolerant feature of uniqueness will be discriminated and fewer primary unique patterns can be extracted from the family set. In addition to adopting clustered results to describe the features of uniqueness, this module also provides a statistical analysis to show the level of determined characteristics for each extracted unique pattern. Users are able to assign the mismatch number, which represents the tolerant criteria for primary unique patterns and should be less than the length of a primary unique pattern. If there are N sequences in a family set Z, and Z I is represented as the I th sequence in Z. The length of a primary unique pattern is n and reserved m symbols for each pattern, i.e. allow (n-m) mismatches from matching processes, then the representative level of uniqueness can be defined as the following equation,
where we define the set of all primary unique descriptors with pattern length n in Z I by P n [Z I ,·]. Its jth primary unique descriptor is denoted as P n [Z I , j]. [Z I , j] represents a set of collection from the jth unique pattern, which allows (n-m) mismatched symbols for matching processes, and [Z I , j] is its kth tolerant pattern. The C(n, m) is the total number of possible combinatorial patterns set for reserved m symbols from a unique pattern with length n, and 〈[Z I , j]〉 ∊ [1, N] is the number of appearance of the kth tolerant unique pattern shown in family set Z. It is trivial that the value of 〈[Z I , j]〉 is greater than or equal to 1, since at least one sequence from set Z containing the primary unique pattern P n [Z I , j] must exist. If all the k tolerant unique patterns [Z I , j] can be matched in each sequence in Z, then 〈[Z I , j]〉 = N for all k, and it follows that R([Z I , j]) = 0%. This describes a special condition in which each possible tolerant pattern of the primary unique descriptor [Z I , j] can be discovered in all other sequences of Z. Therefore, its representative percentage of primary uniqueness will be decreased to 0%. On the other hand, if all of the combinatorial tolerant patterns from [Z i , j] cannot be found from other family sequences, then 〈[Z I , j]〉 = 1 for k ∊ [1, C(n, m)] and it follows that by R([Z I , j]) = 100%. This describes another extreme case in which the program provides all of the possible tolerant patterns for approximate matching, and the primary unique pattern of [Z I , j] still possesses 100% unique representative percentage. These calculated quantitative percentages represent the level of uniqueness and range from 0% to 100%. All of the primary unique patterns with different representative percentages will be sent to the next module to perform merge operations based on their neighbouring conditions and threshold settings for the representative levels of uniqueness.
(C) Merging module
The merging operation is proposed to enhance the discrepancies in a family set and emphasize the neighboring relationships instead of the traditional concatenation operation. Two matched primary unique patterns in sequence Z I can be merged if they possess overlapping symbols in Z I and both are satisfied the criteria of the minimum representative percentage. Given any two segments u 1 and u 2 with length n 1 and n 2 , the merged result is denoted by w and constructed by
where l is the number of overlapping symbols between u 1 and u 2 , and l is strictly greater than 0. If u 1 and u 2 do not contain any overlapping symbols between them, then the merging operation will not be performed, and both will retain their original relationship. It is obvious that if u 1 and u 2 are merged with respect to Z I , the length of the merged segment w is strictly less than the concatenated string uv. To reinforce the strength of uniqueness of the merged patterns, a strict merging operation is also provided to make sure each subsegment of the merged pattern still possess unique properties. Assume two primary unique patterns with same length n are merged. The strict merging requirements are satisfied only when the lengths of the overlapping symbols are equal to the primary pattern length minus one or the length of the merged segment is equal to n+1. Therefore, strictly merged unique patterns are constructed using fundamental unique descriptors and reserve the most unique characteristics in sequences. After obtaining all candidate merged segments, the system provides a trimming function which returns a substring from each merged segment with the symbols stripped off the beginning and the end. This function is achieved by evaluating the beginning and ending (n-1) symbols respectively from merged unique pattern set. The (n-1) symbols will be trimmed off when they are matched with another merged unique. Through these three designed modules, UPMs of each sequence from a protein family set will be allocated efficiently. After reinforced processing, these merged UPMs satisfy the tolerance and representative level of uniqueness criteria. The RMA is available free of charge for academic use [44, 45] and a supplementary document for guidelines and examples of the RMA system is also provided .
Analyses of protein sequences by RMA and localization of the identified UPMs
The protein sequences could be selected according to the following criteria: (1) If the protein of interest belongs to a protein family classified by the sequence similarity, each member of this protein family is selected from GenBank and saved as FASTA format. (2) If the protein of interest does not belong to any protein family, its sequence should be compared with other protein sequences in the database by BLASTp software or others and all the sequences with more than 20% identity should be selected and saved as FASTA format.
The sequences of protein families from different species for RMA analysis
Protein expression and generation of mAb
The recombinant clones were transformed into E. coli BL21 (DE3) and expressed in the presence of 0.5 mM IPTG. After homogenization, the RNase3 was purified by His-Bind® metal chelating chromatography under denaturing conditions according to the manufacturer's protocol (Novagen). The generation of mAb was carried out as described . Hybridomas producing anti-RNase3 mAb were obtained by indirect enzyme-linked immunosorbent assay screening with purified recombinant RNase3 as the antigen. The mAb was purified from the hybridoma supernatant using the Montage® Antibody Purification Kit (Millipore).
After electrophoresis, the proteins were transferred onto a polyvinylidene fluoride membrane (Pall). After blocking, hybridizing with the first Ab, washing, and probing with the secondary Ab, the target proteins were visualized using the SuperSignal West Pico (Pierce) or 3,3'-Diaminobezidine tetrahydrochloride (Amersco).
Analysis of protein sequences by PROTEAN
Each protein sequence was saved in PROTEIN format using EditSeq in DNAStar and executed by PROTEAN . The sequential sequences were selected as the antigenic index higher than 0.5 and compared with the UPMs identified by RMA. Between potential antigenic regions and UPMs, the coverage greater than 70% was labeled in Table 1 and 2.
Large-scale RMA analysis
To analyze the performance of RMA in large-scale, the input sequences were selected by the following steps. First, all information related to mAbs in website of Santa Cruz Biotechnology, Inc. was retrieved  and the mAbs containing descriptions about the exact recognition sites (epitopes) were selected. Then the corresponding antigen sequences belonging to human protein families were collected from GenBank and saved as FASTA format. Each family protein sequence set with pairwise identity higher than 20% was individually analyzed by RMA with default parameters as previously described.
green fluorescent protein
epidermal growth factor receptor
maltose binding protein, MMP, matrix metalloproteinase
multiple sequence alignment
reinforced merging algorithms
Sma-and-Mad related protein
transforming growth factor-β
unique peptide motif
The authors thank Drs. P.-C. Huang (Department of Biochemistry and Molecular Biology, The Johns Hopkins University, U. S. A.), C.-I Wu (Department of Ecology and Evolution, University of Chicago, U. S. A.), Y.-K. Lai, H.-Y. Chang, and W.-G. Chou (Department of Life Science, NTHU, Taiwan) for critical comments, as well as C. Teng, I. Lin and Y.-T. Lai for proofreading and maintenance of webpage. This work was supported by National Science Council, R. O. C. grants, NSC 94-2627-B-007-003 (M. D.-T. Chang) and NSC 94-2627-B-007-002 (C.-Y. Tang), as well as Veterans General Hospitals University System of Taiwan Joint Research Program, VGHUST 94-G2-02-1 (M. D.-T. Chang).
- Martinez HM: A flexible multiple sequence alignment program. Nucleic Acids Res 1988, 16: 1683–1691.PubMed CentralView ArticlePubMedGoogle Scholar
- Sobel E, Martinez HM: A multiple sequence alignment program. Nucleic Acids Res 1986, 14: 363–374.PubMed CentralView ArticlePubMedGoogle Scholar
- Chappey C, Danckaert A, Dessen P, Hazout S: MASH: an interactive program for multiple alignment and consensus sequence construction for biological sequences. Comput Appl Biosci 1991, 7: 195–202.PubMedGoogle Scholar
- Zhang J, Dyer KD, Rosenberg HF: RNase 8, a novel RNase A superfamily ribonuclease expressed uniquely in placenta. Nucleic Acids Res 2002, 30: 1169–1175. 10.1093/nar/30.5.1169PubMed CentralView ArticlePubMedGoogle Scholar
- Kennett RH: Hybridomas: a new dimension in biological analyses. In Vitro 1981, 17: 1036–1050.View ArticlePubMedGoogle Scholar
- Welling GW, Weijer WJ, van der Zee R, Welling-Wester S: Prediction of sequential antigenic regions in proteins. FEBS Lett 1985, 188: 215–218. 10.1016/0014-5793(85)80374-4View ArticlePubMedGoogle Scholar
- Lerner RA: Tapping the immunological repertoire to produce antibodies of predetermined specificity. Nature 1982, 299: 593–596. 10.1038/299592a0View ArticlePubMedGoogle Scholar
- Plasterer TN: PROTEAN. Protein sequence analysis and prediction. Mol Biotech 2000, 16: 117–126. 10.1385/MB:16:2:117View ArticleGoogle Scholar
- Boix E, Carreras E, Nikolovski Z, Cuchillo CM, Nogues MV: Identification and characterization of human eosinophil cationic protein by an epitope-specific antibody. J Leukoc Biol 2001, 69: 1027–1035.PubMedGoogle Scholar
- Leonidas DD, Shapiro R, Allen SC, Subbarao GV, Veluraja K, Acharya KR: Refined crystal structures of native human angiogenin and two active site variants: implications for the unique functional properties of an enzyme involved in neovascularisation during tumour growth. J Mol Biol 1999, 285: 1209–1233. 10.1006/jmbi.1998.2378View ArticlePubMedGoogle Scholar
- Ogiso H, Ishitani R, Nureki O, Fukai S, Yamanaka M, Kim JH, Saito K, Sakamoto A, Inoue M, Shirouzu M, Yokoyama S: Crystal structure of the complex of human epidermal growth factor and receptor extracellular domains. Cell 2002, 110: 775–787. 10.1016/S0092-8674(02)00963-7View ArticlePubMedGoogle Scholar
- Arumugam S, Van Doren SR: Global orientation of bound MMP-3 and N-TIMP-1 in solution via residual dipolar couplings. Biochemistry 2003, 42: 7950–7958. 10.1021/bi034545sView ArticlePubMedGoogle Scholar
- Hirose T, Patterson C, Pourmotabbed T, Mainardi CL, Hasty KA: Structure-function relationship of human neutrophil collagenase: identification of regions responsible for substrate specificity and general proteinase activity. Proc Natl Acad Sci U S A 1993, 90: 2569–2573.PubMed CentralView ArticlePubMedGoogle Scholar
- Arza B, De Maeyer M, Felez J, Collen D, Lijnen HR: Critical role of glutamic acid 202 in the enzymatic activity of stromelysin-1 (MMP-3). Eur J Biochem 2001, 268: 826–831. 10.1046/j.1432-1327.2001.01943.xView ArticlePubMedGoogle Scholar
- Wu JW, Krawitz AR, Chai J, Li W, Zhang F, Luo K, Shi Y: Structural mechanism of Smad4 recognition by the nuclear oncoprotein Ski: insights on Ski-mediated repression of TGF-beta signaling. Cell 2002, 111: 357–367. 10.1016/S0092-8674(02)01006-1View ArticlePubMedGoogle Scholar
- Shi Y, Hata A, Lo RS, Massague J, Pavletich NP: A structural basis for mutational inactivation of the tumour suppressor Smad4. Nature 1997, 388: 87–93. 10.1038/40431View ArticlePubMedGoogle Scholar
- Qin B, Lam SS, Lin K: Crystal structure of a transcriptionally active Smad4 fragment. Structure Fold Des 1999, 7: 1493–1503. 10.1016/S0969-2126(00)88340-9View ArticlePubMedGoogle Scholar
- Younus H, Owais M, Rao DN, Saleemuddin M: Stabilization of pancreatic ribonuclease A by immobilization on Sepharose-linked antibodies that recognize the labile region of the enzyme. Biochim Biophys Acta 2001, 1548: 114–120.View ArticlePubMedGoogle Scholar
- Chang G, Roth CB: Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters. Science 2001, 293: 1793–1800. 10.1126/science.293.5536.1793View ArticlePubMedGoogle Scholar
- Chang G: Structure of MsbA from Vibrio cholera: a multidrug resistance ABC transporter homolog in a closed conformation. J Mol Biol 2003, 330: 419–430. 10.1016/S0022-2836(03)00587-4View ArticlePubMedGoogle Scholar
- Acharya KR, Shapiro R, Allen SC, Riordan JF, Vallee BL: Crystal structure of human angiogenin reveals the structural basis for its functional divergence from ribonuclease. Proc Natl Acad Sci U S A 1994, 91: 2915–2919.PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar K, Brady M, Shapiro R: Selective abolition of pancreatic RNase binding to its inhibitor protein. Proc Natl Acad Sci U S A 2004, 101: 53–58. 10.1073/pnas.0307268101PubMed CentralView ArticlePubMedGoogle Scholar
- Sorimachi K, Jacks AJ, Le Gal-Coeffet MF, Williamson G, Archer DB, Williamson MP: Solution structure of the granular starch binding domain of glucoamylase from Aspergillus niger by nuclear magnetic resonance spectroscopy. J Mol Biol 1996, 259: 970–987. 10.1006/jmbi.1996.0374View ArticlePubMedGoogle Scholar
- Korolkova YV, Tseng GN, Grishin EV: Unique interaction of scorpion toxins with the hERG channel. J Mol Recognit 2004, 17: 209–217. 10.1002/jmr.667View ArticlePubMedGoogle Scholar
- Kewley RJ, Whitelaw ML, Chapman-Smith A: The mammalian basic helix-loop-helix/PAS family of transcriptional regulators. Int J Biochem Cell Biol 2004, 36: 189–204. 10.1016/S1357-2725(03)00211-5View ArticlePubMedGoogle Scholar
- Stylos WA, Merryman CF, Maurer PH: Antigenicity of polypeptides (poly-alpha-amino acids). Distribution of sheep antibodies to polymers of alpha-L-amino acids of varying net electrical charge and lysozyme. Int Arch Allergy Appl Immunol 1970, 39: 381–390.View ArticlePubMedGoogle Scholar
- Maurer PH, Merryman CF, Stylos WA: Antigenicity of polypeptides (poly-alpha-amino acids). Distribution of sheep antibodies to dinitrinophenyl-(DNP) conjugates of polymers of alpha-L-amino acids of varying electrical charge. Int Arch Allergy Appl Immunol 1970, 39: 435–444.View ArticlePubMedGoogle Scholar
- Jasin HE, Glynn LE: The Antigenic Properties of Some Synthetic Poly-Iminoacids. Ii. The Antigenicity of Polypeptides Related to Collagen; Peptides Containing Hydroxyproline and Acetyl-Hydroxy-Proline. Immunology 1965, 32: 260–269.Google Scholar
- Cho HS, Mason K, Ramyar KX, Stanley AM, Gabelli SB, Denney DWJ, Leahy DJ: Structure of the extracellular region of HER2 alone and in complex with the Herceptin Fab. Nature 2003, 421: 756–760. 10.1038/nature01392View ArticlePubMedGoogle Scholar
- Harlow E, Lane D: Antibodies: a laboratory manual. New York, Cold Spring Harbor Laboratory; 1988:58.Google Scholar
- Bailey TL, Gribskov M: The megaprior heuristic for discovering protein sequence patterns. Proc Int Conf Intell Syst Mol Biol 1996, 4: 15–24.PubMedGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. AAAI; 1994:28–36.Google Scholar
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 1998, 14: 48–54. 10.1093/bioinformatics/14.1.48View ArticlePubMedGoogle Scholar
- Zhu H, Schein CH, Braun W: MASIA: recognition of common patterns and properties in multiple aligned protein sequences. Bioinformatics 2000, 16: 950–951. 10.1093/bioinformatics/16.10.950View ArticlePubMedGoogle Scholar
- Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P: PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform 2002, 3: 265–274. 10.1093/bib/3.3.265View ArticlePubMedGoogle Scholar
- Ecale Zhou CL, Zemla AT, Roe D, Young M, Lam M, Schoeniger JS, Balhorn R: Computational approaches for identification of conserved/unique binding pockets in the A chain of ricin. Bioinformatics 2005, 21: 3089–3096. 10.1093/bioinformatics/bti498View ArticleGoogle Scholar
- Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 1992, 89: 10915–10919.PubMed CentralView ArticlePubMedGoogle Scholar
- Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. matrices for detecting distant relationships. In In atlas of protein sequence and structure. Volume vol. 5. Edited by: Dayhoff MO. Washington DC, National biomedical research foundation; 1978:345–358.Google Scholar
- Boyer RS, Moore JS: A fast string searching algorithm. Commun ACM 1977, 20: 762–772. 10.1145/359842.359859View ArticleGoogle Scholar
- RMA guidelines[http://bioinfo.life.nthu.edu.tw/rma/RMA-DOCS]
- Fang CY, Chang YS, Chow KP, Yu JS, Chang HY: Construction and characterization of monoclonal antibodies specific to Epstein-Barr virus latent membrane protein 1. J Immunol Methods 2004, 287: 21–30. 10.1016/j.jim.2004.01.021View ArticlePubMedGoogle Scholar
- Pous J, Mallorqui-Fernandez G, Peracaula R, Terzyan SS, Futami J, Tada H, Yamada H, Seno M, de Llorens R, Gomis-Ruth FX, Coll M: Three-dimensional structure of human RNase 1 delta N7 at 1.9 A resolution. Acta Crystallogr D Biol Crystallogr 2001, 57: 498–505. 10.1107/S0907444901001147View ArticlePubMedGoogle Scholar
- Swaminathan GJ, Holloway DE, Veluraja K, Acharya KR: Atomic resolution (0.98 A) structure of eosinophil-derived neurotoxin. Biochemistry 2002, 41: 3341–3352. 10.1021/bi015911fView ArticlePubMedGoogle Scholar
- Boix E, Leonidas DD, Nikolovski Z, Nogues MV, Cuchillo CM, Acharya KR: Crystal Structure of Eosinophil Cationic Protein at 2.4 A Resolution. Biochemistry 1999, 38: 16794–16801. 10.1021/bi9919145View ArticlePubMedGoogle Scholar
- Terzyan SS, Peracaula R, de Llorens R, Tsushima Y, Yamada H, Seno M, Gomis-Ruth FX, Coll M: The three-dimensional structure of human RNase 4, unliganded and complexed with d(Up), reveals the basis for its uridine selectivity. J Mol Biol 1999, 285: 205–214. 10.1006/jmbi.1998.2288View ArticlePubMedGoogle Scholar
- Cho HS, Leahy DJ: Structure of the extracellular region of HER3 reveals an interdomain tether. Science 2002, 297: 1330–1333. 10.1126/science.1074611View ArticlePubMedGoogle Scholar
- Wlodawer A, Sjolin L: Structure of ribonuclease A: results of joint neutron and X-ray refinement at 2.0-A resolution. Biochemistry 1983, 22: 2720–2728. 10.1021/bi00280a021View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.