A comparison of Pfam and MEROPS: Two databases, one comprehensive, and one specialised.
© Studholme et al; licensee BioMed Central Ltd. 2003
Received: 20 March 2003
Accepted: 9 May 2003
Published: 9 May 2003
We wished to compare two databases based on sequence similarity: one that aims to be comprehensive in its coverage of known sequences, and one that specialises in a relatively small subset of known sequences. One of the motivations behind this study was quality control. Pfam is a comprehensive collection of alignments and hidden Markov models representing families of proteins and domains. MEROPS is a catalogue and classification of enzymes with proteolytic activity (peptidases or proteases). These secondary databases are used by researchers worldwide, yet their contents are not peer reviewed. Therefore, we hoped that a systematic comparison of the contents of Pfam and MEROPS would highlight missing members and false-positives leading to improvements in quality of both databases. An additional reason for carrying out this study was to explore the extent of consensus in the definition of a protein family.
About half (89 out of 174) of the peptidase families in MEROPS overlapped single Pfam families. A further 32 MEROPS families overlapped multiple Pfam families. Where possible, new Pfam families were built to represent most of the MEROPS families that did not overlap Pfam. When comparing the numbers of sequences found in the overlap between a MEROPS family and its corresponding Pfam family, in most cases the overlap was substantial (52 pairs of MEROPS and Pfam families had an intersection size of greater than 75% of the union) but there were some differences in the sets of sequences included in the MEROPS families versus the overlapping Pfam families.
A number of the discrepancies between MEROPS families and their corresponding Pfam families arose from differences in the aims and philosophies of the two databases. Examination of some of the discrepancies highlighted additional members of families, which have subsequently been added in both Pfam and MEROPS. This has led to improvements in the quality of both databases. Overall there was a great deal of consensus between the databases in definitions of a protein family.
As the ever-growing number of genome sequencing projects reach completion, numbers of protein sequences in the primary sequence databases such as SWISSPROT  and GenBank  grow at an increasing rate. However, many of the newly deposited sequences are clearly homologous to previously known sequences, resulting in the need for strategies to classify sequences into clusters or families related by sequence similarity. This need has led to the proliferation of so called 'secondary' databases, derived from the primary sequence databases but with value-added curation. These databases are invaluable for predicting the function of new sequences based on homology to previously characterised proteins.
Secondary databases such as InterPro  and Pfam  aim to be comprehensive in their coverage of protein families and classify proteins on the basis of sequence relationships. Similarly, databases such as SCOP  attempt to provide a comprehensive classification of all known three-dimensional structures. However, there are also several databases that specialise in a particular subset of protein families, for example GPCRDB  and CAZy . The MEROPS database  provides a catalogue and a structure-based classification of peptidases (i.e. proteolytic enzymes or proteases). Peptidases are a large group of proteins, representing around 2% of all gene products, and are of particular importance in medicine and biotechnology .
Previously, Pfam was compared to SCOP . The aim of that study was to investigate the similarities and differences between a protein family database based on structural similarity and another based on sequence similarity. In the present study, we wished to compare two databases based on sequence similarity, one of which (Pfam) aims to be comprehensive in its coverage of known sequences, and the other (MEROPS) specialises in a relatively small subset of known sequences.
The two databases use different methods to identify family members. MEROPS selects a type example, and identifies the peptidase unit within it, and then makes pairwise matches using any number of transitive relationships. In contrast, Pfam stores a hidden Markov model (HMM) profile constructed from a seed sequence alignment. Using the HMMER computer package , Pfam searches for matches to the HMMs. The threshold values used in the HMMER searches are chosen manually by the Pfam curators.
One of motivations behind this study was quality control. These secondary databases are used by thousands of researchers worldwide and influence their work, yet the database contents are not peer reviewed. Therefore, we hoped that a systematic comparison of the contents of Pfam and MEROPS would highlight missing members and false-positives leading to improvements in quality of both databases. An additional reason for carrying out this study was to explore the extent of consensus in the definition of a protein family. The InterPro project is helpful in this respect because it simultaneously displays content from several different databases. However, InterPro does not contain data from any of the specialised databases such as MEROPS.
Results and Discussion
We analysed the contents of Pfam release 7.8 (December 2002) and MEROPS release 6.1 (January 2003). The secondary databases Pfam and MEROPS attempt to classify protein sequences from the primary databases into families. Therefore they have to draw from an underlying primary database of protein sequences. The underlying primary sequence database for MEROPS6.1 was the NCBI non-redundant database (NR) as released on 1st November 2002. Pfam7.8 used an underlying sequence database, pfamseq7, which consisted of SwissProt 40 plus trEMBL 18 (released in September 2001). Before we could embark on a meaningful comparison between the two protein family databases, we had to determine the set of common sequences shared by both underlying primary databases. Therefore we attempted to map each sequence in pfamseq7 to a sequence in NR, using crc64 checksums . We found that there were 571,017 (537,646 unique) sequences common to both pfamseq7 and NR. Only these sequences were included in our subsequent analyses. We excluded 13,923 sequences from pfamseq7 that were absent from NR and excluded a further 649,578 in NR that were absent from pfamseq7.
Overall correspondence between MEROPS and Pfam
Several MEROPS families had only a fairly small degree of overlap with a Pfam family (see Table 2), so we investigated these relationships further and made improvements to MEROPS and/or Pfam family where appropriate. For example, family S41 shared six of its 49 unique sequences with PF02692 (Interphotoreceptor retinoid-binding protein), which contained a further 125 unique sequences. In all of those six shared sequences, Pfam reported a short fragment match to PF02692. On closer inspection, it was clear that these matches to PF02692 represented false positives. This proposition that they were false positives was further supported by the fact that fragment matches overlapped matches to Smart  domain TSPc (tail specific protease). Therefore we raised the cut-off values in Pfam for PF02692 such that the false positives would be excluded. Also, we extended the seed alignment for another Pfam family, PF03572, such that it will now correspond closely with MEROPS family S41.
MEROPS family A2 shared 648 unique sequences with PF00077 (Retroviral aspartyl protease), and it is clear that these two families are attempting to represent the same entity. However, there were a further 8008 sequences included in PF00077 but apparently absent from MEROPS (Table 1). These 8008 accession numbers have not been listed in MEROPS because they represent only minor variations (i.e. varying by only one or a few residues) of sequences that are included in MEROPS family A2. This discrepancy highlights an important difference between the two databases in what they 'regard' as a distinct protein. Whereas Pfam relies entirely on SWISSPROT/trEMBL and treats each sequence accession as a distinct sequence entity, in MEROPS the curators usually use the following criteria: splice variants and sequences with greater than 95% identity are not considered to be separate proteins unless there is evidence that they are encoded by different genes (or come from different species). For RNA virus sequences, a lower threshold identity is often used.
We attempted to build HMMs to represent the 53 MEROPS families that did not overlap a Pfam family. We were able to build new HMMs to represent most (36) of these families in Pfam (see Table 3 Additional file: 3). However, in a few cases it was not possible to build a new Pfam family because the number of member sequences was very small (e.g. A18) or because sequence similarity to existing Pfam families (e.g. A11) led to violation of Pfam's rule against allowing overlaps between families .
Pfam contained several families annotated as having peptidase activity that did not have corresponding families in MEROPS. Most of these, including PF02338, PF04096, PF03926, PF04228 and PF04298 were deliberately excluded from MEROPS because there is insufficient experimental evidence for peptidase activity in these putative proteases. Another example was PF00905 (Penicillin binding protein transpeptidase domain). This was excluded because MEROPS does not attempt to comprehensively include transpeptidases. These differences reflect the different aims of the two databases: whereas Pfam attempts to comprehensively include all closely-related sequences within its families, in MEROPS the emphasis is much more upon biological significance of the member sequences.
How well do MEROPS families match Pfam families?
Using set theory as our approach, we quantified the closeness of a match between each MEROPS family and its corresponding Pfam family. For each MEROPS family and its closest matching Pfam family, we compared the number of members in the intersection (i.e. those sequences belonging to both the Pfam family and to the MEROPS family) against the number of members of the union (i.e. all those sequences belonging to either family) and expressed this ratio as a percentage (Tables 1 and 2). On average, the number of sequences in the intersection was 70% of the number of sequences in the union (Figure 2). The distribution was clearly skewed towards larger intersections, i.e. good matches between Pfam and MEROPS; out of 121 MEROPS families that intersected a Pfam family, 52 had an intersection size of greater than 75% of the union.
Family and sub-family levels in MEROPS
As illustrated in Figure 1 and listed in Table 2, thirty-two MEROPS families overlapped more than one Pfam family. MEROPS uses a hierarchical classification system. One aspect of this hierarchy is that families are grouped together into structurally related 'clans'. Also some families are further sub-divided into subfamilies. Among the families divided into subfamilies are A22, C1, C2, M10, M12, M14, M15, M28, M50, S1, S8 and S9, each of which overlaps multiple Pfam families. Therefore we investigated whether in these cases the Pfam families more closely corresponded to subfamilies or to families in MEROPS.
Families with multiple overlap relationships
Several MEROPS families overlapped two or more Pfam families. With the exceptions of A22, C1 and M3, these multiple relationships could not be explained by a closer match at the subfamily level, so we investigated them further. For example, MEROPS family A2 overlaps five Pfam families (Table 2). On browsing the Pfam entries, it is clear that PF00077 (retroviral aspartyl protease) attempts to represent a domain that corresponds closely to A2, whilst the other four Pfam families represent other domains that are often found along with the peptidase unit in retrovirus polyproteins. Overlaps were reported due to discrepancies between the domain boundaries in Pfam and the peptidase units in MEROPS. For example, MEROPS6.1 reported the peptidase unit in O92805 to be between residues 30 and 121. According to Pfam7.8, however, the match to PF00077 was at residues 583 to 699. Furthermore, residues 2 to 87 matched PF02813 (Retroviral M domain) so that A2 and PF02813 were reported to overlap. On inspection of the sequence it was clear that the peptidase unit had been wrongly assigned in MEROPS, leading to erroneous reporting of the overlap.
In many cases, the discrepancies between Pfam domain boundaries and MEROPS peptidase units were not erroneous, but reflected differences in the design of the databases. Where there are several fragment sequences from viral polyproteins, MEROPS records the coordinates of the peptidase units with respect to the parent sequence rather than the fragment sequences. In contrast, Pfam maps domain boundary positions onto the individual fragment sequences, thus leading to discrepancies between the two databases with respect to domain boundaries. This scenario explains most of the multiple overlaps between families of viral proteases.
Nevertheless, we have now begun to introduce a process of checking and correcting the peptidase unit assignments in MEROPS. Similar situations explain most of the remaining cases where a MEROPS family has multiple overlaps. The exception is S49, which overlaps PF01343 and PF01957, the former of which represents MEROPS family S49 (formerly U7), and the latter includes a group of poorly characterised bacterial proteins of unknown function. Judging by alignments of these sequences in MEROPS, it appears that there is a close evolutionary relationship between these two Pfam families.
Five Pfam families (PF00004, PF02225, PF00851, PF00863 and PF00680) overlap multiple MEROPS families. In all five cases the reported overlaps could be explained by discrepancies in assignment of the peptidase unit in MEROPS in a similar manner to that involving MEROPS families with multiple overlaps.
Pfam family members absent from corresponding MEROPS family
As is clear from Tables 1 and 2, many Pfam families contained additional member sequences that were not found in the corresponding MEROPS families (Table 4 - See Additional file: 4). Most of this discrepancy could be explained by MEROPS and Pfam having different criteria for what they consider to be a different sequence, as discussed above. After eliminating discrepancies due to this difference, the majority of remaining sequences were clearly confirmed as members of existing peptidase families (on the basis of FastA and Blast searches) and have been subsequently added to MEROPS for inclusion in future releases. Most of these sequences would probably have been picked up for inclusion in MEROPS by the curators' routine similarity searches. However, some of the sequences were previously not detected by MEROPS because they had a transitive relationship to the family type examples (see Methods). In other words there was no statistically significant direct relationship to the family type example identifiable by Blast or FastA. The sequences were indirectly linked to the type example via similarity to an intermediate sequence. About 180 sequences were added to MEROPS as a direct result of this study. However, a few sequences could not be identified as homologues by pairwise similarity searches against the MEROPS library of peptidase units, even by transitive links.
Sequences Q66541, Q66620, and Q20521 gave only partial or fragment matches to Pfam peptidase families. On closer inspection it is apparent that these sequences are fragments of multi-domain proteins lacking regions of the sequence defined as the peptidase unit in MEROPS. For this reason they were excluded from MEROPS.
Sequences O44472, O45151, O45157, P91466, P91467, P91515, P91519, Q09539, Q09393 and Q9N566 from Caenorhabditis elegans, and Q9VN01 from Drosophila belonged to PF01431 (Peptidase_M13) and contain the characteristic HEXXH zinc-binding motif . Although PF01431 significantly overlapped MEROPS family M13, MEROPS did not include these sequences in family M13 since no similarity could be detected using FastA or Blast searches. It appears that the HMM representing Pfam family PF01431 is more sensitive and able to find additional homologues not detectable by Blast and FastA searches. Q9I304 belonged to Pfam family PF00227 (Proteasome), yet no similarity could be found between this sequence and members of the overlapping MEROPS family T1. This may be another example of a case where Pfam's HMM method for finding family members is more sensitive than MEROPS' Blast and FastA-based method.
Sequence P71878 shows a partial or fragment match to PF00814 (Peptidase_M22). Blast searches revealed that this Mycobacterium tuberculosis protein is related to 3-ketoacyl-CoA thiolase and acetyl-CoA C-acyltransferase, but not to any peptidases. It is possible that the match to PF00814 was a false positive in Pfam.
MEROPS family members absent from their corresponding Pfam families
There were 214 unique sequences that were classified into families in MEROPS but were absent from the corresponding Pfam families (Table 5 - See Additional file: 5). Just over half of these were sequences of protein fragments where the region or domain containing peptidase activity was missing. These fragments were treated differently in MEROPS as compared to Pfam. Whereas Pfam recognises only those sequence features that are actually present in the sequence, MEROPS assigns the fragments to families according to the properties of the complete parent sequence. A further 23 sequences were found to have been erroneously classified in MEROPS, and have subsequently been removed or moved to the correct family. Several SwissProt/trEMBL accessions that have had their sequences updated recently and so different versions of the sequences were found in Pfam versus MEROPS; this accounted for a further nine of the discrepancies.
Aside from the fragment sequences and trivial errors, 72 sequences were included in MEROPS but not in the corresponding Pfam families. In a few cases, such as Q9A9N9 and Q9PFX5, although no statistically significant similarity could be found between these sequences and the rest of the family, the MEROPS curators decided that these sequences should be included on the basis of expert knowledge.
In some cases, MEROPS identified statistically significant sequence similarities that Pfam had failed to detect. For example, four sequences (Q9A748, Q9RDK4, Q9PAC4, and Q9KM08) are included in MEROPS family M48 but were not included in PF01435. Inspection of the sequence alignment for M48 in MEROPS confirmed that these sequences were bona fide members of the family. As a result of this discrepancy we expanded the alignment for family PF01435 in Pfam and used this to rebuild the HMM and searched for new members of the family. As a result of this, the new HMM successfully identified Q9A748, Q9RDK4, Q9PAC4, and Q9KM08 as members of the expanded PF01435.
Family sizes and search sensitivity
Although most MEROPS families substantially overlapped Pfam families, there were some differences between the sets of MEROPS family members and the intersecting Pfam family members. These discrepancies have been examined in some detail in the previous paragraphs. One reason for there being an imperfect match between the Pfam family and the MEROPS family could be differences in sensitivity of family member-detection, in some cases at least due to the differing methods used to curate the two databases. This might be reflected in the relative sizes of the families between them. Therefore we compared the sizes (i.e. numbers of members) of each MEROPS family against each overlapping Pfam family. We found that in 19 cases, the MEROPS family was the same size as the Pfam family. In 95 cases, the Pfam family was larger than the MEROPS family. In the remaining 55 cases, the MEROPS family was larger than the Pfam family. In 34 cases, the MEROPS family was a proper subset of the Pfam family, and in 8 cases the reverse was true. This does not reveal any significant bias in the relative sizes of Pfam and MEROPS families. In other words there is no evidence that the HMM-based method is much more sensitive than Blast and FastA pairwise similarity search methods for detecting family members, at least in the context of Pfam and MEROPS.
The fact that MEROPS and Pfam are similarly successful at identifying family members is surprising at first sight, given that MEROPS is based on pairwise similarity searches whilst Pfam uses HMMs. There are two major factors that contribute to high sensitivity in detecting MEROPS family members, beyond that which might be expected from a simple pairwise search procedure. Firstly, a candidate sequence is searched against every existing peptidase sequence in the MEROPS database, not just the type examples. This enables identification of family members that are outliers in sequence space. Thus families can be iteratively expanded to include sequences that are only transitively linked to the homologous type examples via one or more intermediate members. Secondly, catalytic residues are often highly conserved within families of peptidases. This frequently helps the curators to confidently make a judgement about whether or not a particular sequence belongs to a given family when the degree of sequence similarity is relatively low. It should be noted that there is a significant amount of human intervention in the curation of MEROPS families. This certainly improves the quality and coverage over what could be achieved by completely relying on automated similarity searches.
This study revealed that Pfam and MEROPS are largely consistent with each other in terms of their classification of proteins into families. Nevertheless, MEROPS also contains additional features and data not present in Pfam. These features include a facility for BLAST searching against the peptidase sequence database, systematic data on active sites and substrates of peptidases and inhibitors, and a very comprehensive literature database. Perhaps the most important difference in this context is that MEROPS uses a hierarchical classification whilst Pfam uses a flat classification. To implement all of these features in Pfam would not be feasible given that Pfam aims to be comprehensive in its coverage of proteins. The fact that MEROPS is currently accessed by about 10,000 academic users per month and has previously sold several commercial licenses confirms the value of MEROPS to the scientific community as more than merely a subset of any more general database.
Since there is no peer review process for assessing the contents of the protein family databases, we carried out a systematic comparison of the contents of two widely used protein family databases with the intention of checking and improving the quality of data in both. As a result, about 35 new families have been added to Pfam. The numbers of members (i.e. sequence coverage) has been increased for several families by identifying false negatives. Furthermore, accuracy has been improved as a result of identifying false positives highlighted by this comparison.
Notwithstanding the differences identified between the contents of the two databases, overall there was a high degree of consensus between the two databases, despite their being independently curated using different methodologies and with different objectives. In particular the families defined in Pfam corresponded closely to the family level in the MEROPS hierarchy in most cases. This suggests that the families are accurately curated and that the lack of peer-review has not led to gross errors in family assignments. These databases also receive feedback from users with suggestions and improvements, which help to keep the quality of data high.
The methods developed here for systematically comparing the contents of the two databases will be used in future as routine quality control procedures in production of Pfam and MEROPS to highlight errors and to help refine family boundaries. Thus we hope that close cooperation yet independence will lead to continuing benefits to both databases.
Blast and FastA searches for detection of MEROPS family members
Pairwise similarity searches were carried out using the routine MEROPS procedures. Data collection for the MEROPS database is done as follows. The best-characterised member in each peptidase subfamily is designated as the "type example", and the peptidase unit (that part of the sequence that bears the residues important for proteolytic activity, usually corresponding to one or two consecutive structural domains) of each type example was used as the query sequence in a BlastP search of the NR database from NCBI. In order to minimise redundancy, and to attempt to find known orthologues, each significant hit (e <= 0.001) from each BlastP search is used as a query sequence against the MEROPS collection of peptidase unit sequences. A sequence is appended to the MEROPS collection if it matches all the following criteria: (1) it is a significant hit from the FastA analysis (e <= 0.001), (2) it is less than 95% identical to an existing sequence in the collection, unless it is from a different species (but not a subspecies, strain or isolate) or is known to be the product of a different gene, and (3) it is not derived from an mRNA splice variant or alternative initiation. For any sequence failing to meet these criteria, only the database cross-references are added to the MEROPS database.
Comparison of Pfam versus MEROPS
The intersection between the MEROPS family set and the Pfam family set must contain at least one member sequence, and (2) for at least one of the member sequences in the intersection, the matches to the MEROPS and the Pfam families must be co-linear. Co-linearity is defined as follows. When Pfam finds that a protein sequence matches a given HMM, it reports which region of the sequence contains that match. For example on the 309-residue long sequence of Bacillus subtilis sporulation σE factor processing peptidase SP2G_BACSU (P13801), Pfam identifies a match to family PF03419 covering residues 1 to 300. MEROPS identifies residues 148 to 309 to be the peptidase unit (family U4). Since the 1–300 and 148–309 regions overlap by more than 50% of each of their respective lengths, we consider these matches to be co-linear.
Building Pfam families
HMMs representing Pfam families were built using the standard Pfam procedures . Pfam stores a hidden Markov model (HMM) profile constructed from a seed sequence alignment. Using the HMMER computer package  Pfam searches for matches to the HMMs. The threshold values used in the HMMER searches are chosen manually by the Pfam curators.
Hidden Markov Model.
The NCBI non-redundant sequence database
National Centre for Biotechnology Information
AB is funded by the Wellcome Trust. AJB, NDR and DJS are funded by the Medical Research Council, UK.
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365–370. 10.1093/nar/gkg095PubMed CentralView ArticlePubMedGoogle Scholar
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res 2003, 31: 23–27. 10.1093/nar/gkg057PubMed CentralView ArticlePubMedGoogle Scholar
- Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 2003, 31: 315–318. 10.1093/nar/gkg046PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2002, 30: 276–280. 10.1093/nar/30.1.276PubMed CentralView ArticlePubMedGoogle Scholar
- Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res 2002, 30: 264–267. 10.1093/nar/30.1.264PubMed CentralView ArticlePubMedGoogle Scholar
- Horn F, Bettler E, Oliveira L, Campagne F, Cohen FE, Vriend G: GPCRDB information system for G protein-coupled receptors. Nucleic Acids Res 2003, 31: 294–297. 10.1093/nar/gkg103PubMed CentralView ArticlePubMedGoogle Scholar
- Coutinho PM, Henrissat B: Carbohydrate-active enzymes: an integrated database approach. In Recent Advances in Carbohydrate Bioengineering (Edited by: Gilbert HJ, Davies G, Henrissat B, Svensson B). Cambridge: The Royal Society of Chemistry 1999, 3–12.Google Scholar
- Rawlings ND, O'Brien E, Barrett AJ: MEROPS: the protease database. Nucleic Acids Res 2002, 30: 343–346. 10.1093/nar/30.1.343PubMed CentralView ArticlePubMedGoogle Scholar
- Rawlings ND, Barrett AJ: MEROPS: the peptidase database. Nucleic Acids Res 1999, 27: 325–331. 10.1093/nar/27.1.325PubMed CentralView ArticlePubMedGoogle Scholar
- Elofsson A, Sonnhammer EL: A comparison of sequence and structure protein domain families as a basis for structural genomics:. Bioinformatics 1999, 15: 480–500. 10.1093/bioinformatics/15.6.480View ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14: 755–663. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP: Numerical recipes in C 2 Edition Cambridge: Cambridge University Press 1992.Google Scholar
- Letunic I, Goodstadt L, Dickens NJ, Doerks T, Schultz J, Mott R, Ciccarelli F, Copley RR, Ponting CP, Bork P: Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res 2002, 30: 242–244. 10.1093/nar/30.1.242PubMed CentralView ArticlePubMedGoogle Scholar
- Sonnhammer EL, Eddy SR, Durbin R: Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 1997, 28: 405–420. 10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-LView ArticlePubMedGoogle Scholar
- Turner AJ, Isaac RE, Coates D: The neprilysin (NEP) family of zinc metalloendopeptidases: genomics and function. Bioessays 2001, 23: 261–269. 10.1002/1521-1878(200103)23:3<261::AID-BIES1036>3.0.CO;2-KView ArticlePubMedGoogle Scholar
- Pfam Home Page[http://www.sanger.ac.uk/Software/Pfam/]
- MEROPS: the Protease Database[http://merops.sanger.ac.uk]
- Wall L, Christiansen T, Orwant J: Programming Perl 3 Edition Sebastopol: O'Reilly & Associates 2000.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.