Re-searcher: a system for recurrent detection of homologous protein sequences
© Repšys et al; licensee BioMed Central Ltd. 2008
Received: 21 December 2007
Accepted: 27 June 2008
Published: 27 June 2008
Sequence searches are routinely employed to detect and annotate related proteins. However, a rapid growth of databases necessitates a frequent repetition of sequence searches and subsequent analysis of obtained results. Although there are several automatic systems available for executing periodical sequence searches and reporting results, they all suffer either from a lack of sensitivity, restrictive database choice or limited flexibility in setting up search strategies. Here, a new sequence search and reporting software package designed to address these shortcomings is described.
Re-searcher is an open-source highly configurable system for recurrent detection and reporting of new homologs for the sequence of interest in specified protein sequence databases. Searches are performed using PSI-BLAST at desired time intervals either within NCBI or local databases. In addition to searches against individual databases, the system can perform "PDB-BLAST"-like combined searches, when PSI-BLAST profile generated during search against the first database is used to search the second database. The system supports multiple users enabling each to separately keep track of multiple queries and query-specific results.
Re-searcher features a large number of options enabling automatic periodic detection of both close and distant homologs. At the same time it has a simple and intuitive interface, making the analysis of results even for a large number of queries a straightforward task.
Protein sequence database searches are routinely employed to detect homologs of the sequence used as a query. However, at present, protein sequence databases are growing exponentially necessitating frequent repetition of searches to find out whether new homologous sequences were added. The analysis of results obtained during such repeated searches may also be tedious and time consuming. The task of manually keeping up with changes in databases becomes unbearable if one is interested in finding new homologs not for a single sequence, but for a few or few dozen sequences. To help cope with the periodic detection of new homologs a number of automatic procedures have been developed including Swiss-Shop , DBWatcher , BLAST Search Updater , ReHAB  and DbW . Most of them use BLAST , a popular sequence search engine. While BLAST is good in detecting closely related sequences, distant relatives may often escape undetected. Some recent systems for periodic searches use more powerful homology detection tools. For example, ReHAB  uses PSI-BLAST and DbW  utilizes a collection of several methods. ReHAB is designed to handle a large number of query sequences, while DbW attempts to include only functionally related new sequences. Both are very efficient in performing their tasks but their common caveat is that most parameters for the searches are predefined and users are left with the choice "love it or leave it".
Re-searcher is a new software package that is designed to circumvent these caveats and provide the user with a highly configurable environment for performing recurrent sequence searches. Because of the flexibility in setting up sequence search strategy and specific parameters Re-searcher can be used to find both closely related family-specific homologs and very distantly related matches.
Overview of the system
To detect new homologous protein sequences Re-searcher uses PSI-BLAST as the search engine. However, if only BLAST functionality is desired, Re-searcher can be configured to run only a single PSI-BLAST iteration. Searches can be performed at specified time intervals against either NCBI  protein databases or locally installed custom sequence databases. The user is able to individually configure both the search parameters and the search periodicity for each query. Once query is entered into the system Re-searcher performs sequence search automatically using specified parameters at every query-specific time interval. During every search all the detected sequences are compared to those, found for the query during all previous searches, and only non-identical sequences are added to the Re-searcher's database and reported as new.
Re-searcher provides a possibility to do more than just straightforward recurrent PSI-BLAST searches. It can perform combined searches involving two databases. Such strategy is useful if the user is interested in detecting remote homologs within a small sequence database. The direct searches against such database may be unable to generate rich sequence profiles that are the main strength of PSI-BLAST. Therefore, Re-searcher provides a possibility to run an iterative search against the first (large) sequence database and then use the obtained profile (Position Specific Scoring Matrix or PSSM) to search against the smaller second, either a specialized or private, database. An example of such scheme is so-called "PDB-BLAST", when the generated PSI-BLAST profile is used to detect distantly related sequences that have known PDB structures.
In addition to the familiar NCBI-style PSI-BLAST form for setting up individual queries, Re-searcher provides informative easy-to-understand reports. Queries, for which new homologs have been detected, as well as newly detected matches within the list of all homologs found so far, are clearly marked. To simplify the analysis, resulting lists of homologs can also be sorted and filtered.
Re-searcher supports multiple users. Each user can have an individual account, which is not visible to the public and holds all the user-specific queries and results.
Setting up queries
In addition to straightforward searches against the specified database, both local and remote setups offer a combined two-database searching discussed above. Although it is possible to do the same kind of a combined search through the PSI-BLAST web page at NCBI, it can only be done manually in a number of steps.
The setup for a new query also includes the search periodicity parameter and an option to notify the user of newly detected matches by e-mail.
Reporting of the new matching sequences ("hits")
One of the common ways to inform a user of new matches to the query is to send an e-mail notification. However, sending detailed information about all new hits is not necessarily a good idea. For example, the initial search using multiple PSI-BLAST iterations can sometimes generate an overwhelming number of hits. If all this information is e-mailed to the user, the mailbox might easily get clogged.
Re-searcher is designed to support multiple password-protected user accounts. Depending on the desired user policy, the system may be configured to either allow unhindered creation of new user accounts or have the administrator be in charge of addition of new users. The administrator's account is also used to set up general parameters for the system such as the IP address of the local BLAST server, paths to the suite of BLAST programs and local databases. Of course, it is perfectly possible to use Re-searcher in a single-user mode.
The Re-searcher system is designed to answer a growing need among both computational and experimental biologists to be kept updated on a regular basis about new homologs for the protein sequence(s) of interest. Re-searcher combines the simplicity of installation and use with the flexibility of setting up sequence searches according to the needs of the user. More specifically, the system allows individualized search and reporting strategies for each query, including the search periodicity, choice of databases (remote or local), single- or two-database searches and various search parameters. Re-searcher can be run both in a single-user mode (e.g. on a PC) and as a centrally managed service for multiple users.
Availability and Requirements
Project name: Re-searcher
Operating system(s): Platform independent
Programming language: Java
Other requirements: Sun's Java Runtime Environment Version 6.0 or later
License: GNU GPL
Any restrictions to use by non-academics: No restrictions
This work was supported by the Howard Hughes Medical Institute and EU FP6 Marie Curie grants.
- Boone M, Upton C: BLAST Search Updater: a notification system for new database matches. Bioinformatics 2000, 16(11):1054–1055. 10.1093/bioinformatics/16.11.1054View ArticlePubMedGoogle Scholar
- Whitney J, Esteban DJ, Upton C: Recent Hits Acquired by BLAST (ReHAB): a tool to identify new hits in sequence similarity searches. BMC Bioinformatics 2005, 6: 23. 10.1186/1471-2105-6-23PubMed CentralView ArticlePubMedGoogle Scholar
- Prigent V, Thierry JC, Poch O, Plewniak F: DbW: automatic update of a functional family-specific multiple alignment. Bioinformatics 2005, 21(8):1437–1442. 10.1093/bioinformatics/bti218View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389PubMed CentralView ArticlePubMedGoogle Scholar
- Apache Derby[http://db.apache.org/derby/]
- Jetty web server[http://www.mortbay.org/]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.