libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny

Butt, Davin; Roger, Andrew J; Blouin, Christian

doi:10.1186/1471-2105-6-138

Software
Open access
Published: 06 June 2005

libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny

Davin Butt¹,
Andrew J Roger^2,3 &
Christian Blouin^1,2,3

BMC Bioinformatics volume 6, Article number: 138 (2005) Cite this article

8993 Accesses
11 Citations
3 Altmetric
Metrics details

Abstract

Background

An increasing number of bioinformatics methods are considering the phylogenetic relationships between biological sequences. Implementing new methodologies using the maximum likelihood phylogenetic framework can be a time consuming task.

Results

The bioinformatics library libcov is a collection of C++ classes that provides a high and low-level interface to maximum likelihood phylogenetics, sequence analysis and a data structure for structural biological methods. libcov can be used to compute likelihoods, search tree topologies, estimate site rates, cluster sequences, manipulate tree structures and compare phylogenies for a broad selection of applications.

Conclusion

Using this library, it is possible to rapidly prototype applications that use the sophistication of phylogenetic likelihoods without getting involved in a major software engineering project. libcov is thus a potentially valuable building block to develop in-house methodologies in the field of protein phylogenetics.

Background

With the development of genomics, research in biology and systems biology is becoming increasingly data-driven. The feedback between available data and hypotheses has accelerated the pace at which innovative ideas are generated. Life scientists are in a position to design novel methodologies but do not necessarily have the in-house skills to produce software implementations. Simple methods, made of complex building blocks such as maximum likelihood calculations, require major software development projects before they can be prototyped. The use of libraries can help to rapidly prototype software implementations.

We present libcov, an object-oriented library to perform phylogenetic inference and the manipulation of protein sequences and structures. The library is written in C++, is compliant with the GNU standards and packaged as a dynamic library that can be installed on most Unix distributions (including MacOS X).

There are other bioinformatic libraries available, many of which overlap with libcov in their functionalities. The PAL library, for example, [1] is a Java implementation which offers a versatile object set for nucleotide and protein phylogeny. More generally, interested readers can visit the Open Bioinformatics Foundation [2] that links to a series of libraries written in various popular scripting languages such as Perl and Python. Further, there are other libraries available in C++ such as the Bioinformatics Template library BTL [3], and the compBioTool++[4], both of which focus on sequence manipulation.

The scope of libcov is to offer a series of high-level functions that can be invoked in one line of code, and which does not force an implementation to adopt specialized custom types. As for any open source project, it is possible to use or extend the low-level Application Programming Interface (API) to add functionalities or entirely new modules.

Implementation

Libcov offers a high-level programming interface, using an Object-Oriented (OO) approach with classes to represent distinct identities. For example, the class covTree represents a phylogenetic tree, covAlignment is used to store alignments, and class PDBentity handles 3D protein structures from the PDB format. The PDBentity class is a hierarchical structure of peptide chains, residues and atoms. Other classes handle elements such as geometric transformations and substitution matrices.

Libcov is designed as a protein phylogeny library. The data structures and methods that its public interface offers can be integrated within application prototypes with a minimal impact on software design. Most of the return types are Standard Template Library (STL) containers, which can be seamlessly integrated into ongoing software projects. Specialized classes can be derived by consulting the online API documentation. Examples of integration of libcov within C++ source codes are presented in Figure 1.

A summary of the functions offered by libcov is presented in TABLE 1. A more complete list of methods is available at the project's website.

Table 1 High Level functionalities

Full size table

Currently, we have implemented three major applications using libcov. covTREE is our protein sequence simulator that has the ability to simulate complex patterns of protein evolution and phylogenetic artifacts[5]. It uses the Monte Carlo-based simulation functions that libcov provides. covSEARCH is a tree searching program using the maximum likelihood and tree re-arrangement algorithms in libcov. covARES maps sequence and phylogenetic information on to protein models [6]. These applications are also available to the research community under a GNU GPL license.

Conclusion

The libcov library is actively under development, and we will be frequently releasing updated versions. As libcov is the engine powering the phylogenetic application covSEARCH, future work will involve new algorithms of tree searching, confidence interval determination and the integration of structure-based models of substitution.

External contributions are welcomed as the functionality of the library will evolve to match the research interests of the developers of phylogenetics applications.

Availability and Requirements

Project's name: libcov

Project's website: http://www.cs.dal.ca/~cblouin/libcov/

Operating System: GNU C++ library. Tested on Linux, MacOSX and other Unix-based operating systems.

License: GPL

Non-academic licensing: None.

References

Drummond A, Strimmer K: PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics 2001, 17: 662–663. 10.1093/bioinformatics/17.7.662
Article CAS PubMed Google Scholar
OBF: Open Bioinformatics Foundation.[http://www.open-bio.org]
Williams M: The Bioinformatics Template Library (BTL).[http://people.cryst.bbk.ac.uk/~classlib/bioinf/BTL99.html]
Durbin KJ: CompBioTools++.[http://people.cryst.bbk.ac.uk/~classlib/bioinf/BTL99.html]
Blouin C, Butt DJ, Roger AJ: The impact of taxon sampling on the estimation of rates of evolution at sites. Mol Biol Evol 2005, 22: 784–791. 10.1093/molbev/msi065
Article CAS PubMed Google Scholar
Blouin C, Boucher Y, Roger AJ: Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information. Nucleic Acids Res 2003, 31: 790–797. 10.1093/nar/gkg151
Article PubMed Central CAS PubMed Google Scholar
Felsenstein J: Inferring Phylogenies. 1st edition. Sunderland, MA, Sinauer Associates, Inc.; 2004:664.
Google Scholar
Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987, 4: 406–425.
CAS PubMed Google Scholar
Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Seattle, Wa., Distributed by the author, Dept. of Genetics, U. of Washington; 2002.
Google Scholar
Yang Z: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 1996, 11: 367–372. 10.1016/0169-5347(96)10041-0
Article CAS PubMed Google Scholar
Kishino H, Hasegawa M: Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 1989, 29: 170–179.
Article CAS PubMed Google Scholar
Shimodaira H, Hasegawa M: Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference. Mol Biol Evol 1999, 16: 1114–1116.
Article CAS Google Scholar
Kishino H, Miyata T, Hasegawa M: Maximum Likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol 1990, 30: 151–160.
Article Google Scholar
Strimmer K, Rambaut A: Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B Biol Sci 2002, 269: 137–142. 10.1098/rspb.2001.1862
Article Google Scholar
Pupko T, Graur D: Fast computation of maximum likelihood trees by numerical approximation of amino acid replacement probabilities. Computational Statistics & Data Analysis 2002, 40: 285–291. 10.1016/S0167-9473(02)00008-7
Article Google Scholar
Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 1992, 8: 275–282.
CAS PubMed Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. In Atlas of protein sequence and structure. Volume 5. Edited by: Dayhoff MO. Silver Spring, MA, National Biomedical Research Foundation; 1978:345–352.
Google Scholar
Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 2001, 18: 691–699.
Article CAS PubMed Google Scholar
Grassly NC, Adachi J, Rambaut A: PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees. Comput Appl Biosci 1997, 13: 559–560.
CAS PubMed Google Scholar
Wichmann BA, Hill ID: An efficient and portable pseudo-random number generator. Appl Stat 1982, 31: 188–190.
Article Google Scholar
Bryant D: A Classifcation of Consensus Methods for Phylogenetics. In BioConsensus. Edited by: Janowitz M, Lapointe FJ, McMorris FR, Mirkin B and Roberts FS. , DIMACS. AMS.; 2003:164–184.
Google Scholar

Download references

Acknowledgements

This work was supported by Genome Atlantic grant on Prokaryotic genome diversity and evolution, and by the NSERC Discovery grant 298397-04 (CB). The author would like to thank J. Murdoch for her contribution to the treeSPACE module.

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, 6050 University Ave, Halifax, NS, B3H 1W5, Canada
Davin Butt & Christian Blouin
Dept. of Biochemistry and Molecular Biology, Dalhousie University, Tupper Medical Building, Halifax, NS, B3H 1X5, Canada
Andrew J Roger & Christian Blouin
Canadian Institute for Advanced Research (CIAR), Canada
Andrew J Roger & Christian Blouin

Authors

Davin Butt
View author publications
You can also search for this author in PubMed Google Scholar
Andrew J Roger
View author publications
You can also search for this author in PubMed Google Scholar
Christian Blouin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Blouin.

Additional information

Authors' contributions

C. Blouin – Scientific Functionalities, High-level design, Redaction of manuscript.

D. Butt – Software design, implementation and testing.

A.J. Roger – Scientific functionalities, redaction of manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Butt, D., Roger, A.J. & Blouin, C. libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny. BMC Bioinformatics 6, 138 (2005). https://doi.org/10.1186/1471-2105-6-138

Download citation

Received: 05 November 2004
Accepted: 06 June 2005
Published: 06 June 2005
DOI: https://doi.org/10.1186/1471-2105-6-138

libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny

Abstract

Background

Results

Conclusion

Background

Implementation

Conclusion

Availability and Requirements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

libcov: A C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny

Abstract

Background

Results

Conclusion

Background

Implementation

Conclusion

Availability and Requirements

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us