Using coevolution to improve protein subfamily classification

Simonetti, Franco; Banchero, Martin; Berenstein, Ariel J; Chernomoretz, Ariel; Marino Buslje, Cristina

doi:10.1186/1471-2105-16-S8-A6

Volume 16 Supplement 8

Highlights from the 1st ISCB Latin American Student Council Symposium 2014

Meeting abstract
Open access
Published: 30 April 2015

Using coevolution to improve protein subfamily classification

Franco Simonetti¹,
Martin Banchero¹,
Ariel J Berenstein²,
Ariel Chernomoretz² &
…
Cristina Marino Buslje¹

BMC Bioinformatics volume 16, Article number: A6 (2015) Cite this article

1125 Accesses
1 Citations
Metrics details

Background

The common approach for protein subfamily classification relies on grouping protein sequences according to their degree of similarity. However, there is no single sequence similarity threshold for accurately grouping sequences into isofunctional groups.

Current subfamily classification methods use bottom-up clustering to construct a cluster hierarchy, then cut the hierarchy at the most appropriate locations to obtain a single partitioning. These methods usually integrate data such as protein sequence similarity, residue conservation within groups and HMM profiles. Despite this straightforward approach, results usually predict a great number of subfamilies with few members and limited biological meaning.

The goal of this study is to identify subsets of functionally related sequences within a given superfamily. Since all proteins within a superfamily share a common ancestor, we hypothesize that functional diversity within superfamilies has arisen through a series of concerted changes that must have left an identifiable coevolutionary signal.

Material and methods

The challenge is to be able to separate the subfamilies coevolutionary signals and use them in the process of subfamily classification. This information can be used to guide a hierarchical clustering. Our approach uses Mutual Information to calculate covariation and commonly used clustering methods based on sequence similarity. We have defined a select group of superfamilies from the Structure Function Linkage Database as our gold standard dataset.

Results

Different approaches were considered for integrating Mutual Information data in sequence clustering. Since Mutual Information can only be calculated for a group of sequences, a preliminary sequence clustering is performed. Using solely covariation data, our method can cluster groups of sequences from the same subfamily. For a complete clustering solution, it performs almost as good as a hierarchical clustering based on sequence similarity. The next step will be to integrate both methods.

Conclusions

Automated protein classification remains an active topic of research and state of the art methods are far from predicting biologically meaningful results. Covariation data has never been used before in this context and further analysis are needed to improve the method.

Author information

Authors and Affiliations

Fundación Instituto Leloir, Buenos Aires, Argentina
Franco Simonetti, Martin Banchero & Cristina Marino Buslje
Universidad de Buenos Aires, Buenos Aires, Argentina
Ariel J Berenstein & Ariel Chernomoretz

Authors

Franco Simonetti
View author publications
You can also search for this author in PubMed Google Scholar
Martin Banchero
View author publications
You can also search for this author in PubMed Google Scholar
Ariel J Berenstein
View author publications
You can also search for this author in PubMed Google Scholar
Ariel Chernomoretz
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Marino Buslje
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franco Simonetti.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Simonetti, F., Banchero, M., Berenstein, A.J. et al. Using coevolution to improve protein subfamily classification. BMC Bioinformatics 16 (Suppl 8), A6 (2015). https://doi.org/10.1186/1471-2105-16-S8-A6

Download citation

Published: 30 April 2015
DOI: https://doi.org/10.1186/1471-2105-16-S8-A6

Highlights from the 1st ISCB Latin American Student Council Symposium 2014

Using coevolution to improve protein subfamily classification

Background

Material and methods

Results

Conclusions

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the 1st ISCB Latin American Student Council Symposium 2014

Using coevolution to improve protein subfamily classification

Background

Material and methods

Results

Conclusions

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us