Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

Bastien, Olivier; Maréchal, Eric

doi:10.1186/1471-2105-9-332

wrong modelling of protein sequence evolution: a reply

Eric Marechal, CNRS

29 September 2008

A comment has been posted declaring that the model of protein evolution published here was wrong. Well, in this comment it appears that the meaning of “information” as meant in our work has not been really understood by the reader.

“Information” can have different meanings.

- In the sense of the protein structure-function relationship (i.e. in terms of the emergence of a molecular function from a sequence), “information” can mean what is encrypted in the sequence that is responsible for the function of the folded protein. In this meaning, Mr Loewenstein is right to say that « the assertion that there is a constant amount of information in all amino acids clearly contradicts most of what we know on proteins ». It is unfortunately not in that sense that the term "information" was used here.

- In the sense of phylogenetic signal, “information” (in fact “mutual information” in the sense of Hartley, and clearly stated in the abstract of this paper) can mean what is shared by sequences, meant as chains of characters" that have evolved from an original ancestor. In this meaning, as sequences diverge, we indeed lose information shared by sequences with their ancestor, and particularly it become difficult to reconstruct the evolutionary history. This is another way to describe a well known phenomenon called « loss of phylogenetic signal » (what L. Broccheiri calls « loss of phylogenetic informational content » in Phylogenetic Inferences from Molecular Sequences: Review and Critique. Theoretical Population Biology 59, 27_40 (2001)). Mr. Loewenstein says that « As sequences diverge, we gain more information on the sequence-function connection (not less). » Obviously, this paper dealing on sequences information in the sense of phylogenetic signal has been red understanding the information in the sense of the structure-function relationship.

- There is a third definition of information which has not been treated here, concerning the “mutual information” of amino acids inside a given sequence, which play a joint role in the protein function (these amino acid co-evolve, and this can be measured by their mutual information).

Things being not so simple, there is a relation between these definitions. The "mutual information" of "homologous sequences" can be measured by aligning these sequences that diverged from an original ancestor (for instance the mutual information measured in chains of characters “pere” <->”father” after there divergence from, let’s says ”pater”). However, concerning proteins, the evolutionary tree depends on natural selection constraints, and particularly a conservation of the function, and therefore on a level of information in the sense of structure-function relationship.

Concerning the fact “Proteins are not just old machines that wear and tear, but quite the contrary”, well, in the sense of phylogenetic evolution, proteins are indeed old machines that diverged from the original sequence, and they can actually lose the very function of the original sequence. This is what we mean by “the "reliability" of a sequence [that] refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure) [i.e. the function of the ancestral sequence]. In the abstract, we do not say that we considered the isolated proteins as distinct systems, but that « Homologous sequences were considered as systems ».

I hope I made things clearer, and will be happy to discuss things when we have an opportunity to meet.

Competing interests

One of the authors of the article

wrong modelling of protein sequence evolution

Yaniv Loewenstein, Hebrew University Of Jerusalem

28 August 2008

Reading the abstract, it seems to me that your assumptions are not just simplifying, but completely wrong. As sequences diverge, we gain more information on the sequence-function connection (not less). Proteins are not just old machines that wear and tear, but quite the contrary. You say that "The failure rate is related to the systems longevity.", but I would assume that in protein sequences it is the other way around - older proteins have more "experience", and more chances to learn new tricks to be more robust towards failures. Moreover, the assertion that there is a constant amount of information in all amino acids clearly contradicts most of what we know on proteins.

Competing interests

None declared
wrong modelling of protein sequence evolution: a reply

Eric Marechal, CNRS

29 September 2008

A comment has been posted declaring that the model of protein evolution published here was wrong. Well, in this comment it appears that the meaning of “information” as meant in our work has not been really understood by the reader.
“Information” can have different meanings.
- In the sense of the protein structure-function relationship (i.e. in terms of the emergence of a molecular function from a sequence), “information” can mean what is encrypted in the sequence that is responsible for the function of the folded protein. In this meaning, Mr Loewenstein is right to say that « the assertion that there is a constant amount of information in all amino acids clearly contradicts most of what we know on proteins ». It is unfortunately not in that sense that the term "information" was used here.
- In the sense of phylogenetic signal, “information” (in fact “mutual information” in the sense of Hartley, and clearly stated in the abstract of this paper) can mean what is shared by sequences, meant as chains of characters" that have evolved from an original ancestor. In this meaning, as sequences diverge, we indeed lose information shared by sequences with their ancestor, and particularly it become difficult to reconstruct the evolutionary history. This is another way to describe a well known phenomenon called « loss of phylogenetic signal » (what L. Broccheiri calls « loss of phylogenetic informational content » in Phylogenetic Inferences from Molecular Sequences: Review and Critique. Theoretical Population Biology 59, 27_40 (2001)). Mr. Loewenstein says that « As sequences diverge, we gain more information on the sequence-function connection (not less). » Obviously, this paper dealing on sequences information in the sense of phylogenetic signal has been red understanding the information in the sense of the structure-function relationship.
- There is a third definition of information which has not been treated here, concerning the “mutual information” of amino acids inside a given sequence, which play a joint role in the protein function (these amino acid co-evolve, and this can be measured by their mutual information).
Things being not so simple, there is a relation between these definitions. The "mutual information" of "homologous sequences" can be measured by aligning these sequences that diverged from an original ancestor (for instance the mutual information measured in chains of characters “pere” <->”father” after there divergence from, let’s says ”pater”). However, concerning proteins, the evolutionary tree depends on natural selection constraints, and particularly a conservation of the function, and therefore on a level of information in the sense of structure-function relationship.
Concerning the fact “Proteins are not just old machines that wear and tear, but quite the contrary”, well, in the sense of phylogenetic evolution, proteins are indeed old machines that diverged from the original sequence, and they can actually lose the very function of the original sequence. This is what we mean by “the "reliability" of a sequence [that] refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure) [i.e. the function of the ancestral sequence]. In the abstract, we do not say that we considered the isolated proteins as distinct systems, but that « Homologous sequences were considered as systems ».
I hope I made things clearer, and will be happy to discuss things when we have an opportunity to meet.

Competing interests

One of the authors of the article

Archived Comments for: Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores

wrong modelling of protein sequence evolution

Competing interests

wrong modelling of protein sequence evolution: a reply

Competing interests

BMC Bioinformatics

Contact us