The human "magnesome": detecting magnesium binding sites on human proteins
© Piovesan et al.; licensee BioMed Central Ltd. 2012
Published: 7 September 2012
Magnesium research is increasing in molecular medicine due to the relevance of this ion in several important biological processes and associated molecular pathogeneses. It is still difficult to predict from the protein covalent structure whether a human chain is or not involved in magnesium binding. This is mainly due to little information on the structural characteristics of magnesium binding sites in proteins and protein complexes. Magnesium binding features, differently from those of other divalent cations such as calcium and zinc, are elusive. Here we address a question that is relevant in protein annotation: how many human proteins can bind Mg2+? Our analysis is performed taking advantage of the recently implemented Bologna Annotation Resource (BAR-PLUS), a non hierarchical clustering method that relies on the pair wise sequence comparison of about 14 millions proteins from over 300.000 species and their grouping into clusters where annotation can safely be inherited after statistical validation.
After cluster assignment of the latest version of the human proteome, the total number of human proteins for which we can assign putative Mg binding sites is 3,751. Among these proteins, 2,688 inherit annotation directly from human templates and 1,063 inherit annotation from templates of other organisms. Protein structures are highly conserved inside a given cluster. Transfer of structural properties is possible after alignment of a given sequence with the protein structures that characterise a given cluster as obtained with a Hidden Markov Model (HMM) based procedure. Interestingly a set of 370 human sequences inherit Mg2+ binding sites from templates sharing less than 30% sequence identity with the template.
We describe and deliver the "human magnesome", a set of proteins of the human proteome that inherit putative binding of magnesium ions. With our BAR-hMG, 251 clusters including 1,341 magnesium binding protein structures corresponding to 387 sequences are sufficient to annotate some 13,689 residues in 3,751 human sequences as "magnesium binding". Protein structures act therefore as three dimensional seeds for structural and functional annotation of human sequences. The data base collects specifically all the human proteins that can be annotated according to our procedure as "magnesium binding", the corresponding structures and BAR+ clusters from where they derive the annotation (http://bar.biocomp.unibo.it/mg).
Magnesium is the most abundant divalent alkaline ion in living cells and it is an indispensable element for many biological processes. Magnesium deficiency in humans is responsible for many diseases including osteoporosis  or metabolic syndrome (MetS), a combination of different metabolic disorders that increase the risk of developing cardiovascular diseases and diabetes . Magnesium is characterised by specific chemico-physical properties: it is redox inert, it has a small ionic radius and is consequently endowed with a high charge density [3, 4]. In cells magnesium ions have both structural and functional roles. Magnesium plays a key role in stabilising protein structures, phosphate groups of membrane lipids and negatively charged phosphates of nucleic acids. Concomitantly, it is also involved in catalytic roles, such as the activation/inhibition of many enzymes [3, 4].
Observations on the structural geometry of Mg2+ binding sites in proteins known with atomic resolution may be derived from PROCOGNATE, a cognate ligand domain mapping for enzymes  and from the Protein Data Bank [PDB, http://www.rcsb.org]. Typical magnesium binding sites on proteins show three or fewer direct binding contacts with carbonyl oxygen atoms of the backbone and/or protein side chains, with a tendency to bind water molecules given the octahedral coordination geometry of the divalent cation [3, 6]. It is known that Mg2+ binding sites are less specific than those of other divalent cations such as Zn2+ and Ca2+, and that in particular conditions, Zn2+ can dislocate Mg2+ from its pocket [3, 7]. Apparently metal binding sites on proteins seem to satisfy constraints related to the physiological availability of the ions . Magnesium binds weakly to proteins and enzymes (Ka≤ 105 M-1)  and its binding affinity appears to be dependent on its high cellular concentration. Free Mg2+ concentration is higher than that of any other ion (0.5-1mM, ). As a consequence magnesium binding sites are less conserved through evolution than those of others divalent cations  and their detection is therefore difficult. Mg2+ binding sequence motifs have been described to be conserved in similar RNA and DNA polymerases [9, 10]. Three dimensional Mg2+ binding pockets derived from 70 Mg2+ binding proteins solved at atomic resolution were recognised in protein structures by implementing a structural alphabet .
In this work we describe how to assign putative Mg2+ binding sites to human proteins that lack structural information and also to proteins that share less than 30% sequence identity with any available Mg2+ binding protein template. This is possible within our BAR-PLUS annotation resource (BAR+), a non hierarchical clustering method that has been recently described and relies on the pair wise sequence comparison of about 14 millions proteins, including 998 complete proteomes of different species and Homo sapiens [12, 13]. This paper to our knowledge describes the first large scale investigation of magnesium binding sites at the human proteome level. The results highlight that residues involved in magnesium binding in protein structures (derived from the PDB) falling into the same BAR+ cluster are conserved and can be transferred to all the human sequences sharing the same cluster on the basis of structure to sequence alignment with a cluster specific hidden Markov model (HMM). Magnesium binding sites within a given cluster are also conserved when pair-wise sequence identity among the target and the template/s is less than 30%. A data base (BAR-hMG) is made available from where for a given human input sequence the predicted magnesium binding site/s can be retrieved with the corresponding structural template/s and the annotating BAR+ cluster.
The dataset of Mg2+ binding protein structures
A list of 4,710 magnesium binding protein structures was retrieved from the Ligand-Expo database  by searching "MG" as Mg2+ ligand identifier. The Expo database is a data warehouse that integrates databases, services and tools related to small molecules bound to macromolecules and based on PDB. It allows users to extract ligand information directly from the PDB, to perform chemical substructure searches of PDB ligands using a graphical interface and also to browse other relevant small molecule resources on the Web. It is updated daily and therefore provides the most current information on small molecules present in the PDB. Its reliability is based on the reliability of the structures from where information is derived and ultimately on the resolution of the electron density map of the molecule. Our set includes PDBs with an average Resolution (R) factor of 0.23 nm. The list of magnesium binding residues and corresponding positions in the sequence for each PDB was obtained parsing both the "LINK" and "SITE" fields on the coordinate files . In order to guarantee that magnesium is part of a biologically significant PDB structure, we filtered out fragments and chimeric structures by constraining the coverage of the template PDB structure to its UniProtKB corresponding sequence (without signal peptide, when present) to be ≥70%. This bound guarantees a satisfactory overlapping of the sequence to its structure and this is essential in building by homology procedures. Applying this criterion, we ended up with 1,341 PDB templates. For each PDB structure the reference sequence and the corresponding UniProtKB  accession are obtained from the Sifts web server . In case of multiple PDBs containing different magnesium binding sites and referring to the same sequence, all the sites are mapped into the protein sequence. Human sequences are collected from UniProtKB (release 2011_02), including also splicing isoforms, for a total of 110,464 sequences. Most of these sequences are annotated in UniProtKB in an automatic way and lack any experimental evidence. When fragments are filtered out, the total number of human sequences adopted for our analysis is 84,520.
The BAR-PLUS annotation resource
BAR+ is an annotation resource based on the notion that sequences with high identity value to a counterpart can inherit from this the same function/s and structure, if available (http://bar.biocomp.unibo.it/bar2.0/). The method has been recently described . Briefly, an extensive BLAST alignment  was performed for some 13,495,736 sequences in a GRID environment . The sequence similarity network was built by connecting two proteins only if their sequence identity is ≥40% with an overlap (Coverage, COV) ≥90%. 913,762 clusters were obtained by splitting of the connected components of the similarity network. Mapping of PDB, Pfam functional domains (http://pfam.sanger.ac.uk/) and GO terms (Gene Ontology terms, http://www.geneontology.org/) as listed in the UniProtKB protein files allows different annotation types within each cluster. Enrichment of Pfam domains [http://www.sanger.ac.uk/resources/databases/pfam.html] and GO terms [http://www.geneontology.org/] for each cluster was statistically validated (by computing a Bonferroni corrected P-value and by selecting its significance threshold with a bootstrapping procedure) . Only when P<0.01, terms are transferred from one protein to another one in the same cluster and annotation is inherited by all the sequences in the cluster. When a sequence falls into a validated cluster it can inherit in a validated manner functional and structural annotation (PDB +/SCOP +/Pfam +/GOterms +/). Stand alone sequences are called Singletons (30.4% of the total protein universe). Clusters can contain distantly related proteins that by this procedure can be annotated with high confidence. We verified that the magnesium containing 1,341 PDB structures were in BAR+ clusters and when not present, we included them in the corresponding cluster. In any case we verified that backbone structure was conserved in the same cluster (average Root Mean Square Deviation (RMSD) was about 2.0±0.2 Å) (for the definition of RSMD see: http://cnx.org/content/m11608/latest/). The human sequences were then aligned against BAR+ clusters and only those satisfying the BAR+ constraints (ID≥40% and COV≥90%) were retained. Out of the 84,520 human sequences aligned towards BAR+ with the required criteria, some 61,106 fell into 22,858 clusters and some 2,791 aligned with singletons. The remaining portion of the human proteome (aligned with sequences contained in BAR+ clusters with lower sequence identity and coverage than those required for a validated transfer of annotation) is not considered in the present analysis. In BAR+, each cluster endowed with structure/s is characterised by a computed cluster Hidden Markov Model (HMM) that is derived from a structure-to-sequence alignment within the cluster and can be adopted to model the cluster sequences on the structure template/s of the cluster . We took advantage of the cluster HMM both for structural alignments of the newly introduced PDB structures and for sequence-to-structure alignment.
Selection of the "human magnesome"
Out of the above selected 61,106 human sequences, we focused on the subset that comprises all the chains included in 251 clusters endowed with magnesium containing PDB structures. In our clusters, we deal with 1,341 PDBs. We therefore checked all the PDB files, the corresponding UniProtKB files and the related literature. From this effort we were able to verify that for only 119 structures (9% of the total) in 21 clusters there is no published observation supporting so far any functional or structural role of MG. Within the clusters, sequences could also safely inherit validated Pfam functional domains and GO functional terms (Molecular Function, Biological Process and Cellular Component, http://www.geneontology.org/).
Binding positions were transferred from the template/s to the target after pair-wise alignment/s based on the cluster HMM. 251 clusters contain Mg binding templates and there from an equivalent number of HMM models were used to transfer Mg binding position/s to the human sequences in the clusters. 141 clusters contain 827 magnesium binding protein structures derived from non human species (25 different Eukaryota, 42 different bacteria, 9 different Archaea and 1 virus). 110 clusters contain 514 human templates.
Results and discussion
Finding Magnesium binding sites with BAR+
Bound Mg in this structure is not as yet supported by any experimental observation highlighting a specific functional role. The whole BAR-hMG data base contains 21 out of 251 clusters with templates binding Mg without any experimental (still) determined functional or structural role. This information can be retrieved for each template from the corresponding PDB and UniProtKB files and the quoted literature therein. It should be considered that Mg ions may play a role on protein stability still not fully described or even a role in protein-protein interaction that is at the basis of many relevant biological processes. In many instances the formation of protein complexes has not yet been recognized due to its transient characteristics. Therefore the question is still open and we therefore included also these cases in our data set for a comprehensive analysis of putative Mg binding sites. Clusters containing templates where Mg has a documented structural and functional role are labelled with a yellow star, and a yellow star and the corresponding EC number, respectively. For this reason no label is present in the figure.
Annotation of Mg2+ binding sites in human proteins
Human sequences annotated with human structural templates
Cluster RMSD (Å)
Newly annotated sequence (#)
Annotated sequence (ID<30%)*
Mg and Ions
Mg and Ligands
Mg , Ions and Ligands
Human sequences annotated with structural templates from other organisms
Cluster RMSD (Å)
Newly annotated sequence (#)
Annotated sequence (ID<30%)*
Mg and Ions
Mg and Ligands
Mg , Ions and Ligands
The number of PDB human protein structures with bound magnesium (514) univocally identifies 172 template sequences; within the BAR+ environment this number reaches 2,688 (Annotation inherited from human templates). Some other 1,063 human sequences inherit annotation within BAR+ clusters where the structural templates are from other organisms (Table 2) (Annotation inherited from other organisms).
When more PDB structures fall into the same cluster (Table 1 and 2) their RMSDs are very low (<1 Å) for all the groups. This indicates that the BAR+ clusters preserve the structural specificity. Therefore when a target sequence falls into a cluster characterised by Mg binding, the corresponding site annotation can be safely inherited. This is so also for very distantly related sequences (sequence identity <30%, last column) that are in the same cluster.
In BAR-hMG some 3,751 human sequences are annotated as Mg binding. About 98% of this set is annotated for the first time. For these sequences the corresponding UniProtKB entry neither has any information on Mg binding nor contains any GO term related to Mg binding.
Localising the human Mg2+ binding sequences
Localising the human magnesium binding sequences
GO terms (Cellular Component)
GO terms (Cellular Component)
Mg + Ions + Ligands
endoplasmic reticulum lumen
endoplasmic reticulum part
Mg + Ions
site of polarized growth
cell division site
Mg + Ligands
cytoplasmic mRNA processing body
cytoplasmic membrane-bounded vesicle
cell leading edge
intracellular membrane-bounded organelle
plasma membrane enriched fraction
internal side of plasma membrane
intracellular membrane-bounded organelle
The "Human Magnesome" database
In this work we address the problem of annotating magnesium binding sites in proteins starting from their sequence. We take advantage of an annotation resource recently introduced (BAR+, ), where functional and structural features derived from PDB structures are implemented into HMM models that allows sequence to template alignment even when sequence identity is below 30%. This procedure is based on the notion of "cluster", a set of sequences retrieved as connected components of a graph where two proteins are linked together when they share a sequence identity greater or equal than 40% in at least 90% of the pair wise alignment length. By restricting our analysis to clusters containing human sequences and magnesium binding PDB structures, we align with the cluster HMMs some 3,751 human sequences that fall in the same clusters and inherit by this the magnesium binding feature. Some 370 human sequences share an identity to the template less than 30%.
We therefore prove feasible that magnesium binding sites can be inherited from a given template when the sequence falls inside a well annotated cluster from where it derives also validated Pfam functional domains and GO functional terms. Presently we can annotate some 5% of the human genome as inheriting the capability of binding magnesium ions. All the analysed sequences, their binding sites, and the corresponding clusters from where they derive annotation are included in the Human Magnesome data set (BAR-hMG), freely available at http://bar.biocomp.unibo.it/mg.
RC thanks the following grants: PRIN 2009 project 009WXT45Y (Italian Ministry for University and Research: MIUR), COST BMBS Action TD1101(European Union RTD Framework Programme), and PON project PON01_02249 (Italian Ministry for University and Research: MIUR). DP is a recipient of a PHD fellowship from the Ministry of the Italian University and Research. GP is a recipient of a research contract from Health Science and Technologies-ICIR.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 14, 2012: Selected articles from Research from the Eleventh International Workshop on Network Tools and Applications in Biology (NETTAB 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S14
- Rude RK, Singer FR, Gruber HE: Skeletal and hormonal effects of magnesium deficiency. J Am Coll Nutr. 2009, 28 (2): 131-141. [http://www.jacn.org/content/28/2/131.long]View ArticlePubMedGoogle Scholar
- Belin RJ, He K: Magnesium physiology and pathogenic mechanisms that contribute to the development of the metabolic syndrome. Magnes Res. 2007, 20 (2): 107-129.PubMedGoogle Scholar
- Bertini I, Gray HB, Stiefel EI, Valentine EI: Biological Inorganic Chemistry: Structure and Reactivity. 2007, Sausalito (CA): University Science BooksGoogle Scholar
- Cowan JA: Metal Activation of Enzymes in Nucleic Acid Biochemistry. Chem Rev. 1998, 98 (3): 1067-1088. 10.1021/cr960436q.View ArticlePubMedGoogle Scholar
- Bashton M, Nobeli I, Thornton JM: PROCOGNATE: a cognate ligand domain mapping for enzymes. Nucleic Acids Res. 2007, 36: D618-D622. 10.1093/nar/gkm611.PubMed CentralView ArticlePubMedGoogle Scholar
- Dudev T, Cowan JA, Lim C: Competitive Binding in Magnesium Coordination Chemistry: Water versus Ligands of Biological Interest. J Am Chem Soc. 1999, 121 (33): 7665-7673. 10.1021/ja984470t.View ArticleGoogle Scholar
- Dudev T, Lim C: Metal Selectivity in Metalloproteins: Zn2+ vs Mg2+. J Phys Chem B. 2001, 105 (19): 4446-4452. 10.1021/jp004602g.View ArticleGoogle Scholar
- Cowan J: Structural and catalytic chemistry of magnesium-dependent enzymes. Biometals. 2002, 15 (3): 225-235. 10.1023/A:1016022730880.View ArticlePubMedGoogle Scholar
- Zaychikov E, Martin E, Denissova L, Kozlov M, Markovtsov V, Kashlev M, Heumann H, Nikiforov V, Goldfarb A, Mustaev A: Mapping of Catalytic Residues in the RNA Polymerase Active Center. Science. 1996, 273 (5271): 107-109. 10.1126/science.273.5271.107.View ArticlePubMedGoogle Scholar
- Joyce CM, Steitz TA: Function and Structure Relationships in DNA Polymerases. Annu Rev Biochem. 1994, 63: 777-822. 10.1146/annurev.bi.63.070194.004021.View ArticlePubMedGoogle Scholar
- Dudev M, Lim C: Discovering structural motifs using a structural alphabet: Application to magnesium-binding sites. BMC bioinformatics. 2007, 8 (1): 106-10.1186/1471-2105-8-106.PubMed CentralView ArticlePubMedGoogle Scholar
- Bartoli L, Montanucci L, Fronza R, Martelli PL, Fariselli P, Carota L, Donvito G, Maggi GP, Casadio R: The Bologna Annotation Resource: a Non Hierarchical Method for the Functional and Structural Annotation of Protein Sequences Relying on a Comparative Large-Scale Genome Analysis. J Proteome Res. 2009, 8: 4362-4371. 10.1021/pr900204r.View ArticlePubMedGoogle Scholar
- Piovesan D, Martelli PL, Fariselli P, Zauli A, Rossi I, Casadio R: BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Res. 2011, 39 (Web Server issue): W197-W202.PubMed CentralView ArticlePubMedGoogle Scholar
- Feng Z, Chen L, Maddula H, Akcan O, Oughtred R, Berman HM, Westbrook J: Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics. 2004, 20 (13): 2153-2155. 10.1093/bioinformatics/bth214.View ArticlePubMedGoogle Scholar
- Berman HM, Henrick K1, Nakamura H: Announcing the worldwide Protein Data Bank. Nat Struct Biol. 2003, 10 (12): 98-View ArticleGoogle Scholar
- The UniProt Consortium: Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 2011, 39: D214-D219.PubMed CentralView ArticleGoogle Scholar
- Velankar S, McNeil P, Mittard-Runte V, Suarez A, Barrell D, Apweiler R, Henrick K: E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res. 2005, D262-D265. 33 DatabaseGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.