- Research article
- Open Access
Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase
© Sheydina et al.; licensee BioMed Central Ltd. 2014
Received: 9 September 2013
Accepted: 31 March 2014
Published: 17 April 2014
Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.
BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.
Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.
The human gut microorganisms form a specialized community, the human gut microbiome, that plays an important role in normal functioning of digestive metabolism, in nutrition and, possibly, in the development of the human immune system . As part of their adaptation to the gut environment, the bacterial species forming the microbiome have developed an extensive ability to metabolize a wide variety of polysaccharides. This allows humans to utilize a broad range of plant- and host-secreted glycans that would otherwise be indigestible. Bacteroides spp. are an essential part of the human gut microbiome and provide us with a broad range of metabolic enzymes [2, 3]. The Gram-negative bacterium Bacteroides thetaiotaomicron is a dominant member of the normal human distal intestine and colon microbiota and has a large repertoire of genes for harvesting nutrients from a wide range of polysaccharides derived from both plants as well as hosts .
BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large family of uncharacterized proteins. The sequence analysis predicted that the BT_1012 protein is related to glycoside hydrolase family 5, based on the Carbohydrate-Active Enzymes (CAZy) classification . A classification of glycoside hydrolases into families based on amino acid sequence similarity has been in place for a few decades . However, structure analysis and comparisons allow us to confirm and fine tune function predictions based on sequence analysis.
A search against Pfam database  predicts that this protein has two domains: the N-terminal domain belongs to the PF13204 (DUF4038) family and the C-terminal domain, which is a member of the PF12904 family, currently annotated as a collagen-binding domain. Many protein families annotated as DUFs represent divergent branches of already known and well-characterized families, and the DUF4038 is no exception. It belongs to the Pfam clan CL0058, the TIM barrel glycosyl hydrolase superfamily. This allows us to hypothesize that it also may be a carbohydrate hydrolase.
The Pfam database currently contains over 3,500 families annotated only as “domains of unknown function” . Such families, because of the acronym of their name as known as DUFs and are differentiated by their number, such as DUF4038. In a coordinated effort the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for representatives of more than 400 of such families, and the first 250 were analyzed by our group previously . In this paper we analyze the crystal structure of the BT_1012 protein and combine several bioinformatics approaches to suggest the function of this protein. The structure the BT_1012 protein was solved by JCSG and deposited in the PDB database as [PDB: 3KZS] in 2009.
Results and discussion
The crystal structure of the BT_1012 (NP_8009925.1) protein from Bacteroides thetaiotaomicron VPI-5482 was determined to 2.1 Å by MAD (Multi-wavelength anomalous diffraction) phasing. Data-collection, model, phasing, and refinement statistics are summarized in Additional file 1: Table S1. The final model includes four molecules (residues 27–483), sixteen sulfate ions, two (4S)-2-methyl-2,4-pentanediol (MPD), eight (4R)-2-methylpentane-2,4-diol (MRD), and 1208 water molecules in the asymmetric unit. Modeling of the electron density for 2-methylpentanve-2,4-diol was subjective because of the 2.10 Å resolution limit, and further analysis showed that either the R or S enantiomer of 2-methylpentane-2,4-diol could be modeled and refined. The structure is composed of twelve alpha-helices, five 310-helices, twenty beta strands. Gly 0 (which remained at the cleavage of the expression/purification tag), the region from Ala 22-Thr 26 on subunits A, C, and D; and Ala 22-Gln 27 on subunit B were disordered and not modeled. Subunit D was partly disordered in the asymmetric unit and its statistics are slightly different from that of subunits A-C. The Matthews coefficient (VM: Matthews, 1968) is 2.77 Å3Da-1 and the estimated solvent content is 55.6%. The Ramachandran plot produced by MolProbity  shows that 94.1% with seven outliers.
The C–terminal domain of the BT_1012 protein, corresponding to the Pfam PF12904 domain family, was previously annotated in Pfam as a collagen-binding domain. This domain has a beta-sandwich fold (Figure 1A). According to Pfam, this domain is found almost exclusively at the C–termini of proteins with the PF13204 domain. The structural comparison by DALI and FATCAT found that this domain to be similar to C-terminal domains of different hydrolases with a broad spectrum of substrate specificity (for example galactosidase, xylosidase, and dextranase). Based on CAZY classification, the top hits of FATCAT and DALI searches are proteins bearing catalytic domains belonging to GH27 (GH-D), GH39 (GH-A), and GH59 (GH-A) (Additional file 2: Table S2) glycosyl hydrolase families. The top ten DALI and FATCAT hits are different from those found in the N-terminal domain search (Additional file 2: Table S2) and these proteins do not have similar domain combinations. It is interesting that several proteins with overall structural similarity to 3KZS (over the entire length of the structure) belong to cellulases of subfamilies 8 and 34 of the GH5 family (Additional file 4: Table S4). The GH5 family has different carbohydrate-binding modules (CBM), however some of them (CBM6, CBM15, CBM29) have beta-sandwich folds . According to the CAZy database, 16 of the 394 B. thetaiotaomicron VPI-5482 enzymes possess CBM domains. Of these, the majority (11) belong to the CBM32 family, which also has a beta-sandwich fold.
The whole structure 3KZS and functional prediction for each domain
The structure similarity searches using DALI and FATCAT with the whole BT_1012 structure still identified single domain proteins as the top hits. Thus we separately analyzed top multi-domain proteins found in these searched (Additional file 4: Table S4). With this constraint, the top match is the beta-galactosidase from Bacillus circulans sp. alkalophilus (PDB: 3TTS). This enzyme has three domains and an atypical active site . According to the authors, the function of the third domain, which has a beta-sandwich fold, is purely structural because they did not find clefts on the surface or cavities that could have carbohydrate binding function.
The distant homology recognition for the BT_1012 protein by FFAS found that the top ten hits belong to the GH5 family. All other GH5 hits have a single (catalytic) domain with only one exception - arabinoxylan-specific xylanase from Clostridium thermocellum ATCC 27405 (PDB: 2Y8K) has an additional domain that belongs to the CBM6 family. This module was shown to increase thermostability of the catalytic domain and was involved in binding of cellohexaose or xylohexaose . Another example of a glycoside hydrolase with an additional C-terminal domain that is a distant homolog of 3KZS is β-xylosidase II from Caulobacter crescentus CB15 (PDB: 4EKJ). This protein belongs to the GH39 family and has a C terminal domain that regulates the accessibility and molecular topography of the active site . Thus, there are several examples that support our hypothesis that the C-terminal domain has a supportive and regulatory function. This domain is always found in two domain proteins following the PF13204 domain, and about 60% of the PF13204 family members have the C-terminal domain (see more examples in Additional file 4: Table S4).
Many carbohydrate-related proteins in Bacteriodes are grouped into polysaccharide utilization loci, often of well-defined induction specificity. However, the BT_1012 coding gene is not a part of any of these loci. Genomic-context analysis using the MicrobesOnline database  and STRING revealed that BT_1012 is colocalized with alpha-rhamnosidase, and rhamnosidases (NOG10735 on STRING) are linked by genome neighborhoods in other species too. Rhamnose is commonly bound to other sugars, but is also a common component of plant glycosides . This suggests that BT_1012 may be a partner of alpha-rhamnosidase in plant sugar degradation. Taken together that the catalytic domain has longer loops compared to mannanases (for example 1RH9 and 1UUQ), this may suggest that it can make additional interactions with long polysaccharide substrates.
The crystal structure of the BT_1012 protein and structure-based sequence-structure-function analysis suggests that BT_1012 and its approximately 150 full length homologs, ranging in sequence identity from 40-60%, are two domain glycoside hydrolases, which include an N-terminal catalytic domain and a C-terminal auxiliary domain which may be involved in stabilizing or regulating the catalytic domain. Based on our findings, we have renamed the PF13204 family as “putative glycoside hydrolase” and the PF12904 family as “glycoside hydrolase-associated C-terminal domain”.
Protein production and crystallization of CA_C2195 was carried out by standard JCSG protocols . Data collection was performed at SSRL beamline 11–1. The crystal structure was determined by multi-wavelength anomalous diffraction phasing (MAD) using seleno-methionine-derivatized protein and x-ray data collection, processing, structure solution, tracing, crystallographic refinement, and model building were performed using BLU-ICE , MOSFLM /SCALA , SHELXD /AUTOSHARP , ARP/wARP , REFMAC , and COOT . Modeling, phasing, and refinement statistics were done based on the standard JCSG protocol [33–36]. After building and refining the protein chains A, B, and C, anomalous difference Fourier maps and isomorphous difference Fourier maps suggested that there was a fourth subunit in the crystallographic asymmetric unit. However, the electron density for this subunit is poor, and both the electron density map and anomalous difference Fourier maps indicate that this extra subunit is disordered. The anomalous difference Fourier peaks were used as a guide the building of chain D. The pattern of these peaks supports modeling of the subunit in two half occupancy conformations. Note that while chain D part B would symmetry clash with itself, it does not clash with the symmetry mate of part A. Additionally, chain D part A does not clash with the symmetry mate of chain D part B.
To find homologs for sequence conservation analysis, PSI-BLAST was used to search the Uniref90 database in 3 iterations with e-value cutoff of 0.0001, identifying 150 homologs with sequence similarity between 35-95%. MAFFT was used for multiple alignment . Figures were prepared using PyMOL  and ESPirits . The protein secondary-structure elements were determined according to the database of secondary structure assignments (DSSP) . Phylogenetic analysis was performed using distance based approaches, such as FastM 1.1 , neighbor-joining from PHYLIP 3.66  pair-wise distances were calculated by TREE-PUZZLE 5.2 using the VT model . A phylogenetic tree was drawn and visualized with FORESTER . Pfam data is from release 27.0 .
We are grateful to the Sanford Burnham Medical Research Institute (SBMRI) and UC San Diego for hosting the DUF annotation jamboree in June 2013, which allowed the authors to collaborate on this work. We would like to thank all the participants of this workshop for their intellectual contributions to this work, in particular we acknowledge contributions from L. Aravind, Alex Bateman, Penny Coggill, Debanu Das, Rob Finn, William Hwang, Lukasz Jaroszewski, Alexey Murzin, Padmaja Natarajan, Marco Punta, Neil Rawlings, Mayya Sedova and John Wooley. We also want to thank the members of the JCSG high-throughput structural biology pipeline for structure determination of the BT_1012 protein and for depositing the coordinated to the PDB.
Wellcome Trust (grant numbers WT077044/Z/05/Z); Howard Hughes Medical Institute (R.D.F.); NIH U54 GM094586; National Science Foundation (IIS-0646708 and IIS-1153617). Funding for open access charge: NIH U54 GM094586 and SBMRI institutional funds; Portions of this research were carried out at the Stanford Synchrotron Radiation Lightsource, a Directorate of SLAC National Accelerator Laboratory and an Office of Science User Facility operated for the U.S. Department of Energy Office of Science by Stanford University. The SSRL Structural Molecular Biology Program is supported by the DOE Office of Biological and Environmental Research, and by the National Institutes of Health, National Institute of General Medical Sciences (including P41GM103393). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of DOE, NSF, NIGMS, NCRR or NIH.
- Rosenstiel P: Stories of love and hate: innate immunity and host-microbe crosstalk in the intestine. Curr Opin Gastroenterol. 2013, 29 (2): 125-132. 10.1097/MOG.0b013e32835da2c7.View ArticlePubMedGoogle Scholar
- Tasse L, Bercovici J, Pizzut-Serin S, Robe P, Tap J, Klopp C, Cantarel BL, Coutinho PM, Henrissat B, Leclerc M, Doré J, Monsan P, Remaud-Simeon M, Potocki-Veronese G: Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes. Genome Res. 2010, 20 (11): 1605-1612. 10.1101/gr.108332.110.View ArticlePubMed CentralPubMedGoogle Scholar
- Quiocho FA: Carbohydrate-binding proteins: tertiary structures and protein-sugar interactions. Annu Rev Biochem. 1986, 55: 287-315. 10.1146/annurev.bi.55.070186.001443.View ArticlePubMedGoogle Scholar
- Xu J, Bjursell MK, Himrod J, Deng S, Carmichael LK, Chiang HC, Hooper LV, Gordon JI: A genomic view of the human-bacteroides thetaiotaomicron symbiosis. Science. 2003, 299 (5615): 2074-2076. 10.1126/science.1080029.View ArticlePubMedGoogle Scholar
- Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 2009, 37 (Database issue): D233-D238.View ArticlePubMed CentralPubMedGoogle Scholar
- Henrissat B, Davies G: Structural and sequence-based classification of glycoside hydrolases. Curr Opin Struct Biol. 1997, 7 (5): 637-644. 10.1016/S0959-440X(97)80072-3.View ArticlePubMedGoogle Scholar
- Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res. 2012, 40 (Database issue): D290-D301.View ArticlePubMed CentralPubMedGoogle Scholar
- Bateman A, Coggill P, Finn RD: DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun. 2010, 66 (Pt 10): 1148-1152.View ArticlePubMed CentralPubMedGoogle Scholar
- Jaroszewski L, Li Z, Krishna SS, Bakolitsa C, Wooley J, Deacon AM, Wilson IA, Godzik A: Exploration of uncharted regions of the protein universe. PLoS Biol. 2009, 7 (9): e1000205-10.1371/journal.pbio.1000205.View ArticlePubMed CentralPubMedGoogle Scholar
- Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC: MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010, 66 (Pt 1): 12-21.View ArticlePubMed CentralPubMedGoogle Scholar
- UniProt C: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic acids research. 2012, 40 (Database issue): D71-75.Google Scholar
- Dusko Ehrlich S: Meta HITc: [Metagenomics of the intestinal microbiota: potential applications]. Gastroenterol Clin Biol. 2010, 34 (Suppl 1): S23-S28.View ArticlePubMedGoogle Scholar
- Boraston AB, Bolam DN, Gilbert HJ, Davies GJ: Carbohydrate-binding modules: fine-tuning polysaccharide recognition. Biochem J. 2004, 382 (Pt 3): 769-781.View ArticlePubMed CentralPubMedGoogle Scholar
- Holm L, Sander C: Dali: a network tool for protein structure comparison. Trends Biochem Sci. 1995, 20 (11): 478-480. 10.1016/S0968-0004(00)89105-7.View ArticlePubMedGoogle Scholar
- Ye Y, Godzik A: Multiple flexible structure alignment using partial order graphs. Bioinformatics. 2005, 21 (10): 2362-2369. 10.1093/bioinformatics/bti353.View ArticlePubMedGoogle Scholar
- Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res. 2005, 33 (Web Server issue): W284-288.View ArticlePubMed CentralPubMedGoogle Scholar
- Shallom D, Shoham Y: Microbial hemicellulases. Curr Opin Microbiol. 2003, 6 (3): 219-228. 10.1016/S1369-5274(03)00056-0.View ArticlePubMedGoogle Scholar
- Maksimainen M, Paavilainen S, Hakulinen N, Rouvinen J: Structural analysis, enzymatic characterization, and catalytic mechanisms of beta-galactosidase from Bacillus circulans sp. alkalophilus. FEBS J. 2012, 279 (10): 1788-1798. 10.1111/j.1742-4658.2012.08555.x.View ArticlePubMedGoogle Scholar
- Correia MA, Mazumder K, Bras JL, Firbank SJ, Zhu Y, Lewis RJ, York WS, Fontes CM, Gilbert HJ: Structure and function of an arabinoxylan-specific xylanase. J Biol Chem. 2011, 286 (25): 22510-22520. 10.1074/jbc.M110.217315.View ArticlePubMed CentralPubMedGoogle Scholar
- Santos CR, Polo CC, Correa JM, Simao Rde C, Seixas FA, Murakami MT: The accessory domain changes the accessibility and molecular topography of the catalytic interface in monomeric GH39 beta-xylosidases. Acta Crystallogr D Biol Crystallogr. 2012, 68 (Pt 10): 1339-1345.View ArticlePubMedGoogle Scholar
- Dehal PS, Joachimiak MP, Price MN, Bates JT, Baumohl JK, Chivian D, Friedland GD, Huang KH, Keller K, Novichkov PS, Dubchak IL, Alm EJ, Arkin AP: MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res. 2010, 38 (Database issue): D396-400.View ArticlePubMed CentralPubMedGoogle Scholar
- Gotō M: Fundamentals of bacterial plant pathology. 1992, San Diego: Academic PressGoogle Scholar
- Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y: dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2012, 40 (Web Server issue): W445-451.View ArticlePubMed CentralPubMedGoogle Scholar
- Elsliger MA, Deacon AM, Godzik A, Lesley SA, Wooley J, Wuthrich K, Wilson IA: The JCSG high-throughput structural biology pipeline. Acta Crystallogr Sect F Struct Biol Cryst Commun. 2010, 66 (Pt 10): 1137-1142.View ArticlePubMed CentralPubMedGoogle Scholar
- McPhillips TM, McPhillips SE, Chiu HJ, Cohen AE, Deacon AM, Ellis PJ, Garman E, Gonzalez A, Sauter NK, Phizackerley RP, Soltis SM, Kuhn P: Blu-Ice and the distributed control system: software for data acquisition and instrument control at macromolecular crystallography beamlines. J Synchrotron Radiat. 2002, 9 (Pt 6): 401-406.View ArticlePubMedGoogle Scholar
- Battye TG, Kontogiannis L, Johnson O, Powell HR, Leslie AG: iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM. Acta Crystallogr D Biol Crystallogr. 2011, 67 (Pt 4): 271-281.View ArticlePubMed CentralPubMedGoogle Scholar
- The CCP4 suite: Programs for protein crystallography. Acta Crystallogr D Biol Crystallogr. 1994, 50 (Pt 5): 760-763.Google Scholar
- Sheldrick GM: A short history of SHELX. Acta Crystallogr A. 2008, 64 (Pt 1): 112-122.View ArticlePubMedGoogle Scholar
- Vonrhein C, Blanc E, Roversi P, Bricogne G: Automated structure solution with autoSHARP. Methods Mol Biol. 2007, 364: 215-230.PubMedGoogle Scholar
- Langer G, Cohen SX, Lamzin VS, Perrakis A: Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat Protoc. 2008, 3 (7): 1171-1179. 10.1038/nprot.2008.91.View ArticlePubMed CentralPubMedGoogle Scholar
- Winn MD, Murshudov GN, Papiz MZ: Macromolecular TLS refinement in REFMAC at moderate resolutions. Methods Enzymol. 2003, 374: 300-321.View ArticlePubMedGoogle Scholar
- Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004, 60 (Pt 12 Pt 1): 2126-2132.View ArticlePubMedGoogle Scholar
- Diederichs K, Karplus PA: Improved R-factors for diffraction data analysis in macromolecular crystallography. Nat Struct Biol. 1997, 4 (4): 269-275. 10.1038/nsb0497-269.View ArticlePubMedGoogle Scholar
- Weiss MS, Hilgenfeld R: On the use of the merging R factor as a quality indicator for X-ray data. J Appl Crystallogr. 1997, 30 (2): 203-205. 10.1107/S0021889897003907.View ArticleGoogle Scholar
- Weiss MS, Metzner HJ, Hilgenfeld R: Two non-proline cis peptide bonds may be important for factor XIII function. FEBS Lett. 1998, 423 (3): 291-296. 10.1016/S0014-5793(98)00098-2.View ArticlePubMedGoogle Scholar
- Cruickshank DW: Remarks about protein structure precision. Acta Crystallogr D Biol Crystallogr. 1999, 55 (Pt 3): 583-601.View ArticlePubMedGoogle Scholar
- Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic acids research. 2005, 33 (2): 511-518. 10.1093/nar/gki198.View ArticlePubMed CentralPubMedGoogle Scholar
- DeLano W: The PyMOL Molecular Graphics System, Version 1.2r3pre. 2002, DeLano Scientific: San Carlos, CAGoogle Scholar
- Gouet P, Courcelle E, Stuart DI, Metoz F: ESPript: analysis of multiple sequence alignments in PostScript. Bioinformatics. 1999, 15 (4): 305-308. 10.1093/bioinformatics/15.4.305.View ArticlePubMedGoogle Scholar
- Joosten RP, te Beek TA, Krieger E, Hekkelman ML, Hooft RW, Schneider R, Sander C, Vriend G: A series of PDB related databases for everyday needs. Nucleic Acids Res. 2011, 39 (Database issue): D411-419.View ArticlePubMed CentralPubMedGoogle Scholar
- Desper R, Gascuel O: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol. 2002, 9 (5): 687-705. 10.1089/106652702761034136.View ArticlePubMedGoogle Scholar
- Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18 (3): 502-504. 10.1093/bioinformatics/18.3.502.View ArticlePubMedGoogle Scholar
- Han MV, Zmasek CM: phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics. 2009, 10: 356-10.1186/1471-2105-10-356.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.