Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny
 Qin Chang^{1},
 Yihui Luan^{1}Email author and
 Fengzhu Sun^{2, 3}Email author
DOI: 10.1186/1471210512118
© Chang et al; licensee BioMed Central Ltd. 2011
Received: 30 September 2010
Accepted: 25 April 2011
Published: 25 April 2011
Abstract
Background
Beta diversity, which involves the assessment of differences between communities, is an important problem in ecological studies. Many statistical methods have been developed to quantify beta diversity, and among them, UniFrac and weightedUniFrac (WUniFrac) are widely used. The WUniFrac is a weighted sum of branch lengths in a phylogenetic tree of the sequences from the communities. However, WUniFrac does not consider the variation of the weights under random sampling resulting in less power detecting the differences between communities.
Results
We develop a new statistic termed variance adjusted weighted UniFrac (VAWUniFrac) to compare two communities based on the phylogenetic relationships of the individuals. The VAWUniFrac is used to test if the two communities are different. To test the power of VAWUniFrac, we first ran a series of simulations which revealed that it always outperforms WUniFrac, as well as UniFrac when the individuals are not uniformly distributed. Next, all three methods were applied to analyze three large 16S rRNA sequence collections, including human skin bacteria, mouse gut microbial communities, microbial communities from hypersaline soil and sediments, and a tropical forest census data. Both simulations and applications to real data show that VAWUniFrac can satisfactorily measure differences between communities, considering not only the species composition but also abundance information.
Conclusions
VAWUniFrac can recover biological insights that cannot be revealed by other beta diversity measures, and it provides a novel alternative for comparing communities.
Background
The assessment of differences between communities is an important problem in ecological studies. By comparing the compositions of natural communities from different environments, locations or time periods, we can learn how specific factors affect community assembly and how species or individuals associate with each other [1–3]. The development of nextgeneration highthroughput sequencers, such as the 454 Life Sciences Genome Sequencer FLX System, the Illumina 1G Genome Analysis System, and Applied Biosystems SOLiD Sequencing, has profoundly changed our approaches to ecological studies. With the rapid development of sequencing technologies, it is now possible to sequence a particular gene, such as 16S rRNA sequences, at very high depth without culturing [2, 4–6]. The new sequencing technologies also make it possible to efficiently and economically sequence the whole metagenome within a community [7, 8]. These techniques have revealed high microbial diversity present in the ocean, soil and human tissues.
Many statistics have been proposed to compare communities based on genomic sequence data of a specific gene sampled from the communities. These include LIBSHUFF [9], ∫LIBSHUFF [10], analysis of molecular variance (AMOVA) [11, 12], and homogeneity of molecular variance (HOMOVA) [13]. They mainly depend on the distances or similarities between sequences within the same community and between different communities. Other statistical methods for community comparison depend on a specific phylogenetic tree of the sequences and the tree can be either predefined or inferred from the genomic sequences. Such statistics include the parsimony test [14, 15], UniFrac [16], and weighted UniFrac (WUniFrac) [17]. In the parsimony test, each sequence is labeled according to the community it belongs to and then the parsimony score, the number of minimal changes along the tree necessary to explain all the labels of the sequences, is calculated according to Fitch's parsimony algorithm [18]. The statistical significance of the parsimony score has been evaluated using two different randomization procedures. The first randomization procedure is to randomize the tree for the sequences [14] and the second procedure is to randomize the labels of the sequences conditional on the tree [16]. These two randomization procedures test for different hypotheses. The first randomization procedure evaluates whether the sequences from the communities cluster randomly and the second procedure evaluates if the sequences are randomly distributed on the leaves of the given phylogenetic tree.
Lozupone et al. proposed a novel statistical method termed UniFrac [16] and a weighted UniFrac (WUniFrac) [17] to test if two communities are significantly different based on a phylogenetic tree. They have been widely applied to numerous recent studies to compare microbial communities, and significant biological insights have been obtained [2, 4, 19]. The procedures for calculating UniFrac and WUniFrac can be briefly described as follows. A phylogenetic tree composed of sequences from all the communities is first constructed using a phylogenetic analysis tool such as PHYLIP [20]. Similar as in the parsimony test, each sequence is labeled according to the community it comes from. Then UniFrac measures the distance between communities by the fraction of length of the tree branches that lead to descendants from each single community, but not from both communities [16]. The WUniFrac takes abundance information into consideration and weights each branch length by the difference of the fractions of sequences belonging to the branch for the two communities [17]. The significance of both tests are evaluated by randomizing the labels of the sequences. Using this randomization procedure, both UniFrac and WUniFrac test the hypothesis that the sequence labels are random along the leaves of the tree.
Despite the many studies on statistical methods to compare communities, there had been some confusions about the hypotheses being tested for the different statistics. Schloss [21] addressed this important issue using simulations. It was shown that AMOVA can be used to test if sequences from the different communities have the same mean (center) and HOMOVA can be used to evaluate if the variations within the communities are the same. On the other hand, the parsimony test, UniFrac and WUniFrac are valid for evaluating the general hypothesis that the communities are the same.
Note that in WUniFrac the length of a branch is weighted by the difference of relative abundances of the two communities for that branch of the tree. Under the null hypothesis that the two communities are the same, the weights for the different branches in WUniFrac have difference variances and we provide a formula for calculating the variance in this paper. Based on the variance formula, we propose a new weighting scheme for the branch length in WUniFrac by taking the variation of the weight into consideration. The new resulting statistic is termed Variance Adjusted Weighted UniFrac (VAWUniFrac). The statistical significance of VAWUniFrac is evaluated by randomizing the labels of the sequences along the leaves of the tree. Similar to UniFrac and WUniFrac, the VAWUniFrac can be used to evaluate if two communities are different. More precisely, it tests the hypothesis that the sequences from the communities are randomly distributed along the leaves of the tree. To study the power of this new statistic, we first carried out simulation studies similar to that in [21] to detect differences between communities based on UniFrac, WUniFrac and VAWUniFrac. The power of VAWUniFrac is always higher than that of WUniFrac. When the individuals are uniformly distributed in both communities, UniFrac can be more powerful than both WUniFrac and VAWUniFrac. However, when the individuals are not uniformly distributed, VAWUniFrac is more powerful than both UniFrac and WUniFrac. We also utilized UniFrac, WUniFrac, and VAWUniFrac in a reanalysis of four different real datasets, including three 16S rRNA sequence collections from different studies and one forest census data. Since VAWUniFrac demonstrated a capacity to gain novel biological insights beyond that of either UniFrac or WUniFrac, we concluded that VAWUniFrac offers a highly useful alternative approach for comparing communities.
Methods
UniFrac and WUniFrac
In the numerator, n is the number of branches in the tree, b_{ i } is the length of branch i, A_{ i } and B_{ i } are the numbers of individuals that descend from branch i in communities A and B, respectively, and A_{ T } and B_{ T } are the total numbers of individuals in communities A and B, respectively. In the denominator, n' is the number of different individuals in the two communities, d_{ j } is the distance from the root to individual j, while α_{ j } and β_{ j } are the numbers of times the sequences were observed in communities A and B, respectively (all the above numbers of individuals should be counted with multiplicity, except n'). The same annotation will be used in the rest of the paper.
A novel variance adjusted weighted UniFrac (VAWUniFrac) for comparing communities
From the definition of WUniFrac given above, we note that it does not consider the variance of the weight for the ith branch length assuming that the sequence labels are randomly distributed along the leaves of the tree. By ignoring the variance of ω_{ i } in WUniFrac, the true relationships between communities may not be well characterized. Hence, we propose to adjust the weight ω_{ i } as follows. Given individuals from two communities, A and B, we first generate a phylogenetic tree composed of all the A_{ T } + B_{ T } individuals in communities as leaves. Each leaf is labeled "A" or "B" to represent the community from which it comes. We test the hypothesis that the labels of the individuals are randomly distributed on the phylogenetic tree.
Similar to UniFrac and WUniFrac, the VAWUniFrac aims to test if the two communities are different and, more specifically, if the sequences are randomly distributed along the leaves of the tree. The statistical significance of VAWUniFrac is evaluated by randomizing the labels of the sequences. We are interested in which methods, including UniFrac, WUniFrac (WU), VAWUniFrac (T ) or SqT , are more powerful in detecting the relationship between two communities if they are related.
Simulation studies to compare the power of the statistics for detecting the relationships between communities
Schloss [21] evaluated the power of several different statistics for comparing the relationships between communities and studied the validity of the different statistics for testing various hypotheses. These statistical techniques included TreeClimber [15], UniFrac [16], WUniFrac [17], ∫LIBSHUFF [10], AMOVA [22], and HOMOVA [23]. In our study, similar simulation approaches are used to compare the power of UniFrac, WUniFrac, as well as VAWUniFrac and its variation SqT. Our objective is to understand which statistics are the most powerful and under what conditions. Since the simulation approaches are similar to those in [21], we only present a very brief description.
In the simulations, a community was represented by the interior of a circle or an ellipse with a certain density. Changing the overlap between circles (ellipses) or the distribution patterns of samples in circles (ellipses) represented changing the differences between the communities. The maximum distance between any two points in one community was designed to be 0.3 units according to the distance between sequences from different phyla [21]. Three classes of overlapping patterns were simulated. In the first class, the two communities were represented as circles, and points were uniformly sampled from each circle. The different overlapping patterns were obtained by changing the center and the radius of one circle. In the second class, the two communities were represented as ellipses, and points were uniformly sampled from each ellipse. The overlapping patterns were obtained by rotating one of the ellipses. In the third class, the two communities were represented by the same circle. One community was uniformly sampled from the circle, and the distribution of the points in the other community was not uniform.
For one comparison, we first sampled 200 points from each community, resulting in a total of 400 points. Second, Euclidean distances among all 400 points were calculated. Third, a phylogeny tree was generated based on this distance matrix using the neighbor joining method in the neighbor program in PHYLIP [20]. Fourth, the four statistics, UniFrac, WUniFrac, VAWUniFrac(T), and SqT, could then be calculated. Fifth, the labels of the 400 points were randomized 1000 times with the tree topology unchanged, and the corresponding four statistics were calculated for each randomized dataset. Finally, a Pvalue was calculated by the proportion of randomizations which result in statistics that are either equal to, or greater than, the original statistic. Pvalues less than 0.05 were considered significant. Therefore, after 1000 independent samplings, a proportion of significant Pvalues was obtained, representing the type 1 error rate when two communities were the same and the statistical power when the communities were different.
Applications to four real data sets
We applied UniFrac, WUniFrac and VAWUniFrac to reanalyze three datasets consisting of 16S rRNA sequences and one tropical forest census data. First, Costello et al. [4] investigated how environmental factors and foreign transplants shape skin bacterial communities. Plots on two skin sites of volunteers, both forehead and left volar forearm, were first disinfected, then inoculated with foreign microbiotas from other tissues, and, finally, followed over 2, 4 and 8 hours. The data were downloaded from the European Read Archive [ERA:ERA000159]. Second, Ley et al. [2] studied the effects of obesity and kinship on mouse distal intestinal microbial communities based on 16S rRNA gene sequence collections. They sampled 16S rRNA gene sequences obtained from the distal ceca of 19 mice, including 3 heterozygous (ob/+) mothers (M1, M2, and M3) and their 16 offspring with all three possible genotypes (obese ob/ob mice, lean ob/+ and wildtype +/+ mice). The final data, including all the sequences, ARB alignment [24] and phylogenetic tree are publicly available at http://gordonlab.wustl.edu/mice. Third, Hollister et al. [25] studied the microbial diversity of soil and sediments using both Sanger sequencing and pyrosequencing. Samples were collected at eight locations (T30, T365, T3130, T3195 T3260, T3325, T3390, and T3455) along a geographical transect from the shoreline of a hypersaline lake and lakebed. Point T30 is the terrestrial end of the transect, while point T3455 is the aquatic end. For each sample, both Sanger sequencing and pyrosequencing were performed. A total of 39590 16S rRNA sequences were generated through 454 sequencing, and 1693 16S rRNA sequences were generated through cloning and singlepass Sanger sequencing. The pyrosequencing libraries ranged in size from 1403 sequences at site T30 to 6745 sequences at site T3325. The Sanger clone collections ranged in size from 185 sequences at T30 and T3130 to 230 sequences at T3390 [25]. All the sequences were downloaded from NCBI [GenBank:CQ893028CQ894720, SRA:SRA009427.2]. The fourth dataset involves tropical forest census data in three plots across a precipitation gradient in central Panama [26–28]. The Cocoli 4ha plot is located in a dry, semideciduous forest on the Pacific side, and it has 3 census data: 1994, 1997, and 1998. The 50ha BCI plot is located in the tropical moist forest of Barro Colorado Island (BCI) in central Panama, and it has 6 census data: 19811983, 1985, 1990, 1995, 2000, and 2005. The third plot is the Sherman 5.6ha plot, the wettest of the three, located near the Atlantic coast, 55 km northwest of the Cocoli site. This plot has three census data: 1996, late 1997 to early 1998, and 1999. These census data recorded all freestanding woody plants with stem diameter 1 cm or above in the plots [26]. Different from the above applications, the original abundance information for each species was available.
For each dataset, we first calculated the distances between each pair of communities using the three statistics: UniFrac, WUniFrac and VAWUniFrac. Then the unweighted pair group method with arithmetic averages (UPGMA) clustering [29] was used to cluster the communities. The resulting clusters were then analyzed based on the characteristics of the individuals in each cluster. Principal coordinate analysis (PCoA) [30] was also used to project the communities into a twodimensional plane determined by the first two principal coordinates to determine whether communities with similar characteristics tend to cluster together.
Results and Discussion
In order to study our new methods and compare their performance to UniFrac and WUniFrac, we carried out simulation studies according to the simulation methods developed in [21]. We then used UniFrac, WUniFrac, and VAWUniFrac to reanalyze four real datasets, three 16S rRNA sequence collections from different research laboratories, and a tropical forest census dataset.
Results from Simulation Studies
We carried out three classes of simulations for two communities: 1) both were uniform samples from two circles with different centers and radii; 2) both were uniform samples from two ellipses with different orientations; and 3) one community was a uniform sample, while the other was an uneven sample from the same circle.
Simulation 1: communities were uniformly distributed on two circles
Simulated power of four statistics, UniFrac (UniF), WUniFrac (WUniF), VAWUniFrac (T), and SqT in Simulation 1.
Radius of B (Overlap)  offset (Overlap)  

0 (100%)  0.012 (95%)  0.024 (90%)  0.035 (85%)  0.047 (80%)  
0.15 (100%)  UniF: 0.050  UniF: 0.224  UniF: 0.898  UniF: 0.999  UniF: 1.000 
WUniF: 0.049  WUniF: 0.152  WUniF: 0.600  WUniF: 0.945  WUniF: 0.999  
T: 0.057  T: 0.208  T: 0.761  T: 0.994  T: 1.000  
SqT: 0.061  SqT: 0.200  SqT: 0.755  SqT: 0.991  SqT: 1.000  
0.134 (80%)  UniF: 0.820  UniF: 0.918  UniF: 0.997  UniF: 1.000  UniF: 1.000 
WUniF: 0.124  WUniF: 0.339  WUniF: 0.726  WUniF: 0.973  WUniF: 1.000  
T: 0.311  T: 0.578  T: 0.926  T: 0.997  T: 1.000  
SqT: 0.252  SqT: 0.546  SqT: 0.933  SqT: 1.000  SqT: 1.000  
0.116 (60%)  UniF: 1.000  UniF: 1.000  UniF: 1.000  UniF: 1.000  UniF: 1.000 
WUniF: 0.778  WUniF: 0.886  WUniF: 0.974  WUniF: 0.998  WUniF: 1.000  
T: 1.000  T: 1.000  T: 1.000  T: 1.000  T: 1.000  
SqT: 0.999  SqT: 1.000  SqT: 1.000  SqT: 1.000  SqT: 1.000 
Simulation 2: communities were uniformly distributed on two ellipses
Simulated power of four statistics, UniFrac (UniF), WUniFrac (WUniF), VAWUniFrac (T), and SqT in Simulation 2.
Pivot  0°  6°  12°  26°  71° 

Power  UniF: 0.050  UniF: 0.185  UniF: 0.822  UniF: 1.000  UniF: 1.000 
WUniF: 0.047  WUniF: 0.126  WUniF: 0.339  WUniF: 0.981  WUniF: 1.000  
T: 0.050  T: 0.166  T: 0.577  T: 1.000  T: 1.000  
SqT: 0.049  SqT: 0.157  SqT: 0.554  SqT: 1.000  SqT: 1.000 
Simulation 3: one community was uniformly and the other was unevenly distributed on a circle
Simulated power of four statistics, UniFrac (UniF), WUniFrac (WUniF), VAWUniFrac (T), and SqT in Simulation 3.
c  6  4  3 
 1 


Power  UniF: 0.908  UniF: 0.433  UniF: 0.154  UniF: 0.117  UniF: 0.274  UniF: 0.660 
WUniF: 0.802  WUniF: 0.357  WUniF: 0.124  WUniF: 0.150  WUniF: 0.649  WUniF: 1.000  
T: 0.965  T: 0.506  T: 0.160  T: 0.196  T: 0.681  T: 1.000  
SqT: 0.951  SqT: 0.522  SqT: 0.167  SqT: 0.199  SqT: 0.685  SqT: 1.000 
Summary of results from simulation studies
Results of the three simulations reveal that VAWUniFrac always performs better than WUniFrac. In Simulation 1 and Simulation 2, UniFrac has the highest statistical power to detect differences between communities. These observations can be explained as follows. In the first two simulations, both communities are uniformly distributed, and there is no need to weight the branch lengths. The inclusion of weights for the branch lengths in both WUniFrac and VAWUniFrac introduces more noise into the statistics, resulting in lowered power to detect differences between the communities. In Simulation 3, one of the communities is not uniformly distributed, and since the inclusion of weights can adjust for the uneven distribution, the weighted version is more powerful in general. Because the variance adjusted version takes both the abundance difference and its variance into consideration, it has the most power. Since VAWUniFrac has power similar to SqT, we only utilize UniFrac, WUniFrac, and VAWUniFrac in the following analyses of real data.
Application 1: a study of bacterial communities on human skin across time after transplantation
We first studied the variable region 2 (V2) of bacterial 16S rRNA sequence data from Costello et al. [4] to understand the relationship between microbial communities in certain tissues after transplantation from another tissue. We present our results for the analysis of 80 microbial samples from four individuals (F2, F3, M1, M4), over two days, and with two plots by transplanting microbial organisms from the forehead to the left volar forearm at four different time points (0, 2, 4, 8 hours posttransplantation). The samples are listed in additional file 1.
We first assigned each sequence in samples to its closest relative in a phylogeny of the Greengenes core set [31] using BLAST's megablast [32] as in Hamady et al. [33]. Then we used only the phylogeny of the Greengenes core set and removed the leaves that were not involved in the comparison when comparing two samples. The Greengenes core set and the phylogeny were downloaded from the FastUnifrac website [33].
Application 2: comparison of microbial communities in mouse gut
We then applied the three statistics to analyze the 16S rRNA sequences from mouse gut communities [2]. Lozupone et al. [17] applied UniFrac and WUniFrac to this dataset and showed that analyses using the two different versions of UniFrac can lead to completely different conclusions. Therefore, we reanalyzed this dataset using VAWUniFrac. We calculated the three statistics for each pair of the 19 communities and used hierarchical clustering and principal coordinate analysis (PCoA) to analyze the results. For each comparison, we used the same phylogenetic tree, but the leaves that were not in these two communities were removed so that the results of different comparisons were comparable.
Application 3: comparison of sequence collections derived by Sanger and pyrosequencing technologies
In order to see how the statistics perform when they are applied to sequence collections derived by different technologies, but from the same sample, we analyzed the 16S rRNA sequence data from soil and sediments from [25]. The same methods as in Application 1 were used to build the phylogeny of the sequences. Some short sequences that could not be assigned to any sequences in the Greengenes core set, most of which were less than 200 bp in length, were ignored in our analysis.
In fact, each pair of sequence libraries from the same sample using Sanger sequencing and pyrosequencing is very different because pyrosequencing detected a greater variety of lowabundance taxa compared to Sanger sequencing [25]. UniFrac emphasized those lowabundance taxa to a greater degree than the two weighted statistics. Consequently, UniFrac clustered the data from two techniques separately. However, the results from such an analysis can be misleading as pyrosequencing usually generates a very large number of sequences and tends to be more prone to error, while, on the other hand, the number of sequences from Sanger sequencing is usually relatively small, but tends to be more accurate. Sometimes, we hope samples from the same community cluster together irrespective of which sequencing technologies are used. Like when comparing communities based on sequence data from different studies, methods that are not highly sensitive to sequencing depth or sequencing technology are preferred. The weighted methods, such as WUniFrac and VAWUniFrac, are preferred in such cases.
Application 4: analysis of compositions of tropical forests in central Panama
Although UniFrac and WUniFrac were originally proposed to measure differences between microbial communities, they could also be applied to other communities, as long as the phylogeny of the individuals is available. Therefore, as another example, we applied the three statistics to tropical forest census data in three plots across a precipitation gradient in central Panama [26–28]. Different from the above applications, the original abundance information for each species present was available.
In order to obtain the phylogeny of tree species in these censuses, we referred to a dated phylogenetic tree of all angiosperm families [34]. It was downloaded from http://svn.phylodiversity.net/tot/megatrees/davies04.bl.new at the Phylomatic [35] website. There were 420 out of a total of 467 detectable species of the censuses included in this phylogeny. We reconstructed the tree by positioning the genera and species at 2/3 and 1/3 the age of the corresponding family, respectively, similar to [36].
Withinsite comparisons had some interesting differences between UniFrac (Figure 9a) and its weighted variations (WUniFrac and VAWUniFrac, Figures 9b and 9c). For example, plot Sherman6 apparently stood apart from other Sherman plots based on the weighted measures. On the other hand, UniFrac separated Sherman1, 2, 3 from Sherman4, 5, 6. This revealed that the Sherman6 plot had a species composition similar to Sherman4, 5, but differed significantly in species abundance. The result is consistent with the fact that the Sherman6 plot was in a very young forest, probably cleared within the past 20 years [37]. The clustering results of 9 BCI plots also showed the superiority of the weighted measures. In Figures 9b and 9c, the clustering of the 9 small BCI plots is consistent with the geographic distributions of the plots(Figure 7). On the other hand, UniFrac separated plot BCI5 from the other BCI plots which seems not explicable. These resules indicate that within site, the differences between communities were mainly from abundances, while between sites, they were mainly from species presence/absence.
We also studied the effects of different tree construction methods for the sequences, e.g. neighborjoining, maximum parsimony, and maximum likelihood, on the clustering results of the communities using the 16S rRNA sequence data in [25] as an example. The results are given in additional file 2. It is shown that the clustering of communities based on VAWUniFrac does depend on the tree construction methods, however, the differences are generally small.
Conclusions
In this paper, we studied UniFrac, a widely used phylogenetic method for comparing compositions of microbial communities, and a weighted variation, WUniFrac, which takes abundance information into account. Both UniFrac and WUniFrac can be written as a weighted sum of all the branch length in the phylogeny tree. However, different weighting methods resulted in differences in performance. For each branch, we showed that the number of sequences of one community that belong to the branch followed a hypergeometric distribution under the null hypothesis that community labels were not correlated with phylogeny. From this perspective, we developed a new variance adjusted weighted UniFrac that takes into account variation of the weights to test if two communities are different. Both simulations and applications on real data showed that VAWUniFrac is more powerful than WUniFrac. From real data analyses, we showed that our method could reveal biological insights not possible with either UniFrac or weighted UniFrac. Furthermore, our results supported the conclusion of Lozupone et al. [17] that the different versions of UniFrac can lead to different conclusions. With the increase of data containing abundance information, we expect that our new statistic will help to obtain new insights into community differences, especially for situations where the species are similar, but the differences in relative abundance are of great interest.
Declarations
Acknowledgements
We thank the Center for Tropical Forest Science of the Smithsonian Tropical Research Institute for providing BCI, Cocoli and Sherman datasets, and Drs. Wenhui Wang and Shuyun Wang for helpful discussion. This research was partially supported by NSFC grants 11071146, 60928007, and 60805010, and the National Basic Research Program of China (973 Program, No. 2007CB814901). QC is supported by Graduate Independent Innovation Foundation of Shandong University (GIIFSDU). FS is partially supported by US NSF DMS1043075. The BCI forest dynamics research project was made possible by National Science Foundation grants to Stephen P. Hubbell: DEB0640386, DEB0425651, DEB0346488, DEB0129874, DEB00753102, DEB9909347, DEB9615226, DEB9615226, DEB9405933, DEB9221033, DEB9100058, DEB8906869, DEB8605042, DEB8206992, DEB7922197, support from the Center for Tropical Forest Science, the Smithsonian Tropical Research Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Celera Foundation, and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part the Center for Tropical Forest Science, a global network of largescale demographic tree plots.
Authors’ Affiliations
References
 Pyke CR, Condit R, Aguilar S, Lao S: Floristic composition across a climatic gradient in a neotropical lowland forest. Journal of Vegetation Science 2001, 12: 553–566. 10.2307/3237007View Article
 Ley RE, Backhed F, Turnbaugh P, Lozupone CA, Knight RD, Gordon JI: Obesity alters gut microbial ecology. Proc Natl Acad Sci USA 2005, 102: 11070–11075. 10.1073/pnas.0504978102PubMed CentralView ArticlePubMed
 Mathur J, Bizzoco RW, Ellis DG, Lipson DA, Poole AW, Levine R, Kelley ST: Effects of abiotic factors on phylogenetic diversity of bacterial communities in acidic thermal springs. Appl Environ Microbiol 2007, 73: 2612–2623. 10.1128/AEM.0256706PubMed CentralView ArticlePubMed
 Costello EK, Lauber CL, Hamady M, Fierer N, Gordon JI, Knight R: Bacterial Community Variation in Human Body Habitats Across Space and Time. Science 2009, 326: 1694–1697. 10.1126/science.1177486PubMed CentralView ArticlePubMed
 Grice EA, Kong HH, Renaud G, Young AC, Bouffard GG, Blakesley RW, Wolfsberg TG, Turner ML, Segre JA: A diversity profile of the human skin microbiota. Genome Res 2008, 18: 1043–1050. 10.1101/gr.075549.107PubMed CentralView ArticlePubMed
 Nasidze I, Li J, Quinque D, Tang K, Stoneking M: Global diversity in the human salivary microbiome. Genome Res 2009, 19: 636–643. 10.1101/gr.084616.108PubMed CentralView ArticlePubMed
 Gill S, Pop M, DeBoy R, Eckburg P, Turnbaugh P, Samuel B, Gordon J, Relman D, FraserLiggett C, Nelson K: Metagenomic analysis of the human distal gut microbiome. Science 2006, 312: 1355–1359. 10.1126/science.1124234PubMed CentralView ArticlePubMed
 Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, Furlan M, Desnues C, Haynes M, Li L, McDaniel L, Moran MA, Nelson KE, Nilsson C, Olson R, Paul J, Brito BR, Ruan Y, Swan BK, Stevens R, Valentine DL, Thurber RV, Wegley L, White BA, Rohwer F: Functional metagenomic profiling of nine biomes. Nature 2008, 452: 629–632. 10.1038/nature06810View ArticlePubMed
 Singleton DR, Furlong MA, Rathbun SL, Whitman WB: Quantitative comparisons of 16S rRNA gene sequence libraries from environmental samples. Appl Environ Microbiol 2001, 67: 4374–4376. 10.1128/AEM.67.9.43744376.2001PubMed CentralView ArticlePubMed
 Schloss PD, Larget BR, Handelsman J: Integration of microbial ecology and statistics: a test to compare gene libraries. Appl Environ Microbiol 2004, 70: 5485–5492. 10.1128/AEM.70.9.54855492.2004PubMed CentralView ArticlePubMed
 Anderson M: A new method for nonparametric multivariate analysis of variance. Austral Ecology 2001, 26: 32–46.
 Excoffier L, Smouse P, Quattro J: Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 1992, 131(2):479.PubMed CentralPubMed
 Stewart CNJ, Excoffier L: Assessing population genetic structure and variability with RAPD data: application to Vaccinium macrocarpon (American cranberry). Journal of Evolutionary Biology 1996, 9(2):153–171. 10.1046/j.14209101.1996.9020153.xView Article
 Martin AP: Phylogenetic approaches for describing and comparing the diversity of microbial communities. Appl Environ Microbiol 2002, 68: 3673–3682. 10.1128/AEM.68.8.36733682.2002PubMed CentralView ArticlePubMed
 Schloss P, Handelsman J: Introducing TreeClimber, a test to compare microbial community structure. Appl Environ Microbiol 2006, 72: 2379–2384. 10.1128/AEM.72.4.23792384.2006PubMed CentralView ArticlePubMed
 Lozupone C, Knight R: UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 2005, 71: 8228–8235. 10.1128/AEM.71.12.82288235.2005PubMed CentralView ArticlePubMed
 Lozupone CA, Hamady M, Kelley ST, Knight R: Quantitative and qualitative diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol 2007, 73: 1576–1585. 10.1128/AEM.0199606PubMed CentralView ArticlePubMed
 Fitch W: Toward defining the course of evolution: minimum change for a specific tree topology. Systematic zoology 1971, 20(4):406–416. 10.2307/2412116View Article
 Ley R, Hamady M, Lozupone C, Turnbaugh P, Ramey R, Bircher J, Schlegel M, Tucker T, Schrenzel M, Knight R, Gordon JI: Evolution of mammals and their gut microbes. Science 2008, 320: 1647–1651. 10.1126/science.1155725PubMed CentralView ArticlePubMed
 PHILIP[http://evolution.gs.washington.edu/phylip.html]
 Schloss PD: Evaluating different approaches that test whether microbial communities have the same structure. ISME J 2008, 2: 265–275. 10.1038/ismej.2008.5View ArticlePubMed
 Excoffier L, Smouse PE, Quattro JM: Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 1992, 131: 479–491.PubMed CentralPubMed
 Stewart CN, Excoffier L: Assessing population genetic structure and variability with RAPD data: application to Vaccinium macrocarpon (American cranberry). J Evol Biol 1996, 9: 153–171. 10.1046/j.14209101.1996.9020153.xView Article
 Ludwig W, Strunk O, Westram R, Richter L, Meier H, Buchner A, Lai T, Steppi S, Jobb G, Förster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, König A, Liss T, Lüßmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer KH: ARB: a software environment for sequence data. Nucleic Acids Res 2004, 32: 1363–1371. 10.1093/nar/gkh293PubMed CentralView ArticlePubMed
 Hollister E, Engledow A, Hammett A, Provin T, Wilkinson H, Gentry T: Shifts in microbial community structure along an ecological gradient of hypersaline soils and sediments. ISME J 2010, 4: 829–838. 10.1038/ismej.2010.3View ArticlePubMed
 Hubbell SP, Condit R, Foster RB: Barro Colorado Forest Census Plot Data.2005. [http://ctfs.arnarb.harvard.edu/webatlas/datasets/bci]
 Condit R: Tropical Forest Census Plots. Berlin, Germany, and Georgetown, Texas: SpringerVerlag and R. G. Landes Company; 1998.View Article
 Hubbell SP, Foster RB, O'Brien ST, Harms KE, Condit R, Wechsler B, Wright SJ, de Lao SL: Light gap disturbances, recruitment limitation, and tree diversity in a neotropical forest. Science 1999, 283: 554–557. 10.1126/science.283.5401.554View ArticlePubMed
 Sneath PHA, Sokal RR: Numerical taxonomy: the principles and practice of numerical classification. San Francisco, CA: W. H. Freeman; 1973.
 Gower JC: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 1966, 53: 325–338.View Article
 DeSantis T, Hugenholtz P, Larsen N, Rojas M, Brodie E, Keller K, Huber T, Dalevi D, Hu P, Andersen G: Greengenes, a chimerachecked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006, 72: 5069–5072. 10.1128/AEM.0300605PubMed CentralView ArticlePubMed
 Altschul S, Gish W, Miller W, Myers E, Lipman D: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMed
 Hamady M, Lozupone C, Knight R: Fast UniFrac: facilitating highthroughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J 2010, 4: 17–27. 10.1038/ismej.2009.97PubMed CentralView ArticlePubMed
 Davies TJ, Barraclough TG, Chase MW, Soltis PS, Soltis DE, Savolainen V: Darwin's abominable mystery: insights from a supertree of the angiosperms. Proc Natl Acad Sci USA 2004, 101: 1904–1909. 10.1073/pnas.0308127100PubMed CentralView ArticlePubMed
 Webb CO, Donoghue MJ: Phylomatic: tree assembly for applied phylogenetics. Molecular Ecology Notes 2005, 5: 181–183. 10.1111/j.14718286.2004.00829.xView Article
 Hardy OJ: Testing the spatial phylogenetic structure of local communities: statistical performances of different null models and test statistics on a locally neutral community. Journal of ecology 2008, 96: 914–926. 10.1111/j.13652745.2008.01421.xView Article
 Condit R, Aguilar S, Hernandez A, Perez R, Lao S, Angehr G, Hubbell SP, Foster RB: Tropical forest dynamics across a rainfall gradient and the impact of an El Niño dry season. Journal of Tropical Ecology 2004, 20: 51–72. 10.1017/S0266467403001081View Article
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.