Statistical issues in the analysis of Illumina data
© Dunning et al. 2008
Received: 12 October 2007
Accepted: 06 February 2008
Published: 06 February 2008
Skip to main content
© Dunning et al. 2008
Received: 12 October 2007
Accepted: 06 February 2008
Published: 06 February 2008
Illumina bead-based arrays are becoming increasingly popular due to their high degree of replication and reported high data quality. However, little attention has been paid to the pre-processing of Illumina data. In this paper, we present our experience of analysing the raw data from an Illumina spike-in experiment and offer guidelines for those wishing to analyse expression data or develop new methodologies for this technology.
We find that the local background estimated by Illumina is consistently low, and subtracting this background is beneficial for detecting differential expression (DE). Illumina's summary method performs well at removing outliers, producing estimates which are less biased and are less variable than other robust summary methods. However, quality assessment on summarised data may miss spatial artefacts present in the raw data. Also, we find that the background normalisation method used in Illumina's proprietary software (BeadStudio) can cause problems with a standard DE analysis. We demonstrate that variances calculated from the raw data can be used as inverse weights in the DE analysis to improve power. Finally, variability in both expression levels and DE statistics can be attributed to differences in probe composition. These differences are not accounted for by current analysis methods and require further investigation.
Analysing Illumina expression data using BeadStudio is reasonable because of the conservative estimates of summary values produced by the software. Improvements can however be made by not using background normalisation. Access to the raw data allows for a more detailed quality assessment and flexible analyses. In the case of a gene expression study, data can be analysed on an appropriate scale using established tools. Similar improvements can be expected for other Illumina assays.
A BeadArray is an array of randomly positioned, three micron diameter, silica beads. A specific oligonucleotide sequence is assigned to each bead type, which is replicated about 30 times on an array. A series of decoding hybridisations is used to identify every bead . BeadArrays can be used in a wide range of genomic applications including SNP genotyping, methylation profiling, and copy number variation analysis. In this paper we concentrate on data from single-channel gene expression BeadArrays . After hybridisation and washing, each array is scanned by Illumina's scanning software (BeadScan) to produce an image in the Tagged Image File Format (TIFF), along with files in a proprietary file format containing intensity and location information. The latest version of BeadScan can also output a text file giving the identity and position of each bead on the array. The collection of TIFF images and text files is referred to as the bead level data for an experiment and are presented in the same format, regardless of the assay used.
Analysis of BeadArray data is routinely carried out using the BeadStudio application developed by Illumina. This software takes the proprietary files and produces bead summary data, comprising a mean and standard error on the original unlogged scale, for each bead type on each array. When the raw data are analysed using BeadStudio, there is no control over image processing or the method used to combine the replicate observations for a given bead type into a single, summarised, value. Also, the user loses the ability to perform a detailed quality assessment of the data.
Information about how to obtain bead level data has only recently been released and these data cannot be generated retrospectively. Therefore, no publications have taken the processing of bead level data into account apart from our own preliminary investigations . In comparison to Affymetrix, which is an established technology, there is a lack of in-depth literature on the low-level analysis of Illumina data. Publications involving Illumina expression data tend to use Illumina's recommendations for normalisation, with the exception of  who found this method to have a negative impact on data quality and who used quantile normalisation instead. Also, there is no publicly available dataset for which the bead level data may be obtained and for which there is some expectation about the results. Such datasets are available for Affymetrix and have allowed researchers to understand more about the technology and to develop new analysis methods .
This paper presents our guidelines regarding the pre-processing of Illumina data and explores some common issues raised during the analysis of other microarray platforms. After discussing the bead level properties and quality assessment of a spike-in experiment, we investigate the effects of the image processing and summarisation methods used by Illumina on a typical DE analysis.
Image processing is an important consideration for microarrays and involves calculating foreground intensities using the pixels that make up each feature on the array, and estimating a local background intensity using the pixels surrounding each feature. A background correction calculation is then used to correct for non-specific or random contributions to the overall signal. Within BeadScan, the local background measures are subtracted from the bead foreground values to produce the intensities in the bead level text files. This method is the same for all Illumina technologies. Previous studies have shown that background subtraction introduces missing values, increases variability and has a negative impact on the detection of differentially expressed genes [6, 7]. Therefore, it is crucial to investigate the automatic background adjustment used within BeadScan.
In addition to local background subtraction, Illumina also recommends a secondary calibration procedure for expression data, known as background normalisation, which is performed on summarised data. For a given array, this method involves subtracting the mean intensity of the negative controls (bead types with randomly permuted probe sequences attached and no targets in the genome) from each bead summary value. This is intended to compensate for differences between arrays in both non-specific binding of dye and cross-hybridisation. After background normalisation, an additional normalisation using rank invariant genes or cubic splines can be carried out between arrays. We demonstrate the effect of applying this normalisation to the spike-in experiment.
For Affymetrix data, it has been shown that the base composition of the 25-mer probes can have an effect on the observed intensity, and methods have been proposed to deal with this effect . Other probe effects have also been described for two-colour microarrays with longer probes , but have not been reported for Illumina data, which typically uses 50-mer probes. Additionally, the reliability of Affymetrix probe sets has been called into question, with a large percentage of probes on an array sometimes not mapping to the intended transcript. Naturally, this can lead to misleading conclusions in a DE study . We investigate if such effects are apparent in Illumina gene expression data.
The Illumina spike-in experiment investigated in this paper consists of eight customised Mouse-6 version 1 BeadChips hybridised with a complex mouse background. In addition to the ~48,000 bead types included as standard, the bead pool for these chips was modified to include 33 bead types chosen to target 9 different bacterial and viral genes absent from the Mouse genome . These 33 bead types are referred to as spikes in this paper and the remaining bead types on the array are referred to as non-spikes. Each array also includes a number of standard Illumina controls, including 1616 negative controls. Each BeadChip comprises six arrays and each array is made up of two strips on the chip surface. The first strip interrogates targets from the curated MEEBO database, and the second strip mostly contains targets from the RefSeq and RIKEN databases.
The spikes were added at concentrations of 1000, 300, 100, 30, 10 and 3 pM on the six arrays from the first four BeadChips. A further four chips were hybridised with spikes at concentrations of 1, 0.3, 0.1, 0.03, 0.01 and 0 pM. The spikes on a given array were all added at the same concentration. Each concentration was allocated to the same position on all replicate BeadChips. For example, 1000 pM was always array 1 on a chip and 300 pM was array 2 and so on. The spike-in experiment was designed and scanned by Illumina. Raw data from the experiment, which includes the TIFF image and text file for each strip, are available as supplementary materials . Annotation information (including the 50-mer probe sequence attached to each bead type), additional figures and R scripts are also available online .
Access to the bead level data allowed us to identify a significant spatial artefact on one BeadChip in the experiment. An error in the scanning of the chip resulted in the x coordinates of many beads on the left-hand side taking negative values. Consequently, the Illumina algorithm for image processing was unable to calculate foreground and background intensities for these beads and set their foreground intensities to zero. This problem affected between 4.4% and 7.3% of the total number of beads from each strip on this chip, with the percentage of affected beads decreasing from the top to the bottom of the chip. These beads were subsequently removed as outliers by Illumina's summary algorithm. No other significant spatial effects were found on the remaining chips and arrays. We note that after background correcting the bead level data, the median percentage of beads with negative intensities on a strip was 0.32% with a 75th percentile of 0.56%.
After applying the Illumina summary method, the median number of beads per bead type on an array was 36 with 25th and 75th percentiles of 31 and 41 respectively. The median number of beads per bead type per array removed as outliers was 1 with a 95th percentile of 4.
Note that even though the data shown are not normalised, we can see that the replicates of the same array processed using the same method show low variability and only slight differences in the medians. The same trend can be seen for all background correction methods. A saturation effect can to be seen between 300 pM and 1000 pM, as the increase in the concentration of the spikes is not reflected in a change in observed intensity. At 3 pM, there is a clear difference between arrays with no background adjustment and arrays where a local background estimate has been subtracted. The linear relationship between spike concentration and observed intensity persists below 3 pM for the background subtracted data, whereas without background adjustment, an attenuation in signal is evident below 3 pM.
The variances of each method are similar in the range 1000 pM to 100 pM. However, at 10 pM we see a steep decrease in the variability for the non-adjusted data, whilst the background subtracted data shows a slight increase. The rate of decrease in variability for the no background adjustment option is greater than the rate of increase in variability for the subtracted data. For concentrations of 0.1 pM and below, the variance of the spikes does not decrease any further with decreasing concentration.
In Figure 4C we show the same comparison for data which have been background subtracted and background normalised. This is equivalent to processing the raw data using BeadStudio's recommended settings. For visualisation purposes, and to compare with the other methods, we log2 transformed the background normalised data. The difference that this makes to the MA-plot is striking. We see a much increased range of M-values as A decreases. There are clearly a large number of genes which would be selected as differentially expressed if a simple cut-off approach were used, even though few genes are expected to change for this comparison. In addition, the log-ratios of the spikes are systematically over-estimated by this method. There are also a large number of bead types with negative intensities on each array after applying background normalisation, ranging from 11.18% to 49.35% (median 39.06%). These become missing values after a log2 transformation, which is undesirable in downstream analyses.
Five non-spikes are seen to have high log-odds both before and after weights were used. These bead types were ranked amongst the spikes for all contrasts and had a similar expression profile to the spikes. Further investigation revealed these probes are controls from the MEEBO database and not used in current Illumina chips. Generally, the spikes were the top ranked probes for each contrast with very few false discoveries. The choice of background correction method was found to have little impact on the number of false discoveries (see supplementary materials).
When weights are used in the linear model for a particular correction method, we see an increase in the log-odds scores calculated for the spikes. At the same time, we do not see a substantial change for the non-spikes. The most dramatic increase in log-odds by using weights is seen for the non-adjusted data. It is interesting to note that the log-odds of the different methods are more comparable when weights are used. We produced the same plot for all contrasts in the linear model (data not shown). The log-odds typically increased for all contrasts in the middle of the concentration range when weights were used. However, for contrasts comparing 0.3 pM to lower concentrations, we found little improvement, or sometimes a decrease in log-odds. Figure 5B shows the estimated coefficients for the comparison between 3 pM and 1 pM. For this contrast, we would expect the spikes to have a log-ratio of log2(3) or 1.73. For data processed without background adjustment, the highest log-ratio seen for the spikes is not much greater than 1. For the subtract and normexp methods, the log-ratios are centered around the expected value. For other contrasts (data not shown), the log-ratios were often underestimated by all methods, especially at high and low concentrations. Pairwise contrasts 30 pM to 10 pM, 10 pM to 3 pM and 3 pM to 1 pM accurately recovered the predicted log fold-changes. The non-adjusted data consistently produced the most biased values for all contrasts.
After re-annotation of the available probe sequences, bead types were categorised according to where they map in the genome. Some probe sequences were found to match to intronic and intergenic regions. The percentages of bead types on the second strip with intronic and intergenic matches were 27.67% and 9.64%, respectively, compared to 2.84% and 0.32% on the first strip. There were 16,332 and 7,317 unique genes represented on the first and second strips respectively, in addition to the 2,263 genes interrogated on both strips. As expected, re-annotation of the spikes and negative controls produced no matches in the genome.
The "hump" seen in Figure 9 was evident for most contrasts in the linear model. We have observed similar trends in experiments which use Human version 1 BeadChips (data not shown).
The data produced using Illumina technology are widely reported to be of high quality. Naturally, we would still recommend careful quality assessment of Illumina arrays and not to take high data quality for granted. Whilst initial quality assessment using the raw data showed little variation between arrays, we were able to detect a consistent spatial effect on a particular BeadChip. However, we found that in this case, there was no impact on further analysis due to the random placement of beads and robust summary method used by Illumina. Although BeadStudio is capable of giving a good overview of an experiment, it will miss important artefacts on arrays, as spatial information is lost when the data are summarised. We found that the two strips for each array show consistently different intensities with the first strip showing a wider range of expression values. We have noticed this effect for other mouse expression data and also for Human-6 Version 1 chips (data not shown). The second strip contains a large number of bead types with sequences that target rare transcripts, or match to intronic or intergenic regions, which could partly explain the lower signal on this strip. The default options within the BeadStudio software combine the two strips for every array on a whole genome BeadChip. In the Version 2 whole genome BeadChips, the replicates of each bead type are spread between the two strips with potentially 20 replicates on each strip. Clearly, the summary value could be affected by any differences in underlying intensity between the strips, in which case analysing strips separately would be appropriate.
It is interesting to note the consistency of the estimated background for individual beads, which is observed within and between arrays. The background estimation used by Illumina takes an average of the five dimmest pixels within a comparatively large area surrounding each bead. This gives a very low estimate for background that is related to the optical properties of the array surface rather than being specific to the sequence attached to each bead. In contrast, background estimation for two-colour arrays typically uses the mean or median value of pixels surrounding each feature, producing higher, more variable estimates. The approach Illumina uses is more akin to a morphological background estimation, which has been shown to perform well for two-colour arrays [6, 7].
In other data sets, we have used the predictability of the background signal as a simple diagnostic to identify poor quality arrays on which the background level was considerably higher and more variable than usual (data not shown). When analysing this experiment, we found that subtracting this low estimate of the background was beneficial for detecting differentially expressed genes. At low spike concentrations (around 1 pM), the observed values for the spikes are close to the negative controls. Therefore, when comparing arrays with low spike concentrations that have not been background subtracted, the calculated log-ratios will be biased towards zero as the difference in spike concentration is obscured by the background noise. Background subtraction reduces this bias, although, as anticipated, we see an increase in variability after a log2 transformation of the subtracted data. The results of the simplest method of subtracting the background estimates are comparable to those of the model-based approach of normexp. This is due to the low percentage (less than 1%) of negative intensities produced using the subtract method, hence methods that avoid these negative values have little scope for improvement. This is encouraging for users without access to raw data who perform pre-processing using Illumina's default settings.
Illumina's default summary method was able to handle around 20% of outlier beads before the estimates became noticably biased in our simulations. This provides a rough guideline on how much of an array can be corrupted before the analyst needs to worry about biases creeping into the estimates and inflating the variances. In addition, Illumina's method is better at accommodating asymmetric outliers than regular trimmed means. This is desirable, as these artefacts arise frequently in data sets we have analysed.
In this study we did not conduct a thorough investigation into normalisation methods. Given the low variability of replicate observations, it is important that the data are not "over-normalised", thus removing potentially interesting biological information. An important conclusion from the spike-in experiment is that the background normalisation recommended by Illumina is not appropriate for some DE analyses. This method is seen to introduce substantial variability into the data, particularly at low intensities, and also to increase the numbers of false positives. Another consequence of this normalisation is that low expression values become negative and cannot be log2 transformed. In the spike-in experiment, we found that around 40% of the data were missing on average per array. Illumina keep the bead summary data on the unlogged scale and their model for differential expression takes the relationship between the mean and variance of each bead type into account . DE analyses performed outside of BeadStudio, such as limma eBayes , SAM , or other methods, usually require data that have been subjected to a log2, or similar, transformation to ensure the gene-wise variances are comparable. Therefore we recommend that only non-normalised data are exported from BeadStudio if they are to be analysed using established statistical methods. Otherwise, a small offset could be added to the intensities to ensure positivity of the background normalised data.
We find that using the bead type variances as inverse weights increases the evidence for DE for the spikes for each background correction method in most contrasts. At the same time, the log-odds scores for the non-spikes are not affected; this represents a gain in statistical power. The weights also produce more comparable log-odds between processing methods. Less precise observations arising from arrays with quality issues, or intensity-dependent trends in variablity introduced by the chosen pre-processing option, are down-weighted in the analysis. At very low concentrations (less than 1 pM), this improvement was reversed, with DE statistics decreasing for the spikes. Although this would seem undesirable, it indicates that after considering the underlying variability of the observations, it is difficult to distinguish between very small changes in concentration, which is a limitation of any microarray technology. Having access to the bead level data allows bead type variances to be calculated on the appropriate scale so that they may be used in the linear model.
Probe annotation should be given careful consideration during the analysis. The second strip should assist in the investigation of rare transcripts, but it also contains many probes that will not produce any meaningful signal in many gene expression studies. One way of exporting data from BeadStudio is to average the probes for the same gene. We would discourage combining the signal from two probes, as there may be differences in the reliability of the probes, particularly if one maps to an intron or intergenic region. We found the intensities of probes on an array to be related to base composition. In particular, probes with a higher GC content were seen to have a higher intensity, as were probes with a G or C at the first base. These effects were observed on normalised data from strip 1 and persisted in the between-array comparisons. Inflated differential expression statistics were found for probes with 17 to 22 GCs in their sequence.
A possible probe effect is also suggested by the intensity differences in both spikes and negative controls. Given these observations and previous work for Affymetrix arrays, it would seem that more sophisticated methods than background normalisation are needed to account for sequence-specific hybridisation effects. The correlation of observed intensities with the melting temperature or free energy is seen to decline with the spike concentration, presumably as the signal-to-noise ratio diminishes. Similar correlations were seen in Affymetrix spike-in experiments. While it is clear that some of the behaviour of the negative controls can be explained by their thermodynamic properties, 2 – 3% of the variation is explained at best. Since the intensities of the negative controls are reproducible across strips and arrays, it seems implausible that the remainder could consist of random noise. Thus, we conclude that other sources of variation are to be identified.
In this paper, we describe the advantages of analysing a gene expression experiment using bead level data. We also anticipate that the analysis of other Illumina assays (e.g. GoldenGate, Infinium, DASL) can benefit from using bead level data. For instance, recent genotyping methods for Affymetrix technology successfully use the full raw data and therefore having access to the bead level data is likely to be useful in developing similar methods for Illumina. If log-ratios are required for genotyping, the situation is similar to expression data where the values output by BeadStudio are not on the desired scale. With the raw data, it is possible to obtain log ratios for every bead and then calculate an average and variance for each bead type.
The main findings presented in this paper are:
Access to bead level data promotes more detailed quality assessment and more flexible analyses. Bead level data can be summarised on a relevant scale. We were able to use the means and variances of the log2 data in the DE analysis to improve our ability to detect known changes in the spikes.
The background correction and summarisation methods used in BeadStudio reduce bias and produce robust gene expression measurements. However, we find that background normalisation introduces a significant number of negative values and much increased variability
Base composition of probes has an effect on intensity and further investigation is required to remove this effect
The raw data were read using the beadarray package (version 1.6.0), available through the Bioconductor project .
The foreground estimation algorithm used by Illumina is a two-step process described in more detail in . The main steps in Illumina's image analysis are
i) All pixel intensities are altered using a sharpening transformation. The intensity of a particular pixel is made higher/lower if its intensity is higher/lower in comparison to the intensities of the pixels surrounding it.
ii) Foreground intensities are calculated as a weighted average of signals obtained using the four pixels nearest to each bead centre as a "virtual bead centre". Sharpened pixel intensities are used in the calculation. The value returned is unlogged.
Background intensities are estimated using an average of the five dimmest pixels (unsharpened intensities) within the 17 × 17 pixel area around each bead centre.
We also read the TIFF images using the EBImage package . We then created an incidence matrix indicating whether each pixel was within a bead or part of the background. This allowed us to assess the intensity of pixels in the area immediately surrounding each bead. These pixels were ranked in order of increasing intensity to establish the variation in the pixels used to estimate the local background.
The following methods were used to adjust the foreground (y f ) of each bead using the corresponding background estimate (y b ) to obtain an estimate of the true signal due to hybridisation (y).
No adjustment – Use the estimated foreground (y = y f ) in the analysis (assume y b = 0).
Subtract – The estimated background is subtracted from the foreground for each bead (y = y f - y b ). No guarantee is given against negative values appearing.
Normexp – A normal-exponential convolution model is fitted to the background subtracted signal (y = y f - y b ) to adjust the intensities from each strip separately (see  for details). This model has the advantage of returning strictly positive intensities.
The background correction method used in BeadScan is the Subtract method.
Most analyses in this paper used the background adjusted (see above), log2 transformed data from replicate beads on a given array and summarised these values using Illumina's default method. This method removes outliers greater than 3 median absolute deviations (MADs) from the median and calculates a mean, standard error and number of observations for the remaining intensities. To look at how robust Illumina's method is relative to other summarisation methods (mean, trimmed mean removing 10% of highest and lowest intensities or median), we measured the bias for each method from simulated data, where varying numbers of outliers were added (from 0% to 40% in increments of 5%). The true values were assumed to be the means calculated from the original data. Data from a good quality BeadChip from this experiment was replaced at random by intensities at the saturation level (216). Saturation artefacts are fairly common in experiments we have analysed, and often occur at the edges of an array. By varying the number of outliers, we can roughly assess the break-down point of Illumina's summary method. Data for each simulation was summarised on the original and log2 scale. Bias was computed by subtracting the summary values obtained from the simulated BeadChip from the summary values obtained from the original data for each probe on each array. Per array, per probe variances were also calculated within each simulated data set. The bias values and variances were averaged across arrays and probes within each simulation to produce the values plotted in Figure 2.
We use MA-plots to show the variablity between arrays for different methods. For two given arrays (k1 and k2) the summarised intensities for a given bead type, y k1 and y k2, were used to calculate log-ratios M, where M = log2(y k1) - log2(y k2) and average log intensities A, where . The M- and A-values were then plotted on the y- and x-axis respectively. A value of M = 0 indicates no differential expression between arrays, and most probes in the spike-in experiment should be centred around this value.
The log2 summarised data were quantile normalised as in . This approach is reasonable given that the majority of genes do not change between arrays, and hence the intensity distribution between arrays should be the same. In principle, this could affect the intensity of high intensity spikes, although we found the effect to be small. Background normalisation was carried out on the non-normalised, background subtracted data by subtracting the average value of the negative controls on each array output by BeadStudio from the summarised intensities of the non-control probes. M- and A-values were calculated to allow comparison with quantile normalised results.
The limma package  was used to assess DE. Bead level data were created using different background correction methods and then summarised to give an expression matrix Y, where y jk represents the log2 normalised expression of probe j on array k. A linear model E[y j ] = X α j was fitted to each probe, where is the jth row of Y, and α j is a vector of coefficients to be estimated for each probe at the 12 different concentrations. The contrasts of interest are given by β j = C T α j , where C is a contrast matrix created to make all pairwise comparisons between concentrations (e.g. 1000 pM vs 300 pM, 300 pM vs 100 pM etc). After empirical Bayes variance shrinkage, the moderated t-statistics and log-odds scores for each contrast were analysed separately to assess the performance of different background correction methods. A second series of linear models was fitted to take the variability of bead types into account. We now assume that where w jk is a weighting factor for bead type j on array k. Weights , where is the sample variance calculated using the standard error and the number of observations of bead type j on array k, were used. Using inverse variances as weights gives less influence to observations with higher variability in the linear model. The coefficients, α j , were estimated using weighted least squares and contrasts, β j , were calculated as before.
See supplementary materials for the R code used to fit the linear models.
Probe sequences were BLASTed and BLATed against the corresponding mouse genome and transcriptome, which included UCSC Genome Browser , RefSeq, and GenBank transcripts. The subsequent annotation and probe classification were performed with a Perl script, comprising BioPerl modules , and relied on transcriptomic annotation tables downloaded from the UCSC Genome Browser. The resulting annotation table is available in supplementary materials.
We defined A, C, G and T to be matrices of binary values with j = 1,..., ~48,000 rows and p = 1, ..., 50 columns to represent the sequence of each probe, where A jp = 1 if the sequence for probe j contained an "A" at position p, or 0 otherwise.
The total number of As (a j ) in the sequence of the j'th probe is simply . The total number of Cs (c j ) Gs (g j ) and Ts (t j ) were defined in a similar fashion. The GC content for probe j was then defined as g j + c j ,
We then plotted the normalised intensities of the strip 1 probes on a given array in terms of their a j , c j , g j , t j and GC content. Similarly, for a particular contrast, we ranked the same probes according to their log-odds scores and plotted probes with the same GC content together.
The linear model E[y k ] = A α k + C β k + G γ k , was fitted to the intensities and variances of the k'th array to estimate coefficients α, β, γ representing the effect of having an A, C or G at each position, relative to having a T at that position.
The melting temperature, free energy (ΔG), entropy (ΔS) and enthalpy (ΔH) were calculated for each spike and negative control probe using values taken from  with code calibrated against OligoCalc , although we recognise that these values may not be strictly applicable to 50-mer oligos.
We thank Illumina for their permission to publish the data, and in particular Gary Nunn for transferring the data to us and Semyon Kruglyak for advice on Illumina algorithms and supplying probe sequences. We also thank Natalie Thorne, Christina Curtis, Gordon Smyth and Vincent Carey for useful discussions, Mike Smith for assisting with development of the beadarray library and Roman Sasik for supplying C code to read TIFF images into R. We are grateful to the reviewers for their constructive feedback on this manuscript. The authors were supported in part by grants from the MRC (MJD), CRUK grant number C14303/A8646 (NLBM, AGL and ST) and the Isaac Newton Trust (MER).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.