Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
© Arrigo et al; licensee BioMed Central Ltd. 2009
Received: 18 August 2008
Accepted: 26 January 2009
Published: 26 January 2009
Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses.
Using a new scoring algorithm, RawGeno, we show that scoring errors – in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) – induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (Ibin) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets.
Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at http://sourceforge.net/projects/rawgeno.
Defining amplicon size categories (i.e. called "bins") that ideally represent AFLP loci.
Recording the presence/absence of an amplicon within each bin and for each sample . As a result, an AFLP locus will be coded as a binary state, where the presence of an amplicon is coded with 1 ("present" allele) while the absence of the amplicon is coded as 0 ("null" allele).
Size homoplasy , which can arise in two ways: a) when two non-identical amplicons are considered to be homologous because they display identical mobility or b) when two amplicons are scored as absent at the same locus for differing reasons (e.g. amplicon size polymorphism or mutation in the restriction site).
Occurrence of bin definition errors resulting from a non-optimal scoring parameterization. This bias can lead to two contrasting errors: either a) "oversplitting", in which bins are too thin and may split variant locations of the same amplicon into smaller and erroneous sub-bins, or b) including an exaggerated range of amplicon sizes within the same bin, thus introducing an artificial similarity between unrelated samples. Although they differ in their causes, we assume that this second bias has comparable consequences for the dataset quality as those caused by size homoplasy. We will therefore term this bias "technical homoplasy" in order to distinguish it from size homoplasy.
Difficulties in detecting amplicons due to variable quality of AFLP reactions. Low quality runs can lead to the introduction of a noisy signal (i.e. "false-negatives" or "false-positives") within the dataset. This bias can be limited by using optimized and standardized laboratory AFLP protocols  and by running blank samples in order to determine the background noise associated with the genotyping machine.
In addition, recent studies propose the evaluation of the quality of both bins and alleles in order to increase the final dataset quality. Several of these procedures are applied once the scoring is complete . However, it is also possible to proceed before the scoring phase, while analysing the AFLP profiles, for instance by tuning the peak detection parameters .
The scope of the present study covers methodology of automated AFLP scoring in the framework of population genetics and phylogeography. We propose a new automated solution for scoring AFLP electropherograms: RawGeno, a program implemented as a package in the widely used R CRAN freeware. We investigate the effects of sub-optimal settings on our algorithm, by focusing on two upstream processes of the AFLP electropherograms analysis: the bin definition and the recording of alleles. For this purpose, we tuned scoring parameters and produced five datasets differing in the average width of bins. This strategy, applied to the model-species Cerastium uniflorum (Caryophyllaceae), produced five datasets with increasing technical homoplasy. In parallel, we produced five analogous datasets with the commercial software GeneMapper. Finally, we scored the dataset manually by using the freeware Genographer. Using these eleven datasets, we first evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP data. We then investigate the effects of the AFLP scoring settings, as well as the choice of the scoring method, on downstream analyses such as data mining statistics (ordination techniques), inter-individual and inter-population distance, Maximum Likelihood clustering (using PSMix , an algorithm aimed to investigate patterns of genetic structure) and population diversity indices.
Technical features of RawGeno
The analysis begins by detecting and calculating the size of peaks within the AFLP profiles. This preliminary analysis is conducted either with GeneScan V3.1.2 (ABI) or with the freeware PeakScanner V 1.0 (ABI, http://www.appliedbiosystems.com/peakscanner [verified on December 30, 2008]) to produce an exhaustive list of detected amplicons generated by the AFLP reaction. This list records the size, fluorescence and sample origin of each amplicon which are used as the input data for RawGeno. It should be noted that our program can potentially be modified to handle results from other genotyping machines.
Empirical data and experimental design
In order to investigate the effects of the scoring procedure on genetic analyses, a dataset from an extensive study on intra- and inter-specific plant biodiversity (IntraBiodiv Consortium ) was chosen as a model. This dataset includes samples covering the whole geographic range of Cerastium uniflorum (Caryophyllaceae), a perennial, diploid (2n = 36) plant distributed throughout the European Alps in subnival habitats. A total of 209 individuals (including 40 individuals that were replicated in the DNA extraction step) from 46 populations (four individuals per population on average) were analysed with three selective AFLP primer pairs. Details on the sampling scheme, the primer pairs used and the scoring methods are provided in Gugerli et al.  and Bonin et al. . Raw data were obtained after running AFLP reactions on an ABI 3100 sequencing machine (ABI) and analysing electropherograms using GeneScan V3.1.2 (ABI). The program was set up with default detection parameters; only peaks ranging between 50 and 500 bp with a minimum fluorescence intensity of 50 rfu (i.e. the minimum reportable peak height according to Gilder et al. ) were included in the analysis. In the context of the Intrabiodiv Consortium, the scoring was performed manually using Genographer V1.6 , through a classical user-defined protocol , in which non-reproducible bins were discarded before being recorded (as proposed in Bonin et al. ). As a consequence, this manually scored dataset did not include information regarding rates of reproducibility of alleles and individuals. The resulting scored matrix (referred to as "manual" below) was coded as binary states with lines and columns recording presence or absence of amplicons in samples and bins respectively. Finally, ten datasets were produced by tuning the scoring parameters of RawGeno and GeneMapper. We tuned the "minimum bin width" parameter of RawGeno and the "bin width" parameter of GeneMapper in order to force the algorithms to assign amplicons displaying close but differing sizes to the same bin. As a consequence, the technical homoplasy of the datasets increased proportionally while increasing the bin width.
Automated scoring procedures
The parameters of RawGeno were determined as follows (refer to Figure 1. for a scheme of the scoring algorithm and its parameters): the "maximum bin width" was left unconstrained and set as being the maximum observed amplicon size in the dataset (default value from the original algorithm). As a consequence, RawGeno avoided systematically the oversplitting bias (see above). In contrast, the "minimum bin width" was constrained during the analysis and set to 0.2 bp, 1 bp, 2 bp, 5 bp and 10 bp. This strategy produced datasets with an increasing amount of technical homoplasy since RawGeno was forced to use wider bins. It bears repeating that this setting strategy was used to intentionally increase the technical homoplasy in the produced datasets and not to produce optimally scored datasets (refer to the Additional file 1 for recommendations to conduct a proper scoring with RawGeno). The parameters in GeneMapper were determined using the standard detection settings and a polynomial degree of three was used for peak recognition in the electropherograms. The scoring step was achieved by tuning the "bin width" parameter with values ranging from 0.2 bp to 10 bp (identically to RawGeno settings). The precise effect of the "bin width" parameter on the GeneMapper scoring algorithm could however not be predicted a priori and was deduced from the results we obtained. Replicated samples (i.e. ~20% of the sampling) were included in all ten datasets, which allowed the calculation of error rates (see below) and subsequently, the removal of non-reproducible bins (as done during the manual scoring) in the final dataset. Monomorphic bins and singletons were also removed. The whole set of downstream analyses (see below) were carried out on these cleaned datasets. The ten resulting matrices were coded as binary states in the same way as for the manual dataset (see above). For RawGeno, datasets were labelled as follows: RG_0.2, RG_1, RG_2, RG_5 and RG_10. GeneMapper datasets were labelled as follows: GM_0.2, GM_1, GM_2, GM_5 and GM_10.
Several indices were computed. First, the final number of bins (nbin) was recorded for each dataset. Second, the mean homoplasy rate was computed within the RawGeno datasets. The homoplasy rate (HR) was defined as the number of amplicons belonging to the same individual that are assigned within the same bin. This statistic was calculated for each sample/bin and averaged for the whole dataset. The frequency of the "present" allele was computed for each bin and frequencies were plotted against the bin sizes (in bp). Finally, the level of correlation was calculated between each of the ten automatically scored datasets and the manually scored one (i.e. performing Pearson's correlations; hereafter referred to as "R2 Manual"). This was achieved by: I. Calculating Jaccard similarity indices  between samples, within each dataset. This calculation is defined as being asymmetric as it only accounts for presences in individual genotypes while absences are not considered. II. Calculating Pearson's correlation between the resulting similarity matrices, using the similarity matrix obtained with the manually scored dataset as reference.
The datasets produced from the automated analysis were compared to the manually scored one by using a partial constrained correspondence analysis . We used the "vegan" package  implemented in the R CRAN environment and applied the "cca" function (using a "scaling 1" procedure, in order to optimally represent the samples coordinates). This analysis was used to produce residuals containing information that is specific to the automatically scored dataset, according to the following model: Va - Vm = Ra, where Va is the variance of the automatically scored dataset, Vm is the variance of the manually scored dataset and Ra is referenced to as the residuals that are specific to the automatically scored dataset. We further measured the ability of these residuals to discriminate populations, in order to assess whether the automatically scored datasets effectively contained more information than the manually scored one. This calculation was achieved by applying a Mantel test between the residuals matrix and a contrast matrix comprising the population origin of each sample. This test required the computation of Euclidean distance on the contrast and residual matrices, for which we used the "mantel" function of the "vegan" R CRAN package (1000 permutations).
Error rates and optimality criterion
The mismatch error rate  defined as Eb = Mrepl/nbin, where Mrepl is the total number of mismatches between a sample and its replicate and nbin is the total number of bin. This statistic was computed for each sample-replicate pair.
The Bayesian error rates ε1.0 and ε0.1 that represent, respectively, the probability of mis-scoring the presence or the absence of AFLP fragments. We used MasterBayes (an R CRAN package ) and the AFLPScore R CRAN script collection  to compute 1000 estimates of these statistics.
These two estimators required the inclusion of replicated samples and did not allow addressing the quality of datasets where non reproducible bins had been removed (e.g. in the manually scored dataset). As a consequence, we propose a new "optimality criterion" based on the information content per bin (Ibin). This statistic was calculated for each sample of the dataset and was defined as Ibin = Msampling/nbin where Msampling is the average number of mismatches between the considered sample and the other samples of the dataset and nbin is the total number of bins in the dataset. This criterion can be computed at any stage of the scoring process and does not require the inclusion of replicated samples. Here, we applied it after the removal of non-reproducible bins.
We investigated the spatial genetic structure in our datasets at two levels of complexity: first between individuals by computing the Jaccard similarity index , second between populations by using an estimator of the FST and by performing Maximum Likelihood clustering (assuming Hardy-Weinberg equilibrium). FSTs were computed with the program AFLP-Surv  (allelic frequencies were estimated with a Bayesian method using a non-uniform prior distribution and assuming Hardy-Weinberg equilibrium). Jaccard similarity indices (see above) and FST values obtained with the various datasets were compared using scatterplots and linear regressions. Maximum Likelihood clustering was performed using PSMix , a package implemented under the R CRAN environment. It uses Maximum Likelihood methods in order to assign individuals into a predefined number of groups with an associated probability. The algorithm assumes Hardy-Weinberg equilibrium within groups and linkage equilibrium between loci. The datasets were coded as follows: each individual genotype was duplicated, in order to simulate a diploid co-dominant dataset. The absences (0) of the original genotype were coded as absences in the duplicated genotype (i.e., "0" is coded as "0-0") whereas presences (1) occurring in the original genotype were coded as missing data in the duplicated genome (i.e., "1" is coded as "1-?"). This coding scheme is adapted from Bonin et al. . The default settings of PSMix were applied (except for itMax [i.e. the maximum number of iterations] that was set to 100000). The number of investigated groups (K) ranged from two to nine, with ten replicated runs per K. For each K value, only the run showing the highest likelihood value was selected for further analysis. The resulting assignment probabilities were compared using pie-charts mapped on geographical maps.
Three indices, revealing the level of diversity in each population, were computed: I. the estimated Heterozygosity (Hj), by using AFLP-Surv  (we used the same parameters as above). II. The percentage of polymorphic loci (PLP), using AFLPdat, an R CRAN script collection . III. The presence/absence rarity index (i.e. "rarity 2" according to the IntraBioDiv Consortium's  explanations at http://www.intrabiodiv.eu/IMG/doc/Diversity_Uniqueness_EN_v03.doc [verified on December 30, 2008]). The values estimated with the various datasets were compared using scatterplots and linear regressions.
Results and discussion
Applying the scoring
Using manual scoring as a reference point
We first investigated the quality of datasets by referring to the manually scored dataset. Technical homoplasy, generated by forcing the scoring algorithm to create large bins (i.e. RG_5, RG_10, GM_2, GM_5 and GM_10), clearly impacted the dataset quality, as shown by a decreasing correlation to the manual dataset with increasing bin width (see Figure 2B). This result matched our expectations and outlined the effects of technical homoplasy. Interestingly, we observed an optimal bin width effect in the GeneMapper datasets (Figure 2B), where datasets scored with small bin widths showed less correlation to the manually scored one than the datasets scored with a greater bin width (i.e. GM_0.2 and GM_1 showed correlations of 0.558 and 0.713, respectively, with the manually scored dataset). Conversely, in RawGeno this phenomenon was avoided since correlation values remained relatively constant between datasets that were scored with small to medium bin widths (i.e. RG_0.2, RG_1 and RG_2 showed correlations of 0.771, 0.774 and 0.779, respectively, with the manually scored dataset). Divergences between RawGeno and GeneMapper can probably be explained by differences in algorithm settings. While RawGeno allowed the user to set both the "minimum" and the "maximum bin width", we suspect that the "bin width" parameter of GeneMapper might have effects similar to those of the "maximum bin width" parameter of RawGeno. Decreasing the "maximum bin width" parameter can exaggerate the splitting of bins and, as a result, appropriately sized bins might be split into smaller and erroneous sub-bins. This bias is expected to produce datasets with a high number of low-quality bins ("oversplitting of bins"). Such a hypothesis is in accordance with the very high number of bins produced in GM_0.2 and GM_1 for instance. Holland et al.  showed that, by using GeneMapper, choosing a bin width below 0.4 bp was misleading since it resulted in oversplitting. We could however not obtain an absolute confirmation of this hypothesis since GeneMapper algorithms are strongly black-boxed.
Divergences related to varying bin width parameters were also detected by the partial constrained correspondence analysis (Figure 2C). This procedure inspected the residuals of the automatically scored datasets (after having removed the variation explained by the manually scored dataset). It appeared that the residuals contained relevant information in several datasets (e.g. the residuals of RG_2, GM_0.2, GM_1 and GM_2 significantly discriminated populations). This was particularly true for GeneMapper datasets where the scoring algorithm outlined several additional biogeographic patterns (see Additional file 2). These patterns, however, were seldom interpretable since they segregated single populations from the rest of the samples and were identified neither by the manual scoring nor by RawGeno (see below). We could however consider that the very high number of bins associated with these GM datasets (GM_0.2, GM_1 and GM_2) might have included such additional private alleles. Finally, both oversplitting and technical homoplasy decreased the information content of residuals. This result was reinforced by an optimum bin width effect where only RG_2, from the RawGeno datasets and GM_1 and GM_2 from the GeneMapper datasets showed the highest residual information content, while technically biased datasets presented lower values.
Error rates and optimality criteria
Scoring a dataset manually is a time consuming process that is complicated when investigating a large number of samples. Using an automated system can significantly increase the reproducibility of the dataset as all the electropherograms are scored uniformly and as genotyping errors are limited to technical factors (e.g. PCR or migration variations). Our study showed that, with high quality AFLP and GeneScan raw data, automated procedures can be particularly efficient in producing ready-to-use datasets, at least in the context of population genetic or phylogeographic studies (i.e., RawGeno was not tested in a genomics framework [e.g., for gene mapping] and further investigations are needed before validating its extension to this field). However, the automated scoring of AFLPs is a multiple-step process and a trade-off based on several quality criteria may be desirable since it might provide more relevant information than a single statistic. Optimizing the parameters used in the scoring algorithm therefore represents one of the most important steps of the whole analysis. Using RawGeno, we were able to evaluate the impact of technical homoplasy and bin oversplitting on optimality criteria and genetic structure patterns by intentionally biasing our starting datasets. Interestingly, our results demonstrated that a high number of redundant and informative bins might overcome technical homoplasy due to scoring errors, at least when investigating biogeographic structures. While allowing for some plasticity during the optimization of the scoring procedure, this result also reinforces the use of the AFLP methodology for its ability to produce highly informative datasets. By contrast, the estimation of genetic diversity may be considered with caution since scoring biases are likely to reinforce problems caused by size homoplasy.
Finally, RawGeno provided results at least as accurate as those obtained by scoring the dataset manually (even when considering bin widths as wide as 2 bp, representing an error range much higher than the technical error rate of the genotyping machines) or by using a commercial software such as GeneMapper. To our knowledge, RawGeno is the only freely available program proposing a fully automated scoring solution, from electropherogram files to user-defined working binary matrices. Benefiting from the open source R platform, RawGeno can be potentially enhanced and used by any user.
The authors would like to thank the IntraBiodiv Consortium for providing raw AFLP datasets and particularly Conny Thiel-Egenter for producing the GeneMapper outputs. They also warmly thank Sarah Kenyon, Russell Naisbit and three anonymous referees for providing valuable comments on earlier versions of the manuscript. RawGeno includes software developed by SAIC for the National Cancer Institute and by Tommy Gerdes for the Rigshospitalet. Roberto Guadagnuolo, François Felber, Celia Bueno, Sven Buerki, Michael Stipe, Thom Yorke and Anouk Beguin provided helpful ideas and discussions; Fabien Fivaz and Anthonny Lehmann helped to code the Graphical User Interface. This study was funded by the National Centre of Competence in Research (NCCR) Plant Survival and by the Swiss National Science Foundation (project No. 3100A0-116778/1).
- Avise JC: Molecular markers, natural history, and evolution. Sunderland: Sinauer; 2004.Google Scholar
- Karp A, Seberg O, Buiatti M: Molecular techniques in the assessment of botanical diversity. Ann Bot 1996, 78: 143–149. 10.1006/anbo.1996.0106View ArticleGoogle Scholar
- Botstein D, White RL, Skolnick M, Davis RW: Construction of a genetic-linkage map in man using Restriction Fragment Length Polymorphisms. Am J Hum Genet 1980, 32: 314–331.PubMed CentralPubMedGoogle Scholar
- Williams JGK, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV: DNA polymorphisms amplified by arbitrary primers are useful as genetic-markers. Nucleic Acids Res 1990, 18: 6531–6535. 10.1093/nar/18.22.6531PubMed CentralView ArticlePubMedGoogle Scholar
- Tautz D: Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res 1989, 17: 6463–6471. 10.1093/nar/17.16.6463PubMed CentralView ArticlePubMedGoogle Scholar
- Vos P, Hogers R, Bleeker M, Reijans M, Vandelee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, Zabeau M: AFLP – A new technique for DNA-fingerprinting. Nucleic Acids Res 1995, 23: 4407–4414. 10.1093/nar/23.21.4407PubMed CentralView ArticlePubMedGoogle Scholar
- Vos P, Kuiper M: AFLP analysis. In DNA markers: Protocols, Applications and Overviews. Edited by: Caetano-Anolles G, Gresshoff PM. New-York: J. Wiley & Sons; 1997:115–131.Google Scholar
- Pompanon F, Bonin A, Bellemain E, Taberlet P: Genotyping errors: causes, consequences and solutions. Nat Rev Genet 2005, 6: 847–859. 10.1038/nrg1707View ArticlePubMedGoogle Scholar
- Meudt HM, Clarke AC: Almost forgotten or latest practice? AFLP applications, analyses and advances. Trends Plant Sci 2007, 12: 106–117. 10.1016/j.tplants.2007.02.001View ArticlePubMedGoogle Scholar
- Benham J, Jeung JU, Jasieniuk M, Kanazin V, Blake T: Genographer: a graphical tool for automated AFLP and microsatellite analysis. J Agric Genomics 1999, 4: 3.Google Scholar
- Bonin A, Bellemain E, Bronken Edeisen P, Pompanon F, Brochmann C, Taberlet P: How to track and assess genotyping errors in population genetics studies. Mol Ecol 2004, 13: 3261–3273. 10.1111/j.1365-294X.2004.02346.xView ArticlePubMedGoogle Scholar
- DeHaan LR, Antonides R, Belina K, Ehlke NJ: Peakmatcher: software for semi-automated fluorescence-based AFLP. Crop Science 2002, 42: 1361–1364.View ArticleGoogle Scholar
- Vekemans X, Beauwens T, Lemaire M, Roldan-Ruiz I: Data from amplified fragment length polymorphism (AFLP) markers show indication of size homoplasy and of a relationship between degree of homoplasy and fragment size. Mol Ecol 2002, 11: 139–151. 10.1046/j.0962-1083.2001.01415.xView ArticlePubMedGoogle Scholar
- Whitlock R, Hipperson H, Mannarelli M, Butlin RK, Burke T: An objective, rapid and reproducible method for scoring AFLP peak-height data that minimizes genotyping error. Mol Ecol Res 2008, 8: 725–735. 10.1111/j.1755-0998.2007.02073.xView ArticleGoogle Scholar
- Holland BR, Clarke AC, Meudt HM: Optimizing automated AFLP scoring parameters to improve phylogenetic resolution. Syst Biol 2008, 57: 347–366. 10.1080/10635150802044037View ArticlePubMedGoogle Scholar
- Wu BL, Liu NJ, Zhao HY: PSMIX: an R package for population structure inference via maximum likelihood method. BMC Bioinformatics 2006, 7: 317–325. 10.1186/1471-2105-7-317PubMed CentralView ArticlePubMedGoogle Scholar
- The caMassClass 1.0. Library (Tuszynski 2003)[http://cran.r-project.org/web/packages/]
- Gugerli F, Englisch T, Niklfeld H, Tribsch A, Mirek Z, Ronikier M, Zimmermann NE, Holderegger R, Taberlet P: Relationships among levels of biodiversity and the relevance of intraspecific diversity in conservation – a project synopsis. Perspect Plant Ecol 2008, 10: 259–281. 10.1016/j.ppees.2008.07.001View ArticleGoogle Scholar
- Bonin A, Ehrich D, Manel S: Statistical analysis of amplified fragment length polymorphism data: a toolbox for molecular ecologists and evolutionists. Mol Ecol 2007, 16: 3737–3758. 10.1111/j.1365-294X.2007.03435.xView ArticlePubMedGoogle Scholar
- Gilder JR, Ford S, Doom TE, Raymer ML, Krane DE: Systematic differences in electropherogram peak heights reported by different versions of the GeneScan®Software. J Forensic Sci 2004, 49: 92–95. 10.1520/JFS2003001View ArticlePubMedGoogle Scholar
- Legendre P, Legendre L: Numerical ecology. Amsterdam: Elsevier; 2000.Google Scholar
- Vegan: community ecology package (Oksanen et al. 2008)[http://cc.oulu.fi/~jarioksa/softhelp/vegan.html]
- Hadfield JD, Richardson DS, Burke T: Towards unbiased parentage assignment: combining genetic, behavioural and spatial data in a Bayesian framework. Mol Ecol 2006, 15: 3715–3730. 10.1111/j.1365-294X.2006.03050.xView ArticlePubMedGoogle Scholar
- AFLP-SURV version 1.0. (Vekemans 2002)[http://www.ulb.ac.be/sciences/ecoevol/aflp-surv.html]
- Ehrich D: AFLPdat: a collection of R functions for convenient handling of AFLP data. Mol Ecol Notes 2006, 6: 603–604. 10.1111/j.1471-8286.2006.01380.xView ArticleGoogle Scholar
- Caballero A, Quesada H, Rolan-Alvarez E: Impact of Amplified Fragment Length Polymorphism size homoplasy on the estimation of population genetic diversity and detection of selective loci. Genetics 2008, 179: 539–544. 10.1534/genetics.107.083246PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.