Skip to main content

Table 1 Genome simulators

From: FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses

Tool

Description

Outputs

ART [9]

Simulation of sequence reads with error models for multiple platforms (454, Solexa, SOLiD).

Single or pair ended sequence reads.

MetaSIM [10]

Simulation of sequence reads for metagenomics, particularly for highly variable data (taxonomically distinct but related organisms).

Single or pair ended sequence reads.

GENOME [14]

Population simulation within a set of alleles using genome level events such as recombination, migration, bottlenecks, and expansions.

Alleles identified as mutated (1) or not (0) across the simulated population.

GWASimulator [12]

Simulation of loci across a population which follows a given LD structure in case–control type studies.

SNVs per individual for input loci.

FreGene [13]

Mutation simulation using a theoretical sequence of a given size with hotspot, conversion, and selection parameters.

Mutation selection across population for a theoretical sequence.

genomeSIMLA [11]

Simulation of disease loci within a family or case–control setting using specific LD patterns for investigations of disease.

Affy identified SNPs selected by disease association.

ALF [15]

Population simulation for a specific gene set using a model for variation at the sequence and individual level.

FASTA protein and DNA sequences for specific genes.

  1. Example simulators used in various types of genome investigations. Many use the Wright-Fisher model of population genetics theory [8] in order to generate populations that vary over time given some set of event frequencies such as LD, hotpots, population bottlenecks (GENOME, genomeSIMLA, FreGene), others provide a set of sequences that could be generated by a given sequencing technology with an error model (ART and MetaSIM). The specific simulator used is based on the type of investigation. In planning new GWAS studies for instance, a simulator that uses LD patterns and can provide predicted genomic regions for disease related mutations would be selected. However, such a simulator would not be of use in the planning of a metagenomic study for an organism which may not yet be fully sequenced, or is highly variable. None of these simulators provides whole genome FASTA as outputs.