Seq4SNPs: new software for retrieval of multiple, accurately annotated DNA sequences, ready formatted for SNP assay design

Table 2 Data sources

Datum	Source	Notes	Step
SNP rs or ss number*	User input	File or text input	1
Trivial name	User input	Same file as above	1
Size of assay sequence	User input	e.g. 200 specifies 200 nucleotides each side of assay SNP (401 altogether)	1
New rs number	NCBI dbSNP cluster page*	New rs retrieved when rs no longer in use or if ss number submitted*	2
Fasta sequence, allele,	ditto	Fasta output with allele in header (major allele first)	2
Major allele, validation of assay, heterozygosity	ditto	'Allele' report.
Fasta sequence (second attempt)	NCBI contig fasta sequence****	If sequence in cluster page too short: contig reference from cluster page*	2
Gene, chromosome	NCBI cluster page*	'Gene' report	2
Masked sequences	RepeatMasker (see text)	Takes fasta output above and produces fasta for next step.	3
Platform	User input	Choose TaqMan, SNPstream or Sequenom	3
Chromosome position, adjacent SNP list, with 21 nucleotide sequence etc.	Mysql local database with dbSNP data	Annotation of assay sequence using Seq4SNP algorithm	4
Validation, heterozygosity	Ditto	Part of Adjacent SNP Report (Fig 3) detailing each SNP and flagging placement mismatches	4
SNP assay sequences		Final output compatible with assay designers	4

Data used by Seq4SNPs is drawn from various sources, listed here: Seq4SNPs inputs (italics), outputs (bold). Some items are taken from web pages accessed by the universal resource locator (URL), or FTP download sites, shown below.
Example URLs:
*dbSNP rs cluster page: http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=123
**New rs number: http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&cmd=search&term=rs840 (if cluster page not available)
***rs number for ss:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&cmd=search&term=ss19333593
****NCBI contig download:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucletide&val=NT_007819.16&dopt=fasta&from=24454804&to=24456004
dbSNP downloads (human): fasta sequences and chromosome positions respectively from
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta
ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/chr_rpts
Note that to extend Seq4SNPs adjacent SNP addition to other species, data from other species may be downloaded from the organisms folder and put into the MySQL database with the human SNPs

ISSN: 1471-2105