SNP HiTLink: a high-throughput linkage analysis system employing dense SNP data

Background During this recent decade, microarray-based single nucleotide polymorphism (SNP) data are becoming more widely used as markers for linkage analysis in the identification of loci for disease-associated genes. Although microarray-based SNP analyses have markedly reduced genotyping time and cost compared with microsatellite-based analyses, applying these enormous data to linkage analysis programs is a time-consuming step, thus, necessitating a high-throughput platform. Results We have developed SNP HiTLink (SNP High Throughput Linkage analysis system). In this system, SNP chip data of the Affymetrix Mapping 100 k/500 k array set and Genome-Wide Human SNP array 5.0/6.0 can be directly imported and passed to parametric or model-free linkage analysis programs; MLINK, Superlink, Merlin and Allegro. Various marker-selecting functions are implemented to avoid the effect of typing-error data, markers in linkage equilibrium or to select informative data. Conclusion The results using the 100 k SNP dataset were comparable or even superior to those obtained from analyses using microsatellite markers in terms of LOD scores obtained. General personal computers are sufficient to execute the process, as runtime for whole-genome analysis was less than a few hours. This system can be widely applied to linkage analysis using microarray-based SNP data and with which one can expect high-throughput and reliable linkage analysis.


Annotation files
Annotation files containing all information of SNPs such as general IDs and chromosomal positions can be obtained from the Affymetrix web page (https://www.affymetrix.com/support/ technical/annotationfilesmain.affx). Choose annotation files corresponding to the SNP chip the users select, and download them. The Annotation File Manager on the Main Menu registers each of the annotation files to recognize them in linkage analysis. When using a mapping 100K or 500K array, register both Mapping50K_Hind and Mapping50K_Xba or Mapping250K_NSP and Mapping250K_Sty, respectively.

Allele frequency files
From CHP files of control samples, allele frequency can be automatically calculated by the Allele Frequency Data Maker. Click on Allele Frequency Data Maker, and specify the directory where CHP files are located, then choose the array type and enter the title name. Clicking on Make icon will create allele frequency data.

LD data files
LD data files can be downloaded from our web site (https://www.dynacom.co.jp/adachi/linkage/ data/hapmap). These files contain all the data of D' and r 2 of four ethnic populations available from the hapmap database. Download LD data files of interest and save them in an appropriate directory. Users can make LD data files from their own samples by using LD Data Maker in the Main Menu. Click on LD Data Maker and specify the directory where chip files located.

Input files
1. CHP files CHP files are generated by Affymetrix Genotyping Console Tm from firstly created CEL files in genotyping assays. It is preferable that CHP files employed in the same linkage analysis are saved in the same directory. Names of CHP files should be started with an identical name (e.g., aaa_SNP6.chp/bbb_SNP6.chp etc.). Do not start with common letters such as the version of the SNP chip or date of assay (e.g., SNP6_aaa.chp/SNP6_bbb.chp).

Interval settings (Merlin and Allegro)
Choose methods for setting inter-marker distances.

Min-max method
The minimum interval and maximum interval are set, among SNPs in the region defined by these intervals, one with the highest MAF is selected.

Min MAF & interval method
The minimum interval and minimum MAF are set. After filtering SNPs defined as having higher MAFs, one SNP longer than the minimum interval from the former SNP is selected.

Use LD settings
LD data files are available from our download site, or users can make from their own samples.
Users set parameters of LD by setting D' and r 2 . By referring to the LD Data file specified by users, the program constructs 'LD blocks' where neighboring SNPs with D' or r 2 higher than defined are included in the same blocks. When a marker A is in the same LD block as the former markers, the program skips marker A and goes to the next marker. Click on 'Test' to see information on the LD block defined by D' and r 2 .

Transfer of lkin files to unix OS
Transfer lkin files to unix OS by binary mode.

Running linkage program
A perl program, run_linkage.pl available from the Dynacom Website should be transferred to unix and decompressed. To decompress, type the following. When users check on 'Do haplotype' option in Allegro, haplo.out, founder.out, ihaplo.out, inher.out will be produced in each of the chromosome directories. The haplotype viewer included in the SNP HiTLink package visualizes those files in the table format that can be easily copied to an Excel sheet to be analyzed.
When users analyze a family with parental data, errors due to inconsistency between parents and children may occur. Multipoint analysis by Allegro is usually interrupted by inconsistent genotypes. SNP HiTLink has the functionality to detect and skip those SNPs in multipoint analysis by referring to 'unknown' (mlink) or 'superlink' (superlink) errors described in output_xx.txt files produced by pairwise analysis. Pair-wised analysis should precede multipoint analysis by Allegro to effectively skip inconsistent markers. Run_linkage.pl of lkin files for multipoint analysis should be run in the same directory where output_xx.txt files are saved.