Subjects
Blood samples from thirtynine patients with Acute Myeloid Leukemia (AML) were collected by professor Giovanni Martinelli (Dep. of Hematology and Oncological Sciences "L. and A. Seràgnoli" - University of Bologna). Genomic DNA was extracted from peripheral blood samples with standard methods. A written informed consent was collected from each individual and the study was designed according to the ethical principles for medical research involving human subjects stated by the World Medical Association Declaration of Helsinki. The study was also approved by the Department of Hematology and Oncological Sciences "L. and A. Seràgnoli". The individuals were genotyped with the following two platforms: DMET™ Plus GeneChip array (DMET) and Genome Wide Human SNP 6.0 array (AFFY, Affymetrix Inc, Santa Clara, CA). A brief description follows.
Genotyping
DMET Plus
The DMET™ Plus GeneChip array (Affymetrix Inc, Santa Clara, CA) contains 1931 SNPs and five Copy Number Variants (CNVs) distributed on 225 drug metabolizing enzymes and transporters genes. Amplified and non-amplified DNA samples were combined for the annealing and amplification steps, in which molecular inversion probes (MIP) technology was exploited to genotype all the genomic sites of interest in a single reaction. DNA samples were subsequently purified, fragmented, labeled and hybridized to the array to be scanned with the Gene Chip Scanner 3000 (Affymetrix Inc, Santa Clara, CA).
DMET Console version 1.1 (Affymetrix Inc, Santa Clara, CA) was used to perform genotype calls using standard parameters.
Affymetrix Genome Wide Human SNP 6.0
DNA samples were genotyped using the Genome Wide Human SNP 6.0 array (Affymetrix Inc, Santa Clara, CA), according to the manufacturer's instructions and retrieving genotype information for ~ 906,000 loci. Genomic DNA samples were firstly digested with Nsp I and Sty I restriction enzymes and then adaptor-ligated and PCR amplified using a primer that recognizes the adaptor sequence. PCR products were subsequently purified, fragmented, labeled, denatured and hybridized to oligonucleotide probes attached to the surface of the array, followed by washing and staining procedures, as well as by scanning by means of the Gene Chip Scanner 3000 (Affymetrix Inc, Santa Clara, CA). Genotyping Console 3.0 package was used to perform genotype calls using standard parameters.
SNPs in the study
The study was performed on 1860 of the 1931 markers of DMET because 71 markers were discarded for the following reasons:
-
13 markers have been coded in PharmGKB [4], but have not been yet validated in dbsnp[5] and therefore they do not have an adequate coding to be recognized in the reference panel, if present;
-
5 markers have two different annotated positions;
-
2 markers were duplicated;
-
5 markers presented 3 alleles in the study sample;
-
46 markers mapped on chromosome X (the software IMPUTE 2 handles autosomal markers only).
Quality control
Before proceeding to the analysis, we performed some quality control checks on the data. First, we tested the concordance between the genetic and reported sex to check for errors in labeling the samples. Second, all subjects showing a genotype call rate < 95% would have been removed. Third, SNPs mapping on the regions of interest (i.e. containing the drug metabolism genes, about 6000 SNPs) were removed when showing a Hardy-Weinberg p-value inferior to 0.00001.
Categories of SNPs
The DMET SNPs investigated were grouped into 3 classes according to their presence in the AFFY platform and in the reference panel, as follows:
-
Shared: 205 markers present in both DMET and AFFY arrays (genotyped matching is performed between experimental genotypes).
-
Reference Panel Only (RPO): 654 markers in DMET and in the reference panel but not in AFFY (genotyped matching is performed between DMET experimental and AFFY imputed genotypes).
-
Neither in AFFY nor in reference panel (NAR): 1001 markers in the DMET, but not in the AFFY or in reference panel. Therefore, we did not perform the imputation for this group of SNPs.
Regardless of the SNP classes (Shared, RPO and NAR), markers were subdivided according to their minor allele frequency (MAF). We used the following 7 ranges: 0, 0-0.05, 0.05-0.10, 0.1-0.2, 0.2-0.3, 0.3-0.4 and 0.4-0.5.
Imputation
Imputation is a statistical process used to predict genotypes that are not directly assayed in a sample of individuals. The term often refers to the situation in which a reference panel of individuals genotyped at a dense set of SNPs is used to impute into a study sample of individuals that have been genotyped at a subset of the SNPs [3].
Imputation was performed using the method implemented in the software IMPUTE 2 [6]. IMPUTE 2 returns the full probability distribution of the imputed genotypes at each SNP for each individual. We generated discrete imputed genotypes by accepting a call if the posterior probability for a genotype reached a pre-specified threshold or set the genotype as missing otherwise. Genotypes from AFFY here represent the study sample and the reference panel used was prepared by Marchini et al. [6] for the CEU population, including information from the 1000 Genomes Pilot and HapMap 3 (release Jun 2010/Feb 2009) [7].
The imputation algorithm for large population genetic datasets, built starting from a model developed by Li and Stephens [8] to capture important features of the recombination process is based on Markov chain Monte Carlo (MCMC) algorithm. Imputation typically involves a reference panel genotyped at a dense set of SNPs and a study sample genotyped at a subset of these SNPs. We chose this imputation method over the others because it allows to use multiple reference panels. Imputation was performed on the untyped markers of AFFY6.0 using the CEU reference panel prepared by Marchini et al. [6], and freely downloadable.
To speed up the procedure and to reduce the computation load imputation was performed subdividing the chromosomal regions to be imputed in partially overlapping 2Mb-chunks that have been processed independently.
We defined following parameters in order to evaluate the results of the imputation:
-
Discordance: the proportion of genotype calls for which the imputed genotype did not match the experimental genotype call, averaged over all SNPs.
-
Successful SNPs (SSNPs): in which the imputed genotype matched the experimental genotype for at least 37 of 39 subjects (that roughly corresponds to 5% of genotype error rate).
-
Genotype error rate: proportion of unmatched genotypes over the total genotypes.
The discordance was evaluated at three different imputed genotype calling threshold (IGCT) of value. An imputed genotype was called if the corresponding posterior probability estimated by the imputation software (IMPUTE 2) was higher than investigated IGCT. Imputed genotypes below the IGCT were set as no-calls (i.e. missing genotypes). The IGCT were set on 50%, 70% and 90%.