Multiplex PCR Design (MPD) software consists of a C library and programs used to design and pool compatible primers and a Perl package that provides convenience functions for sanitizing inputs, executing and processing the C programs, and summarizing results. To minimize human error, the package can write specifically formatted files to enable bulk oligonucleotide ordering via direct upload and addition of appropriate adapters to primers for compatibility with the 48.48 Access Array System.
The MPD C program designs primers using k-mers in a similar fashion to how BLAT finds compatible sequences [5]. It takes a specially prepared hashed version of the genome, flat dbSNP files, standard PCR parameters, and a bed file of target regions. All possible primers that cover a user-specified region are examined. Primers are immediately excluded if any of the following is true: 1) they form hairpins, 2) dimerize to each other, 3) have Tm outside the user specified range, 4) have GC content outside the user specified range, 5) occur within a repeat-masked region of the genome, 6) overlap a high frequency SNP, or 7) if the last 7 bases of the primer anneal within the amplified product. Tm and other primer characteristics were calculated using established algorithms [6]. Primers not rejected for any of these criteria are given a “quality score” which is an estimate of the primers commonness within the genome. Smaller scores represent primers with less common subsequences within them. A score of 1 would indicate that every k-mer of size 15 or smaller within the primer was absolutely unique, which is not actually possible, but scores near 1 indicate that most k-mers of size 14–15 are nearly unique. Primers with non-unique 15mers at the 3’ end of the primer are given large penalties. After all primers have been identified compatible with the supplied specification, a matrix of compatibility is created, and primer pairs are determined to be compatible if all of the following are true: 1) no primer dimerizes with another, 2) all primers have Tm’s within 2 °C, 3) primer pairs do not target overlapping regions, and 4) amplified regions are within 20% of the maximum allowable amplicons size of one another (usually, 20–30 bp). The final criterion is important to avoid race conditions where smaller amplicons predominate the reaction. Pooling begins by either selecting the primer compatible with the most or least primers and proceeds recursively until all compatible primers are pooled.
The MPD Perl package offers convenience functions to process bed files into unique regions, launch and process the MPD C program output, and check MPD primers against a local compiled version of isPcr (UCSC genome browser’s in silico PCR tool [7]). Figure 1 and an included example script demonstrates the most common usage: a configuration file and target bed file are supplied, the bed file is sanitized to the unique regions, and primer pools are created that match the design specifications on the first iteration. Primers above a set threshold are retained, and optional additional iterations are made to loosen PCR parameters up to a set threshold. Optionally, isPcr may be used to provide an orthogonal validation for PCR primer uniqueness and genomic coordinates. After the final PCR pool design, all primers are written to (1) a plain text file, (2) a file suitable for use with isPcr, and (3) an excel file that is suitable for upload for batch synthesis of oligonucleotides in 96-well plate format. Additionally, a coverage file is provided indicating which primer(s) cover what target regions. To facilitate use with the 48.48 Access Array System, required forward and reverse primer sequencing adapters may be optionally added.
Web application
The full Multiplex Primer Design (MPD) program is accessible online (http://multiplexprimer.io). The web application allows users to submit primer design jobs by uploading a list of coordinates to amplify as a simple bed file. Once the bed file is uploaded, the web server submits the data to a job queue, waiting until a worker hosted on Amazon’s EC2 cloud computer platform is available to run the job. Under typical conditions this happens within seconds. Once a worker reserves a job, it sends real-time progress updates back to the browser, allowing the user to monitor the progress of the primer design submission from anywhere in the world. Users may also opt-in to email notifications of major state changes, such as primer design success. Once completed, output files may be downloaded, and the design summary may be viewed directly in the browser.
Genomic DNA samples
Human DNA samples used in this study were provided by the Emory Alzheimer’s Disease Research Center (ADRC), which recruits community volunteers for studies of aging and memory. Genomic DNA was extracted from human blood using the Gentra Puregene Blood kit (Qiagen) following the manufacturer’s protocol.
Primer design and capture
MPD was used to design primers for all exonic regions of the following genes: ABCA7, APOE, BIN1, CD2AP, CD33, CLU, MS4A6A, and PICALM using conditions recommended for the Access Array System and compatible for sequencing on an Illumina MiSeq. For validation purposes, we restricted our analysis to the first 47 pools identified so only 1 Access Array System would be required per 48 samples. The primers were synthesized on 6 plates using standard desalting and normalized to 60 mM concentration with the appropriate forward and reverse adapters added to the respective primers. Individual primers were pooled and amplification of 48 samples of genomic DNA was performed using the Access Array as per manufacturer’s protocol. All samples were barcoded according to the manufacturer’s protocol and 250 bp paired-ended sequencing was performed on an Illumina MiSeq. Of note, the forward and reverse sequencing adapters add about 100 bp of sequence to the resultant amplicons.
Primer design validation
All raw fastq files were mapped against hg38 build of the human genome using PE Mapper (https://github.com/wingolab-org/pecaller) and trimmed by 27 bp, which is 1 bp larger than the longest primer, from the 5’ end of the read to avoid sequencing the primers directly. Base calling and variant detection was performed using PE Caller with default parameters (theta = 0.001, probability to call = 0.95), and annotation was performed using SeqAnt [8].
Quality control was performed in 2 phases. First, samples were examined within groups that underwent capture together. Primer regions with >3 SD missing sites were dropped from all samples, and samples with >3 SD missing data were likewise excluded. Second, samples from all batches were combined, and those with >3 SD missing data or excess heterozygosity were dropped. Reported sites are those with >95% completeness, and variant sites that failed Hardy-Weinberg filtering at 10-7 were excluded; however, no site failed Hardy-Weinberg filtering.