Position dependent mismatch discrimination on DNA microarrays – experiments and model
© Naiser et al. 2008
Received: 12 August 2008
Accepted: 01 December 2008
Published: 01 December 2008
Skip to main content
© Naiser et al. 2008
Received: 12 August 2008
Accepted: 01 December 2008
Published: 01 December 2008
The propensity of oligonucleotide strands to form stable duplexes with complementary sequences is fundamental to a variety of biological and biotechnological processes as various as microRNA signalling, microarray hybridization and PCR. Yet our understanding of oligonucleotide hybridization, in particular in presence of surfaces, is rather limited. Here we use oligonucleotide microarrays made in-house by optically controlled DNA synthesis to produce probe sets comprising all possible single base mismatches and base bulges for each of 20 sequence motifs under study.
We observe that mismatch discrimination is mostly determined by the defect position (relative to the duplex ends) as well as by the sequence context. We investigate the thermodynamics of the oligonucleotide duplexes on the basis of double-ended molecular zipper. Theoretical predictions of defect positional influence as well as long range sequence influence agree well with the experimental results.
Molecular zipping at thermodynamic equilibrium explains the binding affinity of mismatched DNA duplexes on microarrays well. The position dependent nearest neighbor model (PDNN) can be inferred from it. Quantitative understanding of microarray experiments from first principles is in reach.
The well-known double-helix structure of nucleic acids results from sequence-specific binding between complementary single strands. Sequential base pairing between A·T and C·G base pairs along the two complementary strands results in the formation of stable duplexes. This so called hybridization process is fundamental to many biological processes and biotechnologies. Microarrays consist of surface-tethered probe sequences, which act as specific scavengers for their respective complementary target sequence. The molecular recognition enables a highly parallel detection of nucleic acid sequences in complex target mixtures. Hybridization also occurs with single mismatched (MM) base pairs, however, these duplexes are significantly less stable than the corresponding perfect match (PM) [1, 2]. The single base pair mismatch-discrimination capability of short (~20 nt) oligonucleotide probes provides an important diagnostic tool for the detection of point-mutations and single nucleotide polymorphisms (SNPs) . DNA duplex stability arises from hydrogen bonding and base stacking interactions (the latter comprise van der Waals interactions, electrostatic and hydrophobic interactions between adjacent base pairs). According to the well-established nearest-neighbor model, thermodynamically a nucleic acid duplex can be considered the sum of these nearest-neighbor (NN) interactions [4–6]. The binding free energy of an oligonucleotide duplex can be predicted from the nearest-neighbor free energy parameters: The helix propagation parameters (one for each of the 10 possible base-pair doublets in case of a DNA/DNA duplex) account for the duplex sequence. Further parameters provide corrections for duplex initiation, A·T terminal pairs or a symmetry penalty in case of self-complementary sequences. The NN model adequately predicts oligonucleotide duplex melting temperatures T M in bulk solution . Datasets of Watson-Crick NN parameters  provide the basis for nucleic acid structure and melting temperature prediction software like the DINAMelt web server  (UNAFold), the HYTHER server and others. The NN model can be extended beyond the Watson-Crick pairs to include single base MM defects [7, 10].
In spite of good knowledge about nucleic acid hybridization in solution, the prediction of binding affinities on DNA microarrays remains empirical. Recent microarray studies [11–15] report, that the influence of even a point defect on hybridization signal intensity cannot be predicted easily. In particular the influence of defect position on the hybridization signal is stronger than the influence of MM-type [12, 14, 16].
Experiments show that the two-state nearest-neighbor (TSNN) approach , which has been very successful in predicting duplex stability in solution, does not appropriately describe MM binding affinities on DNA microarrays. The NN model does not account for the position of the individual NN pairs , except for the outermost ones. Based on microarray data, Zhang et al.  proposed a position dependent nearest-neighbor (PDNN) model. The model assumes that the duplex binding free energy can be expressed as a weighted sum of stacking energies with empirically derived positional weight parameters [17–21]. The purpose of this study is to investigate the influence of point defects on (surface bound) hybridization experimentally and theoretically. Previous studies investigate mismatch discrimination with samples of very different sequence motifs [11, 12]. However, other effects such as secondary structure formation or competitive binding may reduce the visibility of the impact of the MM-defect on the binding affinity. To avoid such complications we performed experiments with fixed sequence motifs: We focus on small variations of the probe sequences. We perform hybridization studies with home-made microarrays comprising sets of very similar probe sequences. We use a single target sequence in each hybridization assay in order to avoid inter-target binding as well as target competition of different sequences for one and the same probe sequence. In order to avoid excluded volume interactions or secondary structure we limit the length of the target sequence to be of the order of the probes. These simplifications (described in detail in ) enable a detailed investigation of the influences of defect type, defect position, flanking base pairs and the sequence motif on the binding affinity. The extensive set of hybridization affinities obtained from our experiments enables us to perform a very complete analysis. We compare the experimental data to theoretical modeling based on a double-ended molecular zipper approach (the double-ended nucleic acid zipper has been previously described by [22–26]). We find that in order to reproduce the microarray hybridization signal in our model, the heterogeneity of binding affinities – mostly owing to in situ synthesis-related probe defects (e.g. probe polydispersity) – needs to be taken into account. More than that, synthesis defects arise as useful for parallel detection of many different sequences.
Protocols for the preparation of dendrimer-functionalized microarray substrates (adapted from ) and for the light-directed synthesis (based on NPPOC-phosphoramidites ), as well as details on the hybridization assay and on fluorescence microscopy based microarray analysis (Fig. 1) are provided in Naiser et al. [14, 15].
In each microarray hybridization assay a probe set of cognate probes with purposefully introduced point mutations – derived from a common probe sequence motif – is hybridized against a single target sequence, which perfectly matches the probe sequence motif. We systematically vary defect type and defect position to provide the complete "defect profile" of hybridization affinities with probe sets. We include not only all single base mismatches (MMs), but also, in order to investigate mismatch discrimination in a broader context of other sequence defects, we consider single base bulges (originating from insertions and deletions) as well as probes with multiple defects. Since the ca. 130 probes within each probe set differ only by single bases we are able to distinguish between defect-positional and sequence influence. In our experimental conditions hybridization equilibrium is reached after a few tens of minutes. Further details can be found in .
As can directly be inferred from Fig. 2, defects in the middle of the probes are most destabilizing. In the center of a 16 mer duplex a single nucleotide MM typically reduces the hybridization signal to 0–40% of the corresponding PM duplex hybridization signal. Defect type and nearest-neighbor effects have less influence on the hybridization signal than defect position. Our experiments show a mostly monotonous decrease of hybridization signals over a range of typically 5–8 defect positions (for 16 mer probes and up to 14 positions for some 25 mer sequence motifs) from the duplex ends towards the center of the duplex. This is consistent with previous work [11, 12].
We also perform hybridization experiments on oligonucleotide duplexes with two single base deletion defects at varying positions x and y. The results show that the binding affinity depends also on the relative position of the defects (for details see Additional file 1, Fig. S5 and ). The hybridization signal is largest if each defect is located close to an end. Lowest binding affinities are observed for defect configurations which divide the sequence into three roughly equally long subsequences. Closely spaced defects (with a distance of less than four nucleotides) systematically increase their impact with distance.
In thermodynamic equilibrium duplex nucleation (determined by the slow nucleation rate k nuc ) is balanced by duplex dissociation with the dissociation rate k diss . The widely used two-state nearest-neighbor model (including mismatched NN-dimers as described by ) cannot provide an explanation for this positional influence, it does not account for the position of the individual nearest-neighbor dimers. We assume that the nucleation rates k nuc of very similar duplexes (differing by a single base pair, e.g. a PM duplex and a corresponding mismatched duplex) are virtually identical. Thus, the positional dependence observed experimentally can be expected to result from differences in k diss . In agreement with  we show that the positional influence originates from end-domain unzipping. Our experimental findings suggest a common mechanism for DPI, that is independent of the defect type. Further, the relatively long range of the DPI (Fig. 3A and 3B) suggests that molecular dynamics may well be a good candidate for an explanation. The symmetry of DPI (with respect to the duplex ends) and sequence-specific deviations from the symmetry indicate a zipping related mechanism. Thus, in order to account for partial denatured duplex states, we use a double-ended zipper model of the oligonucleotide duplex to determine mismatched oligonucleotide duplex stabilities as a function of defect position. We consider a situation in thermodynamic equilibrium.
k+ and k- are the fast zipping and unzipping rates determined by the nearest-neighbor propagation parameters of the individual base-pair doublets. The time-evolution of the oligonucleotide zipper can be considered a biased random walk with a finite probability for complete dissociation (described by the duplex dissociation rate k diss ). Since we consider thermodynamic equilibrium, we can use a partition function for fast numerics.
We use a partition function approach [22–25] and investigate if the double-ended zipper model can reproduce our experimental results. On the basis of unified NN-Parameters  we calculate statistical weights of partially denatured duplex states. The effect of partial binding with respect to microarray data was discussed earlier in [24–26].
While defects near the duplex ends result in low mismatch discrimination only (i.e. small reduction of K with respect to the PM binding affinity) defects in the center result in higher MM discrimination as K then approaches the value of the two-state equilibrium constant. NN-pair free energy increments for single base MMs are in the range of 1 to 3 kcal/mol per NN-pair (derived from NN parameters [8, 10]). Employing these values in Eq. 9 for Δg° = -1.4 kcal/mol (Fig. 5A), DPI propagation is restricted to 3 or 6 NN-pairs, respectively. However, in subsequences with weakly bound NN-pairs (as demonstrated in Fig. 5B) the DPI can propagate further towards the middle of the duplex.
In order to compare our numerical analysis to the experimentally observed hybridization signals we need to understand how the hybridization signal (fluorescence intensity from hybridized targets) is linked to duplex stability. As detailed below the assumption of a single (homogeneous) binding affinity within a microarray feature of the Langmuir adsorption model does not describe the experimentally observed hybridization signal intensities well. In this section we account for the heterogeneity that is introduced by in situ synthesis related random mutations of the microarray probe sequences.
Assuming a stepwise error rate of 10%, more than 90% of the 25 mer duplexes contain at least one synthesis error . Since the number of synthesis errors per probe follows a binomial distribution, the majority of the strands contains between one and three single base defects.
We calculate binding constants K i of the individual, randomly "mutated" probe sequences on the basis of the zipper model. Using the approach of Forman et al.  we obtain the total hybridization signal by summing up over the distribution of probes, where the contribution of each individual mutated probe θ i is described by a Langmuir equation (Eq. 10) with the binding constant K i . Probe polydispersity (in length as well as in sequence) reproduces a "stretched isotherm"  (similar to a Sips isotherm), with a significantly broadened transition region. This explains our experimental results in Figs. 6 and 2 well. A simulation of the transfer function θ( ) for various error rates and a comparison between the experimental data in Fig. 6 and the corresponding simulation results are provided in Additional file 1, Fig. S6.
To model experimental results with the partition function approach we choose the NN free energy of the mismatched base pair as a free parameter. = 1 kcal/mol (at T = 325 K) describes our experimental observations (in particular the dominating positional influence with respect to defect type-related influences) best – see Fig. 3. This value is also in good agreement with bulk solution parameters . Results of the numerical simulation (in Fig. 3A) demonstrate that the shallower slope of the hybridization signal at the right duplex-end corresponds to a series of weak NN pairs (as anticipated by Eq. 9). The partition function Z(x) largely determines the positional influence. Additionally, as shown in Fig. 3B, defect-type related influence (the difference between MM and PM free energies δΔg° affects the statistical weight of the completely dissociated duplex w D ) is reflected in the hybridization affinity K(x) and in the hybridization signal θ(x). In addition to single base pair defects our binding affinity model reproduces well our experimental results on the binding affinities of oligonucleotide duplexes with two single base deletion defects (for details see Additional file 1, Fig. S5).
In order to investigate the generality of our finding, we investigate if PDNN models, which fit experimental data well, can be inferred from our model framework. We note that zippering has been previously proposed as the rationale behind the PDNN model in .
In the following we investigate the contribution of each base pair to duplex stability and ask if there is a position-dependent contribution of Watson-Crick NN pairs in the same way as for defects.
In this paper we studied, experimentally and theoretically, the stability of short (l < 26 bp) linear surface-bound oligonucleotide duplexes with single base defects. We demonstrated that the rationale behind positional dependent models of oligonucleotide duplex stability is the partial denaturation of the duplexes. We have shown, that the strong influence of the defect position on mismatch discrimination [11–14, 16, 49] and the influence of the sequence context – beyond nearest neighbors [14, 34] can be quantitatively inferred from a molecular zipper model. Partial (end-domain-)denaturation of the duplex as proposed by us in  as well as in [16, 24, 25] results in a positional influence that is entropic in nature. The zipping process is modulated by the sequential arrangement of the base pairs. The model confirms the observed influence of the sequence context beyond the nearest-neighbors. Further the zipper model provides a theoretical foundation to the positional dependent nearest-neighbor model of Zhang et al. .
In the commonly employed two-state nearest-neighbor model, nucleic acid duplex hybridzation/denaturation is considered to be an all-or-none process. According to literature indeed end-fraying effects are expected to be small beyond three bases , however, in our studied case, we conclude that end-fraying plays a non-negligible role. This is surprising since the dissociation probability of individual base pairs decreases towards the center of the duplex in an exponential fashion (see Additional file 1, Fig. S4) and remains very low for most NN-pairs.
We propose that the effect of the defect position on probe-target binding affinities becomes apparent in the hybridization signal intensities due to the unavoidable probe polydispersity of optical synthesis. It indeed appears that the positional dependence of single base MM discrimination is more commonly observed on photolithographically produced DNA oligonucleotide arrays [11–14] rather than (in large scale studies) on spotted microarrays [40, 50, 51] or in solution-phase experiments. We notice, however, that in small studies (investigating few sequences) a positional influence in solution  and on spotted microarrays  has been reported. The probe polydispersity in our experiments smoothes out the steep sigmoid relation between the hybridization intensity and binding free energy ΔG D that is expected for defect free probes, and explains why (within a relatively broad range of ≈ 20 kcal/mol) variations of the binding free energies – like for example the influence of the defect position – are reflected (by means of an approximately linear relation) in the hybridization signal intensities.
The authors thank Dr. Pramod Pullarkat for many helpful discussions and suggestions on this work. Our research was supported by the University of Bayreuth.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.