Opentarget sparse sensing of biological agents using DNA microarray
 Mojdeh Mohtashemi^{1, 2}Email author,
 David K Walburger^{1},
 Matthew W Peterson^{1},
 Felicia N Sutton^{1},
 Haley B Skaer^{1} and
 James C Diggans^{1}
DOI: 10.1186/1471210512314
© Mohtashemi et al; licensee BioMed Central Ltd. 2011
Received: 28 March 2011
Accepted: 29 July 2011
Published: 29 July 2011
Abstract
Background
Current biosensors are designed to target and react to specific nucleic acid sequences or structural epitopes. These 'targetspecific' platforms require creation of new physical capture reagents when new organisms are targeted. An 'opentarget' approach to DNA microarray biosensing is proposed and substantiated using laboratory generated data. The microarray consisted of 12,900 25 bp oligonucleotide capture probes derived from a statistical model trained on randomly selected genomic segments of pathogenic prokaryotic organisms. Opentarget detection of organisms was accomplished using a reference library of hybridization patterns for three test organisms whose DNA sequences were not included in the design of the microarray probes.
Results
A multivariate mathematical model based on the partial least squares regression (PLSR) was developed to detect the presence of three test organisms in mixed samples. When all 12,900 probes were used, the model correctly detected the signature of three test organisms in all mixed samples (mean(R^{ 2 } )) = 0.76, CI = 0.95), with a 6% false positive rate. A sampling algorithm was then developed to sparsely sample the probe space for a minimal number of probes required to capture the hybridization imprints of the test organisms. The PLSR detection model was capable of correctly identifying the presence of the three test organisms in all mixed samples using only 47 probes (mean(R^{ 2 } )) = 0.77, CI = 0.95) with nearly 100% specificity.
Conclusions
We conceived an 'opentarget' approach to biosensing, and hypothesized that a relatively small, nonspecifically designed, DNA microarray is capable of identifying the presence of multiple organisms in mixed samples. Coupled with a mathematical model applied to laboratory generated data, and sparse sampling of capture probes, the prototype microarray platform was able to capture the signature of each organism in all mixed samples with high sensitivity and specificity. It was demonstrated that this new approach to biosensing closely follows the principles of sparse sensing.
Background
To date, most biosensors can be considered to be 'targetspecific' systems in that their detection elements are built to respond to a fixed number of organisms, and are designed to be nonresponsive in the absence of those organisms. In fielded sensors, PCRbased technologies are often selected for their specificity and low perassay cost. While this targeted approach is very effective in an environment where specific biological events are expected, a biosensing infrastructure capable of rapidly responding to new or engineered biological threats while maintaining a low cost of operation requires increased flexibility. Targeted platforms, like those using specific PCR primers for qualitative or quantitative amplification for detection, require creation of new physical capture reagents when new organisms are targeted [1]. These platforms are also often limited in the total possible number of parallel assays run at any one time as multiplexing tens or hundreds of PCR reactions greatly increases assay complexity. To mitigate the limitations of such approaches, there have been previous efforts to design highdensity microarrays that are representative of groups or families of organisms and while these sensors would likely still offer information for novel threats, assured classification at the species or strain level would be impossible without reengineering and redeployment of sensing devices [2–4].
Microarraybased detection and identification approaches often consist of a series of probes designed with particular target genomes in mind; if a probe hybridizes, the analyst can be reasonably sure the organism or target represented by that probe is present in the original sample. In some cases, multiple probes can be used to create 'fingerprints' representative of particular organisms, but this requires a great deal of upfront probe design effort [5] such as assuring specificity of probe sequence and lack of crosshybridization. This approach has been used previously to detect viruses [2, 3, 6, 7]; in one example by designing 70mer probes unique to each of more than 100 viral species [2]. Microarrays with species or strainspecific probes have also been designed to differentiate between strains of Staphylococcus aureus by generating lists of thermodynamicallyfavorable probes from regions of sequence unique to particular strains [8–10]. Additional efforts have also constructed systems for the design of probes specific at the level of individual gene families [10], recognizing that some of these families will be specific for related pathogens.
While these approaches achieve an increase in robustness by using multiple, parallel measurements for each target organism, they still rely upon a priori knowledge of agent sequence. They are also limited in the scope of intended detection capability to only those organisms for which the individual arrays have been explicitly designed. However, the constraints placed on probes generated to match unique sequence regions in a family of organisms, by definition limit the capacity for these probes to hybridize to distinct novel or engineered organisms. An opentarget design would provide data regardless of whether a particular biological event was expected, thus allowing new microorganisms to be recognized, characterized and managed in short order.
One presumed drawback in the design of an opentarget biosensor, however, is that the greater the number of biological species to be detected, the larger the array size required. Thus, to detect the presence of even a few microorganisms, conventional wisdom dictates that the microarray would have to be very large to capture distinct genomic patterns with high degree of specificity, an endeavour that is not cost effective in environmental monitoring.
It has recently been suggested that many natural phenomena are sparse in that they can be represented in a compressed format using the proper basis [11–16]. Sparsity denotes that, to recover a signal of interest, the number of degrees of freedom needed to approximate the signal may, in principle, be much smaller than the length of the signal. This is the foundation for the new theory of sparse, or compressive sensing (CS) [13–15]. The main principle of CS is that for a signal x of length N, if x is Ksparse in some basis (K << N), which implies that it has K nonzero entries and NK zero elements, then M linear measurements of x suffice to recover the signal, M < N and M =O(K log(N/K)). Let y be the vector of M measurements of x. Then in matrix notation we have y = Φx. The key challenge in this framework lies in the design of a M × N sensing matrix Φ, which together with y and the sparsity condition imposed on x, would be capable of accurate recovery or detection of x. For CS to apply, in addition to the constraint that x must be sparse, the sensing matrix must satisfy the restricted isometry property (RIP) [15] which implies that the rows of Φ should be incoherent with respect to the signal sparsity basis.
Recently, Dai et al. have proposed that DNA microarrays can be designed using the notion of CS [17]. They used the NCBI Clusters of Orthologous Groups (COG) database, which contains orthologous sets of proteins from 66 organisms corresponding to conserved protein domains. Challenges of this approach include how to translate protein back to less conserved DNA sequences and species which lack certain clustered proteins. Species which DNA encode these proteins differently than the array probe sequences would also not be detected.
In this paper, we put forward the notion that an opentarget design is a viable approach to biosensing based on the principle of sparsity. Using laboratorygenerated data, we provide strong evidence that: First, the underlying genomic imprints of multiple biological organisms can be captured succinctly using a small codebook, or collection of microarray probes, not specifically designed to respond to the target organisms. And second, our design approach follows closely the principles of sparse sensing, and thus CS is an applicable and sensible notion for biological sensing.
Methods
Microarray Probe Design
Pathogenic Sequences
Species  Pathogenicity  GenBank ID 

Bacillus anthracis (Ames strain)  Anthrax  NC_003997 
Yersinia pestis (CO92)  Bubonic plague  NC_003143 
Francisella tularensis (Schu 4)  Tularemia  NC_006570 
Brucella suis  Brucellosis  NC_004310 
Burkholderia mallei  Glanders  NC_006348 
Burkholderia pseudomallei  Melioidosis  NC_006350 
Escherichia coli O157 H7 str. Sakai  Hemolytic uremic syndrome  NC_002695 
To investigate the impact of sequence sampling lengths and strategies on the final probe design, VLMCs for three different training sets were used to generate probes by sampling:

500 bp from each of 7 genomic sequences, resulting in a total of 3,500 long input sequence

5000 bp from each of 7 genome sequences, resulting in a total of 35,000long input sequence

12,000 bp from each of 3 of the 7 sequences (identified in bold in Table 1), resulting in 36,000long input sequence
Samples were taken randomly from each genome without regard for higherorder genomic structure (e.g., coding sequence). For each training set, samples from individual genomes were concatenated endtoend to produce single DNA sequences to train a VLMC model.
Training a VLMC model was performed using the context algorithm [18, 22], based on a previously developed data compression technique [23], which requires a single parameter, K. A larger value for K results in more pruning of a VLMCderived tree, which leads to a less complex tree, and thus a model of smaller dimension. To determine an optimal value for K, we applied an approach similar to that of Mächler [18]. In brief, initial values of K (0, 0.5, 1.0, 2.0, 5.0, 10.0, and 15.0), termed K_{0}, were used to create multiple VLMC models. For each K_{0}, pruned VLMC models were used to emit n+1 base pairs. The first 10,000 base pairs were discarded to allow the simulation model to stabilize. Subsequent VLMC models were created for values of K from 1 to 20 in increments of 0.1 and used to predict the (n+1)^{th} base pair from the initial VLMC output. This process was iterated 1,000 times for each value of K_{0}, and the number of correct predictions was recorded.
Microarray Hybridization
To hybridize against the VLMCderived probe set and generate data, the purified genomic DNA from 3 different simulant strains: Bacillus cereus (BC), Bacillus subtilis (BS) (as withingenera standins for B. anthracis), and Pantoea agglomerans (PA) (as a gramnegative standin for Yersinia pestis), was fragmented and amplified using a Sigma GenomePlex^{®} Whole Genome Amplification (WGA) kit. 10 ng of purified genomic DNA was randomly fragmented using the WGA kit to yield fragment lengths of 75  1500 base pairs with an average fragment length of 400 base pairs. Fragmented DNA was then flanked by universal priming sites and amplified through 14 rounds of PCR. Amplified DNA was precipitated using 1/10 volume of 3 M sodium acetate (pH 5.2) and 2 volumes of 100% pure ethanol at 80°C for 2 hours. DNA was fluorescently labeled by reacting with the N7 of guanine using the with ULYSIS Alexa Fluor^{®} 546 Nucleic Acid Labeling kit (Invitrogen). Excess dye was removed with an Agilent Genomic DNA Purification Module spin kit. Samples were then concentrated to 250 ng of DNA in 7μl. Labeled DNA was prepared for hybridization with 4.5μl Agilent 10 × GE Blocking Agent and 22.5μl Agilent 2 x CGH Hybridization buffer using an Agilent Oligo aCGH Hybridization kit. Samples were denatured at 95°C for 3 minutes followed by 30 minutes at 37°C. 11μl of KreaBlock was added to each sample to reduce background fluorescence. 40μl of prepared sample was then loaded onto Agilent 8 × 15 K Custom Arrays which were hybridized for 16 hours at 42°C. Arrays were washed (Agilent Oligo Wash Buffer Kit) for 5 minutes and then scanned on a Molecular Devices GenePix 4100 A scanner. Feature extraction was performed using Agilent's Feature Extraction software v9.5.3.1 and samples underwent quantile normalization via the Bioconductor limma package [29] in R.
Experimental Design
Genomic DNA  # Arrays  gDNA 

B. subtilis  10  250 ng 
B. cereus  10  250 ng 
P. agglomerans  10  250 ng 
B. subtilis/B. cereus  2  125 ng/species 
B. cereus/P. agglomerans  2  125 ng/species 
B. cereus/B. subtilis/P. agglomerans  2  125 ng/species 
Oligo spikeins  2  2.5 ng and 25 ng 
Detection Model
A multivariate mathematical model based on partial least squares regression (PLSR) was developed to capture the signature of each simulant organism. Briefly, given a number of predictors, or independent variables, PLSR iteratively finds the best fit for one or more response, or dependent variables by maximizing the correlation between the two variables [30, 31]. PLSR seeks to maximize correlation between the response and predictor variables while capturing and explaining most of the variation within the covariate space by constructing new predictor variables, or latent variables, as linear combinations of the original predictor variables.
In this study, the covariate matrix, X = (x_{1},...,x_{ m }), is a (n × m) matrix of n = 12,900 observations and, m = 4 predictor variables. Each variable, x_{ j }, for j ∈ {1,2,3}, represents the vector of hybridization values, x_{ ij }, i = 1,...,n, averaged over 10 replicate arrays for the j^{th} simulant species (see Table 2), and x_{4} represents that of the oligos averaged over two arrays (see Table 2). The response matrix, Y = (y_{1},...,y_{ s }), is a (n × s) matrix of s = 8 dependent variables representing 4 possible combinations of the three simulant organisms, with two replicate arrays for each combination, hybridized against the probe set. Both the predictor and response matrices were then standardized (meancentered and scaled) before analysis was performed.
where k is the number of sample observations per probe, T_{ i }is the vector of k sample observation scores in row i, for i = 1,...,k, μ_{ i } is the mean value of k observation scores in row i, and S^{1} is the inverse of the sample covariance matrix. All scripts were written in Matlab 7.6.0 (R2008a).
Results
Signal Detection
The first three latent variables from the PLSR model, h = 3, achieved maximum correlation with the response variables while together they captured most of the variation in the predictor matrix (>86%) and response matrix (>74%). Thus, the PLSR model was calibrated using the first three components to build a predictive model of the response matrix.
Sparse Sampling and Signal Detection
 1.
For each mixed sample
 a.
Probes were sorted in decreasing order of their T^{ 2 } values.
 b.
Probes with high T^{ 2 } values were selected for further investigation, if their value was greater than μ _{ T } ^{2}+ cσ _{ T } ^{2}, where μ _{ T } ^{2}and σ _{ T } ^{2}are the respective mean and standard deviation of the sample T ^{2} values, and c is a scalar.
 2.
Probes with high T ^{2} values shared by four out of eight samples (or two out of four combination groups) from step 1.2 were selected as the final set for PLSR analysis.
Sparse Sensing
In this section we demonstrate that, in retrospect, the sparse sampling algorithm, developed in the previous section, closely follows the principles of compressive sensing when the matrix of intensity values is properly mapped to generate a sensing matrix. Recall the main condition of CSthat for a signal to be compressively sensed, it must be sufficiently sparse (Ksparse). Here, the target vector, x, has only three nonzero elements, namely the concentrations of the three simulant organisms in captured samples and the remaining N3 entries are zero. Because in principle, the number of potential organisms in a location at a point in time, N, is very large, x is considerably sparse (K = 3). The vector of M measurements, y, consists of 12,900 intensity values for each mixed sample. The key challenge in the application of sparse sensing is in the design of the sensing matrix that satisfies the RIP and results in accurate recovery of x using the matrix notation y = Φx. It has been shown that sparse binary random matrices satisfy the RIP [17]. Here, we show how the results of our sparse sampling algorithm can be mapped to a sparse binary random sensing matrix that together with the hybridization measurements uniquely identifies the presence of each simulant organism in the mixed samples.
Discussion
It is well understood that in spite of vast amount of shared sequences among biological organisms, most comprise unique sets of oligomers based on which they can be differentiated at various biological scales. This critical finding has enhanced the ability to design microarraybased biosensors capable of detecting multiple biological agents whose signatures are included in the array. As more viral and bacterial species are sequenced and their DNA signatures are retrieved, microarray scalability presents a challenge to the design of targetspecific biosensors. At the same time, such a targeted approach to biosensing is illequipped when a biological threat is due to the presence of an agent whose signature is not considered in the microarray design either because it was outside the realm of expectation (e.g., previously eradicated but reemerging pathogens) or is unknown (e.g., newly emerging strains or an engineered pathogenic sequences). An open system approach to biosensing is a new concept. If properly designed, an open system biosensor can address the aforementioned challenges from which conventional biosensors suffer.
The equivalence of our sparse sampling algorithm and compressive sensing in the context of opentarget sensing has important implications for biosensing. First, that the genomic imprints of biological organisms can be represented in a compressed format, and thus a relatively small DNA microarray can be used to decode the signature of multiple organisms in mixed, and potentially complex environmental samples. Second, that the sparsity condition likely applies to environmental sampling and detection of biological events, and thus the cost and size of the array can be kept in check. And third, that the previously unencountered microorganisms can be detected if they are present in the environment at sufficient concentrations, even though their unique DNA sequences are not explicitly accounted for in the array design.
Two potential limitations of this study must be addressed for future consideration. First, despite relatively extensive laboratory experimentations performed for this study, the number of biological organisms tested and selected to generate mixed samples is small. To demonstrate the utility, efficiency, and robustness of an open system approach to biosensing, a greater spectrum of biological agents must be tested and their hybridization patterns evaluated against the microarray probes.
Second, with respect to the probe design a set of evaluations were performed to select the final design of the probe set, where the specificity of the randomly generated and VLMCderived probes were compared by aligning each set of 12,900 25mer probes against a panel of twelve Grampositive and negative prokaryotic organisms (Figure 2). While the specificity of all three VLMCderived probe sets was substantially higher than that of random probe sequences, the average performance of the three VLMCderived sets of probes is relatively the same across all organisms. It is important to note, however, that we only generated one set of probes for each sampling strategy. In principle, the average outcome of multiple runs of simulations is required to arrive at statistically significant results. We selected the first sampling strategy, a random sampling of 500 bp from each of the seven pathogenic sequences, for designing the final probe set based on its slightly higher prediction accuracy than those of the two probe sets generated using the competing sampling strategies. A more comprehensive examination of these and other sampling strategies are needed to determine which strategy, or set of strategies, leads to the best probe sequences design for differentiating between the DNA signatures of multiple organisms.
Conclusions
In this paper, we hypothesized and demonstrated that a relatively small nonspecifically designed DNA microarray was capable of identifying the presence of three test organisms in mixed DNA samples with high sensitivity and specificity without specifically targeting these organisms. Coupled with a multivariate detection model and sparse sampling of the microarray probes our prototype opentarget biosensor was demonstrated to follow the design principles of CS.
Three observations are worthy of note here, and should also be considered in future work. First, sparse sampling of 12,900 probes, based on a twolayer filtering, led to the selection of the smallest set consisting of 47 probes capable of accurate identification of three simulant organisms in the mixed samples. This resulted in a considerable reduction in the array size, based on which a sparse, binary, random sensing matrix was designed. However, our goal was not to derive the minimum number of probes required to differentiate across three test organisms in mixed DNA samples, but to demonstrate the feasibility of designing a small DNA microarray for 'opentarget' sensing of multiple organisms and applicability of sparse sampling to biosensing. It remains uncertain whether a mathematical function can be formulated that describes the relationship between the number of organisms to be sensed and the size of an 'opentarget' microarray.
Second, qualitative examination of the relationship between the size of the array and its detection specificity uncovers an important difference between 'opentarget' and 'targetspecific' microarraybased sensing platforms. In 'targetspecific' sensing, as the size of the microarray is increased to include molecular signatures of newly sequenced organisms, the falsepositive rate is expected to decrease, or equivalently the specificity is expected to increase. In 'opentarget sparse sensing', the falsepositive rate approached zero, or equivalently the specificity reached 100%, as the size of the array was substantially reduced by pruning the less informative probes. This observed dichotomy between 'opentarget' and 'targetspecific' sensing with respect to the relationship between the array size and detection specificity, while promising, will have to be further validated in future studies.
Third, the distribution of the intensity values of the final set of 47 selected probes is qualitatively different than that of the average of 500 runs of 47 randomly selected probes (see Figure 9). The sparse sampling algorithm was applied to 12,900 probes without any constraint imposed on probe selection except that a selected probe would have a high T^{2} value. Indeed, the application of sparse sampling algorithm resulted in the selection of high T^{2} probes which captured the difference in the hybridization patterns of BC and BS, and greatly reduced the false positive rate previously observed (compare Figures 4 and 7). This finding should be more closely examined by testing more organisms and by the sequence alignment of each selected probe against the genomic sequence of each organism.
To our knowledge, this is the first study to introduce an 'opentarget' approach to DNA microarray based biosensing, and demonstrate a proof of concept through three elements of probe design, laboratory data generation, and mathematical modelling. Future directions of this work include improvement to the probe design as guided by the analysis and experiments, expansion of the reference library to encompass additional test organisms, and environmental testing by external air sampling to provide a more realistic and complex environmental background.
Declarations
Acknowledgements
This project was fully funded by The MITRE Corporation.
Authors’ Affiliations
References
 Sabelnikov A: Probability of realtime detection versus probability of infection for aerosolized biowarfare agents: A model study. Biosens Bioelectron 2006, 21: 2070–77. 10.1016/j.bios.2005.10.011View ArticlePubMedGoogle Scholar
 Wang D, Coscoy L, Zylberberg M, Avila P, Boushey H, Ganem D, DeRisi J: Microarraybased detection and genotyping of viral pathogens. PNAS 2002, 99: 15687–92. 10.1073/pnas.242579699PubMed CentralView ArticlePubMedGoogle Scholar
 Lim DV, Simpson JM, Kearns EA, Kramer MF: Current and Developing Technologies for Monitoring Agents of Bioterrorism and Biowarfare. Clin Microbiol Rev 2005, 18: 583–607. 10.1128/CMR.18.4.583607.2005PubMed CentralView ArticlePubMedGoogle Scholar
 Schulze A, Downward J: Navigating gene expression using microarrays  a technology review. Nat Cell Biol 2001, 3: E190E195. 10.1038/35087138View ArticlePubMedGoogle Scholar
 Satya RV, Zavaljevski N, Kumar K, Reifman J: A highthroughput pipeline for designing microarraybased pathogen diagnostic assays. BMC Bioinformatics 2008., 9:Google Scholar
 Wang D, Urisman A, Liu Y, Springer M, Ksiazek TG, Erdman DD, Mardis ER, Hickenbotham M, Magrini V, Eldred J, Latreille JP, Wilson RK, Ganem D, Derisi JL: Viral Discovery and Sequence Recovery Using DNA Microarrays. PLoS Biology 2003, 1: 257–260.View ArticleGoogle Scholar
 Urisman A, Fischer KF, Chiu CY, Kistler AL, Beck S, Wang D, Derisi JL: EPredict: a computational strategy for species identification based on observed DNA microarray hybridization patterns. Genome Biology 2005., 6:Google Scholar
 Charbonnier Y, Gettler B, François P, Bento M, Renzoni A, Vaudaux P, Schlegel W, Schrenzel J: A generic approach for the design of wholegenome oligoarrays, validated for genomotyping, deletion mapping and gene expression analysis on Staphylococcus aureus. BMC Genomics 2005., 6:Google Scholar
 Kim B, Park J, Gu M: Implementation of random bacterial genomic DNA microarray chip (RBGDMC) for screening of dominant bacteria in complex cultures. Appl Biochem Biotechnol 2010, 8: 2284–93.View ArticleGoogle Scholar
 Feng S, Tillier ER: A fast and flexible approach to oligonucleotide probe design for genomes and gene families. Bioinformatics 2007, 23: 1195–1202. 10.1093/bioinformatics/btm114View ArticlePubMedGoogle Scholar
 Candès E: Stable signal recovery from incomplete and inaccurate measurements. Comm Pure Appl Math 2006, 59: 1208–1223.View ArticleGoogle Scholar
 Candès E, Robust : Uncertainty principles: exact signal reconstruction. IEEE Trans Info Theory 2006, 52: 489–509.View ArticleGoogle Scholar
 Donoho D: Compressed Sensing. IEEE Trans Info Theory 2006, 52: 1289–1306.View ArticleGoogle Scholar
 Duarte M: Singlepixel imaging via compressive sampling. IEEE Signal Processing Magazine 2008.Google Scholar
 Candès E, Wakin M: Introduction to compressive sampling. IEEE Signal Processing Magazine 2008.Google Scholar
 Gilbert A, Indyk P: Sparse recovery using sparse matrices. Proceedings of the IEEE: June 2010 2010, 98: 15687–92.View ArticleGoogle Scholar
 Dai W, Sheikh MA, Milenkovic O, Baraniuk RG: Compressive sensing DNA microarrays. EURASIP Journal on Bioinformatics & Systems Biology 2009, 1–12.Google Scholar
 Mächler M, Bühlmann P: Variable Length Markov Chains: Methodology, Computing, and Software. J Comput Graph Statist 2004, 13: 435–455. 10.1198/1061860043524View ArticleGoogle Scholar
 CRAN: The Comprehensive R Archive Network[http://cran.Rproject.org]
 Rimour S, Hill D, Militon C, Peyret P: GoArrays: highly dynamic and efficient microarray probe design. Bioinformatics 2005, 21: 1094–1103. 10.1093/bioinformatics/bti112View ArticlePubMedGoogle Scholar
 Benson D, KarschMizrachi I, Lipman D, Sayers E: GenBank. Nucleic Acids Research 2009., 37:Google Scholar
 Buhlmann P, Wyner A: Variable Length Markov Chains. Annals of Statistics 1999, 27: 480–513. 10.1214/aos/1018031204View ArticleGoogle Scholar
 Rissanen J: A Universal Data Compression System. IEEE Trans Info Theory 1983, 29: 656–664. 10.1109/TIT.1983.1056741View ArticleGoogle Scholar
 Markham N, Zucker M: UNAFold: software for nucleic acid folding and hybridization. Bioinformations 2008, 2: 3–31.View ArticleGoogle Scholar
 Rozen S, Skaletsky H: Primer3. Methods Mol Biol 1998, 132: 365–386.Google Scholar
 Sen D, Gilbert W: Formation of parallel fourstranded complexes by guaninerich motifs in DNA and its implications for meiosis. Nature 1988, 334: 364–366. 10.1038/334364a0View ArticlePubMedGoogle Scholar
 Guschlbauer W, Chantot J, Thiele D: Fourstranded nucleic acid structures 25 years later: from guanosine gels to telomere DNA. J Biomol Struct Dyn 1990, 8: 491–511.View ArticlePubMedGoogle Scholar
 Hoffman S, Otto C, Kurtz S, Sharma C, Khaitovich P, Vogel J, Stadler P, Hackermüller J: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Computational Biology 2009., 5:Google Scholar
 Smythe G: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R and Bioconductor. New York: Springer; 2005.Google Scholar
 De Jong S: SIMPLS: An Alternative Approach to Partial Least Squares Regression. Chemom Intell Lab Syst 1993, 18: 251–263. 10.1016/01697439(93)85002XView ArticleGoogle Scholar
 Jobson J: Applied Multivariate Analysis Volume II: Categorical and Multivariate Methods. New York: SpringerVerlag; 1992.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.