Bacterial genotyping by 16S rRNA mass cataloging
© Jackson et al; licensee BioMed Central Ltd. 2006
Received: 13 March 2006
Accepted: 23 June 2006
Published: 23 June 2006
It has recently been demonstrated that organism identifications can be recovered from mass spectra using various methods including base-specific fragmentation of nucleic acids. Because mass spectrometry is extremely rapid and widely available such techniques offer significant advantages in some applications. A key element in favor of mass spectrometric analysis of RNA fragmentation patterns is that a reference database for analysis of the results can be generated from sequence information. In contrast to hybridization approaches, the genetic affinity of any unknown isolate can in principle be determined within the context of all previously sequenced 16S rRNAs without prior knowledge of what the organism is. In contrast to the original RNase T1 cataloging method, when digestion products are analyzed by mass spectrometry, products with the same base composition cannot be distinguished. Hence, it is possible that organisms that are not closely related (having different underlying sequences) might be falsely identified by mass spectral coincidence. We present a convenient spectral coincidence function for expressing the degree of similarity (or distance) between any two mass-spectra. Trees constructed using this function are consistent with those produced by direct comparison of primary sequences, demonstrating that the inherent degeneracy in mass spectrometric analysis of RNA fragments does not preclude correct organism identification.
Neighbor-joining trees for important bacterial pathogens were generated using distances based on mass spectrometric observables and the spectral coincidence function. These trees demonstrate that most pathogens will be readily distinguished using mass spectrometric analyses of RNA digestion products. A more detailed, genus-level analysis of pathogens and near relatives was also performed, and it was found that assignments of genetic affinity were consistent with those obtained by direct sequence comparisons. Finally, typical values of the coincidence between organisms were also examined with regard to phylogenetic level and sequence variability.
Cluster analysis based on comparison of mass spectrometric observables using the spectral coincidence function is an extremely useful tool for determining the genetic affinity of an unknown bacterium. Additionally, fragmentation patterns can determine within hours if an unknown isolate is potentially a known pathogen among thousands of possible organisms, and if so, which one.
Determinative bacteriology often relies on culture-based methods involving time-consuming isolation, cultivation, and characterization of phenotypic traits. While in a few cases a rapid identification can be made using phenotypic methods, the phylogenetic resolution of such methods is usually quite low. Characterization of cells based on morphology, staining, and metabolic traits is often not discriminatory and can take days to weeks for unambiguous identification [1–3]. Perhaps most importantly, many pathogens are fastidious or even uncultivable under laboratory conditions, so that culture-based methods are not applicable. Finally, such methods are labor-intensive, not amenable to automation, and require extensive "hands-on" time and interpretation by the trained microbiologist.
In the "post-genome" era, molecular methods are rapidly supplanting phenotypic characterization. Although a variety of nucleic acid based approaches are in use, most current bacterial diagnostic research is focused on comparative sequencing of PCR-amplified genes, in situ hybridization with labeled probes or molecular beacons, and phylogenetic microarrays [4–11]. Methods that rely on hybridization are effective at leveraging genomic information, but they typically face the significant drawback of requiring construction of one or more probes based on a priori knowledge of the genus or species that needs to be detected. Complete or partial genomic sequencing requires no such preliminary knowledge, but even the fastest sequencing separations requiring minutes or hours are time-consuming compared to mass spectrometry [12, 13]. In contrast to sequencing by capillary electrophoresis which requires a labeling step, as we will describe, in vitro transcription and fragmentation reactions may be analyzed by rapid mass spectrometry, such that the greatest gains in overall efficiency are had when processing multiple samples.
Mass spectrometry for sequencing and compositional characterization
A number of methods for characterization of nucleic acids using mass spectrometric analysis have been proposed. Among these, MALDI-TOF (matrix-assisted laser desorption ionization time-of-flight) mass spectrometry is the method of choice for measuring the mass of oligonucleotides, especially mixtures thereof. Employing this idea, MALDI-TOF has been used for chain-termination sequencing [14–18]. However, the maximum read length so far demonstrated using such an approach is ~56 nucleotides . Very high resolution measurement of PCR product composition (~100 bp) using electrospray ionization Fourier transform ion-cyclotron resonance (ESI-FTICR) for microbial identification has also been proposed [19, 20]. Unfortunately, the resolution required for unambiguous compositional assignment (+/- 1 ppm) of such large molecules requires instrumentation out of reach for many laboratories. Given these limitations, it is advantageous to introduce a fragmentation step, which reduces the resolution requirements while retaining valuable information. In this regard, analysis of 16S rRNA fragmentation patterns produced by RNase T1 was used with great success to determine phylogenetic relationships between bacteria (prior to the onset of modern sequencing technologies) [21–25]. Early studies suggest that despite the information loss associated with compositional rather than sequential analysis of such fragments, microbial identification based upon base-specific fragmentation patterns appears extremely promising [26–31].
In this paper, we compare genetic affinities deduced from observable mass fragment spectra and phylogenies based on complete 16S rRNA sequences of the same organisms using neighbor-joining  to construct distance-based trees. This approach allows us to examine the extent to which unrelated organisms might be incorrectly identified by mere coincidence of mass spectral patterns and to determine the degree (i.e. family-, genus-, or species-level) to which the genetic affinity of various organisms may be resolved. Using a straightforward spectral comparison metric, we derive distance matrices. Neighbor-joining trees are then constructed to obtain insight into the limit of resolution of the method when a single cleavage reaction is used. These in silico analyses were based on an experimentally reasonable protocol: First, universal primers (appended with 5'-RNA polymerase promoters) are used to amplify a homologous sequence region of the 16S rDNA from the organism(s) in a sample. Next, the DNA amplicons are transcribed to RNA and subjected to base-specific fragmentation. The mass of the resulting products is then determined by MALDI-TOF mass spectrometry. Finally, measured masses are cataloged and compared to mass databases derived from rRNA sequence databases to determine the genetic affinity of the sample organism.
Comparison of mass spectra
In order to quantitatively inter-compare mass spectral "fingerprints" produced by base-specific fragmentation, we formulated the scalar- or inner-product defined by equation 1. We define a scalar product (often referred to as a 'dot-product') of two mass spectra as
where m i are the masses of each of the N1 individual fragments in the spectrum for species 1 and m' j are the masses of each of the N2 fragments for species 2, and δ is the discrete (Kronecker) delta function defined as
It can be easily verified that the following commutative, distributive, and positive-definiteness conditions for an inner-product are satisfied:
Using this inner-product, we then define the following metric or "coincidence function":
This function provides a normalized (i.e. between 0 and 1) representation of the extent to which two spectra are similar. Using this metric, a coincidence (or similarity) matrix, C with elements c ij can be generated to tabulate the degree of similarity between the fragment catalogs of every pair of organisms. Likewise, a matrix of distances, D with elements d ij = (1 - c ij ) can be created, and used as input to conventional cluster analysis algorithms.
Universal primers directed at bacterial 16S ribosomal RNA (rRNA)
Since the early demonstration that phylogenetic information is obtainable from catalogs of RNase T1 digests of 16S rRNA [21–25], comparison of 16S rRNA (or rDNA) sequences has become the most widely used single molecular method to characterize the phylogenetic position or genetic affinity of Bacteria and Archaea. Over 180,000 accessions containing partial or full rDNA sequences are available from the Ribosomal Database Project (as of RDP release 9.32, 72,540 sequences are of length 1200 nt or more) .
Weisburg  and Lane  have previously reported "universal" primers yielding PCR products from various 16S rDNA (or rRNA) sequence regions for a large fraction of all known bacteria. When present, these primers yield almost full coverage of the approximately 1,500 bp 16S sequence, and have thus been used routinely for phylogenetic study. In addition to primer sequences shared by many organisms, primer pairs yielding amplicons of ~500 bp or less are also of interest because the mass spectra acquired from such shorter regions will have minimal complexity. For example, transcription and subsequent RNase T1 cleavage of a 400 bp amplicon will typically yield only ~22–25 distinct RNA fragment masses .
Organisms of greatest interest
The National Institute of Allergy and Infectious Diseases (NIAID) has prioritized research on a number of organisms categorized as "Category A, B, or C" pathogens for the purposes of epidemic prevention and for biodefense [35–37]. To examine to what extent mass spectrometric methods might resolve these organisms from one another (using the primer sets above) all the 16S rDNA sequences of every bacterial strain listed in NIAID Categories A, B, or C were downloaded. In some cases an entire genus such as Salmonella, for example, is specified. Based on initial experimental results, we expected to obtain at least genus-level resolution and, in many cases, species-level resolution of bacteria using masses cataloged from just a single cleavage reaction. Of course, since pathogenicity can arise from minor genomic changes (presence of plasmid, a single gene, changes in regulation, etc.) even complete 16S rRNA sequencing cannot always distinguish pathogenic strains from non-pathogenic strains. For example, pathogenic strains of E. coli have the same rRNA sequence as non-pathogenic strains. Thus, the best possible outcome in these cases is to assign an unknown organism to a genus or species whose members include pathogens.
With large databases of predicted mass-fragment catalogs in hand (typically ~20–25 masses from a given "universally amplifiable" 16S rDNA sequence subregion), we calculated thousands of mass-spectral coincidences between all catalogs using equation 4. We investigated the average value of the coincidence function at different phylogenetic levels for a number of organism lineages and compared coincidence values to the average entropy of several multi-sequence alignments. Finally, we used coincidence values to generate distance matrices and these distances were placed in a format acceptable for input into the freely-available program MEGA 3.1 for generating neighbor-joining trees .
Occurrence of the adjacent Weisburg and Lane primer pairs
Coverage and amplicon characteristics for the adjacent Weisburg and Lane universal primer sets in 47,257 bacterial sequences.
Occurrence of Primer Pair in All Bacterial Sequences
Occurrence of Primer Pair in Enteric Sequences
Average Amplicon Length and Std. Dev. (All Bacteria)
Average Amplicon Length and Std. Dev. (Enterics only)
"Weisburg FWD" AGAGTTTGATCCTGGCTCAG AGAGTTTGATCATGGCTCAG AGAGTTTGATCCTGGCTTAG AGAATTTGATCTTGGTTCAG
"Lane A" CAGCAGCCGCGGTAATAC CAGCAGCCGCGGTAATTC CAGCCGCCGCGGTAATAC CAGCCGCCGCGGTAATTC
9337/47257 = 19.76%
333/1655 = 20.12%
518.78 ± 37.94 bp
530.42 ± 3.38 bp
"Lane A" CAGCAGCCGCGGTAATAC CAGCAGCCGCGGTAATTC CAGCCGCCGCGGTAATAC CAGCCGCCGCGGTAATTC
"Lane B" AAACTCAAAGGAATTGACGG AAACTCAAATGAATTGACGG AAACTTAAAGGAATTGACGG AAACTTAAATGAATTGACGG
38310/47257 = 81.07%
1553/1655 = 93.84%
408.83 ± 7.35 bp
409.17 ± 2.40 bp
"Lane B" AAACTCAAAGGAATTGACGG AAACTCAAATGAATTGACGG AAACTTAAAGGAATTGACGG AAACTTAAATGAATTGACGG
"Lane C" GTACACACCGCCCGT GCACACACCGCCCGT
36,058/47,257 = 76.30%
1401/1655 = 84.65%
500.83 ± 13.94 bp
501.26 ± 1.06 bp
"Lane C" GTACACACCGCCCGT GCACACACCGCCCGT
"Weisburg REV" GGTTGGATCACCTCCTTA AAGTCGTAACAAGGTAACCGT AAGTCGTAACAAGGTAGCCGT AAGTCGTAACAAGGTATCCGT
9991/42257 = 21.14%
373/1655 = 22.54%
123.62 ± 57.24 bp
122.24 ± 8.1978 bp
Spectral coincidence vs. phylogenetic level and positional entropy
Average mass spectral coincidence values for 11 representative organisms at the genus and species levels.
Generation of trees based on separate analysis of the "Lane-AB" or "Lane-BC" RNase T1 mass catalogs
Figure 4 shows the corresponding neighbor-joining tree of the same pathogens and coincidence analysis of RNase T1 fragmentation of the Lane-BC amplicon. Note that while in Figure 3, Salmonella enterica was poorly resolved from S. typhi and S. typhimurium, fragmentation of the Lane-BC sequence separates S. enterica from the other two species. Likewise, analysis of the Lane-BC region improves resolution between Vibrio cholerae and V. parahaemolyticus as well as between Yersinia pestis and Y. entericolitica. On the other hand, Brucella and Burkholderia species are better separated by the Lane-AB sequence region.
Occurrence of universal primers
Since we did not account for sequences which may have contained one or more indeterminate bases within the primer substring, it is likely that the published primer sequences occur in more strains than reported here. For the purposes of a broad bacterial identification platform, in practice, the primers will likely be employed under conditions of "permissive" PCR in which up to several mismatches will still result in a product, extending the organism coverage of a primer pair by reduced stringency. In any event, many other universally conserved sequences have been proposed for phylogenetic positioning of bacteria  and the methods described here are certainly amenable to various primer pairs. Indeed, for some applications it may be best to design custom primer pairs to achieve the best possible resolution between target species.
To some extent the standard deviations in coincidence (as well as the averages themselves) presented in Figures 1 and 2 may be affected by the prevalence of certain "popular" strains or species in the RDP such as E. coli (represented by 80 accessions of length 1200 nt or longer). Conversely, just as sequence-variability may be over-estimated due to a shortage of information on a particular genus, so might the variation in mass-spectral coincidence. Although the coincidences at the genus-level for Campylobacter and Clostridium stand out as lower than the rest (see Table 2) no attempt was made to normalize the presented coincidences by the number of times a species was used in a calculation. Finally, the lower values of coincidence for the C.botulinum lineage, even at the species-level, helps to explain the difficulty in distinguishing that organism from its near-neighbors.
Masses employed and isotopic distribution
Determination of RNA compositions is challenging because of the small mass difference between U and C. (The repeating G, A, U, and C monomer masses for RNA are respectively: 345.2, 329.2, 306.2, and 305.2 with pair wise differences 16, 39, 40, 23, 24, and 1 Da). In preliminary experimental work, we have routinely incorporated amino-allyl uridine residues as a 100% substitute for natural U in RNA transcripts thereby increasing the 1 Da difference between U and C to 55 Da. Under 100% amino-allyl U (aaU) modification all the fragments are separated by at least ~8 Da. This 8 Da difference is not attributable to a single monomer difference in composition but rather occurs when a C and one aaU residue in one fragment are replaced by two As in a second fragment of otherwise identical composition, e.g. fragment 1, AA UU CG = 2048.4 Da, and fragment 2, CU UU CG = 2056.4 Da. Thus, incorporation of amino-allyl uridine will provide a significant increase in the resolving power of the mass spectrometric approach.
Those familiar with mass spectrometry of large biomolecules will also appreciate that isotopic distribution of the molecular masses to be measured is an important consideration. We and others have previously published detailed discussions of these issues [31, 42]. Experimentally, we have found that isotopic distributions do not prevent accurate organism identification, and use of average masses gives very acceptable results. Finally, automated spectral processing algorithms for "centroiding" vastly reduce the complexity of observed spectra by placing an idealized peak with a peak-width of zero at the center-of-mass of each major peak in the spectrum. We have therefore used average monomer masses in calculating all fragment masses, and amino-allyl U in place of natural U. While the spectral coincidence values calculated here would be the same regardless of the mass used for U, experimentally, as discussed above, the use of amino-allyl U in place of natural U facilitates resolution of fragments with only a single U/C difference in composition (and their underlying isotopic distributions).
In all cases, we assumed the RNase T1 digests to be complete. That is, no fragments containing an internal G-residue remained. In practice, significant incomplete digestion due to secondary structure of RNA would be readily recognizable because there would be many more large fragments than expected. Such incomplete digestion was never a problem in the original rRNA digest cataloging studies [21–25] and heavy masses that would be symptomatic of incomplete digestion have not been encountered in our experimental studies.
Although approximately 1% of bases in the naturally occurring 16S rRNA are known to be post-transcriptionally modified and therefore have unexpected masses, the possible presence of such modified nucleotides was not considered here, for several reasons. First, it is known that very few modifications occur in the larger RNase fragments (length 9 or more) . Second, the fragments that contain the modifications and the modification itself are typically highly conserved, and it would therefore be straightforward to identify them and make the needed mass adjustment. Finally, because in vitro transcription is likely to be used to generate sufficient 16S rRNA to meet the sensitivity limits of contemporary MALDI instruments, the actual samples will not contain the biological modifications.
Using these methods then, we feel the trees presented are reasonable representations of the theoretical limit of resolution of the method when all expected masses are observed and no unexpected masses are observed, and the nearest two masses in any observed spectrum are well resolved.
Use of coincidence function with real spectra
The above discussion points out perhaps the most useful aspect of the coincidence function and trees presented here. The inner-product-based metric can be used to compare peak lists from real spectra to large predicted databases of fragments (and we have now routinely used this to rapidly identify bacteria in less than 40 minutes beyond the initial PCR step). Von Wintzingerode et al. described comparison of base-specific fragmentation patterns derived from Bordetella species against the patterns predicted by virtual fragmentation of 50 published 16S rDNA sequences, including 13 sequences which were known to be closely related . Discriminating masses (those non-degenerate between the strains under consideration) were compared and strains were typed by inspection. Lefmann, et al. used similar methods to rank the identification of mycobacteria . Using the coincidence function described herein with the following tolerance parameter
we have successfully identified bacteria in the laboratory by comparison of acquired fragment patterns to mass catalogs corresponding to over 47,000 rDNA sequences in a rapid, automated, and quantitative fashion. In practice, for mass calibration, an internal fragment common to all reactions is generated from the reverse compliment of the antisense primer. (This mass also serves as a confirmation that the RNA transcription was full-length). In general, such a single point calibration results in sufficient mass accuracy that when the tol parameter is set to 1.0 Da or less, correct organism identification is still obtained.
By repeated acquisition of spectra from known model organisms, it should be possible to relate the fidelity of real spectra (that is the number of expected and unexpected peaks observed) to the branch lengths calculated by distance-based clustering methods such as neighbor-joining. Low fidelity spectra will obviously limit organism-resolution, and branches beyond a certain threshold distance would effectively be collapsed. Finally, the results of such cluster analyses should help us to develop faster reporting of organism identification when comparing observed spectra to predicted mass catalogs. Comparison of observed mass spectra to subsamples of mass catalogs from smaller, mass-based organism clusters of the database will speed our identification routines.
Cluster analysis based on mass spectrometric observables is an extremely useful tool for examining the extent to which specificity is preserved in rapid, mass-based assays in comparison with results obtained using complete sequences. The analysis also facilitates evaluation of the relative utility of alternative choices of the region to be amplified. In contrast to, for example, pair wise lists of organisms which have matching or closely matching mass spectra, the trees presented here provide a quick means for visually assessing the resolution achieved using a particular sequence region and cleavage after a particular base. These trees indicate that complete cleavage after just one base using primer pairs that produce 20–30 informative fragments should provide at least genus-level resolution for most bacteria. This resolution can be readily improved by analysis of a second 16S rRNA sequence region, transcription and fragmentation of the antisense strand, and/or cleavage after an alternative base. (A number of methods have been described for generating RNA transcripts which are mass-modified and RNA may be cleaved mono-specifically after bases other than G [26, 28, 44]). For the purposes of developing a broad-based "sentinel" bacterial assay, this level of resolution may be acceptable, especially if a result is provided rapidly. In situations, for example, in which an enriched unknown substance is presented, whether the rapid analysis indicates Bacillus anthracis, B. cereus, or several other near phylogenetic neighbors may be irrelevant to the near-term prophylactic steps to be taken if an assay is sufficiently rapid. In the case of clinical diagnostics and response, many antibiotics have broad organism activity, so a rapid, genus-level diagnostic test will often be of higher resolution than actually needed. Finally, the results of such cluster analyses should help us to develop faster reporting of organism identification by serving as a template for database segmentation when comparing observed spectra to large numbers of predicted mass catalogs.
All 16S rDNA sequences used in this study were downloaded in FASTA format from the Ribosomal Database Project, release 9.32. All downloaded sequences were 1200 nt or longer with all alignment gaps removed. For many initial sequence manipulations of FASTA files, the freely available program BioEdit was used . Multiple sequence alignments were performed using the CLUSTALW algorithm  within BioEdit using the default gap opening and gap extension penalties. The number of pair wise distances for n organisms is n(n-1)/2. For groups containing large numbers of organisms, such as the phylum Firmicutes, with over 20,000 sequences in the RDP, this would lead to over 200 million pair wise distances. Average coincidences were therefore calculated by taking random samples of 200 sequences. For many other groups of sequences, such as the explicitly named Category A, B, or C pathogens, only several hundred sequences were typically involved. In these cases, the entire sequence set was used for coincidence analysis or tree generation. For the generation of Figure 2, following multiple sequence alignments, positional entropies were calculated as described within BioEdit [39, 40] using default parameters. Briefly, entropy in BioEdit gives a measure of uncertainty at each position relative to other positions and is calculated as H(l) = -∑f(b, l) ln(f(b, l)) where f(b, l) is the frequency at which residue b is found at position l. The resulting entropy values at each position in the alignment of a particular phylogenetic grouping of sequences were then averaged as a single measure of the variability of the sequence region in the group.
The freely available program, Molecular Evolutionary Genetics Analysis or "MEGA" v3.1  was used to generate all neighbor-joining (NJ) trees presented. NJ trees were generated using the default parameters and the MEGA 3.1 implementation of the NJ algorithm [32, 47].
A suite of custom Linux shell scripts, C programs, and Matlab/Octave functions were written under Linux 2.2.13 and used to create and analyze mass fragment databases. Complete or partial unaligned sequences were downloaded in FASTA format and split into separate files with unique identifier tags which were cross referenced in an index. Each sequence file consisted of a single string composed of A's, G's, C's, and/or T's and was searched for the existence of two (forward and reverse) primer strings. For sequences which contained both strings, new strings representing the inter-primer regions (including the primer sequences themselves) were placed along with the relative position of the first primer into separate files which were also cataloged by the unique identifiers. Next, these "amplicons" were subject to base-specific cleavage after every G residue, and the resulting sequence fragments were used to compute a list of masses corresponding to those fragments. Finally, based on previous determination of the minimum informative length for a given fragment , the mass lists were "high-pass" filtered for fragments which were 6-mers or longer, and the resultant lists were cataloged in separate files using their unique identifier tags.
In order to perform quantitative "distance" analysis based on mass-fragment spectra, the mass lists generated above were concatenated into a single file whose first column contained integers representing the origin of the masses and whose second column contained the corresponding lists of masses. Octave (version 2.0.17) could then be used to read this file and generate the square matrix representing the coincidence (using an implementation of the coincidence function introduced previously) between each pair of mass spectra in the file. The resulting matrix was manipulated into a vector which contained the corresponding distances (distance ≡ 1 – coincidence) and which conformed to the structure required for the MEGA 3.1 input file. Additional Linux shell scripts then created a MEGA-compatible file by generating the appropriate header information, relating the unique identifiers back to organism names, and attaching the distance output list.
From start to finish, the entire process can be accomplished automatically by specifying the name of a FASTA formatted file containing sequence data, the desired forward and reverse primers for amplification, and the desired nucleotide(s) for cleavage. The amplification process includes the ability to accommodate arbitrary degrees of primer degeneracy (for example by specifying an 'N' in one of the primer strings), and the mass calculation can be tailored to include mass modified nucleotides, for example, aminoallyl-U for uridine. For an input file containing 220 bacterial 16S sequences (~2,000 bp each), the entire process took just under 3 minutes on an older 1 GHz AMD Athlon (32-bit) personal computer with 512 MB of RAM. The same analysis of 403 complete viral genomes (~10,000 bp each) took approximately 6 minutes.
The work reported here was supported in part by grants from NASA (grant NNJ04HF43G) and the Welch Foundation to GEF (E1451) and RCW (E1264) and the Institute of Space Systems Operations to GEF, as well as a NASA Small Business Innovation and Research Grant, NNM06AA44C to GWJ (BioTex, Inc). GWJ would also like to thank Dr. Dan Graur and Eran Elhaik for consultation on the various methods described here.
- Ghoshal U, Prasad KN, Singh M, Tiwari DP, Ayyagari A: A comparative evaluation of phenotypic and molecular methods for the detection of oxacillin resistance in coagulase-negative staphylococci. J Infect Chemother 2004, 10(2):86–89. 10.1007/s10156-003-0293-zView ArticlePubMedGoogle Scholar
- Ellepola AN, Hurst SF, Elie CM, Morrison CJ: Rapid and unequivocal differentiation of Candida dubliniensis from other Candida species using species-specific DNA probes: comparison with phenotypic identification methods. Oral Microbiol Immunol 2003, 18(6):379–388. 10.1046/j.0902-0055.2003.00103.xView ArticlePubMedGoogle Scholar
- Willis RC: Improved molecular techniques help researchers diagnose microbial conditions. Modern Drug Discovery 2004.Google Scholar
- Cannone JJ, Subramanian S, Schnare MN, Collett JR, D'Souza LM, Du Y, Feng B, Lin N, Madabusi LV, Muller KM, Pande N, Shang Z, Yu N, Gutell RR: The Comparative RNA Web (CRW) Site: An Online Database of Comparative Sequence and Structure Information for Ribosomal, Intron, and other RNAs. BioMed Central Bioinformatics 2002., 3 (2 [Correction: BioMed Central Bioinformatics. 3:15.]):Google Scholar
- Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 2005, 33(Database issue):D294–6. 10.1093/nar/gki038PubMed CentralView ArticlePubMedGoogle Scholar
- Amann R, Fuchs BM, Behrens S: The identification of microorganisms by fluorescence in situ hybridization. Curr Opin Biotechnol 2001, 12: 231–236. 10.1016/S0958-1669(00)00204-4View ArticlePubMedGoogle Scholar
- DeLong EF, Wickham GS, Pace NR: Phylogenetic stains: ribosomal RNA-based probes for the identification of single microbial cells. Science 1989., 243(1360–3):Google Scholar
- Wagner M, Horn M, Daims H: Fluorescence in situ hybridisation for the identification and characterisation of prokaryotes. Current Opinion in Microbiology 2003, 6: 302–309. 10.1016/S1369-5274(03)00054-7View ArticlePubMedGoogle Scholar
- Busti E, Bordoni R, Castiglioni B, Monciardini P, Sosio M, Donadio S, Consolandi C, Rossi Bernardi L, Battaglia C, De Bellis G: Bacterial discrimination by means of a universal array approach mediated by LDR (ligase detection reaction). BMC Microbiol 2002, 2: 27. 10.1186/1471-2180-2-27PubMed CentralView ArticlePubMedGoogle Scholar
- Uchida T, Bonen L, Schaup HW, Lewis BJ, Zablen L, Woese C: The use of ribonuclease U2 in RNA sequence determination. Some corrections in the catalog of oligomers produced by ribonuclease T1 digestion of Escherichia coli 16S ribosomal RNA. J Mol Evol 1974, 3(1):63–77. 10.1007/BF01795977View ArticlePubMedGoogle Scholar
- Chandler DP, Newton GJ, Small JA, Daly DS: Sequence versus Structure for the Direct Detection of 16S rRNA on Planar Oligonucleotide Microarrays. Appl Environ Microbiol 2003., 69(2950–8):Google Scholar
- Applied Biosystems I: MicroSeq 16S rRNA Gene Kit.Google Scholar
- Applied Biosystems I: MicroSeq Full Gene 16S rDNA Bacterial Identification Kit.Google Scholar
- Kwon Y, Tang K, Cantor C, Koster H, Kang C: DNA sequencing and genotyping by transcriptional synthesis of chain-terminated RNA ladders and MALDI-TOF mass spectrometry. Nucleic Acids Res 2001, 29(3):E11. 10.1093/nar/29.3.e11PubMed CentralView ArticlePubMedGoogle Scholar
- Roskey MT, Juhasz P, Smirnov IP, Takach EJ, Martin SA, Haff LA: DNA sequencing by delayed extraction-matrix-assisted laser desorption/ionization time of flight mass spectrometry. Proc Natl Acad Sci U S A 1996, 93(10):4724–4729. 10.1073/pnas.93.10.4724PubMed CentralView ArticlePubMedGoogle Scholar
- Spottke B, Gross J, Galla HJ, Hillenkamp F: Reverse Sanger sequencing of RNA by MALDI-TOF mass spectrometry after solid phase purification. Nucleic Acids Res 2004, 32(12):e97. 10.1093/nar/gnh089PubMed CentralView ArticlePubMedGoogle Scholar
- Koster: DNA diagnostic (sic) based on mass spectrometry. United States patent 5,605,798 and continuations 1997.Google Scholar
- Koster: DNA diagnostics based on mass spectrometry. United States patent 6,043,031 and continuations 2000.Google Scholar
- Ecker DJ, Sampath R, Blyn LB, Eshoo MW, Ivy C, Ecker JA, Libby B, Samant V, Sannes-Lowery KA, Melton RE, Russell K, Freed N, Barrozo C, Wu J, Rudnick K, Desai A, Moradi E, Knize DJ, Robbins DW, Hannis JC, Harrell PM, Massire C, Hall TA, Jiang Y, Ranken R, Drader JJ, White N, McNeil JA, Crooke ST, Hofstadler SA: Rapid identification and strain-typing of respiratory pathogens for epidemic surveillance. Proc Natl Acad Sci U S A 2005, 102(22):8012–8017. 10.1073/pnas.0409920102PubMed CentralView ArticlePubMedGoogle Scholar
- Hofstadler SA, Sampath R, Blyn LB, Eshoo MW, Hall TA, Jiang Y, Drader JJ, Hannis JC, Sannes-Lowery KA, Cummins LL: TIGER: the universal biosensor. International Journal of Mass Spectrometry 2005, 242(1):23–41. 10.1016/j.ijms.2004.09.014View ArticleGoogle Scholar
- Fox GE, Pechman KR, Woese CR: Comparative cataloguing of 16S ribosomal RNA: molecular approach to prokaryotic systematics. International Journal of Systemic Bacteriology 1977, 27: 44–57.View ArticleGoogle Scholar
- Fox GE, Stackebrandt E, Hespell RB, Gibson J, Maniloff J, Dyer TA, Wolfe RS, Balch WE, Tanner RS, Magrum LJ, Zablen LB, Blakemore R, Gupta R, Bonen L, Lewis BJ, Stahl DA, Luehrsen KR, Chen KN, Woese CR: The phylogeny of prokaryotes. Science 1980, 209(4455):457–463.View ArticlePubMedGoogle Scholar
- Woese CR, Fox GE: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci U S A 1977, 74(11):5088–5090. 10.1073/pnas.74.11.5088PubMed CentralView ArticlePubMedGoogle Scholar
- Woese CR, Fox GE, Zablen L, Uchida T, Bonen L, Pechman K, Lewis BJ, Stahl D: Conservation of primary structure in 16S ribosomal RNA. Nature 1975, 254(5495):83–86. 10.1038/254083a0View ArticlePubMedGoogle Scholar
- Woese CR, Stackebrandt E, Macke TJ, Fox GE: A phylogenetic definition of the major eubacterial taxa. Syst Appl Microbiol 1985, 6: 143–151.View ArticlePubMedGoogle Scholar
- Hahner S, Ludemann HC, Kirpekar F, Nordhoff E, Roepstorff P, Galla HJ, Hillenkamp F: Matrix-assisted laser desorption/ionization mass spectrometry (MALDI) of endonuclease digests of RNA. Nucleic Acids Res 1997, 25(10):1957–1964. 10.1093/nar/25.10.1957PubMed CentralView ArticlePubMedGoogle Scholar
- von Wintzingerode F, Bocker S, Schlotelburg C, Chiu NH, Storm N, Jurinke C, Cantor CR, Gobel UB, van den Boom D: Base-specific fragmentation of amplified 16S rRNA genes analyzed by mass spectrometry: a tool for rapid bacterial identification. Proc Natl Acad Sci U S A 2002, 99(10):7039–7044. 10.1073/pnas.102165899PubMed CentralView ArticlePubMedGoogle Scholar
- Krebs S, Medugorac I, Seichter D, Forster M: RNaseCut: a MALDI mass spectrometry-based method for SNP discovery. Nucleic Acids Res 2003, 31(7):e37. 10.1093/nar/gng037PubMed CentralView ArticlePubMedGoogle Scholar
- Hartmer R, Storm N, Boecker S, Rodi CP, Hillenkamp F, Jurinke C, van den Boom D: RNase T1 mediated base-specific cleavage and MALDI-TOF MS for high-throughput comparative sequence analysis. Nucleic Acids Res 2003, 31(9):e47. 10.1093/nar/gng047PubMed CentralView ArticlePubMedGoogle Scholar
- Lefmann M, Honisch C, Bocker S, Storm N, von Wintzingerode F, Schlotelburg C, Moter A, van den Boom D, Gobel UB: Novel mass spectrometry-based tool for genotypic identification of mycobacteria. J Clin Microbiol 2004, 42(1):339–346. 10.1128/JCM.42.1.339-346.2004PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Z, Jackson GW, Fox GE, Willson RC: Microbial Identification by Mass Cataloging. BMC Bioinformatics 2006., 7(117):Google Scholar
- Saitou N, Nei M: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 1987, 4: 406–425.PubMedGoogle Scholar
- Weisburg WG, Barns SM, Pelletier DA, Lane DJ: 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol 1991, 173(2):697–703.PubMed CentralPubMedGoogle Scholar
- Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR: Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci U S A 1985, 82(20):6955–6959. 10.1073/pnas.82.20.6955PubMed CentralView ArticlePubMedGoogle Scholar
- NIAID Category A, B, & C Priority Pathogens[http://www3.niaid.nih.gov/biodefense/bandc_priority.htm]
- NIAID biodefense research agenda for CDC category A agents.[http://www3.niaid.nih.gov/biodefense/research/biotresearchagenda.pdf]
- NIAID strategic plan for biodefense research. NIH publication No 03–5306 2002.Google Scholar
- Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Briefings in Bioinformatics 2004, 5: 150–163. 10.1093/bib/5.2.150View ArticlePubMedGoogle Scholar
- Pierce JR: An Introduction to Information Theory: Symbols, Signals and Noise. 2nd edition. New York, Dover Publications, Inc.; 1980.Google Scholar
- Schneider TD, Stephens RM: Sequence Logos: A new Way to Display Consensus Sequences. Nucleic Acids Res 1990, 18: 6097–6100.PubMed CentralView ArticlePubMedGoogle Scholar
- Santos SR, Ochman H: Identification and phylogenetic sorting of bacterial lineages with universally conserved genes and proteins. Environmental Microbiology 2004, 6(7):754–759. 10.1111/j.1462-2920.2004.00617.xView ArticlePubMedGoogle Scholar
- Koomen JM, Russell WK, Tichy SE, Russell DH: Accurate mass measurement of DNA oligonucleotide ions using high-resolution time-of-flight mass spectrometry. J Mass Spectrom 2002, 37(4):357–371. 10.1002/jms.312View ArticlePubMedGoogle Scholar
- Rozenski J, Crain PF, McCloskey JA: The RNA Modification Database: 1999 update. Nucleic Acids Res 1999, 27(1):196–197. 10.1093/nar/27.1.196PubMed CentralView ArticlePubMedGoogle Scholar
- Stanssens P, Zabeau M, Meersseman G, Remes G, Gansemans Y, Storm N, Hartmer R, Honisch C, Rodi CP, Bocker S, van den Boom D: High-throughput MALDI-TOF discovery of genomic sequence polymorphisms. Genome Res 2004, 14(1):126–133. 10.1101/gr.1692304PubMed CentralView ArticlePubMedGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 1999, 41: 95–98.Google Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673–4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Nei M, Kumar S: Molecular Evolution and Phylogenetics. New York, Oxford University Press; 2000.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.