Mining locus tags in PubMed Central to improve microbial gene annotation
© Stubben and Challacombe; licensee BioMed Central Ltd. 2014
Received: 6 June 2013
Accepted: 18 January 2014
Published: 5 February 2014
The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed an R package called pmcXML to automatically mine and extract locus tags from full text, tables and supplements.
We mined locus tags from 1835 OA publications in ten microbial genomes and extracted tags mentioned in 30,891 sentences in main text and 20,489 rows in tables. We identified locus tag pairs marking the start and end of a region such as an operon or genomic island and expanded these ranges to add another 13,043 tags. We also searched for locus tags in supplementary tables and publications outside the OA subset in Burkholderia pseudomallei K96243 for comparison. There were 168 publications containing 48,470 locus tags and 83% of mentions were from supplementary materials and 9% from publications outside the OA subset.
B. pseudomallei locus tags within the full text and tables of OA publications represent only a small fraction of the total mentions in the literature. For microbial genomes with very few functionally characterized proteins, the locus tags mentioned in supplementary tables and within ranges like genomic islands contain the majority of locus tags. Significantly, the functions in the R package provide access to additional resources in the OA subset that are not currently indexed or returned by searching PMC.
The rapid growth of next generation sequencing and transcriptomic studies, particularly on the causative agents of infectious diseases, requires accurate genome annotations to confidently analyze the sequencing data and identify and compare functions, pathways and networks. There are many resources available for genome annotation and most rely on transferring annotations from model organism or protein family databases that vary greatly in content and quality . For microbial genomes, there are very few model organism databases containing manual annotations based on experimental evidence in the current literature. Therefore, when microbial genomes are reannotated or new gene functions are identified by subsequent experiments, the new updates are rarely incorporated into public sequence databases.
Since the manual annotation of genomes using controlled vocabularies and evidence codes is a time-consuming task , text mining solutions that link evidence in the literature to annotations in genome databases are needed [3, 4]. One recent example is text2genome, which extracts DNA sequences from PubMed Central (PMC) and maps them to model organism databases . Significantly, this study was the first to mine text in supplementary files in the Open Access (OA) subset. The authors found DNA sequences in 20% of the OA articles and then requested permission to mine the full text from over 40 publisher websites (their progress and efforts over the last three years are documented on the UCSC Genocoding website at http://text.soe.ucsc.edu). A related project called pubmed2ensembl links millions of articles to thousands of genes from 50 eukaryotic species using six data sources containing gene to literature links .
Many other projects have shown that text mining improves the links between literature and biological databases such as the Protein Data Bank and Gene Expression Omnibus  or UniProt and the European Nucleotide Archive . In this latter study, the authors noted the existence of accession number ranges but did not attempt to expand or quantify the regions. Many other tools have been developed to extract information from biological texts and are reviewed in [9–12]. Most of the text mining applications discussed in these reviews focus on innovative efforts to extract genes, functions and interactions from model eukaryotic organisms.
For microbial genomes with very few functionally characterized proteins, locus tags are often associated with structural and functional annotations in the literature. Structural annotations may include revised gene starts, novel genes or mobile regions based on either computational or experimental evidence. Functional annotations may include the assignment of new definitions, gene names and functions. Therefore, a typical role filled by model organism databases is to update annotations by linking genes to experimental evidence in the literature, and text mining tools are often used to assist in the process of manual curation . Another option for curators is to use a full text database like PMC to search for articles citing a specific gene or locus tag in the full text. However, finding the locus tag within the article requires searching through the entire text and linked tables. To facilitate these types of automated searches, we developed an R package to mine locus tags from text, tables and supplements in the OA subset.
We demonstrate the capabilities of these tools by mining locus tags from ten microbial genomes. Our main objectives are to (1) improve access to structured data in order to extract locus tags from rows that are linked to column names, captions and subheadings, (2) identify locus tags pairs marking the start and end of a region and then list genes mentioned indirectly within the range, and (3) search for all locus tags in supplementary tables and publications outside the OA subset in Burkholderia pseudomallei K96243 for comparison. This comprehensive set of locus tags in B. pseudomallei is used to highlight deficiencies in current annotations and suggest future microbial gene mining efforts.
Searching for a single locus tag in PMC is straightforward, for example, enter “Rv3874” in the search box and this will return 135 articles (accessed Nov 5, 2013). These full text articles are part of two groups in PMC, an Open Access subset that are available for text mining and another set that are merely free to read. The OA subset are available as XML files that can be downloaded using automated queries to either the FTP site or Open Archives Initiative service and articles may be searched by adding the Open Access filter (Rv3874 AND open access[FILTER] returns 47 results).
Reference genome codes, strains and locus tag prefixes used for searching PubMed Central
Burkholderia pseudomallei K96243
Campylobacter jejuni subsp. jejuni NCTC 11168
Chlamydia trachomatis D/UW-3/CX
Francisella tularensis subsp. tularensis SCHU S4
Helicobacter pylori 26695
Listeria monocytogenes EGD-e
Mycobacterium tuberculosis H37Rv
Pseudomonas aeruginosa PAO1
Vibrio cholerae O1 biovar El Tor str. N16961
Yersinia pestis CO92
We used the locus tag prefix and first digit from the GFF3 file to build wildcard searches and find PMC articles with a matching locus tag (Additional file 1: Table S1). We also restricted the number of spurious matches by limiting the results to articles with the genus name in the title or abstract. For example, this query was used for Yersinia pestis CO92: (YPO0* OR YPO1* OR YPO2* OR YPO3* OR YPO4*) AND (Yersinia [ABSTRACT] OR Yersinia[TITLE]) AND open access[FILTER]. In some cases, the results returned a warning that the wildcard search used only the first 600 variations and therefore we lengthened the root word to include two digits.
For each publication, we passed the PMC id to the Open Archives Initiative service and downloaded the XML version of the full text article. We parsed the XML into text by splitting the document into main sections, and each section was further divided into complete sentences. We parsed XML tables using rowspan and colspan attributes to correctly position and repeat cell values and then joined column names and cell values into a single delimited list to preserve the table structure in a single row. We extracted tags from both main text and tables by matching a prefix followed by four digits (or three digits in Chlamydia) and optional suffixes. We then expanded locus tag pairs marking the start and end of a region such as an operon or genomic island using the ordered list of tags in the GFF3 file. We saved the PMC id, locus tag, section title or table caption, full sentence or table row, and a flag indicating if the tag was mentioned indirectly within a range. We manually checked all ranges with ten or more locus tags to ensure valid range expansions. Finally, we searched for additional locus tags in B. pseudomallei from supplementary tables and from full text articles outside the OA subset.
A complete description of the R functions listed in Figure 1 is available in the supplementary text in Additional file 2. The R code is also available on GitHub (https://github.com/cstubben/pmcXML) for further community development.
Locus tags in reference genomes
We searched for articles from ten microbial genomes (Table 1) with locus tag mentions by using the locus tag prefix as a wildcard pattern and found 3011 total articles in PubMed and 9282 articles in PMC, ranging from 123 articles in Burkholderia pseudomallei to 3569 publications in Pseudomonas aeruginosa (Additional file 1: Table S2). In order to find the most relevant articles in PMC, we also matched the genus name in the title or abstract and limited the results to the OA subset, which ranged from 35 articles in Yersinia pestis to 693 in Mycobacterium tuberculosis.
Total number of open access articles with locus tag mentions by source
We identified 30,891 locus tags in main text and 20,489 tags in tables (Table 2). We expanded locus tag ranges and identified another 13,043 tags mentioned indirectly within a range. The complete list of all 1446 publications and 64,423 tag mentions are available by genome in Additional file 3: Table S3-S13. The majority of tags in B. pseudomallei were part of ranges (59%), while the number of tags mentioned indirectly within other genomes included 5% from Francisella, 8% from Helicobacter, 14% from Chlamydia and Listeria, and 20-23% from the other five genomes.
We corrected 21 matches to locus tag pairs that were not part of a valid region (Additional file 4: Table S14). Most matches were to interaction pairs such as Rv2158c-Rv0631c from PMC2649132: “Some edges in the SOS response (e.g. Rv2158c-Rv0631c) were common to paths from cell wall proteins and gyrase”. Other matches included network paths, primer names, comparisons and ranges spanning the origin of replication such as Rv3913-Rv0017c. In this case, the parser returned 4082 tags between Rv0017c and Rv3913 instead of the 25 tags between Rv3913 and Rv0017c. We also corrected six large range expansions that were the result of typographical errors in the published articles. We noted a few cases where ranges should be expanded but the current parser did not detect them automatically, for example, some tables list the start and end of a region in different columns in a table.
Total number of unique locus tags in RefSeq, PMC and in both databases and the source of the unique locus tag
Unique locus tags in
Unique tags mentioned in
Locus tags in Burkholderia pseudomallei
In order to better estimate the fraction of locus tags indexed by OA publications, we also checked supplementary tables and other publications in B. pseudomallei (Additional file 5: Table S15-S19). There were 53 Open Access articles in PMC containing 1514 direct mentions and 2161 tags within ranges (3675 total). There were another 53 free articles in PMC with 1514 direct mentions and 1304 tags within ranges (2818 total). There were 16 articles in PMC matching the locus tag and the genus name Burkholderia anywhere in the full text. These articles included very few mentions as expected (52 total) and over half the tags were from two tables listing type VI secretion system homologs in B. pseudomallei. We also identified seven PMC articles not found in the search results, including five with tags in supplementary tables only. For example, the study by Schell et al.  lists 653 virulence genes in a zipped document file in the supplement, so these virulence genes are not even available using web searches.
The 63 supplementary tables contained the majority of all locus tags (83%) and included 40,122 total mentions from 30 publications. The supplements included 21 Word tables, 19 PDF tables, 14 Excel files, four HTML tables, three zipped files and two GenBank files (Additional file 6: Table S20). The pmcSupp function in the R package was used to read all file types directly into data frames in R, except for PDF tables that were loaded as a vector of text and required additional code to reformat the table structure. We included the GenBank files from the genome reannotation  since these 6263 locus tags included operon groups, novel proteins and revised start coordinates for 1579 proteins.
Finally, we searched PubMed for any articles not included in PMC. We identified 14 PubMed articles matching the B. pseudomallei locus tag prefix in the abstract. Nine of these articles have the full text available from the publisher and we extracted 390 mentions. We found another 25 PubMed articles containing 1382 total mentions from our own reference collection, although there are likely many other publications in this group that have not been identified. Overall, we retrieved 168 total articles and extracted 48,470 total mentions (Additional file 5: Table S15-S19).
Total number of articles citing a B. pseudomallei locus tag and the number of times each tag was mentioned directly within the text or indirectly within a range
Molecular chaperone GroEL
Membrane-anchored cell surface protein
Cell invasion protein
Heat shock protein 20
ATP/GTP binding protein
Type III secretion system protein
Outer membrane protein a
Lipopolysaccharide biosynthesis protein
Cell invasion protein
AraC family transcriptional regulator
Surface presentation of antigens protein
Type III secretion system protein
Since the number of OA publications is increasing rapidly (Figure 2), tools that automatically link gene identifiers to recent articles in full text databases could improve microbial gene annotation in many ways. Within the document, the locus tags could be highlighted and linked to protein databases. Within a protein database, the links to publications containing the locus tag and the specific sentence or table row could be provided, along with the context of the mention such as a section title or table caption (see Additional file 3: Table S3-S13 and Additional file 5: Table S15-S19 for all mentions containing locus tags). The mentions could also be viewed as tracks in genome browsers or processed further to summarize structural and functional annotations.
Structural annotations include revised gene starts, novel genes, doubtful coding regions and mobile regions. In B. pseudomallei, over half of RefSeq genes have alternate starts in the Genemark, Glimmer or Prodigal predictions available in the same genomes FTP directory at NCBI. Therefore, finding verified start coordinates in the primary literature based on either experimental or computational evidence would be very useful. Prodigal has been recommended for Burkholderia genomes due to their high GC content, and Dunbar et al.  includes gene start revisions for 994 inconsistent ortholog sets. However, the locus tags and coordinates reported for B. pseudomallei are from strain 1710b. Since many of these protein sequences have 100% similarity to the corresponding protein sequences of K96243, locus tags in closely related strains would be another valuable resource to improve annotations. In addition, the B. pseudomallei genome reannotation by Nandi et al.  included 1579 RefSeq proteins with new start coordinates.
Other useful sources of structural annotations in the primary literature include novel genes and doubtful coding regions. The reannotation by Nandi et al. identified 283 novel genes and 120 doubtful CDSs in the supplementary tables . One of the novel genes included BPSL1057F1 and the protein reportedly increased actin stress fiber formation in transfected cells. Since these novel genes are only found in the literature, they are often missed by tagging systems based solely on dictionary lookups. We extracted locus tags based on pattern searches, which returned 731 additional tags not found in the RefSeq GFF3 files (Table 3).
Functional annotations include gene names, definitions and less often terms from controlled vocabularies describing functions and other characteristics. Protein definitions and gene names are critical for comparative analyses since they are the most commonly used source of information transfer . Clearly, public sequence databases and annotation service providers have failed to keep up with the increasing number of publications, and as illustrated in Table 4, many commonly cited locus tags are still listed as hypothetical proteins. For example, BPSS1492 is mentioned in 22 different publications and was first identified as a Burkholderia intracellular motility A protein (BimA) from Stevens et al. in 2005 . There are also 47 papers in PMC matching BimA and Burkholderia; however, the gene name bimA is not included in any public sequence database for strain K96243 including NCBI, UniProt and Ensembl. This gene name is also missing from annotations provided by IMG , RAST , and specialized databases, such as PATRIC  or Burkholderia.com  as well as the reannotation by Nandi et al. . In 2004, the National Institute of Allergy and Infectious Diseases funded eight Bioinformatic Resource Centers to provide access to pathogen genomes . As part of this effort, curated Burkholderia annotations were available from the now defunct Pathema database at JCVI  and BimA was correctly identified in this database. However, these were not propagated to the other databases.
In this study, we focused only on locus tags. However, there are many other gene identifiers that should be extracted. In fact, many tables and text sources list gene names by default, and only use a locus tag if a gene name was not assigned by RefSeq or other annotation source. For example, this sentence in Bartpho et al.  is typical: “Further confirmation of the presence of some selected virulence genes; FliC, bsaQ, rpoS, BPSL2800, BPSS0120, BPSL1705 and BPSS2053 was also performed”. The gene names fliC, bsaQ, and rpoS correspond to BPSL3319, BPSS154, and BPSL1505 respectively in the RefSeq GFF3 file; therefore, extracting gene names from OA publications will definitely improve microbial genome annotations. In an effort to obtain the most accurate annotations for B. pseudomallei genomes, we are continuing to develop R scripts to extract these gene names.
There are many challenges in extracting gene identifiers from the literature, and some groups like the UCSC Genocoding project are actively trying to mine articles outside the OA subset to expand access to human gene and sequence mentions . At least for microbial genomes, and B. pseudomallei in particular, we believe that freely available supplementary materials and locus tags mentioned indirectly within ranges are important sources for acquiring gene annotations. Other sources including gene names, accession numbers and coordinates should also be collected before proceeding with future efforts to summarize functions, interactions and pathways.
Only 1514 B. pseudomallei locus tags are mentioned directly in the main text and tables of the OA subset and are indexed and available for searching in PMC. This represents 3% of the total number of B. pseudomallei locus tags mentioned in the literature, since most locus tags are available in supplementary tables or within ranges. Both of these are valuable annotation sources and we developed queries and tools in the pmcXML package to improve access to these data sources.
Due to the rapid growth of OA submissions, extracting gene and locus tags from the literature would clearly benefit efforts to improve microbial genome annotation. The next challenge will involve developing the data mining algorithms needed to automatically summarize the gene mentions to identify names and functions of experimentally characterized proteins such as virulence factors and antibiotic resistance genes directly from the literature database. If successful, this would help to convert a full text database into a functional gene annotation database first envisioned by Bourne , and would provide a valuable reference for most microbial genomes that do not have recent or updated annotations available in public sequence databases.
Availability and requirements
Project name: pmcXML
Project home page: https://github.com/cstubben/pmcXML
Operating system(s): Platform independent, however loading supplementary tables requires a number of Unix dependencies (unzip, unoconv, pdftotext) to read zip, Word tables and pdf files
Programming language: R
Other requirements: Package dependencies include stringr and gdata from CRAN, genomes from Bioconductor and genomes2 from GitHub
Any restrictions to use by non-academics: none
This work was supported in part through DTRA Grant CBS119924543-7049-BASIC to JFC. Jian Song provided helpful comments on earlier drafts.
- Klimke W, O'Donovan C, White O, Brister JR, Clark K, Fedorov B, Mizrachi I, Pruitt KD, Tatusova T: Solving the problem: genome annotation standards before the data deluge. Stand Genomic Sci. 2011, 5: 168-193. 10.4056/sigs.2084864.View ArticlePubMed CentralPubMedGoogle Scholar
- Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, St Pierre S, et al: The future of biocuration. Nature. 2008, 455: 47-50. 10.1038/455047a.View ArticlePubMed CentralPubMedGoogle Scholar
- Kersey P, Apweiler R: Linking publication, gene and protein data. Nature Cell Biology. 2006, 8: 1183-1189. 10.1038/ncb1495.View ArticlePubMedGoogle Scholar
- Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, Gannon F, Grivell L, Hahn U, Hersh W, Hirschman L: Text mining for biology-the way forward: opinions from leading scientists. Genome Biol. 2008, 9 (Suppl 2): S7-10.1186/gb-2008-9-s2-s7.View ArticlePubMed CentralPubMedGoogle Scholar
- Haeussler M, Gerner M, Bergman CM: Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics. 2011, 27: 980-986. 10.1093/bioinformatics/btr043.View ArticlePubMed CentralPubMedGoogle Scholar
- Baran J, Gerner M, Haeussler M, Nenadic G, Bergman CM: pubmed2ensembl: a resource for mining the biological literature on genes. PLoS One. 2011, 6: e24716-10.1371/journal.pone.0024716.View ArticlePubMed CentralPubMedGoogle Scholar
- Neveol A, Wilbur WJ, Lu Z: Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE. Database. 2012, 2012: bas026-View ArticlePubMed CentralPubMedGoogle Scholar
- Kafkas S, Kim JH, McEntyre JR: Database citation in full text biomedical articles. PLoS One. 2013, 8: e63184-10.1371/journal.pone.0063184.View ArticlePubMed CentralPubMedGoogle Scholar
- Lu Z: PubMed and beyond: a survey of web tools for searching biomedical literature. Database. 2011, 2011: baq036-View ArticlePubMed CentralPubMedGoogle Scholar
- Manconi A, Vargiu E, Armano G, Milanesi L: Literature retrieval and mining in bioinformatics: state of the art and challenges. Advances in Bioinformatics. 2012, 2012: 1-10.View ArticleGoogle Scholar
- Rebholz-Schuhmann D, Oellrich A, Hoehndorf R: Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet. 2012, 13: 829-839. 10.1038/nrg3337.View ArticlePubMedGoogle Scholar
- Krallinger M, Leitner F, Vazquez M, Salgado D, Marcelle C, Tyers M, Valencia A, Chatr-Aryamontri A: How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience. Database. 2012, 2012: bas017-View ArticlePubMed CentralPubMedGoogle Scholar
- Schell MA, Lipscomb L, DeShazer D: Comparative genomics and an insect model rapidly identify novel virulence genes of Burkholderia mallei. J Bacteriol. 2008, 190: 2306-2313. 10.1128/JB.01735-07.View ArticlePubMed CentralPubMedGoogle Scholar
- Nandi T, Ong C, Singh AP, Boddey J, Atkins T, Sarkar-Tyson M, Essex-Lopresti AE, Chua HH, Pearson T, Kreisberg JF, et al: A genomic survey of positive selection in Burkholderia pseudomallei provides insights into the evolution of accidental virulence. PLoS Path. 2010, 6: e1000845-10.1371/journal.ppat.1000845.View ArticleGoogle Scholar
- Stevens MP, Stevens JM, Jeng RL, Taylor LA, Wood MW, Hawes P, Monaghan P, Welch MD, Galyov EE: Identification of a bacterial factor required for actin-based motility of Burkholderia pseudomallei. Mol Microbiol. 2005, 56: 40-53. 10.1111/j.1365-2958.2004.04528.x.View ArticlePubMedGoogle Scholar
- Cruz-Migoni A, Hautbergue GM, Artymiuk PJ, Baker PJ, Bokori-Brown M, Chang CT, Dickman MJ, Essex-Lopresti A, Harding SV, Mahadi NM, et al: A Burkholderia pseudomallei toxin inhibits helicase activity of translation factor eIF4A. Science. 2011, 334: 821-824. 10.1126/science.1211915.View ArticlePubMedGoogle Scholar
- Balder R, Lipski S, Lazarus JJ, Grose W, Wooten RM, Hogan RJ, Woods DE, Lafontaine ER: Identification of Burkholderia mallei and Burkholderia pseudomallei adhesins for human respiratory epithelial cells. BMC Microbiol. 2010, 10: 250-10.1186/1471-2180-10-250.View ArticlePubMed CentralPubMedGoogle Scholar
- Edwards TE, Phan I, Abendroth J, Dieterich SH, Masoudi A, Guo W, Hewitt SN, Kelley A, Leibly D, Brittnacher MJ: Structure of a Burkholderia pseudomallei trimeric autotransporter adhesin head. PLoS One. 2010, 5: e12803-10.1371/journal.pone.0012803.View ArticlePubMed CentralPubMedGoogle Scholar
- Lazar Adler N, Stevens J, Stevens M, Galyov E: Autotransporters and their role in the virulence of Burkholderia pseudomallei and Burkholderia mallei. Front Microbiol. 2011, 2: 151-PubMedGoogle Scholar
- Jubelin G, Chavez CV, Taieb F, Banfield MJ, Samba-Louaka A, Nobe R, Nougayrede JP, Zumbihl R, Givaudan A, Escoubas JM, et al: Cycle inhibiting factors (CIFs) are a growing family of functional cyclomodulins present in invertebrate and mammal bacterial pathogens. PLoS One. 2009, 4: e4855-10.1371/journal.pone.0004855.View ArticlePubMed CentralPubMedGoogle Scholar
- Schell MA, Ulrich RL, Ribot WJ, Brueggemann EE, Hines HB, Chen D, Lipscomb L, Kim HS, Mrazek J, Nierman WC, et al: Type VI secretion is a major virulence determinant in Burkholderia mallei. Mol Microbiol. 2007, 64: 1466-1485. 10.1111/j.1365-2958.2007.05734.x.View ArticlePubMedGoogle Scholar
- Shalom G, Shaw JG, Thomas MS: In vivo expression technology identifies a type VI secretion system locus in Burkholderia pseudomallei that is induced upon invasion of macrophages. Microbiology. 2007, 153: 2689-2699. 10.1099/mic.0.2007/006585-0.View ArticlePubMedGoogle Scholar
- Dunbar J, Cohn JD, Wall ME: Consistency of gene starts among Burkholderia genomes. BMC Genomics. 2011, 12: 125-10.1186/1471-2164-12-125.View ArticlePubMed CentralPubMedGoogle Scholar
- Markowitz VM, Chen IMA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K, et al: The Integrated Microbial Genomes (IMG) system: an expanding comparative analysis resource. Nucleic Acids Res. 2009, 38: D382-D390.View ArticlePubMed CentralPubMedGoogle Scholar
- Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al: The RAST server: rapid annotations using subsystems technology. BMC Genomics. 2008, 9: 75-10.1186/1471-2164-9-75.View ArticlePubMed CentralPubMedGoogle Scholar
- Gillespie JJ, Wattam AR, Cammer SA, Gabbard JL, Shukla MP, Dalay O, Driscoll T, Hix D, Mane SP, Mao C, et al: PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infect Immun. 2011, 79: 4286-4298. 10.1128/IAI.00207-11.View ArticlePubMed CentralPubMedGoogle Scholar
- Winsor GL, Khaira B, Van Rossum T, Lo R, Whiteside MD, Brinkman FSL: The Burkholderia genome database: facilitating flexible queries and comparative analyses. Bioinformatics. 2008, 24: 2803-2804. 10.1093/bioinformatics/btn524.View ArticlePubMed CentralPubMedGoogle Scholar
- Greene JM, Collins F, Lefkowitz EJ, Roos D, Scheuermann RH, Sobral B, Stevens R, White O, Di Francesco V: National Institute of Allergy and Infectious Diseases bioinformatics resource centers: new assets for pathogen informatics. Infect Immun. 2007, 75: 3212-3219. 10.1128/IAI.00105-07.View ArticlePubMed CentralPubMedGoogle Scholar
- Brinkac LM, Davidsen T, Beck E, Ganapathy A, Caler E, Dodson RJ, Durkin AS, Harkins DM, Lorenzi H, Madupu R, et al: Pathema: a clade-specific bioinformatics resource center for pathogen research. Nucleic Acids Res. 2010, 38: D408-D414. 10.1093/nar/gkp850.View ArticlePubMed CentralPubMedGoogle Scholar
- Bartpho T, Wongsurawat T, Wongratanacheewin S, Talaat AM, Karoonuthaisiri N, Sermswan RW: Genomic islands as a marker to differentiate between clinical and environmental Burkholderia pseudomallei. PLoS One. 2012, 7: e37762-10.1371/journal.pone.0037762.View ArticlePubMed CentralPubMedGoogle Scholar
- Van Noorden R: Trouble at the text mine. Nature. 2012, 483: 134-135. 10.1038/483134a.View ArticlePubMedGoogle Scholar
- Bourne P: Will a biological database be different from a biological journal?. PLoS Comp Biol. 2005, 1: e34-10.1371/journal.pcbi.0010034.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.