Genome Project Statistic[http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html]
DeLong EF: Microbial community genomics in the ocean. Nat Rev Microbiol 2005, 3(6):459–469. 10.1038/nrmicro1158
Article
CAS
PubMed
Google Scholar
Eisen JA: Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 2007, 5(3):e82. 10.1371/journal.pbio.0050082
Article
PubMed Central
PubMed
Google Scholar
Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM: Comparative metagenomics of microbial communities. Science 2005, 308(5721):554–557. 10.1126/science.1107851
Article
CAS
PubMed
Google Scholar
Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 2004, 428(6978):37–43. 10.1038/nature02340
Article
CAS
PubMed
Google Scholar
Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304(5667):66–74. 10.1126/science.1093857
Article
CAS
PubMed
Google Scholar
DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard NU, Martinez A, Sullivan MB, Edwards R, Brito BR, Chisholm SW, Karl DM: Community genomics among stratified microbial assemblages in the ocean's interior. Science 2006, 311(5760):496–503. 10.1126/science.1120250
Article
CAS
PubMed
Google Scholar
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcon LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC: The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 2007, 5(3):e77. 10.1371/journal.pbio.0050077
Article
PubMed Central
PubMed
Google Scholar
Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, Strausberg RL, Frazier M, Venter JC: The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families. PLoS Biol 2007, 5(3):e16. 10.1371/journal.pbio.0050016
Article
PubMed Central
PubMed
Google Scholar
Chen K, Pachter L: Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput Biol 2005, 1(2):106–112. 10.1371/journal.pcbi.0010024
CAS
PubMed
Google Scholar
Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, Lapidus A, Grigoriev I, Richardson P, Hugenholtz P, Kyrpides NC: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 2007, 4(6):495–500. 10.1038/nmeth1043
Article
CAS
PubMed
Google Scholar
Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE: Metagenomic analysis of the human distal gut microbiome. Science 2006, 312(5778):1355–1359. 10.1126/science.1124234
Article
PubMed Central
CAS
PubMed
Google Scholar
McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I: Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 2007, 4(1):63–72. 10.1038/nmeth976
Article
CAS
PubMed
Google Scholar
Krause L, Diaz NN, Bartels D, Edwards RA, Puhler A, Rohwer F, Meyer F, Stoye J: Finding novel genes in bacterial communities isolated from the environment. Bioinformatics 2006, 22(14):e281–9. 10.1093/bioinformatics/btl247
Article
CAS
PubMed
Google Scholar
Noguchi H, Park J, Takagi T: MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 2006, 34(19):5623–5630. 10.1093/nar/gkl723
Article
PubMed Central
CAS
PubMed
Google Scholar
Harrington ED, Singh AH, Doerks T, Letunic I, von Mering C, Jensen LJ, Raes J, Bork P: Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proc Natl Acad Sci U S A 2007, 104(35):13913–13918. 10.1073/pnas.0702636104
Article
PubMed Central
CAS
PubMed
Google Scholar
Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540. 10.1006/jmbi.1995.0159
CAS
PubMed
Google Scholar
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH--a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108. 10.1016/S0969-2126(97)00260-8
Article
CAS
PubMed
Google Scholar
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res 2004, 32(Database issue):D138–41. 10.1093/nar/gkh121
Article
PubMed Central
CAS
PubMed
Google Scholar
Corpet F, Gouzy J, Kahn D: The ProDom database of protein domain families. Nucleic Acids Res 1998, 26(1):323–326. 10.1093/nar/26.1.323
Article
PubMed Central
CAS
PubMed
Google Scholar
Haft DH, Selengut JD, White O: The TIGRFAMs database of protein families. Nucleic Acids Res 2003, 31(1):371–373. 10.1093/nar/gkg128
Article
PubMed Central
CAS
PubMed
Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403–410.
Article
CAS
PubMed
Google Scholar
Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M: CAMERA: A Community Resource for Metagenomics. PLoS Biol 2007, 5(3):e75. 10.1371/journal.pbio.0050075
Article
PubMed Central
PubMed
Google Scholar
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2007, 35(Database issue):D5–12. 10.1093/nar/gkl1031
Article
PubMed Central
CAS
PubMed
Google Scholar
Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Graf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, Parker A, Proctor G, Prlic A, Rae M, Rios D, Redmond S, Schuster M, Sealy I, Searle S, Severin J, Slater G, Smedley D, Smith J, Stabenau A, Stalker J, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Hubbard TJ: Ensembl 2006. Nucleic Acids Res 2006, 34(Database issue):D556–61. 10.1093/nar/gkj133
Article
PubMed Central
CAS
PubMed
Google Scholar
Quackenbush J, Liang F, Holt I, Pertea G, Upton J: The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res 2000, 28(1):141–145. 10.1093/nar/28.1.141
Article
PubMed Central
CAS
PubMed
Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389
Article
PubMed Central
CAS
PubMed
Google Scholar
Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000, 9(2):232–241.
Article
PubMed Central
CAS
PubMed
Google Scholar
Ochman H: Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes. Trends Genet 2002, 18(7):335–337. 10.1016/S0168-9525(02)02668-9
Article
CAS
PubMed
Google Scholar
Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in BioSciences 1997, 13(5):555–556.
CAS
Google Scholar
Li W, Jaroszewski L, Godzik A: Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 2001, 17(3):282–283. 10.1093/bioinformatics/17.3.282
Article
CAS
PubMed
Google Scholar
Li W, Jaroszewski L, Godzik A: Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 2002, 18(1):77–82. 10.1093/bioinformatics/18.1.77
Article
CAS
PubMed
Google Scholar
Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658–1659. 10.1093/bioinformatics/btl158
Article
CAS
PubMed
Google Scholar
Nekrutenko A, Makova KD, Li WH: The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res 2002, 12(1):198–202. 10.1101/gr.200901
Article
PubMed Central
CAS
PubMed
Google Scholar
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792–1797. 10.1093/nar/gkh340
Article
PubMed Central
CAS
PubMed
Google Scholar
Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 2000, 155(1):431–449.
PubMed Central
CAS
PubMed
Google Scholar
CAMERA[http://camera.calit2.net/]
PANDA[ftp://ftp.tigr.org/pub/software/PANDA/]
Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977, 74(12):5463–5467. 10.1073/pnas.74.12.5463
Article
PubMed Central
CAS
PubMed
Google Scholar
Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, Baptista D, Bibbs L, Eads J, Richardson TH, Noordewier M, Rappe MS, Short JM, Carrington JC, Mathur EJ: Genome streamlining in a cosmopolitan oceanic bacterium. Science 2005, 309(5738):1242–1245. 10.1126/science.1114057
Article
CAS
PubMed
Google Scholar
Sait M, Hugenholtz P, Janssen PH: Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol 2002, 4(11):654–666. 10.1046/j.1462-2920.2002.00352.x
Article
CAS
PubMed
Google Scholar
Fischer D, Eisenberg D: Finding families for genomic ORFans. Bioinformatics 1999, 15(9):759–762. 10.1093/bioinformatics/15.9.759
Article
CAS
PubMed
Google Scholar
Siew N, Fischer D: Unravelling the ORFan puzzle. Comparative and Functional Genomics 2003, 4: 432–441. 10.1002/cfg.311
Article
PubMed Central
CAS
PubMed
Google Scholar
Unger R, Uliel S, Havlin S: Scaling law in sizes of protein sequence families: from super-families to orphan genes. Proteins 2003, 51(4):569–576. 10.1002/prot.10347
Article
CAS
PubMed
Google Scholar
Microbial Genome Sequencing Project - Gordon and Betty Moore Foundation[http://www.moore.org/microgenome/]
Vega FMDL, Marth GT, Sutton G: Computational Tools For Next-Generation Sequencing Applications. 2008, 13: 87–89.
Google Scholar
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437(7057):376–380.
PubMed Central
CAS
PubMed
Google Scholar
Besemer J, Borodovsky M: Heuristic approach to deriving models for gene finding. Nucleic Acids Res 1999, 27(19):3911–3920. 10.1093/nar/27.19.3911
Article
PubMed Central
CAS
PubMed
Google Scholar