Krallinger M, Valencia A, Hirschman L. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol. 2008;9(Suppl 2):S8.
Article
PubMed
PubMed Central
Google Scholar
Burkhardt K, Schneider B, Ory J. A biocurator perspective: annotation at the research collaboratory for structural bioinformatics protein data bank. PLoS Comput Biol. 2006;2(10):e99.
Article
PubMed
PubMed Central
Google Scholar
Baumgartner WA Jr, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L. Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007;23(13):i41–8.
Article
CAS
PubMed
PubMed Central
Google Scholar
Burge S, Attwood TK, Bateman A, Berardini TZ, Cherry M, O'Donovan C, Xenarios L, Gaudet P. Biocurators and biocuration:surveying the 21st century challenges. Database. 2012;2012:bar059.
PubMed
PubMed Central
Google Scholar
Bourne PE, Lorsch JR, Green ED. Perspective: sustaining the big-data ecosystem. Nature. 2015;527:S16–7.
Article
CAS
PubMed
Google Scholar
Wikipedia article on Biocurator. https://en.wikipedia.org/wiki/Biocurator.
Van Auken, K., Fey, P., Berardini, T.Z., Dodson, R., Cooper, L., Li, D., Chan, J., Li, Y., Basu, S., Müller, H.-M., Chisolm, R., Huala, E., and Sternberg, P.W., and the WormBase Consortium. Textmining in the biocuration workflow: application for literature curation at WormBase, dictyBase, and TAIR. Database (Oxford). 2012 Nov 17;2012:bas040.
Hirschman L., Burns G.A., Krallinger M., Arighi C., Cohen K.B., Valencia A., Wu C.H., Chatr-Aryamontri A., Dowell K.G., Huala E., Lourenço A., Nash R., Veuthey A.L., Wiegers T., and Winter A.G. Text mining for the biocuration workflow. Database (Oxford). 2012 Apr 18;2012:bas020. doi: https://doi.org/10.1093/database/bas020. Print 2012.
Lu Z. and Hirschman L. Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II. Database (Oxford). 2012 Nov 17;2012:bas043. doi: https://doi.org/10.1093/database/bas043. Print 2012.
Singhal A., Leaman R., Catlett N., Lemberger T., McEntyre J., Polson S., Xenarios I., Arighi C., and Lu Z., 2016. Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges. Database (Oxford). 2016 Dec 26;2016. pii: baw161. doi: https://doi.org/10.1093/database/baw161. Print 2016.
Textpresso. http://www.textpresso.org.
Müller H-M, Kenny E, Sternberg PW. Textpresso: an ontology-based information retrieval system for the biological literature. PLoS Biol. 2004;2(11):e309.
Article
PubMed
PubMed Central
Google Scholar
Van Auken K, Jaffery J, Chan J, Müller H-M, Sternberg PW. Semi-automated curation of protein subcellular localization: a text mining-based approach to gene ontology (GO) cellular component curation. BMC Bioinformatics. 2009;10:228.
Article
PubMed
PubMed Central
Google Scholar
Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, O'Donnell L, Oster S, Theesfeld C, Sellam A, Stark C, Breitkreutz BJ, Dolinski K, Tyers M. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017 Jan 4;45(D1):D369–79. https://doi.org/10.1093/nar/gkw1102. Epub 2016 Dec 14.
Druzinsky RE, Balhoff JP, Crompton AW, Done J, German RZ, Haendel MA, Herrel A, Herring SW, Lapp H, Mabee PM, Muller HM, Mungall CJ, Sternberg PW, Van Auken K, Vinyard CJ, Williams SH, Wall CE. Muscle logic: new knowledge resource for anatomy enables comprehensive searches of the literature on the feeding muscles of mammals. PLoS One. 2016 Feb 12;11(2):e0149102.
Article
PubMed
PubMed Central
Google Scholar
McQuilton P., and The FlyBase Consortium. Opportunities for text mining in the FlyBase genetic literature curation workflow. Database (Oxford). 2012 Nov 17;2012:bas039. doi: https://doi.org/10.1093/database/bas039. Print 2012.
Li D., Berardini T.Z., Muller R.J., and Huala E. Building an efficient curation workflow for the Arabidopsis literature corpus. Database (Oxford). 2012 Dec 6;2012:bas047. doi: https://doi.org/10.1093/database/bas047. Print 2012.
Szostak J., Ansari S., Madan S., Fluck J., Talikka M., Iskandar A., De Leon H., Hofmann-Apitius M., Peitsch M.C., and Hoeng J. Construction of biological networks from unstructured information based on a semi-automated curation workflow. Database (Oxford). 2015;2015:bav057. doi: https://doi.org/10.1093/database/bav057.
Szostak J, Martin F, Talikka M, Peitsch MC, Hoeng J. Semi-automated curation allows causal network model building for the quantification of age-dependent plaque progression in ApoE−/− mouse. Gene Regul Syst Bio. 2016;10:95–103. eCollection 2016.
PubMed
PubMed Central
Google Scholar
Jorge P., Pérez-Pérez M., Pérez Rodríguez G., Fdez-Riverola F, Pereira MO, and Lourenço A. Construction of antimicrobial peptide-drug combination networks from scientific literature based on a semi-automated curation workflow. Database (Oxford). 2016 ;2016. pii: baw143. doi: https://doi.org/10.1093/database/baw143. Print 2016.
Rinaldi F, Lithgow O, Gama-Castro S, Solano H, Lopez A, Muñiz Rascado LJ, Ishida-Gutiérrez C, Méndez-Cruz CF, Collado-Vides J. Strategies towards digital and semi-automated curation in RegulonDB. Database (Oxford). 2017;(1) https://doi.org/10.1093/database/bax012.
Arighi C.N., Carterette B., Cohen K.B., Krallinger M., Wilbur W.J., Fey P., Dodson R., Cooper L., Van Slyke C.E., Dahdul W., Mabee P., Li D., Harris B., Gillespie M., Jimenez S., Roberts P., Matthews L., Becker K., Drabkin H., Bello S., Licata L., Chatr-Aryamontri A., Schaeffer M.L., Park J., Haendel M., Van Auken K., Li Y., Chan J., Muller H.-M., Cui H., Balhoff J.P., Chi-Yang Wu J., Lu Z., Wei C.H., Tudor C.O., Raja K., Subramani S., Natarajan J., Cejuela J.M., Dubey P., and Wu C. An overview of the BioCreative 2012 Workshop track III: interactive text mining task. Database (Oxford). 2013:bas056. Doi: https://doi.org/10.1093/database/bas056. Print 2013.
Arighi CN, Roberts PM, Agarwal S, Bhattacharya S, Cesareni G, Chatr-Aryamontri A, Clematide S, Gaudet P, Giglio MG, Harrow I, Huala E, Krallinger M, Leser U, Li D, Liu F, Lu Z, Maltais LJ, Okazaki N, Perfetto L, Rinaldi F, Sætre R, Salgado D, Srinivasan P, Thomas PE, Toldo L, Hirschman L, Wu CH. BioCreative III interactive task: an overview. BMC Bioinformatics. 2011;12(Suppl 8):S4. https://doi.org/10.1186/1471-2105-12-S8-S4.
Article
PubMed
PubMed Central
Google Scholar
Kim S., Islamaj Doğan R., Chatr-Aryamontri A., Chang C.S., Oughtred R., Rust J., Batista-Navarro R., Carter J., Ananiadou S., Matos S., Santos A., Campos D., Oliveira J.L., Singh O., Jonnagaddala J., Dai H.J., Su E.C., Chang Y.C., Su Y.C., Chu C.H., Chen C.C., Hsu W.L., Peng Y., Arighi C., Wu C.H., Vijay-Shanker K., Aydın F., Hüsünbeyi Z.M., Özgür A., Shin S.Y., Kwon D., Dolinski K., Tyers M., Wilbur W.J., and Comeau D.C. BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Database (Oxford). 2016;2016. pii: baw121. doi: https://doi.org/10.1093/database/baw121. Print 2016.
Wang Q., S Abdul S., Almeida L., Ananiadou S., Balderas-Martínez Y.I., Batista-Navarro R., Campos D., Chilton L., Chou H.J., Contreras G., Cooper L., Dai H.J., Ferrell B., Fluck J., Gama-Castro S., George N., Gkoutos G., Irin A.K., Jensen L.J., Jimenez S., Jue T.R., Keseler I., Madan S., Matos S., McQuilton P., Milacic M., Mort M., Natarajan J., Pafilis E., Pereira E., Rao S., Rinaldi F., Rothfels K., Salgado D., Silva R.M., Singh O., Stefancsik R., Su C.H., Subramani S., Tadepally H.D., Tsaprouni L., Vasilevsky N., Wang X., Chatr-Aryamontri A., Laulederkind S.J., Matis-Mitchell S., McEntyre J., Orchard S., Pundir S., Rodriguez-Esteban R., Van Auken K., Lu Z., Schaeffer M., Wu C.H., Hirschman L., and Arighi C.N. Overview of the interactive task in BioCreative V. Database (Oxford). 2016 Sep 1;2016. pii: baw119. Doi: https://doi.org/10.1093/database/baw119. Print 2016.
The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2017 Jan 4;45(D1):D331–8. https://doi.org/10.1093/nar/gkw1108. Epub 2016 Nov 29
Ferrucci, D., Lally, A., Gruhl, D., Epstein, E., Schor, M., Murdock, J.W., Frenkiel, A., Brown, E.W., Hampp, T., Doganata, Y., Welty, C., Amini, K., Kofman, G., Kozakov, L., and Mass, Y. Towards an interoperability standard for text and multi-modal analytics. IBM, Yorktown Heights, NY, Res Rep RC 24122.
Unstructured Information Management Architecture. http://uima.apache.org.
Kano Y, Miwa M, Cohen KB, Hunter LE, Ananiadou S, Tsujii J. U-compare: a modular NLP workflow construction and evaluation system. IBM J Res and Dev. 2011;55(3):11.
Article
Google Scholar
Lucene. https://lucene.apache.org/.
LucenePlusPlus. https://github.com/luceneplusplus/LucenePlusPlus.
Wt, a C++ Web Tool Kit. https://www.webtoolkit.eu/wt.
Journal Article Tag Suite. https://jats.nlm.nih.gov/.
PMC OA subset. http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.
Gene Ontology. http://geneontology.org.
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The sequence ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6(5):R44. Epub 2005 Apr 29.
Article
PubMed
PubMed Central
Google Scholar
Sequence Ontology. http://www.sequenceontology.org.
Chemical Entities of Biological Interest (ChEBI). https://www.ebi.ac.uk/chebi/.
Hastings J., de Matos P., Dekker A., Ennis M., Harsha B., Kale N., Muthukrishnan V., Owen G., Turner S., Williams M., and Steinbeck C. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res 2013 Jan;41(Database issue):D456–D463. doi: https://doi.org/10.1093/nar/gks1146. Epub 2012 Nov 24.
Phenotype and Trait Ontology (PATO). http://www.obofoundry.org/ontology/pato.html.
Gkoutos GV, Green EC, Mallon AM, Hancock JM, Davidson D. Using ontologies to describe mouse phenotypes. Genome Biol. 2005;6(1):R8. Epub 2004 Dec 20
Article
PubMed
Google Scholar
Uberon. http://uberon.github.io/.
Mungall C.J., Torniai C., Gkoutos G.V., Lewis S.E., and Haendel M.A.. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012 ;13(1):R5. doi: https://doi.org/10.1186/gb-2012-13-1-r5.
Protein Ontology (PRO). http://pir.georgetown.edu/pro/.
Natale DA, Arighi CN, Blake JA, Bona J, Chen C, Chen SC, Christie KR, Cowart J, D'Eustachio P, Diehl AD, Drabkin HJ, Duncan WD, Huang H, Ren J, Ross K, Ruttenberg A, Shamovsky V, Smith B, Wang Q, Zhang J, El-Sayed A, Wu CH. Protein ontology (PRO): enhancing and scaling up the representation of protein entities. Nucleic Acids Res. 2017 Jan 4;45(D1):D339–46. https://doi.org/10.1093/nar/gkw1075. Epub 2016 Nov 28
Lee RY, Sternberg PW. Building a cell and anatomy ontology of Caenorhabditis elegans. Comp Funct Genomics. 2003;4(1):121–6. https://doi.org/10.1002/cfg.248.
Article
PubMed
PubMed Central
Google Scholar
Lucene Analysis. https://www.tutorialspoint.com/lucene/lucene_analysis.htm.
Noctua. http://noctua.g
eneontology.org.
O’Connell KF, Caron C, Kopish KR, Hurd DD, Kemphues KJ, Li Y, White JG. The C. Elegans zyg-1 gene encodes a regulator of centrosome duplication with distinct maternal and paternal roles in the embryo. Cell. 2001;105(4):547–58.
Article
PubMed
Google Scholar
Kitagawa D, Busso C, Flückiger I, Gönczy P. Phosphorylation of SAS-6 by ZYG-1 is critical for centriole formation in C. Elegans embryos. Dev Cell. 2009 Dec;17(6):900–7. https://doi.org/10.1016/j.devcel.2009.11.002.
Article
CAS
PubMed
Google Scholar
Relations Ontology. https://github.com/oborel/obo-relations.
Fang R, Schindelman G, Van Auken K, Fernandes J, Chen W, Wang X, Davis P, Tuli MA, Marygold SJ, Millburn G, Matthews B, Zhang H, Brown N, Gelbart WM, Sternberg PW. Automatic categorization of diverse experimental information in the bioscience literature. BMC Bioinformatics. 2012 Jan 26;13:16.
Article
PubMed
PubMed Central
Google Scholar
Comeau D.C., Islamaj Doğan R., Ciccarese P., Cohen K.B., Krallinger M., Leitner F., Lu Z., Peng Y., Rinaldi F., Torii M., Valencia A., Verspoor K., Wiegers T.C., Wu C.H., and Wilbur W.J. BioC: a minimalist approach to interoperability for biomedical text processing. Database 2013 Sep 18;2013:bat064.
Cohen KB, Johnson HL, Verspoor K, Roeder C, Hunter LE. The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010 Sep 29;11:492. https://doi.org/10.1186/1471-2105-11-492.
Article
PubMed
PubMed Central
Google Scholar
Verspoor K., Cohen K.B., Lanfranchi A., Warner C., Johnson H.L., Roeder C., Choi J.D., Funk C., Malenkiy Y., Eckert M., Xue N., Baumgartner W.A. Jr, Bada M., Palmer M., and Hunter L.E. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics. 2012 Aug 17;13:207. doi: https://doi.org/10.1186/1471-2105-13-207.
Lin J. Is searching full text more effective than searching abstracts? BMC Bioinformatics. 2009 Feb 3;10:46. https://doi.org/10.1186/1471-2105-10-46.
Article
PubMed
PubMed Central
Google Scholar
Islamaj Dogan R., Kim S., Chatr-Aryamontri A., Chang C.S., Oughtred R., Rust J., Wilbur W.J., Comeau D.C., Dolinski K., and Tyers M. The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions. Database (Oxford). 2017. doi: https://doi.org/10.1093/database/baw147. Print 2017.
Van Auken K., Schaeffer M.L., McQuilton P., Laulederkind S.J., Li D., Wang S.J., Hayman G.T., Tweedie S., Arighi C.N., Done J., Müller H.-M., Sternberg P.W., Mao Y., Wei C.H., and Lu Z. BC4GO: a full-text corpus for the BioCreative IV GO task. Database (Oxford). 2014 pii: bau074. doi: https://doi.org/10.1093/database/bau074. Print 2014.