From: Semantic annotation of morphological descriptions: an overall strategy
Methods | Handmade prerequisites and their reusability | Annotation Level | Results and their reusability | Scope of evaluation | Performance (*) |
---|---|---|---|---|---|
Syntactic parsing: 1. Abascal & Sanchenz (1999) 2. Taylor (1995) | Lexicon & grammar rules: Not good for another taxon group/collection. | 1. Paragraph 2. Character | 1. Style clues: Less reusable. 2. Organ names & character states: Reusable. | 1. FNA v. 19 2. Flora of New South Wales, Flora of Australia. | 1. Not reported 2. Roughly estimated recall:60%-80% |
Supervised machine learning--text classification: Cui & al. (2002) | Training examples: Not good for another taxon group. | paragraph | Classification models: Less reusable. | 1500+ descriptions from FNA | Recall: 94% Precision: 97% |
Ontology based extraction: 1. Diederich, Fortuner & Milton (1999) 2. Wood & al. (2003) | Dictionaries, ontology, & checklists: Not good for another taxon group. | Character | Organ names & character states: Reusable. | 1. 16 descriptions 2. 18 species descriptions from six Floras. | 1. Accuracy on 1 sample:76% 2. Recall: 66% Precision: 74% |
Supervised machine learning--extraction patterns: Tang & Heidorn (2007) | Extraction template & training examples: Not good for another taxon group. | Character, limit to these character states: leaf shape, size, color; Fruit type. | Extraction patterns: Sensitive to text variations, less reusable. Character states: Reusable. | 1600 FNA species descriptions. | Recall: 33%-80% Precision:75%-100% |
Supervised machine learning-- association rules: Cui (2008a) | Annotation template & training examples: Not good for another taxon group. | Clause | Association rules: Reusable only within the same taxon group | 16,000 descriptions from FNA, FOC, and FNCT | Recall and precision: 80%-95% |
Unsupervised learning: Cui (2008b) | No prerequisites | 1. Clause 2. Character | Organ names & character states: Reusable. | FNA, FOC, & Treatises Part H | Precision 88-95% Recall 50%-75% |