Skip to main content

Table 1 Local and global features of various input components

From: Structured learning for spatial information extraction from biomedical text: bacteria biotopes

Type

Feature name

Description

Word features

Lexical-form

Word surface that appears in the text

 

Bio-lemma

Word lemma using a lemmatizer for biomedical domain which uses additional lexical resources [24]

 

POS-tag

Part of speech tag of a word to exploit the syntactical information for training

 

Dprl

Dependency relation of a word to its syntactic head which gives clues to the semantic relationships

 

Cocoa

Word tag using Cocoa - an external resource of biological concepts

 

Capital

If a word starts with a capital letter

 

Stop-word

If a word belongs to a list of stop words

Phrase features

Head-features

The features of the word which is the syntactic head of a phrase

 

nHead-features

The features of other words contained in the phrase

 

Lexical-surface

Concatenation of the lexical form of the words in the phrase

 

Phrasal-POS

The phrasal part of speech tag: the parse tree tag of the common parent of the words in a phrase

 

NCBI-sim

Comparing the phrase and the list of bacterium names in NCBI

 

Ontobio-sim

Comparing the phrase and the habitat classes in OntoBiotope

Phrase-pair features

Same-par

If two phrases occur in same paragraph

 

Same-sen

If two phrases occur in one sentence

 

inTitle

If bacterium candidate occurs in the title

 

Verb

The verb in between the two phrases- if in same sentence

 

Preposition

The preposition in between the two phrases-if in same sentence

 

Parse-Dis

The distance between the two phrases using the parse tree

 

Parse-Path

The path between the two phrases using the parse tree

 

Heads-Lem

The concatenation of the lemma of the heads

 

Heads-POS

The concatenation of the POS-tag of the two heads

 

Dep-Path

The dependency path between the two heads

Relation-pair features

Same-B

If two relations have exactly the same bacterium candidate

 

Sim-BH

Similarity of two relations based on the similarity of their bacterium and habitat candidates