Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: A corpus for plant-chemical relationships in the biomedical domain

Fig. 1

The workflow of corpus construction. The corpus was constructed as follows: (i) we collected PubMed abstracts from the PubTator database; (ii) we applied NER tools, including LingPipe and ChemSpot, to pre-annotate plant and chemical names; (iii) we extracted co-occurrence sentences that contain at least one plant and chemical name; (iv) we randomly selected candidate sentences, where the numbers of positive and negative sentences were set to be approximately the same, and also split them into corpus units; (v) we manually annotated candidate corpus units with our guidelines and also conducted later annotation to harmonize disagreements after annotators finished their annotation tasks; and (vi) we converted annotated corpus units to the BioC XML format

Back to article page