Skip to main content

Table 3 Feature set description

From: Disorder recognition in clinical texts using multi-label structured SVM

Feature

Description

Bag of Words

Bag of Words in a 5-word window.

Part of Speeches

Part of Speeches in a 7-word window.

Capitalization

Convert all alphabetic characters of the words to uppercase [31]. The window size is 5.

Case pattern

The patterns are generated by the following steps. Similar to [32], any uppercase alphabetic character is replaced by “A” and any lowercase one is replaced by “a”. In the same way, any number is replaced by “0”. The window size is 3.

Word representation

We use word2vec to acquire 700 clusters from the unlabeled clinical narratives and give each cluster a different serial number. Then we take the serial number of the clusters as a feature. The window size is 3.