Skip to main content

Table 2 Features extracted from each token in a reference

From: A structural SVM approach for reference parsing

1.Author Name Feature

Is the word in Author Name dictionary?

2. Article Title Feature

Is the word in Article Title dictionary?

3. Journal Title Feature

Is the word in Journal Title dictionary?

4. Pagination Pattern

Is the word in pagination formation, e.g., 200-5, H100-H105?

5. Name Initial Pattern

Is the word in name initial pattern, e.g., J.Z., J.-Z.?

6. Four Digit Year Pattern

Is the word in four digit year pattern, e.g., 2005? It must be not before 1500, and not later than the current year.

7. et, al

Is the word “et” or “al”, or “et.”, or “al.”?

8. pp., p.

Is the word “pp.”, or “p.”, or “pp”, or “p”?

9. Ended With “.”

Does the word end with “.”?

10. Upper Case First Char

Is the first character of the word upper case?

11. Letter Only

Does the word contain letters only?

12. Digit Only

Does the word contain digits only?

13. Digit and Letter

Does the word contain both digits and letters?

14. Digit and Letter Only

Does the word contain digits and letters only?

15. Normalized position

The position of the word normalized by the total number of words in the reference.