Skip to main content

Table 1 Incidence of syntactic/semantic phenomena

From: The textual characteristics of traditional and Open Access scientific journals are similar

  CRAFT TraJour Reference BioReference
Document count 97 99 2,500 163
Sentence count 43,694 35,997 53,107 32,895
Avg. Sentence count 450 364 21 202
Token count 717,166 598,331 1,096,976 654,493
Type count 41,574 49,394 40,139 38,801
Stopword count 238,542 193,905 453,264 238,077
Stopword % 33.3% 32.4% 41.3% 36.4%
Avg. Document length 7,393 6,044 439 4,015
Avg. Sentence length 22.5 24.7 26.4 27.8
Types/Tokens 5.8% 8.3% 3.7% 5.9%
Tokens/Types 17.3 12.1 27.3 16.9
Negatives 3,273 2,587 7,605 2,961
Negatives % 0.46% 0.43% 0.69% 0.45%
Coordination 25,237 23,706 26,019 25,059
Coordination % 3.52% 3.96% 2.37% 3.83%
Pronouns 18,874 15,603 57,406 20,699
Pronouns % 2.63% 2.61% 5.23% 3.16%
Passives 2,783 2,587 2,661 3,172
Passives % 0.39% 0.43% 0.24% 0.48%
  1. This table represents the counts of linguistic phenomena determined from our four document sets, CRAFT (open access), TraJour (traditional journals), Reference (Wall Street Journal), and BioReference (full text biomedical publications).