Skip to main content

Table 1 Incidence of syntactic/semantic phenomena

From: The textual characteristics of traditional and Open Access scientific journals are similar

 

CRAFT

TraJour

Reference

BioReference

Document count

97

99

2,500

163

Sentence count

43,694

35,997

53,107

32,895

Avg. Sentence count

450

364

21

202

Token count

717,166

598,331

1,096,976

654,493

Type count

41,574

49,394

40,139

38,801

Stopword count

238,542

193,905

453,264

238,077

Stopword %

33.3%

32.4%

41.3%

36.4%

Avg. Document length

7,393

6,044

439

4,015

Avg. Sentence length

22.5

24.7

26.4

27.8

Types/Tokens

5.8%

8.3%

3.7%

5.9%

Tokens/Types

17.3

12.1

27.3

16.9

Negatives

3,273

2,587

7,605

2,961

Negatives %

0.46%

0.43%

0.69%

0.45%

Coordination

25,237

23,706

26,019

25,059

Coordination %

3.52%

3.96%

2.37%

3.83%

Pronouns

18,874

15,603

57,406

20,699

Pronouns %

2.63%

2.61%

5.23%

3.16%

Passives

2,783

2,587

2,661

3,172

Passives %

0.39%

0.43%

0.24%

0.48%

  1. This table represents the counts of linguistic phenomena determined from our four document sets, CRAFT (open access), TraJour (traditional journals), Reference (Wall Street Journal), and BioReference (full text biomedical publications).