Skip to main content

Table 8 TF*IDF-ranked terms in the corpora

From: The textual characteristics of traditional and Open Access scientific journals are similar

CRAFT   TraJour   Reference   BioReference  
mice 0.435821989 cells 0.336961638 mr 0.121256579 cells 0.320612568
cells 0.270086285 mice 0.23486562 says 0.118389148 fig 0.205308437
expression 0.216037704 expression 0.23159711 that 0.118092658 cell 0.20214811
mouse 0.178144406 fig 0.220320662 he 0.102566064 abstract 0.190193446
cell 0.172290914 protein 0.195781243 market 0.091669921 medline 0.188483175
gene 0.163204203 cell 0.187501368 's 0.088505453 protein 0.177454065
embryos 0.151251462 mouse 0.167860719 million 0.08479812 fulltext 0.140623811
protein 0.14510293 biol 0.135330482 is 0.083961293 expression 0.138198756
figure 0.12789903 et 0.120455574 as 0.081560405 orderarticle... 0.119943839
doi 0.122095859 gene 0.117032939 his 0.081293607 genes 0.109091523
genes 0.120878869 al 0.1166477 on 0.079143554 proteins 0.106029999
mutant 0.119701823 embryos 0.113119754 stock 0.078988492 gene 0.098232094
null 0.097585044 proteins 0.110513285 they 0.078407133 were 0.096137225
type 0.093527187 domain 0.093587827 at 0.075765418 binding 0.094229295
wild 0.085050648 binding 0.093519941 but 0.075548004 window 0.085695981
differentiation 0.078946407 mutant 0.086691778 billion 0.073818895 induced 0.085566187
analysis 0.076546987 receptor 0.085026763 have 0.073662149 biol 0.085231028
receptor 0.075227992 pp 0.081740644 are 0.072364352 ml 0.083458459
dna 0.073316012 mutants 0.077608416 be 0.071025302 min 0.083015317
pcr 0.073162003 abstract 0.077359299 with 0.068584195 al 0.078391977
biol 0.072197936 antibody 0.076735615 it 0.067830211 et 0.077346227
fig 0.070585942 cdna 0.076317095 was 0.067707989 analysis 0.076920249
were 0.070224935 genes 0.07615871 't 0.066475036 mm 0.073266187
allele 0.069948445 membrane 0.075929698 in 0.065890951 mice 0.072205176
al 0.067582502 transcription 0.073863584 trading 0.065657748 shown 0.070980579
mutants 0.066734887 type 0.073860554 would 0.06509097 data 0.06877046
embryonic 0.0644353 were 0.072302222 said 0.064915624 ph 0.067550112
et 0.06344436 sequence 0.070485632 to 0.064419151 activation 0.06720991
staining 0.061271839 kinase 0.070118752 has 0.064175458 receptor 0.066788269
neurons 0.059343704 pcr 0.070118752 by 0.063766297 sequence 0.066026426
proteins 0.058555579 shown 0.069422034 shares 0.063615252 antibody 0.065025557
mm 0.057094213 X1 0.068956563 company 0.063043995 human 0.064973329
olfactory 0.056987095 activation 0.065599127 their 0.062731731 using 0.064071093
transcription 0.056130146 wild 0.065140388 for 0.062641744 dna 0.063146568
signaling 0.055582376 analysis 0.06314115 bonds 0.061745073 crossref 0.062926201
phenotype 0.052916588 wt 0.062499956 will 0.061422042 activity 0.058794757
observed 0.05206838 dna 0.060635915 year 0.061329696 rna 0.058294133
e2 0.051952521 chem 0.060606544 new 0.060716109 observed 0.05785548
shown 0.050532729 pdf 0.060433841 were 0.06062604 with 0.057545637
homozygous 0.050131504 mrna 0.060175577 or 0.060257745 these 0.057379774
function 0.049871842 rna 0.059552487 an 0.060255469 study 0.056368432
muscle 0.049628485 ca 0.055251655 from 0.059225401 free 0.056039813
data 0.049494253 differentiation 0.055139424 we 0.059174038 mediated 0.055983639
antibody 0.048131217 insulin 0.05440163 index 0.059103846 serum 0.05494964
chromosome 0.048033587 activity 0.053267608 some 0.058883875 actin 0.054506498
we 0.047444291 expressed 0.053108054 one 0.058690763 kinase 0.053029357
sequence 0.047236181 embryonic 0.052726312 more 0.058586253 ?c 0.052586215
transgenic 0.046764407 signaling 0.052369358 stocks 0.058457121 we 0.051671671
using 0.046658651 molecular 0.052281469 sales 0.058224908 figure 0.051378087
pgc 0.045739642 amino 0.052137849 this 0.05791668 amino 0.050216424
  1. These are the top 50 terms in each corpus, by TF*IDF (Term Frequency * Inverse Document Frequency). Terms highlighted in bold in the CRAFT and TraJour columns indicate terms that are shared among these two corpora within the top 50 terms of each corpus; terms highlighted in bold in the BioReference column are shared among all three corpora in the top 50 terms. There is clearly significant overlap between CRAFT and TraJour in their contentful terms.