Skip to main content

Table 8 TF*IDF-ranked terms in the corpora

From: The textual characteristics of traditional and Open Access scientific journals are similar

CRAFT

 

TraJour

 

Reference

 

BioReference

 

mice

0.435821989

cells

0.336961638

mr

0.121256579

cells

0.320612568

cells

0.270086285

mice

0.23486562

says

0.118389148

fig

0.205308437

expression

0.216037704

expression

0.23159711

that

0.118092658

cell

0.20214811

mouse

0.178144406

fig

0.220320662

he

0.102566064

abstract

0.190193446

cell

0.172290914

protein

0.195781243

market

0.091669921

medline

0.188483175

gene

0.163204203

cell

0.187501368

's

0.088505453

protein

0.177454065

embryos

0.151251462

mouse

0.167860719

million

0.08479812

fulltext

0.140623811

protein

0.14510293

biol

0.135330482

is

0.083961293

expression

0.138198756

figure

0.12789903

et

0.120455574

as

0.081560405

orderarticle...

0.119943839

doi

0.122095859

gene

0.117032939

his

0.081293607

genes

0.109091523

genes

0.120878869

al

0.1166477

on

0.079143554

proteins

0.106029999

mutant

0.119701823

embryos

0.113119754

stock

0.078988492

gene

0.098232094

null

0.097585044

proteins

0.110513285

they

0.078407133

were

0.096137225

type

0.093527187

domain

0.093587827

at

0.075765418

binding

0.094229295

wild

0.085050648

binding

0.093519941

but

0.075548004

window

0.085695981

differentiation

0.078946407

mutant

0.086691778

billion

0.073818895

induced

0.085566187

analysis

0.076546987

receptor

0.085026763

have

0.073662149

biol

0.085231028

receptor

0.075227992

pp

0.081740644

are

0.072364352

ml

0.083458459

dna

0.073316012

mutants

0.077608416

be

0.071025302

min

0.083015317

pcr

0.073162003

abstract

0.077359299

with

0.068584195

al

0.078391977

biol

0.072197936

antibody

0.076735615

it

0.067830211

et

0.077346227

fig

0.070585942

cdna

0.076317095

was

0.067707989

analysis

0.076920249

were

0.070224935

genes

0.07615871

't

0.066475036

mm

0.073266187

allele

0.069948445

membrane

0.075929698

in

0.065890951

mice

0.072205176

al

0.067582502

transcription

0.073863584

trading

0.065657748

shown

0.070980579

mutants

0.066734887

type

0.073860554

would

0.06509097

data

0.06877046

embryonic

0.0644353

were

0.072302222

said

0.064915624

ph

0.067550112

et

0.06344436

sequence

0.070485632

to

0.064419151

activation

0.06720991

staining

0.061271839

kinase

0.070118752

has

0.064175458

receptor

0.066788269

neurons

0.059343704

pcr

0.070118752

by

0.063766297

sequence

0.066026426

proteins

0.058555579

shown

0.069422034

shares

0.063615252

antibody

0.065025557

mm

0.057094213

X1

0.068956563

company

0.063043995

human

0.064973329

olfactory

0.056987095

activation

0.065599127

their

0.062731731

using

0.064071093

transcription

0.056130146

wild

0.065140388

for

0.062641744

dna

0.063146568

signaling

0.055582376

analysis

0.06314115

bonds

0.061745073

crossref

0.062926201

phenotype

0.052916588

wt

0.062499956

will

0.061422042

activity

0.058794757

observed

0.05206838

dna

0.060635915

year

0.061329696

rna

0.058294133

e2

0.051952521

chem

0.060606544

new

0.060716109

observed

0.05785548

shown

0.050532729

pdf

0.060433841

were

0.06062604

with

0.057545637

homozygous

0.050131504

mrna

0.060175577

or

0.060257745

these

0.057379774

function

0.049871842

rna

0.059552487

an

0.060255469

study

0.056368432

muscle

0.049628485

ca

0.055251655

from

0.059225401

free

0.056039813

data

0.049494253

differentiation

0.055139424

we

0.059174038

mediated

0.055983639

antibody

0.048131217

insulin

0.05440163

index

0.059103846

serum

0.05494964

chromosome

0.048033587

activity

0.053267608

some

0.058883875

actin

0.054506498

we

0.047444291

expressed

0.053108054

one

0.058690763

kinase

0.053029357

sequence

0.047236181

embryonic

0.052726312

more

0.058586253

?c

0.052586215

transgenic

0.046764407

signaling

0.052369358

stocks

0.058457121

we

0.051671671

using

0.046658651

molecular

0.052281469

sales

0.058224908

figure

0.051378087

pgc

0.045739642

amino

0.052137849

this

0.05791668

amino

0.050216424

  1. These are the top 50 terms in each corpus, by TF*IDF (Term Frequency * Inverse Document Frequency). Terms highlighted in bold in the CRAFT and TraJour columns indicate terms that are shared among these two corpora within the top 50 terms of each corpus; terms highlighted in bold in the BioReference column are shared among all three corpora in the top 50 terms. There is clearly significant overlap between CRAFT and TraJour in their contentful terms.