Skip to main content

Table 1 Sequence-sequence comparison F-measure for clustered sequences

From: Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Family

 

TransClust

HiFix

MCL

SCPS

Dataset

F-measure

Clusters

Precision

Recall

F-measure

Clusters

Precision

Recall

F-measure

Clusters

Precision

Recall

F-measure

Clusters

Precision

Recall

A-10

0.494

1757

0.834

0.409

0.467

2780

0.463

0.692

0.352

2310

0.923

0.389

-

 

A-20

0.573

2013

0.885

0.494

0.491

3270

0.556

0.732

0.398

4125

0.999

0.278

-

 

A-30

0.675

2561

0.912

0.628

0.583

3749

0.561

0.885

0.415

1827

0.351

0.773

-

 

A-50

0.721

3221

0.903

0.709

0.608

4861

0.562

0.945

0.457

1912

0.702

0.445

-

 

A-70

0.739

3486

0.904

0.733

0.630

4921

0.616

0.873

0.474

2323

0.752

0.482

-

 

A-90

0.758

3630

0.913

0.753

0.653

4973

0.625

0.895

0.511

2824

0.815

0.512

-

 

A-95

0.766

3715

0.916

0.765

0.654

4992

0.629

0.907

0.527

2873

0.527

0.813

-

 

GOLD

0.914

96

0.905

0.968

0.902

99

0.960

0.895

0.880

56

0.808

0.942

-

 

Super-family

A-10

0.377

1757

0.917

0.281

0.337

2780

0.993

0.274

0.270

3270

0.997

0.180

0.297

658

0.387

0.221

A-20

0.450

2013

0.954

0.347

0.362

3270

0.993

0.293

0.282

4024

0.999

0.191

0.352

701

0.400

0.323

A-30

0.551

2561

0.551

0.440

0.473

3749

0.994

0.414

0.333

3745

0.998

0.235

0.473

792

0.494

0.364

A-50

0.609

3221

0.995

0.499

0.507

4861

0.992

0.457

0.351

3048

0.847

0.310

0.557

753

0.618

0.546

A-70

0.631

3486

0.997

0.519

0.539

4921

0.990

0.495

0.377

2086

0.875

0.335

0.581

493

0.649

0.518

A-90

0.654

3630

0.996

0.544

0.560

4973

0.989

0.528

0.426

2549

0.922

0.364

0.607

633

0.680

0.531

A-95

0.659

3715

0.996

0.552

0.563

4986

0.990

0.542

0.435

2616

0.912

0.378

0.615

940

0.686

0.542

GOLD

0.865

23

1

0.765

0.915

13

0.998

0.852

0.827

24

1

0.712

0.904

4

0.864

0.983

  1. Number of clusters found, and weighted mean precision and recall values for each clustering algorithm are shown. Best values are shown in bold.