Skip to main content

Table 2 Compression of entire PDB/mmCIF files benchmark

From: Image-centric compression of protein structures improves space savings

 

PDB/

BCIF

MMTF

MMTF

PIC (lossy precision)

Foldcomp

Protein

CIF size

Size

Size

C\(\alpha\)-lossy

Coord

MMTF-

Total

RMSD

(Lossy precision)

 

(KB)

(KB)

(KB)

(KB)

(KB)

Meta (KB)

(KB)

(Å)

(KB)

2ja9

163

24

13

3

9

4

14

0.030680

segfault

2jan

1101

108

98

15

61

22

84

0.047302

26

2jbp

2397

224

214

30

108

52

161

0.043140

runtime>day

2ja8

2831

299

249

42

138

59

197

0.043373

runtime>day

2ign

3579

329

321

41

147

71

219

0.068555

runtime>day

2jd8

4457

382

384

49

196

88

285

0.056100

runtime>day

2ja7

5605

534

488

74

258

110

369

0.055422

runtime>day

2fug

6386

580

566

82

283

128

412

0.060134

runtime>day

2b9v

6817

594

608

73

288

128

417

0.072982

runtime>day

2j28

8152

738

586

70

346

50

396

0.055343

runtime>day

6hif

12726

877

894

167

372

194

566

0.061988

runtime>day

3j7q

16027

1211

1027

108

475

234

710

0.058150

runtime>day

3j9m

17995

1339

1198

160

525

282

807

0.069300

runtime>day

6gaw

20825

1584

1359

184

587

324

911

0.070872

runtime>day

5t2a

22787

1628

1396

151

651

262

914

0.068460

segfault

4ug0

24906

1827

1606

177

707

364

1072

0.069050

runtime>day

4v60

24377

1509

1618

242

730

191

922

0.119788

491

4wro

35661

2336

1902

171

848

447

1296

0.085947

runtime>day

6fxc

31328

1961

1678

181

917

103

1020

0.099960

segfault

4wq1

40130

2646

2212

216

968

523

1492

0.086623

runtime>day

  1. Actual compressed file sizes. We compare PIC, MMTF, Foldcomp, and BCIF formats. However, because PIC does not compress atomic metadata, we compressed the metadata-only with MMTF and then added that to the PNG sizes from PIC. All uses of MMTF were followed-up with standard Gzip compression on the MMTF file, as is standard, whereas BCIF is already a fully compressed file format. All sizes are in kilobytes (1000 \(\times\) bytes). We also include the RMSD for PIC coordinates here. Lastly, unfortunately, most of the proteins we chose for our benchmark were discontinuous, or had other quirks, so Foldcomp 0.0.5 was unable to compress them after running for 24 h. However, Foldcomp does substantially better on both 2jan and 4v60 than all but MMTF-reduced (C\(\alpha\)-lossy), which not only decreases precision but also only keeps the alpha carbons