Skip to main content

Table 1 Homologous pairs in both training and test dataset with the sequence identity and the recognition sites

From: iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots

Protein1(dataset)a

Protein2(dataset)

Sequence identity(%) b

Recognition site1c

Recognition site2c

4GZNC (train)

1AAYA (train)

33

E182

R118,D120,E121,R124

4GZNC (train)

4M9EA (train)

34

E182

E446

4GZNC (train)

5VMVA (test)

27

E182

E535

5EXHC (train)

3QMGA (train)

34

T80,H81,Q82,K88

Q201,R213,Y216

4ALPA (train)

5UDZA (train)

74

F77

Y140,H148,H162

5EIMA (train)

5DNOA (train)

100

R349,K436,T437,N477

N336,R338

5DFFA (train)

4B5FA (train)

35

R181

N207,R208

4RCJA (train)

4R3IA (train)

29

Y397

R475

3WPCA (train)

5ZLNA(test)

72

W47,F108,W96

F375,F402,Y537

5GXHA (train)

5H1KA (train)

99

F381,E197,Y474

N13,W14,Y15,R33,M357,R359

5U2RA (train)

2BPFA (train)

96

R283

K280,N294

5U2RA

5U8GA (train)

100

R283

M236

5U2RA

1BPXA (test)

100

R283

R283

5U2RA

4X5VA (train)

35

R283

N513

5U2RA

5TWPA (train)

26

R283

W434, H329

5U2RA

4XQ8B (train)

34

R283

Y505

5U2RA

5IIIA (train)

34

R283

E529

5HO4A (test)

2ERRA (train)

28

Q19,F66,E92,D49,F24,H108

H120,F160,F158,F126

5HO4A

2KXNB (test)

28

Q19,F66,E92,D49,F24,H108

I195,T196,P199,S194,R111

5HO4A

4CIOA

32

Q19,F66,E92,D49,F24,H108

N106,Y44,N108

4L5RC (train)

3RN2A (train)

40

N236

K160,R244,K335,R311,K251,K198,K309,K204

4HN5A (train)

1HCQA (test)

45

K442

S15,H18,Y19,E25,K32

4HT8A (train)

3QSUA (train)

31

Y25

K33

4HT8A (train)

4QVCD (train)

100

Y25

N48,N28,K31

3SPDA (train)

3SZQA (train)

98

H138,S142

F65

3OSGA (train)

1MSEC (train)

39

K49,R84,N139,K138,R87,F52,K51

S187

3EQTA (test)

5JBJA (train)

43

E573

H406

1QRVA (train)

1J5NA (train)

38

V32,L97

K53,Y81,N33,R23,R36,Y28,K67,R40,Y88,K60,M29,F48,K78,K22,K85

3OD8A (train)

3ODCA (train)

32

F44,V48

R122,L151,R138,I154

5FD3A (train)

4RKGA (train)

31

Y610,Y536

R526,R543

2I05A (train)

1ECRA (train)

100

R198

Q250,K89

2I05A (train)

4XR0A (train)

100

R198

H144

  1. aThe first four letters are the PDB code and the fifth letter is the chain ID. The remark in the parentheses is the dataset that the protein-nucleic acids complexes belong to
  2. bHomologous pairs are defined using sequence identity cutoff value 25%
  3. cThe first letter is the residue name in one letter, and the numbers after the letter is the residue sequence number in the protein