iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots

Zhu, Xiaolei; Liu, Ling; He, Jingjing; Fang, Ting; Xiong, Yi; Mitchell, Julie C.

doi:10.1186/s12859-020-03636-w

BMC Bioinformatics

Table 1 Homologous pairs in both training and test dataset with the sequence identity and the recognition sites

From: iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots

Protein1(dataset)^a	Protein2(dataset)	Sequence identity(%) ^b	Recognition site1^c	Recognition site2^c
4GZNC (train)	1AAYA (train)	33	E182	R118,D120,E121,R124
4GZNC (train)	4M9EA (train)	34	E182	E446
4GZNC (train)	5VMVA (test)	27	E182	E535
5EXHC (train)	3QMGA (train)	34	T80,H81,Q82,K88	Q201,R213,Y216
4ALPA (train)	5UDZA (train)	74	F77	Y140,H148,H162
5EIMA (train)	5DNOA (train)	100	R349,K436,T437,N477	N336,R338
5DFFA (train)	4B5FA (train)	35	R181	N207,R208
4RCJA (train)	4R3IA (train)	29	Y397	R475
3WPCA (train)	5ZLNA(test)	72	W47,F108,W96	F375,F402,Y537
5GXHA (train)	5H1KA (train)	99	F381,E197,Y474	N13,W14,Y15,R33,M357,R359
5U2RA (train)	2BPFA (train)	96	R283	K280,N294
5U2RA	5U8GA (train)	100	R283	M236
5U2RA	1BPXA (test)	100	R283	R283
5U2RA	4X5VA (train)	35	R283	N513
5U2RA	5TWPA (train)	26	R283	W434, H329
5U2RA	4XQ8B (train)	34	R283	Y505
5U2RA	5IIIA (train)	34	R283	E529
5HO4A (test)	2ERRA (train)	28	Q19,F66,E92,D49,F24,H108	H120,F160,F158,F126
5HO4A	2KXNB (test)	28	Q19,F66,E92,D49,F24,H108	I195,T196,P199,S194,R111
5HO4A	4CIOA	32	Q19,F66,E92,D49,F24,H108	N106,Y44,N108
4L5RC (train)	3RN2A (train)	40	N236	K160,R244,K335,R311,K251,K198,K309,K204
4HN5A (train)	1HCQA (test)	45	K442	S15,H18,Y19,E25,K32
4HT8A (train)	3QSUA (train)	31	Y25	K33
4HT8A (train)	4QVCD (train)	100	Y25	N48,N28,K31
3SPDA (train)	3SZQA (train)	98	H138,S142	F65
3OSGA (train)	1MSEC (train)	39	K49,R84,N139,K138,R87,F52,K51	S187
3EQTA (test)	5JBJA (train)	43	E573	H406
1QRVA (train)	1J5NA (train)	38	V32,L97	K53,Y81,N33,R23,R36,Y28,K67,R40,Y88,K60,M29,F48,K78,K22,K85
3OD8A (train)	3ODCA (train)	32	F44,V48	R122,L151,R138,I154
5FD3A (train)	4RKGA (train)	31	Y610,Y536	R526,R543
2I05A (train)	1ECRA (train)	100	R198	Q250,K89
2I05A (train)	4XR0A (train)	100	R198	H144

^aThe first four letters are the PDB code and the fifth letter is the chain ID. The remark in the parentheses is the dataset that the protein-nucleic acids complexes belong to
^bHomologous pairs are defined using sequence identity cutoff value 25%
^cThe first letter is the residue name in one letter, and the numbers after the letter is the residue sequence number in the protein

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com