Natural language processing in text mining for structural modeling of protein complexes

BMC Bioinformatics

Table 1 Overall text-mining performance with the residue filtering using semantic similarity of words in a residue-containing sentence to a generic concept in the WordNet vocabulary. For comparison, the results with basic residue filtering are also shown

Query	Similarity measure	L _tot ^a	L _int ^b	Coverage (%)^c	Success (%)^d	Accuracy (%)^e	ΔN(0)^f	ΔN(1)^f
AND	–	128	108	22.1	18.7	84.4
OR	–	328	273	56.6	47.2	83.2
OR	Lesk [39, 40]	319	267	55.1	46.1	83.7	-3	−1
OR	Lin [41]	251	184	43.4	31.8	73.3	+ 8	−8
OR	Path [42, 43]	316	265	54.6	45.8	83.9	−3	+ 1

^aNumber of complexes for which TM protocol found at least one abstract with residues
^bNumber of complexes with at least one interface residue found in abstracts
^cRatio of L_tot and total number of complexes
^dRatio of L_int and total number of complexes
^eRatio of L_int and L_tot
^fCalculated by Eq. (2)

ISSN: 1471-2105