A robust approach to optimizing multi-source information for enhancing genomics retrieval performance

BMC Bioinformatics

Table 6 Performance of the Fusion Approach on Okapi 2007 and 2006

Components	Okapi 2007			Okapi 2006
	document	aspect	passage2	document	aspect
word	0.2108	0.1080	0.0364	0.3140	0.1237
sentence	0.1805	0.0970	0.0350	0.3030	0.1206
paragraph	0.1588	0.0616	0.0333	0.3109	0.1410
reciprocal	0.2219 (5.29%)	0.1237 (14.51%)	0.0478 (31.40%)	0.3168 (1.07%)	0.1449 (12.25%)
CombMNZ-with-normalization	0.1703 (-19.20%)	0.0643 (-40.43%)	0.0270 (-25.92%)	0.2352 (-26.55%)	0.0498 (-61.46%)
CombMNZ-with-assigned-weights	0.1777 (-15.72%)	0.0701 (-35.12%)	0.0273 (-24.88%)	0.2441 (-23.78%)	0.0524 (-59.43%)
CombMNZ-with-multiple	0.1730 (-17.93%)	0.0651 (-39.73%)	0.0277 (-24.01%)	0.2375 (-25.85%)	0.0508 (-60.62%)
CombSUM	0.1818 (-13.76%)	0.0718 (-33.56%)	0.0297 (-18.43%)	0.2559 (-20.10%)	0.0719 (-44.32%)

We examine the proposed robust approach on the single model with Okapi BM25. First of all, the baselines are from three different indices under the same IR model, BM25, instead of those from three kind of IR models. Second, three indices are built on the 2007 and 2006 genomics data sets according to three passage extraction methods [11, 12]. Here “word” stands for “word-base”, “sentence” for “sentence-base” and “paragraph” for “paragraph-base”. Third, the Okapi tuning parameters of the selected runs are (k₁, b) = (0.5, 1.3). The values in the parentheses are the relative rates of improvement over the best results of the baselines.

ISSN: 1471-2105