In this section, we first define a baseline combination problem formally. Then, we introduce three modified methods of reciprocal, CombMNZ and CombSUM respectively. After that, we give a brief review for three IR models of DFR, BM25 and language model. Finally, we present the related work in details.

### Problem definition

In this paper we focus on exploring a multi-source fusion approach for a metasearch system, where the metasearch approach has access to multiple IR systems that retrieve and rank documents/passages with their own models. We are interested in a scenario in which the proposed approach only concerns the baselines retrieved by the IR models and then re-rank the results as the output for evaluation.

For simplicity, throughout this paper, we will assume that our proposed approach works on three kind of baselines: 1) a DFR baseline, *B*
_{1}; 2) a BM25 baseline, *B*
_{2} and 3) a language model baseline, *B*
_{3}. Furthermore, we will select these baselines from the official submissions of the TREC 2007 Genomics Track. In addition, considering the performance range and effectiveness of the baselines, we try to choose more than a base run with the higher/lower performance. Since DFR is often used in fusion as one of the components, there is only a run named “UniNE1” from University of Neuchatel [15] which used DFR as a single model but did not combine many other models. Hence, we choose “UniNE1” as a seed *B*
_{1} of DFR in the proposed metasearch system. For BM25, we choose two baselines as “MuMshFd”, *B*
_{21} from University of Melbourne [16] and “york07ga2”, *B*
_{22} from York University [17]. And we choose two language model baselines as “UBexp1”, *B*
_{31} from University Buffalo [18] and “kyoto1”, *B*
_{32} from Kyoto University [19]. Hence, given a query *q*, we put all retrieval documents by three baselines *B*
_{1}, *B*
_{2}
_{
i
} and *B*
_{3}
_{
j
} (where *i*, *j* = 1, 2) as *D*, the corresponding weights of the documents as *R*. Based on the combination methods, reciprocal, CombMNZ and CombSUM, the proposed approach re-ranks the documents/passages as the new output.

### Reciprocal

Our intuition in choosing the reciprocal method as the formula in Equation

1, derives from the fact of an exponential function, while highly ranked documents are more important than the lower ranked documents. Reciprocal simply sorts the documents according to a naive scoring formula. Given a set

*D* of documents to be ranked and a set of rankings

*R*, for each permutation on 1

*..|D|*, we compute

where *r*(*d*) stands for the weight of the document, and the constant *k* mitigates the impact of high weights. We also fixed *k* = 60 [20] during a pilot investigation and not altered during subsequent validation, which will not be discussed because of the limit space.

### CombMNZ

Fox and Shaw [10] introduced several combination methods such as CombMax, CombMin, CombSUM, CombANZ, CombMNX and CombMed, and they found CombSUM to be the best performing combination method. Lee [9] conducted extensive experiments with Fox and Shaw combination method based on the TREC data, and he found CombMNZ emerges as the best combination method. In this paper, we apply CombMNZ in the proposed approach as part of the proposed fusion framework.

CombMNZ requires for each

*r* a corresponding scoring function

*s*
_{
r
} :

*D* →

*R* and a cutoff rank

*c* which all contribute to the CombMNZ score:

### CombSUM

As one of the famous combination methods proposed by Fox and Shaw [10], CombSUM is defined as the summation of the set of similarity values, or, equivalently, the numerical mean of the set of the set of similarity values. In [10], the CombSUM method made the significant improvements over all the baselines such that CombSUM is claimed to perform better than the rest of other methods such as CombMIN, CombANZ on the TREC-2 data set. In the image retrieval domain, Chatzichristofis et al. [21] also proved that the CombSUM method was beneficial to improve image information retrieval performance. In this paper, we employ the CombSUM method to evaluate its effectiveness on the genomics domain.

### IR Systems

In this section, we give a brief review for three well-known weighting models as the Okapi BM25 [22], language model [23, 24], and DFR [25].

#### Divergence from randomness

where

*IG* is the information gain, which is given by a conditional probability of success of encountering a further token of a given word in a given document on the basis of the statistics on the retrieved set.

*Prob*(

*tf*) is the probability of observing the document

*d* given

*tf* occurrences of the query term

*t. –log*
_{2}
*Prob*(

*tf*) measures the amount of information that term

*t* carries in

*d. qtw* is the query term weight component. Similarly to the query model in language modeling [

24],

*qtw* measures the importance of individual query terms. In the DFR framework, the query term weight is given by:

where *qtf*(*t*) is the query term frequency of *t*, namely the number of occurrences of *t* in the query. *qtf*
_{
max
} is the maximum query term frequency in the query.

The other two components, namely information gain (IG) and information amount (*–log*
_{2}
*Prob*(*tf*)), can be approximated by different statistics so that various instantiations of DFR are implemented.

#### Okapi BM25

where *w* is the weight of a query term, *N* is the number of indexed documents in the collection, *n* is the number of documents containing the term, *R* is the number of documents known to be relevant to a specific topic, *r* is the number of relevant documents containing the term, *tf* is within-document term frequency, *qtf* is within-query term frequency, *dl* is the length of the document, *avdl* is the average document length, *nq* is the number of query terms, the *k*
_{
i
}s are tuning constants (which depend on the database and possibly on the nature of the queries and are empirically determined), *K* equals to *k*
_{1}
*** ((1 *– b*) + *b * dl/avdl*)*.*

#### Language model

where *w* is the weight of a query term, *tf* is within-document term frequency, *FreqTotColl* is within-collection term frequency, *l* is document length, *F*
_{
t
} is length of the whole collection, the *mu* is tuning constants.

### Related work

A lot of previous work has been done on result combination. In the TREC 2007 Genomics Track, there are more than seven teams which utilize result combination to improve their final submissions in a total of 66 runs by 27 teams. “NLMFusion”, submitted by the team of National Library of Medicine [8], as the top scoring automatic run for all three metrics of the passage2-level, the aspect-level and the document-level, suggested that combining results from different IR models may improve the final score. Here “NLMFusion” is an automatic run obtained by applying fusion to a LHNCBC run, a Terrier run, an NCBI Themes run, an INDRI run and an easyIR run. However, not all teams using fusion/combination achieved the successfully improvements. The teams from University of Neuchatel [15], European Bioinformatics Institute [26], Kyoto University [19] and so on, showed slight declines in performance from their non-fusion/non-combination runs. Nevertheless, each team who used different methods, for fusing the individual different method runs, may have contributed to the differences in performance.

Divergence from randomness (DFR) [3], as one of five individual runs used in “NLMFusion”, was reported to be the highest scoring subcomponent run in the TREC 2007 Genomics Track. “UniNE3” [15], the fusion run submitted by University of Neuchatel, also gave details of success in using it. Since DFR was often used in fusion as one of the components, such as in 49 automatic submissions in 2007, there was only a run as “UniNE1” from University of Neuchatel [15] which used DFR as a single model but did not combine too many other models.

Okapi BM25, as one of the best well-known probabilistic weighting function, was very popular in the TREC Genomics Tracks. “MuMshFd”, the run submitted by University of Melbourne [16], obtained the highest score of the passage2-level, the aspect-level and the document-level in all the BM25 submissions. Other teams who applied the Okapi BM25 model, such as those from York University [17] and University of Illinois at Chicago [27], obtained the performance around the mean MAP on all the evaluation measures. “DUTgen3”, submitted by Dalian University of Technology [28], which also used the Okapi BM25 model, however, only slightly hit the median MAP.

Language model, as one of the most well-known statistical model, was also employed popularly by many teams. “AIDrun3” submitted by Arizona State University [14], “DUTgen1” and “DUTgen2” submitted by Dalian University of Technology [28], “UBexp1” from University at Buffalo [18] and “kyoto1” from Kyoto University [19], achieved better average performance than the Okapi runs, although the individual run is not as good as the Okapi BM25 run, “MuMshFd” submitted by University of Melbourne.