Skip to main content

Table 1 Experimental setup for evaluating CRFref

From: Identification of highly related references about gene-disease association

Item

Setting

Experimental Data

(1) Gene-disease pairs and candidate references:

 

  (A) Genes, diseases, and their aliases: All diseases (and their aliases) listed in GHR as well as their associated genes (and their aliases) are downloaded.

 

  (B) Candidate references: Abstracts of references for each of the gene-disease pairs are collected by querying PubMed.

 

  (C) Target gene-disease pairs: Target gene-disease pairs are obtained from GHR.

 

(2) Target references of each target gene-disease pair:

 

For each target gene-disease pair, all candidate references that GHR curators employed to develop a summary for the pair are the target references for the pair.

Baselines

(1) Vector Space Model (VSM) and term weighting techniques: Two techniques: Lucene and BM25;

 

(2) Proximity-based techniques: Two techniques PRE and PLM that enhance BM25 with the proximity of the disease and the gene in the references;

 

(3) Position-and-frequency-based technique: A technique PosFreq that ranks candidate references by considering the positions and the frequencies of the disease and the gene in the references;

 

(4) Integrative techniques: Several rankers developed by combining the above techniques with SVMrank.

Evaluation criterion

(1) Mean Average Precision (MAP);

 

(2) Average Precision at top-X (average P@X), with X set to 1, 2, and 3.