From: Identification of highly related references about gene-disease association
Item | Setting |
---|---|
Experimental Data | (1) Gene-disease pairs and candidate references: |
(A) Genes, diseases, and their aliases: All diseases (and their aliases) listed in GHR as well as their associated genes (and their aliases) are downloaded. | |
(B) Candidate references: Abstracts of references for each of the gene-disease pairs are collected by querying PubMed. | |
(C) Target gene-disease pairs: Target gene-disease pairs are obtained from GHR. | |
(2) Target references of each target gene-disease pair: | |
For each target gene-disease pair, all candidate references that GHR curators employed to develop a summary for the pair are the target references for the pair. | |
Baselines | (1) Vector Space Model (VSM) and term weighting techniques: Two techniques: Lucene and BM25; |
(2) Proximity-based techniques: Two techniques PRE and PLM that enhance BM25 with the proximity of the disease and the gene in the references; | |
(3) Position-and-frequency-based technique: A technique PosFreq that ranks candidate references by considering the positions and the frequencies of the disease and the gene in the references; | |
(4) Integrative techniques: Several rankers developed by combining the above techniques with SVMrank. | |
Evaluation criterion | (1) Mean Average Precision (MAP); |
(2) Average Precision at top-X (average P@X), with X set to 1, 2, and 3. |