Skip to main content

Table 1 Summary results of different methods on the four simulated datasets

From: A new method to measure the semantic similarity from query phenotypic abnormalities to diseases based on the human phenotype ontology

Dataset 1(Noise:-, Imprecision:-)
  Resnik Lin JC Rel IC GraphIC Wang RBP
Top 1 1027 1016 1029 1018 1021 1029 1023 1031
Top 5 1087 1071 1082 1071 1075 1079 1078 1091
Top 10 1089 1077 1088 1077 1079 1081 1081 1095
Top 20 1092 1078 1092 1078 1080 1083 1081 1096
Dataset 2(Noise:+, Imprecision:-)
  Resnik Lin JC Rel IC GraphIC Wang RBP
Top 1 992 997 1036 996 1006 1031 1001 1030
Top 5 1074 1059 1081 1063 1070 1077 1071 1089
Top 10 1081 1069 1086 1071 1077 1080 1078 1094
Top 20 1087 1074 1089 1076 1078 1083 1079 1095
Dataset 3(Noise:-, Imprecision:+)
  Resnik Lin JC Rel IC GraphIC Wang RBP
Top 1 434 243 104 302 336 120 172 438
Top 5 767 502 261 583 603 341 446 765
Top 10 866 613 342 685 707 482 604 863
Top 20 926 714 440 785 797 620 725 926
Dataset 4(Noise:+, Imprecision:+)
  Resnik Lin JC Rel IC GraphIC Wang RBP
Top 1 183 130 97 143 162 73 77 370
Top 5 453 327 239 383 406 252 263 694
Top 10 579 452 319 509 533 393 384 786
Top 20 703 570 420 640 657 540 535 860
  1. Resnik the Resnik measure, Lin the Lin measure, JC the Jiang-Conrath measure, Rel the Relevance measure, IC the information coefficient measure, GraphIC the graph IC measure, Wang the Wang measure, RBP RelativeBestPair method
  2. The seven existing measures are all implemented with one-sided search algorithm. The numbers represent the number of patients in 1100 cases that the true diseases are ranked within top 1, top 5, top 10 or top 20