Abstract
The shortest path between two concepts in a taxonomic ontology is commonly used to represent the semantic distance between concepts in edge-based semantic similarity measures. In the past, edge counting, which is simple and intuitive and has low computational complexity, was considered the default method for path computation. However, a large lexical taxonomy, such as WordNet, has irregular link densities between concepts due to its broad domain, but edge counting-based path computation is powerless for this non-uniformity problem. In this paper, we advocate that the path computation can be separated from edge-based similarity measures and can form various general computing models. Therefore, to solve the problem of the non-uniformity of concept density in a large taxonomic ontology, we propose a new path computing model based on the compensation of local area density of concepts, which is equal to the number of direct hyponyms of the subsumers for concepts in the shortest path. This path model considers the local area density of concepts as an extension of the edge counting-based path according to the information theory. This model is a general path computing model and can be applied in various edge-based similarity approaches. The experimental results show that the proposed path model improves the average optimal correlation between edge-based measures and human judgments on the Miller and Charles benchmark for WordNet from less than 0.79 to more than 0.86, on the Pedersenet al. benchmark (average of both Physician and Coder) for SNOMED-CT from less than 0.75 to more than 0.82, and it has a large advantage in efficiency compared with information content computation in a dynamic ontology, thereby successfully improving the edge-based similarity measure as an excellent method with high performance and high efficiency.
Similar content being viewed by others
References
Srihari RK, Zhang ZF, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2(2–3):245–275
Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of computational linguistics and intelligent text, pp 241–257
Snchez D, Morenoa A (2008) Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl Eng 64(3):600–623
Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, Second meeting of the North American chapter of the association for computational linguistics, vol 2, issue 12, pp 29–34
Liu X, Zhou Y, Zheng R(2007) Measuring semantic similarity in WordNet. In: Proceedings of machine learning and cybernetics, pp 3431–3435
Kozima H (1994) Computing lexical cohesion as a tool for text analysis. Ph.D. thesis, Computer Science and Information Mathematics, Graduate School of Electro-Communications, University of Electro-Communications
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of on association for computational linguistics, pp 133–138
Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of t artificial intelligence, pp 1089–1090
Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456
Zhou Z, Wang Y, Gu J (2008) New model of semantic similarity measuring in WordNet. In: Proceedings of intelligent system and knowledge engineering, pp 256-261
Hao D, Zuo WL, Peng T (2011) An approach for calculating semantic similarity between words using WordNet. In: Proceeding of digital manufacturing and automation, pp 177–180
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of research in computational linguistics, pp 19–33
Li Y, Bandar Z, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36(8):238–261
Rada R, Mili H, Bicknell E et al (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11(4):95–130
Borgida A, Walsh T, Hirsh H (2005) Towards measuring similarity in description logics. In: 2005 international workshop on description logics, pp 286–294
Claudia D (2007) Similarity-based learning methods for the semantic web. Ph.D. thesis, Department of Computer Science, University of Bari, Italy
Claudia D, Steffen S, Nicola F (2008) On the influence of description logics ontologies on conceptual similarity. In: Proceeding of knowledge engineering: practice and patterns, pp 48–63
Jan R (2002) Clustering and instance based learning in first order logic. Ph.D. thesis, Department of Computer Science, Leuven, Belgium
Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. MIT Press, Cambridge, pp 305–322
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, Cambridge, pp 265–283
Lin D (1998) An information-theoretic definition of similarity. In: Proceeding of machine learning, pp 296–304
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of artificial intelligence, pp 448–453
Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5(3):81–94
Devitt A, Vogel C (2004) The topology of WordNet: some metrics. In: Proceeding of global Wordnet conference, pp 106–111
Spackman KA (2004) SNOMED CT milestones: endorsements are added to already impressive standards credentials. Healthc Inform Bus Mag Inf Commun Syst 21(9):54–56
Harispe S, Ranwez S, Janaqi S et al (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):254
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28
Kipper KS (2006) VERNET: a broad-coverage comprehensive verb lexicon. http://repository.upenn.edu/dissertations/AAI3179808
Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceeding of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 86–90
Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 1098–1102
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497
Yang D, Powers D (2006) Verb similarity on the taxonomy of WordNet. In: Proceeding of global WordNet conference, pp 177–178
Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of information and knowledge management, pp 67–74
Sánchez D, Batet M (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303
Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of EACL 2006 workshop on making sense of sense: bringing computational linguistics and psycholinguistics together, pp 1–8
Petrakis E, Varelas G, Hliaoutakis A et al (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4(4):233–237
Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun Assoc Comput Mach 8(10):627–633
Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical. J Biomed Inform 40(3):288–299
Princeton University (2014) The MIT Java Wordnet interface. http://projects.csail.mit.edu/jwi/
Acknowledgements
This work has been supported by the National Natural Science Foundation of China under the Contract Numbers 61363036 and 61462010, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Pedersen et al. clinical term dataset
Term 1 | ConceptId1 | Term 2 | ConceptId2 | Physician | Coder | Both (average) |
---|---|---|---|---|---|---|
Renal failure | 42399005 | Kidney failure | 42399005 | 4 | 4 | 4 |
Heart | 80891009 | Myocardium | 74281007 | 3.3 | 3 | 3.15 |
Stroke | 427296003 | Infarct | 427296003 | 3 | 2.8 | 2.9 |
Abortion | 17369002 | Miscarriage | 17369002 | 3 | 3.3 | 3.15 |
Delusion | 48500005 | Schizophrenia | 58214004 | 3 | 2.2 | 2.6 |
Congestive heart failure | 42343007 | Pulmonary edema | 19242006 | 3 | 1.4 | 2.2 |
Metastasis | 128462008 | Adenocarcinoma | 443961001 | 2.7 | 1.8 | 2.25 |
Calcification | 125369001 | Stenosis | 415582006 | 2.7 | 2 | 2.35 |
Diarrhea | 62315008 | Stomach cramps | 51197009 | 2.3 | 1.3 | 1.8 |
Mitral stenosis | 79619009 | Atrial fibrillation | 49436004 | 2.3 | 1.3 | 1.8 |
Chronic obstructive pulmonary disease | 313297008 | Lung infiltrates | 19242006 | 2.3 | 1.9 | 2.1 |
Rheumatoid arthritis | 69896004 | Lupus | 200936003 | 2 | 1.1 | 1.55 |
Brain tumor | 254935002 | Intracranial hemorrhage | 1386000 | 2 | 1.3 | 1.65 |
Carpel tunnel syndrome | 57406009 | Osteoarthritis | 396275006 | 2 | 1.1 | 1.55 |
Diabetes mellitus | 73211009 | Hypertension | 38341003 | 2 | 1 | 1.5 |
Acne | 13101006 | Syringes | 228399008 | 2 | 1 | 1.5 |
Antibiotic | 255631004 | Allergy | 705097000 | 1.7 | 1.2 | 1.45 |
Cortisone | 32498003 | Total knee replacement | 179344006 | 1.7 | 1 | 1.35 |
Pulmonary embolus | 194883006 | Myocardial infarction | 22298006 | 1.7 | 1.2 | 1.45 |
Pulmonary fibrosis | 51615001 | Lung cancer | 363358000 | 1.7 | 1.4 | 1.55 |
Cholangiocarcinoma | 70179006 | Colonoscopy | 73761001 | 1.3 | 1 | 1.15 |
Lymphoid hyperplasia | 128863005 | Laryngeal cancer | 363429002 | 1.3 | 1 | 1.15 |
Multiple sclerosis | 24700007 | Psychosis | 69322001 | 1 | 1 | 1 |
Appendicitis | 74400008 | Osteoporosis | 64859006 | 1 | 1 | 1 |
Rectal polyp | 39772007 | Aorta | 15825003 | 1 | 1 | 1 |
Xerostomia | 87715008 | Alcoholic cirrhosis | 420054005 | 1 | 1 | 1 |
Peptic ulcer disease | 13200003 | Myopia | 57190000 | 1 | 1 | 1 |
Depression | 35489007 | Cellulites | 128045006 | 1 | 1 | 1 |
Varicose vein | 12856003 | Entire knee meniscus | 244568009 | 1 | 1 | 1 |
Hyperlipidemia | 55822004 | Metastasis | 363346000 | 1 | 1 | 1 |
Appendix B: Topological parameters on SNOMED-CT (2016)
Liu-1 on Pedersen30 (\(\lambda =0.3\))
ConceptId1 | ConceptId2 | LCSDepth | PathLen | AreaDensity | AreaDepth | Value |
---|---|---|---|---|---|---|
42399005 | 42399005 | 11 | 0 | 0 | 11 | 1.0 |
80891009 | 74281007 | 13 | 3 | 29 | 14 | 0.7654 |
427296003 | 427296003 | 10 | 0 | 0 | 10 | 1.0 |
17369002 | 17369002 | 7 | 0 | 0 | 7 | 1.0 |
48500005 | 58214004 | 3 | 3 | 99 | 4 | 0.2074 |
42343007 | 19242006 | 7 | 6 | 307 | 9 | 0.2816 |
128462008 | 443961001 | 5 | 2 | 154 | 5.67 | 0.3093 |
125369001 | 415582006 | 4 | 6 | 151 | 6 | 0.2116 |
62315008 | 51197009 | 2 | 10 | 321 | 5.3 | 0.0606 |
79619009 | 49436004 | 9 | 6 | 192 | 11 | 0.4213 |
313297008 | 19242006 | 8 | 3 | 118 | 9 | 0.5119 |
69896004 | 200936003 | 3 | 2 | 56 | 3.67 | 0.2931 |
254935002 | 1386000 | 6 | 3 | 125 | 7 | 0.3949 |
57406009 | 396275006 | 2 | 9 | 380 | 5 | 0.0541 |
73211009 | 38341003 | 4 | 4 | 140 | 5.33 | 0.2344 |
13101006 | 228399008 | 0 | 10 | 266 | 4 | 0.0 |
255631004 | 705097000 | 0 | 12 | 630 | 4 | 0.0 |
32498003 | 179344006 | 0 | 18 | 871 | 6 | 0.0 |
194883006 | 22298006 | 5 | 6 | 174 | 7 | 0.2525 |
51615001 | 363358000 | 8 | 3 | 146 | 9 | 0.4804 |
70179006 | 73761001 | 0 | 19 | 1430 | 6.33 | 0.0 |
128863005 | 363429002 | 3 | 8 | 270 | 5.67 | 0.1090 |
24700007 | 69322001 | 2 | 5 | 407 | 3.67 | 0.0454 |
74400008 | 64859006 | 3 | 8 | 455 | 5.67 | 0.0784 |
39772007 | 15825003 | 0 | 6 | 203 | 7 | 0.0 |
87715008 | 420054005 | 6 | 8 | 148 | 8.67 | 0.2936 |
13200003 | 57190000 | 4 | 10 | 149 | 7.33 | 0.1843 |
35489007 | 128045006 | 2 | 5 | 228 | 3.67 | 0.0714 |
12856003 | 244568009 | 1 | 14 | 201 | 5.67 | 0.0356 |
55822004 | 363346000 | 2 | 7 | 261 | 4.33 | 0.0676 |
Appendix C: Topological parameters on WordNet 3.0
Liu-1 on MC30 (\(\lambda =0.3\))
Word1 | Word2 | LCSDepth | PathLen | AreaDensity | AreaDepth | Value |
---|---|---|---|---|---|---|
Car | Automobile | 10 | 0 | 31 | 10.0 | 1.0 |
Gem | Jewel | 8 | 0 | 7 | 8.0 | 1.0 |
Journey | Voyage | 9 | 1 | 16 | 9.3333 | 0.8438 |
Boy | Lad | 8 | 1 | 11 | 8.3333 | 0.8390 |
Coast | Shore | 4 | 1 | 3 | 4.3333 | 0.7507 |
Asylum | Madhouse | 9 | 1 | 1 | 9.3333 | 0.8880 |
Magician | Wizard | 5 | 0 | 5 | 5.0 | 1.0 |
Midday | Noon | 9 | 0 | 0 | 9.0 | 1.0 |
Furnace | Stove | 4 | 12 | 212 | 8.0 | 0.1542 |
Food | Fruit | 2 | 9 | 121 | 5.0 | 0.1006 |
Bird | Cock | 9 | 1 | 26 | 9.3333 | 0.8167 |
Bird | Crane | 9 | 3 | 49 | 10.0 | 0.6467 |
Tool | Implement | 6 | 1 | 30 | 6.3333 | 0.6926 |
Brother | Monk | 9 | 1 | 3 | 9.3333 | 0.8818 |
Crane | Implement | 5 | 4 | 145 | 6.3333 | 0.2949 |
Lad | Brother | 6 | 4 | 426 | 7.3333 | 0.2029 |
Journey | Car | 0 | 18 | 455 | 6.0 | 0.0 |
Monk | Oracle | 6 | 7 | 475 | 8.3333 | 0.1846 |
Cemetery | Woodland | 2 | 8 | 179 | 4.6667 | 0.0853 |
Food | Rooster | 1 | 15 | 239 | 6.0 | 0.0326 |
Coast | Hill | 3 | 4 | 37 | 4.3333 | 0.2936 |
Forest | Graveyard | 2 | 8 | 179 | 4.6667 | 0.0853 |
Shore | Woodland | 2 | 4 | 82 | 3.3333 | 0.1378 |
Monk | Slave | 6 | 4 | 440 | 7.3333 | 0.1987 |
Coast | Forest | 2 | 5 | 85 | 3.6667 | 0.1320 |
Lad | Wizard | 6 | 4 | 417 | 7.3333 | 0.2057 |
Chord | Smile | 2 | 10 | 122 | 5.3333 | 0.0973 |
Glass | Magician | 3 | 11 | 534 | 6.6667 | 0.0722 |
Noon | String | 1 | 13 | 124 | 5.3333 | 0.0435 |
Rooster | Voyage | 0 | 23 | 400 | 7.6667 | 0.0 |
Appendix D: Liu-1 on RG65 (\(\lambda =0.3\))
Cord | Smile | 0.0438 | Mound | Shore | 0.2914 | Brother | Monk | 0.8818 |
Rooster | Voyage | 0.0 | Lad | Wizard | 0.2057 | Asylum | Madhouse | 0.8880 |
Noon | String | 0.0435 | Forest | Graveyard | 0.0853 | Furnace | Stove | 0.1542 |
Fruit | Furnace | 0.1774 | Food | Rooster | 0.0326 | Magician | Wizard | 1.0 |
Autograph | Shore | 0.0 | Cemetery | Woodland | 0.0853 | Hill | Mound | 1.0 |
Automobile | Wizard | 0.0687 | Shore | Voyage | 0.0 | Cord | String | 0.7097 |
Mound | Stove | 0.2271 | Bird | Woodland | 0.0880 | Glass | Tumbler | 0.8018 |
Grin | Implement | 0.0 | Coast | Hill | 0.2936 | Grin | Smile | 1.0 |
Asylum | Fruit | 0.2147 | Furnace | Implement | 0.1870 | Serf | Slave | 0.6560 |
Asylum | Monk | 0.0648 | Crane | Rooster | 0.4751 | Journey | Voyage | 0.8438 |
Graveyard | Madhouse | 0.0585 | Hill | Woodland | 0.1297 | Autograph | Signature | 0.8152 |
Glass | Magician | 0.0722 | Car | Journey | 0.0 | Coast | Shore | 0.7507 |
Boy | Rooster | 0.1282 | Cemetery | Mound | 0.0788 | Forest | Woodland | 1.0 |
Cushion | Jewel | 0.2318 | Glass | Jewel | 0.1947 | Implement | Tool | 0.6926 |
Monk | Slave | 0.1987 | Magician | Oracle | 0.1949 | Cock | Rooster | 1.0 |
Asylum | Cemetery | 0.0645 | Crane | Implement | 0.2949 | Boy | Lad | 0.8390 |
Coast | Forest | 0.1320 | Brother | Lad | 0.2029 | Cushion | Pillow | 0.7982 |
Grin | Lad | 0.0 | Sage | Wizard | 0.2002 | Cemetery | Graveyard | 1.0 |
Shore | Woodland | 0.1378 | Oracle | Sage | 0.5047 | Automobile | Car | 1.0 |
Monk | Oracle | 0.1846 | Bird | Crane | 0.6467 | Midday | Noon | 1.0 |
Boy | Sage | 0.1982 | Bird | Cock | 0.8167 | Gem | Jewel | 1.0 |
Automobile | Cushion | 0.2137 | Food | Fruit | 0.1006 |
Rights and permissions
About this article
Cite this article
Zhu, X., Li, F., Chen, H. et al. An efficient path computing model for measuring semantic similarity using edge and density. Knowl Inf Syst 55, 79–111 (2018). https://doi.org/10.1007/s10115-017-1078-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1078-5