An efficient path computing model for measuring semantic similarity using edge and density

Zhu, Xinhua; Li, Fei; Chen, Hongchao; Peng, Qi

doi:10.1007/s10115-017-1078-5

An efficient path computing model for measuring semantic similarity using edge and density

Regular Paper
Published: 28 June 2017

Volume 55, pages 79–111, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xinhua Zhu¹,
Fei Li¹,
Hongchao Chen¹ &
…
Qi Peng¹

759 Accesses
17 Citations
Explore all metrics

Abstract

The shortest path between two concepts in a taxonomic ontology is commonly used to represent the semantic distance between concepts in edge-based semantic similarity measures. In the past, edge counting, which is simple and intuitive and has low computational complexity, was considered the default method for path computation. However, a large lexical taxonomy, such as WordNet, has irregular link densities between concepts due to its broad domain, but edge counting-based path computation is powerless for this non-uniformity problem. In this paper, we advocate that the path computation can be separated from edge-based similarity measures and can form various general computing models. Therefore, to solve the problem of the non-uniformity of concept density in a large taxonomic ontology, we propose a new path computing model based on the compensation of local area density of concepts, which is equal to the number of direct hyponyms of the subsumers for concepts in the shortest path. This path model considers the local area density of concepts as an extension of the edge counting-based path according to the information theory. This model is a general path computing model and can be applied in various edge-based similarity approaches. The experimental results show that the proposed path model improves the average optimal correlation between edge-based measures and human judgments on the Miller and Charles benchmark for WordNet from less than 0.79 to more than 0.86, on the Pedersenet al. benchmark (average of both Physician and Coder) for SNOMED-CT from less than 0.75 to more than 0.82, and it has a large advantage in efficiency compared with information content computation in a dynamic ontology, thereby successfully improving the edge-based similarity measure as an excellent method with high performance and high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EAPB: entropy-aware path-based metric for ontology quality

Article Open access 10 August 2018

Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness

Article 28 March 2016

DIS-C: conceptual distance in ontologies, a graph-based approach

Article 26 April 2018

Notes

References

Srihari RK, Zhang ZF, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2(2–3):245–275
Article Google Scholar
Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of computational linguistics and intelligent text, pp 241–257
Snchez D, Morenoa A (2008) Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl Eng 64(3):600–623
Article Google Scholar
Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, Second meeting of the North American chapter of the association for computational linguistics, vol 2, issue 12, pp 29–34
Liu X, Zhou Y, Zheng R(2007) Measuring semantic similarity in WordNet. In: Proceedings of machine learning and cybernetics, pp 3431–3435
Kozima H (1994) Computing lexical cohesion as a tool for text analysis. Ph.D. thesis, Computer Science and Information Mathematics, Graduate School of Electro-Communications, University of Electro-Communications
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of on association for computational linguistics, pp 133–138
Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of t artificial intelligence, pp 1089–1090
Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456
Article Google Scholar
Zhou Z, Wang Y, Gu J (2008) New model of semantic similarity measuring in WordNet. In: Proceedings of intelligent system and knowledge engineering, pp 256-261
Hao D, Zuo WL, Peng T (2011) An approach for calculating semantic similarity between words using WordNet. In: Proceeding of digital manufacturing and automation, pp 177–180
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of research in computational linguistics, pp 19–33
Li Y, Bandar Z, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882
Article Google Scholar
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36(8):238–261
Article Google Scholar
Rada R, Mili H, Bicknell E et al (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Article Google Scholar
Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11(4):95–130
MATH Google Scholar
Borgida A, Walsh T, Hirsh H (2005) Towards measuring similarity in description logics. In: 2005 international workshop on description logics, pp 286–294
Claudia D (2007) Similarity-based learning methods for the semantic web. Ph.D. thesis, Department of Computer Science, University of Bari, Italy
Claudia D, Steffen S, Nicola F (2008) On the influence of description logics ontologies on conceptual similarity. In: Proceeding of knowledge engineering: practice and patterns, pp 48–63
Jan R (2002) Clustering and instance based learning in first order logic. Ph.D. thesis, Department of Computer Science, Leuven, Belgium
Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. MIT Press, Cambridge, pp 305–322
Google Scholar
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, Cambridge, pp 265–283
Google Scholar
Lin D (1998) An information-theoretic definition of similarity. In: Proceeding of machine learning, pp 296–304
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of artificial intelligence, pp 448–453
Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5(3):81–94
Google Scholar
Devitt A, Vogel C (2004) The topology of WordNet: some metrics. In: Proceeding of global Wordnet conference, pp 106–111
Spackman KA (2004) SNOMED CT milestones: endorsements are added to already impressive standards credentials. Healthc Inform Bus Mag Inf Commun Syst 21(9):54–56
Google Scholar
Harispe S, Ranwez S, Janaqi S et al (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):254
Google Scholar
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28
Article MathSciNet Google Scholar
Kipper KS (2006) VERNET: a broad-coverage comprehensive verb lexicon. http://repository.upenn.edu/dissertations/AAI3179808
Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceeding of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 86–90
Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 1098–1102
Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497
Article Google Scholar
Yang D, Powers D (2006) Verb similarity on the taxonomy of WordNet. In: Proceeding of global WordNet conference, pp 177–178
Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of information and knowledge management, pp 67–74
Sánchez D, Batet M (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303
Article Google Scholar
Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of EACL 2006 workshop on making sense of sense: bringing computational linguistics and psycholinguistics together, pp 1–8
Petrakis E, Varelas G, Hliaoutakis A et al (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4(4):233–237
Google Scholar
Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun Assoc Comput Mach 8(10):627–633
Google Scholar
Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical. J Biomed Inform 40(3):288–299
Article Google Scholar
Princeton University (2014) The MIT Java Wordnet interface. http://projects.csail.mit.edu/jwi/

Download references

Acknowledgements

This work has been supported by the National Natural Science Foundation of China under the Contract Numbers 61363036 and 61462010, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Xinhua Zhu, Fei Li, Hongchao Chen & Qi Peng

Authors

Xinhua Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongchao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qi Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Li.

Appendices

Appendix A: Pedersen et al. clinical term dataset

Term 1	ConceptId1	Term 2	ConceptId2	Physician	Coder	Both (average)
Renal failure	42399005	Kidney failure	42399005	4	4	4
Heart	80891009	Myocardium	74281007	3.3	3	3.15
Stroke	427296003	Infarct	427296003	3	2.8	2.9
Abortion	17369002	Miscarriage	17369002	3	3.3	3.15
Delusion	48500005	Schizophrenia	58214004	3	2.2	2.6
Congestive heart failure	42343007	Pulmonary edema	19242006	3	1.4	2.2
Metastasis	128462008	Adenocarcinoma	443961001	2.7	1.8	2.25
Calcification	125369001	Stenosis	415582006	2.7	2	2.35
Diarrhea	62315008	Stomach cramps	51197009	2.3	1.3	1.8
Mitral stenosis	79619009	Atrial fibrillation	49436004	2.3	1.3	1.8
Chronic obstructive pulmonary disease	313297008	Lung infiltrates	19242006	2.3	1.9	2.1
Rheumatoid arthritis	69896004	Lupus	200936003	2	1.1	1.55
Brain tumor	254935002	Intracranial hemorrhage	1386000	2	1.3	1.65
Carpel tunnel syndrome	57406009	Osteoarthritis	396275006	2	1.1	1.55
Diabetes mellitus	73211009	Hypertension	38341003	2	1	1.5
Acne	13101006	Syringes	228399008	2	1	1.5
Antibiotic	255631004	Allergy	705097000	1.7	1.2	1.45
Cortisone	32498003	Total knee replacement	179344006	1.7	1	1.35
Pulmonary embolus	194883006	Myocardial infarction	22298006	1.7	1.2	1.45
Pulmonary fibrosis	51615001	Lung cancer	363358000	1.7	1.4	1.55
Cholangiocarcinoma	70179006	Colonoscopy	73761001	1.3	1	1.15
Lymphoid hyperplasia	128863005	Laryngeal cancer	363429002	1.3	1	1.15
Multiple sclerosis	24700007	Psychosis	69322001	1	1	1
Appendicitis	74400008	Osteoporosis	64859006	1	1	1
Rectal polyp	39772007	Aorta	15825003	1	1	1
Xerostomia	87715008	Alcoholic cirrhosis	420054005	1	1	1
Peptic ulcer disease	13200003	Myopia	57190000	1	1	1
Depression	35489007	Cellulites	128045006	1	1	1
Varicose vein	12856003	Entire knee meniscus	244568009	1	1	1
Hyperlipidemia	55822004	Metastasis	363346000	1	1	1

Appendix B: Topological parameters on SNOMED-CT (2016)

Liu-1 on Pedersen30 (\(\lambda =0.3\))

ConceptId1	ConceptId2	LCSDepth	PathLen	AreaDensity	AreaDepth	Value
42399005	42399005	11	0	0	11	1.0
80891009	74281007	13	3	29	14	0.7654
427296003	427296003	10	0	0	10	1.0
17369002	17369002	7	0	0	7	1.0
48500005	58214004	3	3	99	4	0.2074
42343007	19242006	7	6	307	9	0.2816
128462008	443961001	5	2	154	5.67	0.3093
125369001	415582006	4	6	151	6	0.2116
62315008	51197009	2	10	321	5.3	0.0606
79619009	49436004	9	6	192	11	0.4213
313297008	19242006	8	3	118	9	0.5119
69896004	200936003	3	2	56	3.67	0.2931
254935002	1386000	6	3	125	7	0.3949
57406009	396275006	2	9	380	5	0.0541
73211009	38341003	4	4	140	5.33	0.2344
13101006	228399008	0	10	266	4	0.0
255631004	705097000	0	12	630	4	0.0
32498003	179344006	0	18	871	6	0.0
194883006	22298006	5	6	174	7	0.2525
51615001	363358000	8	3	146	9	0.4804
70179006	73761001	0	19	1430	6.33	0.0
128863005	363429002	3	8	270	5.67	0.1090
24700007	69322001	2	5	407	3.67	0.0454
74400008	64859006	3	8	455	5.67	0.0784
39772007	15825003	0	6	203	7	0.0
87715008	420054005	6	8	148	8.67	0.2936
13200003	57190000	4	10	149	7.33	0.1843
35489007	128045006	2	5	228	3.67	0.0714
12856003	244568009	1	14	201	5.67	0.0356
55822004	363346000	2	7	261	4.33	0.0676

Appendix C: Topological parameters on WordNet 3.0

Liu-1 on MC30 (\(\lambda =0.3\))

Word1	Word2	LCSDepth	PathLen	AreaDensity	AreaDepth	Value
Car	Automobile	10	0	31	10.0	1.0
Gem	Jewel	8	0	7	8.0	1.0
Journey	Voyage	9	1	16	9.3333	0.8438
Boy	Lad	8	1	11	8.3333	0.8390
Coast	Shore	4	1	3	4.3333	0.7507
Asylum	Madhouse	9	1	1	9.3333	0.8880
Magician	Wizard	5	0	5	5.0	1.0
Midday	Noon	9	0	0	9.0	1.0
Furnace	Stove	4	12	212	8.0	0.1542
Food	Fruit	2	9	121	5.0	0.1006
Bird	Cock	9	1	26	9.3333	0.8167
Bird	Crane	9	3	49	10.0	0.6467
Tool	Implement	6	1	30	6.3333	0.6926
Brother	Monk	9	1	3	9.3333	0.8818
Crane	Implement	5	4	145	6.3333	0.2949
Lad	Brother	6	4	426	7.3333	0.2029
Journey	Car	0	18	455	6.0	0.0
Monk	Oracle	6	7	475	8.3333	0.1846
Cemetery	Woodland	2	8	179	4.6667	0.0853
Food	Rooster	1	15	239	6.0	0.0326
Coast	Hill	3	4	37	4.3333	0.2936
Forest	Graveyard	2	8	179	4.6667	0.0853
Shore	Woodland	2	4	82	3.3333	0.1378
Monk	Slave	6	4	440	7.3333	0.1987
Coast	Forest	2	5	85	3.6667	0.1320
Lad	Wizard	6	4	417	7.3333	0.2057
Chord	Smile	2	10	122	5.3333	0.0973
Glass	Magician	3	11	534	6.6667	0.0722
Noon	String	1	13	124	5.3333	0.0435
Rooster	Voyage	0	23	400	7.6667	0.0

Appendix D: Liu-1 on RG65 (\(\lambda =0.3\))

Cord	Smile	0.0438	Mound	Shore	0.2914	Brother	Monk	0.8818
Rooster	Voyage	0.0	Lad	Wizard	0.2057	Asylum	Madhouse	0.8880
Noon	String	0.0435	Forest	Graveyard	0.0853	Furnace	Stove	0.1542
Fruit	Furnace	0.1774	Food	Rooster	0.0326	Magician	Wizard	1.0
Autograph	Shore	0.0	Cemetery	Woodland	0.0853	Hill	Mound	1.0
Automobile	Wizard	0.0687	Shore	Voyage	0.0	Cord	String	0.7097
Mound	Stove	0.2271	Bird	Woodland	0.0880	Glass	Tumbler	0.8018
Grin	Implement	0.0	Coast	Hill	0.2936	Grin	Smile	1.0
Asylum	Fruit	0.2147	Furnace	Implement	0.1870	Serf	Slave	0.6560
Asylum	Monk	0.0648	Crane	Rooster	0.4751	Journey	Voyage	0.8438
Graveyard	Madhouse	0.0585	Hill	Woodland	0.1297	Autograph	Signature	0.8152
Glass	Magician	0.0722	Car	Journey	0.0	Coast	Shore	0.7507
Boy	Rooster	0.1282	Cemetery	Mound	0.0788	Forest	Woodland	1.0
Cushion	Jewel	0.2318	Glass	Jewel	0.1947	Implement	Tool	0.6926
Monk	Slave	0.1987	Magician	Oracle	0.1949	Cock	Rooster	1.0
Asylum	Cemetery	0.0645	Crane	Implement	0.2949	Boy	Lad	0.8390
Coast	Forest	0.1320	Brother	Lad	0.2029	Cushion	Pillow	0.7982
Grin	Lad	0.0	Sage	Wizard	0.2002	Cemetery	Graveyard	1.0
Shore	Woodland	0.1378	Oracle	Sage	0.5047	Automobile	Car	1.0
Monk	Oracle	0.1846	Bird	Crane	0.6467	Midday	Noon	1.0
Boy	Sage	0.1982	Bird	Cock	0.8167	Gem	Jewel	1.0
Automobile	Cushion	0.2137	Food	Fruit	0.1006

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Li, F., Chen, H. et al. An efficient path computing model for measuring semantic similarity using edge and density. Knowl Inf Syst 55, 79–111 (2018). https://doi.org/10.1007/s10115-017-1078-5

Download citation

Received: 29 January 2016
Revised: 06 June 2017
Accepted: 16 June 2017
Published: 28 June 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10115-017-1078-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient path computing model for measuring semantic similarity using edge and density

Abstract

Access this article

Similar content being viewed by others

EAPB: entropy-aware path-based metric for ontology quality

Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness

DIS-C: conceptual distance in ontologies, a graph-based approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Pedersen et al. clinical term dataset

Appendix B: Topological parameters on SNOMED-CT (2016)

Appendix C: Topological parameters on WordNet 3.0

Appendix D: Liu-1 on RG65 (\(\lambda =0.3\))

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient path computing model for measuring semantic similarity using edge and density

Abstract

Access this article

Similar content being viewed by others

EAPB: entropy-aware path-based metric for ontology quality

Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness

DIS-C: conceptual distance in ontologies, a graph-based approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Pedersen et al. clinical term dataset

Appendix B: Topological parameters on SNOMED-CT (2016)

Appendix C: Topological parameters on WordNet 3.0

Appendix D: Liu-1 on RG65 (\(\lambda =0.3\))

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation