Skip to main content
Log in

An efficient path computing model for measuring semantic similarity using edge and density

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The shortest path between two concepts in a taxonomic ontology is commonly used to represent the semantic distance between concepts in edge-based semantic similarity measures. In the past, edge counting, which is simple and intuitive and has low computational complexity, was considered the default method for path computation. However, a large lexical taxonomy, such as WordNet, has irregular link densities between concepts due to its broad domain, but edge counting-based path computation is powerless for this non-uniformity problem. In this paper, we advocate that the path computation can be separated from edge-based similarity measures and can form various general computing models. Therefore, to solve the problem of the non-uniformity of concept density in a large taxonomic ontology, we propose a new path computing model based on the compensation of local area density of concepts, which is equal to the number of direct hyponyms of the subsumers for concepts in the shortest path. This path model considers the local area density of concepts as an extension of the edge counting-based path according to the information theory. This model is a general path computing model and can be applied in various edge-based similarity approaches. The experimental results show that the proposed path model improves the average optimal correlation between edge-based measures and human judgments on the Miller and Charles benchmark for WordNet from less than 0.79 to more than 0.86, on the Pedersenet al. benchmark (average of both Physician and Coder) for SNOMED-CT from less than 0.75 to more than 0.82, and it has a large advantage in efficiency compared with information content computation in a dynamic ontology, thereby successfully improving the edge-based similarity measure as an excellent method with high performance and high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://wordnet.princeton.edu/wordnet/download/current-version/.

  2. http://www.nlm.nih.gov/research/umls/licensedcontent/snomedctfiles.html.

  3. http://pythonhosted.org/PyMedTermino/index.html.

  4. http://en.wikipedia.org/wiki/.

References

  1. Srihari RK, Zhang ZF, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2(2–3):245–275

    Article  Google Scholar 

  2. Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of computational linguistics and intelligent text, pp 241–257

  3. Snchez D, Morenoa A (2008) Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl Eng 64(3):600–623

    Article  Google Scholar 

  4. Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, Second meeting of the North American chapter of the association for computational linguistics, vol 2, issue 12, pp 29–34

  5. Liu X, Zhou Y, Zheng R(2007) Measuring semantic similarity in WordNet. In: Proceedings of machine learning and cybernetics, pp 3431–3435

  6. Kozima H (1994) Computing lexical cohesion as a tool for text analysis. Ph.D. thesis, Computer Science and Information Mathematics, Graduate School of Electro-Communications, University of Electro-Communications

  7. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of on association for computational linguistics, pp 133–138

  8. Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of t artificial intelligence, pp 1089–1090

  9. Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456

    Article  Google Scholar 

  10. Zhou Z, Wang Y, Gu J (2008) New model of semantic similarity measuring in WordNet. In: Proceedings of intelligent system and knowledge engineering, pp 256-261

  11. Hao D, Zuo WL, Peng T (2011) An approach for calculating semantic similarity between words using WordNet. In: Proceeding of digital manufacturing and automation, pp 177–180

  12. Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of research in computational linguistics, pp 19–33

  13. Li Y, Bandar Z, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882

    Article  Google Scholar 

  14. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36(8):238–261

    Article  Google Scholar 

  15. Rada R, Mili H, Bicknell E et al (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30

    Article  Google Scholar 

  16. Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11(4):95–130

    MATH  Google Scholar 

  17. Borgida A, Walsh T, Hirsh H (2005) Towards measuring similarity in description logics. In: 2005 international workshop on description logics, pp 286–294

  18. Claudia D (2007) Similarity-based learning methods for the semantic web. Ph.D. thesis, Department of Computer Science, University of Bari, Italy

  19. Claudia D, Steffen S, Nicola F (2008) On the influence of description logics ontologies on conceptual similarity. In: Proceeding of knowledge engineering: practice and patterns, pp 48–63

  20. Jan R (2002) Clustering and instance based learning in first order logic. Ph.D. thesis, Department of Computer Science, Leuven, Belgium

  21. Hirst G, St-Onge D (1998) Lexical chains as representations of context for the detection and correction of malapropisms. MIT Press, Cambridge, pp 305–322

    Google Scholar 

  22. Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, Cambridge, pp 265–283

    Google Scholar 

  23. Lin D (1998) An information-theoretic definition of similarity. In: Proceeding of machine learning, pp 296–304

  24. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of artificial intelligence, pp 448–453

  25. Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5(3):81–94

    Google Scholar 

  26. Devitt A, Vogel C (2004) The topology of WordNet: some metrics. In: Proceeding of global Wordnet conference, pp 106–111

  27. Spackman KA (2004) SNOMED CT milestones: endorsements are added to already impressive standards credentials. Healthc Inform Bus Mag Inf Commun Syst 21(9):54–56

    Google Scholar 

  28. Harispe S, Ranwez S, Janaqi S et al (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):254

    Google Scholar 

  29. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28

    Article  MathSciNet  Google Scholar 

  30. Kipper KS (2006) VERNET: a broad-coverage comprehensive verb lexicon. http://repository.upenn.edu/dissertations/AAI3179808

  31. Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceeding of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 86–90

  32. Richardson SD, Dolan WB, Vanderwende L (1998) MindNet: acquiring and structuring semantic information from text. In: the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, pp 1098–1102

  33. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497

    Article  Google Scholar 

  34. Yang D, Powers D (2006) Verb similarity on the taxonomy of WordNet. In: Proceeding of global WordNet conference, pp 177–178

  35. Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of information and knowledge management, pp 67–74

  36. Sánchez D, Batet M (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303

    Article  Google Scholar 

  37. Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of EACL 2006 workshop on making sense of sense: bringing computational linguistics and psycholinguistics together, pp 1–8

  38. Petrakis E, Varelas G, Hliaoutakis A et al (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4(4):233–237

    Google Scholar 

  39. Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun Assoc Comput Mach 8(10):627–633

    Google Scholar 

  40. Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical. J Biomed Inform 40(3):288–299

    Article  Google Scholar 

  41. Princeton University (2014) The MIT Java Wordnet interface. http://projects.csail.mit.edu/jwi/

Download references

Acknowledgements

This work has been supported by the National Natural Science Foundation of China under the Contract Numbers 61363036 and 61462010, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Li.

Appendices

Appendix A: Pedersen et al. clinical term dataset

Term 1

ConceptId1

Term 2

ConceptId2

Physician

Coder

Both (average)

Renal failure

42399005

Kidney failure

42399005

4

4

4

Heart

80891009

Myocardium

74281007

3.3

3

3.15

Stroke

427296003

Infarct

427296003

3

2.8

2.9

Abortion

17369002

Miscarriage

17369002

3

3.3

3.15

Delusion

48500005

Schizophrenia

58214004

3

2.2

2.6

Congestive heart failure

42343007

Pulmonary edema

19242006

3

1.4

2.2

Metastasis

128462008

Adenocarcinoma

443961001

2.7

1.8

2.25

Calcification

125369001

Stenosis

415582006

2.7

2

2.35

Diarrhea

62315008

Stomach cramps

51197009

2.3

1.3

1.8

Mitral stenosis

79619009

Atrial fibrillation

49436004

2.3

1.3

1.8

Chronic obstructive pulmonary disease

313297008

Lung infiltrates

19242006

2.3

1.9

2.1

Rheumatoid arthritis

69896004

Lupus

200936003

2

1.1

1.55

Brain tumor

254935002

Intracranial hemorrhage

1386000

2

1.3

1.65

Carpel tunnel syndrome

57406009

Osteoarthritis

396275006

2

1.1

1.55

Diabetes mellitus

73211009

Hypertension

38341003

2

1

1.5

Acne

13101006

Syringes

228399008

2

1

1.5

Antibiotic

255631004

Allergy

705097000

1.7

1.2

1.45

Cortisone

32498003

Total knee replacement

179344006

1.7

1

1.35

Pulmonary embolus

194883006

Myocardial infarction

22298006

1.7

1.2

1.45

Pulmonary fibrosis

51615001

Lung cancer

363358000

1.7

1.4

1.55

Cholangiocarcinoma

70179006

Colonoscopy

73761001

1.3

1

1.15

Lymphoid hyperplasia

128863005

Laryngeal cancer

363429002

1.3

1

1.15

Multiple sclerosis

24700007

Psychosis

69322001

1

1

1

Appendicitis

74400008

Osteoporosis

64859006

1

1

1

Rectal polyp

39772007

Aorta

15825003

1

1

1

Xerostomia

87715008

Alcoholic cirrhosis

420054005

1

1

1

Peptic ulcer disease

13200003

Myopia

57190000

1

1

1

Depression

35489007

Cellulites

128045006

1

1

1

Varicose vein

12856003

Entire knee meniscus

244568009

1

1

1

Hyperlipidemia

55822004

Metastasis

363346000

1

1

1

Appendix B: Topological parameters on SNOMED-CT (2016)

Liu-1 on Pedersen30 (\(\lambda =0.3\))

ConceptId1

ConceptId2

LCSDepth

PathLen

AreaDensity

AreaDepth

Value

42399005

42399005

11

0

0

11

1.0

80891009

74281007

13

3

29

14

0.7654

427296003

427296003

10

0

0

10

1.0

17369002

17369002

7

0

0

7

1.0

48500005

58214004

3

3

99

4

0.2074

42343007

19242006

7

6

307

9

0.2816

128462008

443961001

5

2

154

5.67

0.3093

125369001

415582006

4

6

151

6

0.2116

62315008

51197009

2

10

321

5.3

0.0606

79619009

49436004

9

6

192

11

0.4213

313297008

19242006

8

3

118

9

0.5119

69896004

200936003

3

2

56

3.67

0.2931

254935002

1386000

6

3

125

7

0.3949

57406009

396275006

2

9

380

5

0.0541

73211009

38341003

4

4

140

5.33

0.2344

13101006

228399008

0

10

266

4

0.0

255631004

705097000

0

12

630

4

0.0

32498003

179344006

0

18

871

6

0.0

194883006

22298006

5

6

174

7

0.2525

51615001

363358000

8

3

146

9

0.4804

70179006

73761001

0

19

1430

6.33

0.0

128863005

363429002

3

8

270

5.67

0.1090

24700007

69322001

2

5

407

3.67

0.0454

74400008

64859006

3

8

455

5.67

0.0784

39772007

15825003

0

6

203

7

0.0

87715008

420054005

6

8

148

8.67

0.2936

13200003

57190000

4

10

149

7.33

0.1843

35489007

128045006

2

5

228

3.67

0.0714

12856003

244568009

1

14

201

5.67

0.0356

55822004

363346000

2

7

261

4.33

0.0676

Appendix C: Topological parameters on WordNet 3.0

Liu-1 on MC30 (\(\lambda =0.3\))

Word1

Word2

LCSDepth

PathLen

AreaDensity

AreaDepth

Value

Car

Automobile

10

0

31

10.0

1.0

Gem

Jewel

8

0

7

8.0

1.0

Journey

Voyage

9

1

16

9.3333

0.8438

Boy

Lad

8

1

11

8.3333

0.8390

Coast

Shore

4

1

3

4.3333

0.7507

Asylum

Madhouse

9

1

1

9.3333

0.8880

Magician

Wizard

5

0

5

5.0

1.0

Midday

Noon

9

0

0

9.0

1.0

Furnace

Stove

4

12

212

8.0

0.1542

Food

Fruit

2

9

121

5.0

0.1006

Bird

Cock

9

1

26

9.3333

0.8167

Bird

Crane

9

3

49

10.0

0.6467

Tool

Implement

6

1

30

6.3333

0.6926

Brother

Monk

9

1

3

9.3333

0.8818

Crane

Implement

5

4

145

6.3333

0.2949

Lad

Brother

6

4

426

7.3333

0.2029

Journey

Car

0

18

455

6.0

0.0

Monk

Oracle

6

7

475

8.3333

0.1846

Cemetery

Woodland

2

8

179

4.6667

0.0853

Food

Rooster

1

15

239

6.0

0.0326

Coast

Hill

3

4

37

4.3333

0.2936

Forest

Graveyard

2

8

179

4.6667

0.0853

Shore

Woodland

2

4

82

3.3333

0.1378

Monk

Slave

6

4

440

7.3333

0.1987

Coast

Forest

2

5

85

3.6667

0.1320

Lad

Wizard

6

4

417

7.3333

0.2057

Chord

Smile

2

10

122

5.3333

0.0973

Glass

Magician

3

11

534

6.6667

0.0722

Noon

String

1

13

124

5.3333

0.0435

Rooster

Voyage

0

23

400

7.6667

0.0

Appendix D: Liu-1 on RG65 (\(\lambda =0.3\))

Cord

Smile

0.0438

Mound

Shore

0.2914

Brother

Monk

0.8818

Rooster

Voyage

0.0

Lad

Wizard

0.2057

Asylum

Madhouse

0.8880

Noon

String

0.0435

Forest

Graveyard

0.0853

Furnace

Stove

0.1542

Fruit

Furnace

0.1774

Food

Rooster

0.0326

Magician

Wizard

1.0

Autograph

Shore

0.0

Cemetery

Woodland

0.0853

Hill

Mound

1.0

Automobile

Wizard

0.0687

Shore

Voyage

0.0

Cord

String

0.7097

Mound

Stove

0.2271

Bird

Woodland

0.0880

Glass

Tumbler

0.8018

Grin

Implement

0.0

Coast

Hill

0.2936

Grin

Smile

1.0

Asylum

Fruit

0.2147

Furnace

Implement

0.1870

Serf

Slave

0.6560

Asylum

Monk

0.0648

Crane

Rooster

0.4751

Journey

Voyage

0.8438

Graveyard

Madhouse

0.0585

Hill

Woodland

0.1297

Autograph

Signature

0.8152

Glass

Magician

0.0722

Car

Journey

0.0

Coast

Shore

0.7507

Boy

Rooster

0.1282

Cemetery

Mound

0.0788

Forest

Woodland

1.0

Cushion

Jewel

0.2318

Glass

Jewel

0.1947

Implement

Tool

0.6926

Monk

Slave

0.1987

Magician

Oracle

0.1949

Cock

Rooster

1.0

Asylum

Cemetery

0.0645

Crane

Implement

0.2949

Boy

Lad

0.8390

Coast

Forest

0.1320

Brother

Lad

0.2029

Cushion

Pillow

0.7982

Grin

Lad

0.0

Sage

Wizard

0.2002

Cemetery

Graveyard

1.0

Shore

Woodland

0.1378

Oracle

Sage

0.5047

Automobile

Car

1.0

Monk

Oracle

0.1846

Bird

Crane

0.6467

Midday

Noon

1.0

Boy

Sage

0.1982

Bird

Cock

0.8167

Gem

Jewel

1.0

Automobile

Cushion

0.2137

Food

Fruit

0.1006

   

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Li, F., Chen, H. et al. An efficient path computing model for measuring semantic similarity using edge and density. Knowl Inf Syst 55, 79–111 (2018). https://doi.org/10.1007/s10115-017-1078-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1078-5

Keywords

Navigation