Skip to main content
Log in

Ontology-Based Similarity Computation of Two Sentences Using Word-Net Database

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Sentence similarity is used in various fields, such as the mining of text, information retrieval from the web, and dialogue-based system. This research mainly focuses on calculating the sentence-length similarity between very brief texts. It provides a method that works on the implicit word order and contextual relations in the phrases. A combination of data from the corpus statistics and hierarchical database is used to determine the computation of similarity between sentence pairs. Our technique can simulate human sensible knowledge according to the usage of a lexical database, and it may be applied to other areas according to the incorporation of corpora statistics. Numerous applications that involve the representation and finding of text knowledge can make use of the suggested approach. Studies done on two sets of chosen sentence pairings reveal that the suggested approach offers a similarity metric that significantly correlates with human intuition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Allen, J.: Natural language understanding. Benjamin-Cummings Publishing Co. Inc (1995)

    MATH  Google Scholar 

  2. Atkinson-Abutridy, J., Mellish, C., Aitken, S.: Combining information extraction with genetic algorithms for text mining. IEEE Intell. Syst. 19(3), 22–30 (2004)

    Article  Google Scholar 

  3. Liu, Y., Zong, C.: Example-based Chinese–English MT. In: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol. 7, pp. 6093–6096. IEEE, (2004)

  4. Ko, Y., Park, J., Seo, J.: Improving text categorization using the importance of sentences. Inf. Process. Manag. 40(1), 65–79 (2004)

    Article  Google Scholar 

  5. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)

    Article  Google Scholar 

  6. Hatzivassiloglou, V., Klavans, J.L., Eskin, E.: Detecting text similarity over short passages: Exploring linguistic feature combinations via machine learning. In: 1999 Joint SIGDAT conference on empirical methods in natural language processing and very large corpora. (1999)

  7. Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans. In: Proceedings of the 19th annual meeting of the Cognitive Science Society, pp. 412–417 (1997)

  8. Boyce, Bert R., Bert R. Boyce, Charles T. Meadow, Donald H. Kraft, Donald H. Kraft, and Charles T. Meadow. Text information retrieval systems. Elsevier, 2017.

  9. Foltz, P.W., Kintsch, W., Landauer, T.K.: The measurement of textual coherence with latent semantic analysis. Discourse Process. 25(2–3), 285–307 (1998)

    Article  Google Scholar 

  10. Gupta, A., Yadav, D. R.: Semantic similarity measure using information content approach with depth for similarity calculation (2014)

  11. Okazaki, N., Matsuo, Y., Matsumura, N., Ishizuka, M.: Sentence extraction by spreading activation through sentence similarity. IEICE Trans. Inf. Syst. 86(9), 1686–1694 (2003)

    Google Scholar 

  12. Chiang, J.-H., Hsu-Chun, Yu.: Literature extraction of protein functions using sentence pattern mining. IEEE Trans. Knowl. Data Eng. 17(8), 1088–1098 (2005)

    Article  Google Scholar 

  13. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  14. Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Process. 25(2–3), 211–257 (1998)

    Article  Google Scholar 

  15. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  16. http://clwww.essex.ac.uk/w3c/corpus_ling/content/corpora/list/private/brown/brown.html. (Brown Corpus)

  17. Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)

    Article  Google Scholar 

  18. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  19. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cognit. Process. 6(1), 1–28 (1991)

    Article  Google Scholar 

  20. Pawar, A., Mago, V.: Calculating the similarity between words and sentences using a lexical database and corpus statistics. arXiv preprint https://arXiv.org/1802.05667 (2018)

  21. Gupta, S., Gupta, S.K.: Abstractive summarization: an overview of the state of the art. Expert Syst. Appl. 121, 49–65 (2019)

    Article  Google Scholar 

  22. Pandit, R., Sengupta, S., Naskar, S.K., Dash, N.S., Sardar, M.M.: Improving semantic similarity with cross-lingual resources: a study in Bangla—a low resourced language. Informatics 6(2), 19 (2019)

    Article  Google Scholar 

  23. Schubert, L., Tong, M.: Extracting and evaluating general world knowledge from the Brown corpus. In: Proceedings of the HLT-NAACL 2003 workshop on Text meaning, pp. 7–13 (2003)

  24. Leech, G.: The state of the art in corpus linguistics. Routledge (2014)

    Google Scholar 

  25. Gildea, D.: Corpus variation and parser performance. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001)

  26. Fellbaum, C.: WordNet: Wiley online library. In: The encyclopaedia of applied linguistics vol 7 (1998)

  27. Rus, V., Lintean, M., Banjade, R., Niraula, N. B., Stefanescu, D.: Semilar: the semantic similarity toolkit. In: Proceedings of the 51st annual meeting of the association for computational linguistics: system demonstrations, pp. 163–168 (2013)

  28. Islam, A., Inkpen, D.: Semantic similarity of short texts. Recent Adv. Nat. Lang. Process. V 309, 227–236 (2009)

    Article  Google Scholar 

  29. Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th annual research colloquium of the UK special interest group for computational linguistics, pp. 45–52 (2008)

  30. Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl. Eng. 70(4), 390–405 (2011)

    Article  Google Scholar 

  31. Bounab, Y., Zitouni, A., Oussalah, M., Megherbi, A. C., Taleb-Ahmed, A., Taleb, A.: Semantic similarity approach between two sentences, pp 1–7

  32. Farouk, M.: Measuring sentences similarity: a survey. arXiv preprint https://arXiv.org/1910.03940 (2019)

  33. Villata, S.: Sentence embeddings and high-speed similarity search for fast computer assisted annotation of legal documents. In: Legal Knowledge and Information Systems: JURIX 2020: The Thirty-third Annual Conference, Brno, Czech Republic, December 9–11, 2020, vol. 334, p. 164. IOS Press, (2020)

  34. Chandrasekaran, D., Mago, V.: Evolution of semantic similarity—a survey. ACM Comput. Surv. (CSUR) 54(2), 1–37 (2021)

    Article  Google Scholar 

  35. Yoo, Y., Heo, T.-S., Park, Y., Kim, K.: A novel hybrid methodology of measuring sentence similarity. Symmetry 13(8), 1442 (2021)

    Article  Google Scholar 

  36. Sun, X., Meng, Y., Ao, X., Fei, Wu., Zhang, T., Li, J., Fan, C.: Sentence similarity based on contexts. Trans. Assoc. Comput. Linguist. 10, 573–588 (2022)

    Article  Google Scholar 

  37. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)

  38. Wiemer-Hastings, P.: Adding syntactic information to LSA. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 22, no. 22. (2000)

  39. Rodriguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15(2), 442–456 (2003)

    Article  Google Scholar 

  40. Sinclair, J.: Collins cobuild English dictionary for advanced learners, 3rd edn. Harper Collins Pub (2001)

    Google Scholar 

  41. Basile, V.: WordNet as an ontology for generation. In: 1st International Workshop on Natural Language Generation from the Semantic Web pp 1–3 (2015)

  42. Jain, S., Harde, P., Mihindukulasooriya, N.: NyOn: a multilingual modular legal ontology for representing court judgements. In: Semantic Intelligence: Select Proceedings of ISIC 2022. Singapore: Springer Nature Singapore, pp. 175–183 (2023)

  43. Jain, S., Jaglan, D., Gupta, K.: Investigating the similarity of court decisions. In: Advances in Computational Intelligence, its Concepts & Applications (ACI 2022), vol. 3283. CEUR-WS ISSN: 1613-0073, pp. 316–326 (2022)

  44. Kamat, P., Kalson, S., Suraj, S., Harde, P., Mihindukulasooriya, N., Jain, S.: An Indian Court decision annotated corpus and knowledge graph. In: International Workshop on Artificial Intelligence Technologies for Legal Documents and the 1st International Workshop on Knowledge Graph Summarization (2022)

Download references

Funding

There are currently no funding sources in the list.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atul Gupta.

Ethics declarations

Conflict of Interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, A., Sharma, K. & Goyal, K.K. Ontology-Based Similarity Computation of Two Sentences Using Word-Net Database. New Gener. Comput. 41, 723–737 (2023). https://doi.org/10.1007/s00354-023-00228-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-023-00228-z

Keywords

Navigation