skip to main content
10.1145/2791405.2791426acmotherconferencesArticle/Chapter ViewAbstractPublication PageswciConference Proceedingsconference-collections
research-article

Improving the Accuracy of Document Similarity Approach using Word Sense Disambiguation

Published: 10 August 2015 Publication History

Abstract

The aspects of Artificial Intelligence and statistics such as Text mining, Data Mining can provide solutions to the area of concept mining. It provides powerful insights into the meaning and documents similarity without exploiting the semantics of the terms or phrases in the document. Our work determines the similarity of documents using semantic processing namely Word Sense Disambiguation by annotating the senses of the words in the documents and then performs traditional PageRank algorithm over it. The Algorithm ranks the possible senses and finds the correct sense according to the context. Our paper proposes the method of disambiguating the ambiguous words in order to find the document similarity. Moreover it is compared with the cosine similarity approach, which is frequently used to determine similarity between two documents to prove the accuracy of our work.

References

[1]
Veena, G., and N. K. Lekha. "A concept based clustering model for document similarity." Data Science & Engineering (ICDSE), 2014 International Conference on. IEEE, 2014.
[2]
Veena G., and N. K. Lekha. "An Extended Chameleon Algorithm for Document Clustering." Advances in Intelligent Informatics. Springer International Publishing, 2015. 335--348.
[3]
Agirre, Eneko, and Aitor Soroa. "Using the Multilingual Central Repository for Graph-Based Word Sense Disambiguation." LREC. 2008.
[4]
Agirre, Eneko, and Aitor Soroa. "Personalizing pagerank for word sense disambiguation." Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2009.
[5]
Padro, Lluis, et al. "Semantic services in freeling 2.1: Wordnet and ukb." (2010).
[6]
http://nlp.lsi.upc.edu/freeling/
[7]
http://nlp.lsi.upc.edu/freering/demo/demo.php
[8]
https://sites.google.com/site/niraiatweb/home/technical_and_coding_stuff/cosine_similarity
[9]
Lluis Padro and Evgeny Stanilovsky. FreeLing 3.0: Towards Wider Multilinguality Proceedings of the Language Resources and Evaluation Conference (LREC 2012) ELRA. Istanbul, Turkey. May, 2012.
[10]
Lluis Padro. Analizadores Multilingues en FreeLing Linguamatica, vol. 3, n. 2, pg. 13--20. December, 2011.
[11]
Lluis Padro and Miquel Collado and Samuel Reese and Marina Lloberes and Irene Castellón. FreeLing 2.1: Five Years of Open-Source Language Processing Tools Proceedings of 7th Language Resources and Evaluation Conference (LREC 2010), ELRA La Valletta, Malta. May, 2010.
[12]
Jordi Atserias and Bernardino Casas and Elisabet Comelles and Meritxell González and Lluís Padró and Muntsa Padró. FreeLing 1.3: Syntactic and semantic services in an open-source NLP library Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006), ELRA. Genoa, Italy. May, 2006.
[13]
Xavier Carreras and Isaac Chao and Lluis Padro and Muntsa Padró. FreeLing: An Open-Source Suite of Language Analyzers Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04), 2004.
[14]
Lluis Padro and Samuel Reese and Eneko Agirre and Aitor Soroa. Semantic Services in FreeLing 2.1: WordNet and UKB In Pushpak Bhattacharyya AND Christiane Fellbaum AND Piek Vossen (ed.) Principles, Construction, and Application of Multilingual Wordnetspg. 99--105. Narosa Publishing House. Global Wordnet Conference 2010. Mumbai, India. February, 2010.
[15]
Marina Lloberes, Irene Castellón and Lluis Padro Spanish FreeLing Dependency Grammar Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), LaValleta, Malta, 2010.
[16]
Jordi Carrera and Irene Castellon and Marina Lloberes and Lluís Padró and Nevena Tinkova. Dependency Grammars in FreeLing Procesamiento del Lenguaje Natural n. 41, pg. 21--28. September, 2008.
[17]
Eneko Agirre and Aitor Soroa Personalizing PageRank for Word Sense Disambiguation. Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics, EACL. Athens, Greece. 2009 http://ixa2.si.ehu.es/ukb/

Cited By

View all
  • (2024)An Exploration of Code Similarity and Code Replication in Computational Programs2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10725205(1-5)Online publication date: 24-Jun-2024
  • (2023)Network Analysis of Research Base Papers: Metrics and Potential use2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307301(1-9)Online publication date: 6-Jul-2023
  • (2019)A Graph-Based Model for Keyword Extraction and Tagging of Research Documents2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT)10.1109/ICICICT46008.2019.8993142(942-946)Online publication date: Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics
August 2015
763 pages
ISBN:9781450333610
DOI:10.1145/2791405
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Artificial Intelligence
  2. Concept Mining
  3. Cosine Similarity
  4. Document Similarity
  5. Natural Language Processing
  6. Text Mining
  7. Word Sense Disambiguation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WCI '15

Acceptance Rates

WCI '15 Paper Acceptance Rate 98 of 452 submissions, 22%;
Overall Acceptance Rate 98 of 452 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Exploration of Code Similarity and Code Replication in Computational Programs2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10725205(1-5)Online publication date: 24-Jun-2024
  • (2023)Network Analysis of Research Base Papers: Metrics and Potential use2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307301(1-9)Online publication date: 6-Jul-2023
  • (2019)A Graph-Based Model for Keyword Extraction and Tagging of Research Documents2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT)10.1109/ICICICT46008.2019.8993142(942-946)Online publication date: Jul-2019
  • (2019)An Analysis on Different Document Keyword Extraction Methods2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)10.1109/ICCMC.2019.8819819(933-937)Online publication date: Mar-2019
  • (2019)A Graph based Approach for Keyword Extraction from Documents2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP)10.1109/ICACCP.2019.8882946(1-4)Online publication date: Feb-2019
  • (2018)A Combined Approach Using Semantic Role Labelling and Word Sense Disambiguation for Question Generation and Answer Extraction2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC)10.1109/ICAECC.2018.8479468(1-6)Online publication date: Feb-2018
  • (2018)Tagging of Research Publications based on Author and Year Extraction2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2018.8554655(892-896)Online publication date: Sep-2018
  • (2018)Named Entity Recognition in Text Documents Using a Modified Conditional Random FieldRecent Findings in Intelligent Computing Techniques10.1007/978-981-10-8633-5_4(31-41)Online publication date: 4-Nov-2018
  • (2017)Detecting contextual word polarity using aspect based sentiment analysis and logistic regression2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM)10.1109/ICSTM.2017.8089134(102-107)Online publication date: Aug-2017
  • (2017)A model for auto-tagging of research papers based on keyphrase extraction methods2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2017.8126087(1695-1700)Online publication date: Sep-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media