research-article

Implementation of ontology-based on Word2Vec and DBSCAN for part-of-speech

Authors:
Parmonangan R. Togatorop

Institut Teknologi Del, Laguboti, Indonesia

Institut Teknologi Del, Laguboti, Indonesia
View Profile

,
Rosa Siagian

Institut Teknologi Del Laguboti, Indonesia

Institut Teknologi Del Laguboti, Indonesia
View Profile

,
Yolanda Nainggolan

Institut Teknologi Del Laguboti, Indonesia

Institut Teknologi Del Laguboti, Indonesia
View Profile

,
Kaleb Simanungkalit

Institut Teknologi Del Laguboti, Indonesia

Institut Teknologi Del Laguboti, Indonesia
View Profile

SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and TechnologyNovember 2020Pages 51–56https://doi.org/10.1145/3427423.3427431

Published:28 December 2020Publication History

SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology

Pages 51–56

ABSTRACT

POS tagging is a process of marking text into an appropriate word-class based on word definitions and word relationships. In general, several POS tagging approaches have been applied in Bahasa Indonesia namely rule-based, stochastic, and neural. Besides, there is another approach to POS tagging which has been applied to English, namely the approach using ontology. This approach has not yet been applied to Bahasa Indonesia so we will implement an ontology to conduct POS tagging in Bahasa Indonesia. In this study, the ontology was constructed using the Word2Vec and the DBSCAN clustering method. The Word2Vec model is implemented to extract each word in vector form based on its context and the DBSCAN clustering method is implemented for the classification process of word classes based on word vectors modeled by Word2Vec. The process of POS tagging with ontology is carried out in several stages, namely: data collection using web scraping techniques from Kompas.com and Detik.com online news articles, text preprocessing, Word2Vec feature building, clustering with DBSCAN, ontology construction and evaluation. The experiments carried out in this study were to choose the optimal parameter values from DBSCAN in forming word clusters for ontology construction. Overall, the implementation of ontology with Word2Vec and DBSCAN can do POS tagging with the highest accuracy value of 0.62, the highest precision value of 0.79, the highest recall value of 0.62, and the highest f1-score of 0.67.

References

N. Mishra and S. Jain, "POS Tagging of Hindi Language Using Hybrid Approach," in NGCT 2017: Smart and Innovative Trends in Next Generation Computing Technologies, India, 2018.Google Scholar
F. Muhammad Hassan, N. UzZaman and M. Khan, "Comparison of Unigram, Bigram, HMM and Brill's POS Tagging Approaches for some South Asian Languages," BRAC University, 2007.Google Scholar
Badan Pusat Statistik, "Penduduk Indonesia Hasil Sensus Penduduk 2010," Jakarta, 2012.Google Scholar
S. Larasati, V. Kubon and D. Zeman, "Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus," in SFCM: International Workshop on Systems and Frameworks for Computational Morphology, Zurich, 2011.Google Scholar
F. Rashel, A. Luthfi, A. Dinakaramani and A. Manurung, "Building an Indonesian Rule-Based Part-of-Speech Tagger," in IALP 2014: International Conference on Asian Language Processing, 2014.Google Scholar
F. Pisceldo, R. Manurung and M. Adriani, "Probabilistic Part of Speech Tagging for Bahasa Indonesia," in Third International MALINDO Workshop, colocated event ACL-IJCNLP, 2009.Google Scholar
A. F. Abka, "Evaluating the Use of Word Embeddings for Part of Speech Tagging in Bahasa Indonesia," in IC3INA 2016: International Conference on Computer Control Informatics and Its Applications, 2016.Google Scholar
S. Fu, N. Lin, G. Zhu and S. Jiang, "Towards Indonesian Part of Speech Tagging: Corpus and Models," in Proceedings of the LREC 2018 Workshop "Belt & Road: Language Resouces and Evaluation, 2018.Google Scholar
V. Jayawardana, D. Lakmal, N. de Silva, A. S Perera, K. Sugathadasa, B. Ayesha and M. Perera, "Semi-Supervised Instance Population of an Ontology using Word Vector Embedding," in ICTer 2017: Seventeenth International Conference on Advances in ICT for Emerging Regions, 2017.Google Scholar
T. R. Gruber, "A Translation Approach to Portable Ontology Specifications," Knowledge Acquisition, vol. 5, no. 2, pp. 199--220, 1993. Google ScholarDigital Library
C. Chiarcos and M. Sukhareva, "An Ontology-based Approach to Automatic Part-of-Speech Tagging Using Heterogeneously Annotated Corpora," in Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data, 2015.Google Scholar
R. Fu, J. Guo, B. Qin, W. Che, H. Wang and T. Liu, "Learning Semantic Hierarchies via Word Embeddings," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Maryland, 2014.Google Scholar
A. C. Mercedes, D. George, R. Warren, J. F. P. Maria, M. Nava, M. F. Diego, N. Goran, K. Julie, K. John and S. Robert, "Deep Learning meets Semantic Web: A feasibility study with the Cardiovascular Disease Ontology and PubMed citations," in ODLS 2016, Halle, 2016.Google Scholar
Y. V. B. Reddy, D. L. Reddy and D. S. S. N. Reddy, "Comparative Study of Density-Based Clustering Algorithms," IJCIET: International Journal of Civil Engineering and Technology, vol. 8, no. 12, p. 763--767, 2017.Google Scholar
C. Choi, M. Cho, J. Choi, M. Hwang, J. Park and P. Kim, "Travel Ontology for Intelligent Recommendation System," in 2009 Third Asia International Conference on Modelling & Simulation, Bali, 2009. Google ScholarDigital Library
S. Abburu and S. B. Golla, "Ontology and NLP Support for Building Disaster Knowledge Base," in ICCES 2017: 2nd International Conference on Communication and Electronics Systems, Tamilnadu, 2017.Google Scholar
O. Daramola, M. Adigun and C. Ayo, "Building an Ontology-based Framework for Tourism Recommendation Services," in Proceedings of the International Conference, Netherlands, 2009.Google Scholar
M. Uschold and M. Gruninger, "Ontologies: principles, methods and applications," The Knowledge Engineering Review, pp. 93 -- 136, 1996.Google Scholar
T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient Estimation of Word Representations in Vector Space," in ICLR 2013: Proceedings of the International Conference on Learning Representations, Scottsdale, 2013.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean, "Distributed Representations of Words and Phrases Distributed Representations of Words and Phrases," in Advances in neural information processing systems, 2013. Google ScholarDigital Library
J. Han, J. Pei and M. Kamber, Data Mining, Southeast Asia Edition, San Fransisco: Morgan Kaufmann, 2006.Google Scholar
S. O. Al-mamory and Z. M. Algelal, "A Modified DBSCAN Clustering Algorithm for Proactive Detection of DDoS Attacks," in NTICT: Annual Conference on New Trends in Information & Communications Technology Applications, 2017.Google Scholar
L. P. Manik, A. S. Ferti, H. F. Mustika, A. F. Abka and Y. Rianto, "Evaluating the Morphological and Capitalization Features for Word Embedding-Based POS Tagger in Bahasa Indonesia," in IC3INA: International Conference on Computer, Control, Informatics and its Applications, 2018.Google Scholar
A. Dinakaramani, F. Rashel, A. Luthfi and R. Manurung, "Designing an Indonesian Part of speech Tagset and Manually Tagged Indonesian Corpus," in IALP: International Conference on Asian Language Processing, 2014.Google Scholar

Index Terms

Implementation of ontology-based on Word2Vec and DBSCAN for part-of-speech
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Word Embedding in Nepali Language using Word2Vec
NLPIR '22: Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval

Word embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are ...
Read More
A study of lexical function detection with word2vec and supervised machine learning
Special Section: Applied Machine Learning and Management of Volatility, Uncertainty, Complexity & Ambiguity (V.U.C.A)

In this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical function ...
Read More
Building Synsets for Indonesian WordNet with Monolingual Lexical Resources
IALP '10: Proceedings of the 2010 International Conference on Asian Language Processing

This paper presents an approach to build synsets for Indonesian Word Net semi-automatically using monolingual lexical resources available freely in Bahasa Indonesia. Monolingual lexical resources refer to Kamus Besar Bahasa Indoensia or KBBI (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology
November 2020
277 pages
ISBN:9781450376051
DOI:10.1145/3427423
General Chairs:
Agung Setia Budi
Universitas Brawijaya, Indonesia
,
Sigit Adinugroho
Universitas Brawijaya, Indonesia
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 December 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DBSCAN
POS tagging
Word2Vec
bahasa indonesia
ontology
Qualifiers
- research-article
Conference

Acceptance Rates
SIET '20 Paper Acceptance Rate45of57submissions,79%Overall Acceptance Rate45of57submissions,79%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Implementation of ontology-based on Word2Vec and DBSCAN for part-of-speech

SIET '20: Proceedings of the 5th International Conference on Sustainable Information Engineering and Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Word Embedding in Nepali Language using Word2Vec

A study of lexical function detection with word2vec and supervised machine learning

Building Synsets for Indonesian WordNet with Monolingual Lexical Resources