Reference Hub3
An Unsupervised Entity Resolution Framework for English and Arabic Datasets

An Unsupervised Entity Resolution Framework for English and Arabic Datasets

Abdelkrim OUHAB, Mimoun MALKI, Djamel BERRABAH, Faouzi BOUFARES
Copyright: © 2017 |Volume: 8 |Issue: 4 |Pages: 14
ISSN: 1947-3095|EISSN: 1947-3109|EISBN13: 9781522514008|DOI: 10.4018/IJSITA.2017100102
Cite Article Cite Article

MLA

Abdelkrim OUHAB, et al. "An Unsupervised Entity Resolution Framework for English and Arabic Datasets." IJSITA vol.8, no.4 2017: pp.16-29. http://doi.org/10.4018/IJSITA.2017100102

APA

Abdelkrim OUHAB, Mimoun MALKI, Djamel BERRABAH, & Faouzi BOUFARES. (2017). An Unsupervised Entity Resolution Framework for English and Arabic Datasets. International Journal of Strategic Information Technology and Applications (IJSITA), 8(4), 16-29. http://doi.org/10.4018/IJSITA.2017100102

Chicago

Abdelkrim OUHAB, et al. "An Unsupervised Entity Resolution Framework for English and Arabic Datasets," International Journal of Strategic Information Technology and Applications (IJSITA) 8, no.4: 16-29. http://doi.org/10.4018/IJSITA.2017100102

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Entity resolution (ER) is an important step in data integration and in many data mining projects; its goal is to identify records that refer to the same real-world entity. Most existing ER frameworks have focused on datasets in Latin-based languages and do not support Arabic language. In this article, the authors present an unsupervised ER framework that supports English and Arabic datasets. Rather than using matching rules developed by an expert or manually labeled training examples, the proposed framework automatically generates its own training set. The generated training set is then used to train a classifier and learn a classification model. Finally, the learned classification model is used to perform ER. The proposed framework was implemented and tested on three Arabic datasets and four English datasets. Experimental results show that the proposed framework is competitive with supervised approaches and outperform recently proposed unsupervised approaches in terms of F-measure.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.