Abstract
This paper proposes a Deep Neural Networks (DNN) based approach for entity resolution in databases. This approach is mainly based on a record linkage process which aims to detect records that refer to the same entity. First, record pairs are represented by their word embedding using an N-gram embedding based method. Then, they are classified into matching or unmatching pairs using a DNN model. Three DNN architectures: Multi-Layer Perceptron, Long Short Term Memory networks and Convolutional Neural Networks are investigated and compared for this purpose. The approach is experimented on two databases. The results exceed \(97\%\) for recall and \(96\%\) for precision. The comparison with similarity measure and classical classifier based approaches shows a significant improvement in the results on the two databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kooli, N.: Data matching for entity recognition in OCRed documents. Thesis defense, Lorraine university (2016)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications Description, pp. 1–270. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31164-2
Lee, M.L., Ling, T.W., Low, W.L.: IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 290–294 (2000)
Fellegi, I., Sunter, A.: A theory for record linkage. J. Am. Stat. Assoc. 64, 1183–1210 (1969)
Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2003)
Tejada, S., Knoblock, C. A., Minton, S.: Learning domain-independent string transformation weights for high accuracy object identification. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–359 (2002)
Gottapua, R.D., Daglia, C., Ali, B.: Entity resolution using convolutional neural network. In: Procedia Computer Science, vol. 95, pp. 153–158. Elsevier (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). http://arxiv.org/abs/1301.3781
Tran, H.N., Huynh, T., Do, T.: Author name disambiguation by using deep neural network. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014. LNCS (LNAI), vol. 8397, pp. 123–132. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05476-6_13
Bilenko, M.: Adaptive blocking: learning to scale up record linkage. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 87–96 (2006)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)
Collobert, R.: Deep learning for efficient discriminative parsing. In: 21st International Conference on Artificial Intelligence and Statistics, pp. 224–232 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Neural computation (1997)
Kingma, D.P., Ba, J.: Distributed representations for biological sequence analysis. In: Data and Text Mining in Biomedical Informatics, abs/1412.6980 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 - NIPS (2012)
Yih, W., Meek, C.: Learning vector representations for similarity measures. Microsoft Technical Report MSR-TR-2010-139 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Kooli, N., Allesiardo, R., Pigneul, E. (2018). Deep Learning Based Approach for Entity Resolution in Databases. In: Nguyen, N., Hoang, D., Hong, TP., Pham, H., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2018. Lecture Notes in Computer Science(), vol 10752. Springer, Cham. https://doi.org/10.1007/978-3-319-75420-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-75420-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75419-2
Online ISBN: 978-3-319-75420-8
eBook Packages: Computer ScienceComputer Science (R0)