Abstract
Automatic co-text free name matching has a variety of important real-world applications, ranging from fiscal compliance to border control. Name matching systems use a variety of engines to compare two names for similarity, with one of the most critical being phonetic name similarity. In this work, we re-frame existing work on neural sequence-to-sequence transliteration such that it can be applied to name matching. Subsequently, for performance reasons, we then build upon this work to utilize an alternative, non-recurrent neural encoder module. This ultimately yields a model which is 63% faster while still maintaining a 16% improvement in averaged precision over our baseline model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Hagree, S., Al-Sanabani, M., Alalayah, K.M., Hadwan, M.: Designing an accurate and efficient algorithm for matching Arabic names. In: 2019 First International Conference of Intelligent Computing and Engineering (ICOICE), pp. 1–12 (2019).https://doi.org/10.1109/ICOICE48418.2019.9035184
Al-Hagree, S., Al-Sanabani, M., Hadwan, M., Al-Hagery, M.A.: An improved n-gram distance for names matching. In: 2019 First International Conference of Intelligent Computing and Engineering (ICOICE), pp. 1–7 (2019). https://doi.org/10.1109/ICOICE48418.2019.9035154
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May, 2015, Conference Track Proceedings (2015), http://arxiv.org/abs/1409.0473
Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist. 37(6), 1554–1563 (1966). https://doi.org/10.1214/aoms/1177699147
Belinkov, Y., Durrani, N., Dalvi, F., Sajjad, H., Glass, J.: What do neural machine translation models learn about morphology? In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 861–872. Association for Computational Linguistics, Vancouver, Canada, July 2017. https://doi.org/10.18653/v1/P17-1080. https://www.aclweb.org/anthology/P17-1080
Chen, Y., Skiena, S.: False-friend detection and entity matching via unsupervised transliteration. CoRR abs/1611.06722 (2016). http://arxiv.org/abs/1611.06722
Cun, Y.L., et al.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan Kaufmann Publishers Inc., San Francisco (1990)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Series B (Methodological) 39(1), 1–38 (1977), http://www.jstor.org/stable/2984875
Dhore, M., Shantanu, K., Sonwalkar, T.: Hindi to English machine transliteration of named entities using conditional random fields. Int. J. Comput. Appl. 48, July 2012. https://doi.org/10.5120/7522-0624
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Ebraheem, M., Thirumuruganathan, S., Joty, S., Ouzzani, M., Tang, N.: Distributed representations of tuples for entity resolution. Proc. VLDB Endow. 11(11), 1454–1467 (2018). https://doi.org/10.14778/3236187.3236198
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4), 193–202 (1980).https://doi.org/10.1007/bf00344251, https://doi.org/10.14778/3236187.3236198
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML 2017, pp. 1243–1252. JMLR.org (2017)
Gong, J., Newman, B.: English-Chinese name machine transliteration using search and neural network models (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
ISO: ISO Standard 646, 7-Bit Coded Character Set for Information Processing Interchange. International Organization for Standardization, second edn. (1983). http://www.iso.ch/cate/d4777.html, also available as ECMA-6
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2741–2749. AAAI Press (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Kolitsas, N., Ganea, O.E., Hofmann, T.: End-to-end neural entity linking. In: Proceedings of the 22nd Conference on Computational Natural Language Learning. pp. 519–529. Association for Computational Linguistics, Brussels, Belgium, October 2018. https://doi.org/10.18653/v1/K18-1050, https://www.aclweb.org/anthology/K18-1050
Lee, C., Cheon, J., Kim, J., Kim, T., Kang, I.: Verification of transliteration pairs using distance LSTM-CNN with layer normalization. In: Annual Conference on Human and Language Technology, pp. 76–81. Human and Language Technology (2017)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 707 (1966)
Li, T., Zhao, T., Finch, A., Zhang, C.: A tightly-coupled unsupervised clustering and bilingual alignment model for transliteration. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 393–398. Association for Computational Linguistics, Sofia, Bulgaria, August 2013. https://www.aclweb.org/anthology/P13-2070
Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.C.: Deep entity matching with pre-trained language models. Proc. VLDB Endow. 14(1), 50–60 (2020). https://doi.org/10.14778/3421424.3421431. https://doi.org/10.14778/3421424.3421431
Martins, P.H., Marinho, Z., Martins, A.F.T.: Joint learning of named entity recognition and entity linking. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 190–196. Association for Computational Linguistics, Florence, Italy, July 2019. https://doi.org/10.18653/v1/P19-2026, https://www.aclweb.org/anthology/P19-2026
Medhat, D., Hassan, A., Salama, C.: A hybrid cross-language name matching technique using novel modified Levenshtein distance. In: 2015 Tenth International Conference on Computer Engineering Systems (ICCES), pp. 204–209 (2015). https://doi.org/10.1109/ICCES.2015.7393046
Merhav, Y., Ash, S.: Design challenges in named entity transliteration. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 630–640 (2018)
Nabende, P., Tiedemann, J., Nerbonne, J.: Pair hidden Markov model for named entity matching. In: Sobh, T. (ed.) Innovations and Advances in Computer Sciences and Engineering, pp. 497–502. Springer, Netherlands (2010)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, pp. 841–848. MIT Press, Cambridge (2001)
Nie, H., et al.: Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, pp. 629–638. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3357384.3358018
Peled, O., Fire, M., Rokach, L., Elovici, Y.: Matching entities across online social networks. Neurocomputing 210, 91–106 (2016)
Priyadarshani, H., Rajapaksha, M., Ranasinghe, M., Sarveswaran, K., Dias, G.: Statistical machine learning for transliteration: Transliterating names between Sinhala, Tamil and English. In: 2019 International Conference on Asian Language Processing (IALP), pp. 244–249 (2019). https://doi.org/10.1109/IALP48816.2019.9037651
Qu, W.: English-Chinese name transliteration by latent analogy. In: Proceedings of the 2013 International Conference on Computational and Information Sciences, ICCIS 2013, pp. 575–578. IEEE Computer Society, USA (2013). https://doi.org/10.1109/ICCIS.2013.159
Rosca, M., Breuel, T.: Sequence-to-sequence neural network models for transliteration. arXiv preprint arXiv:1610.09565 (2016)
Russell, R.C.: Index (April 1918), US Patent 1,261,167
Sarkar, K., Chatterjee, S.: Bengali-to-english forward and backward machine transliteration using support vector machines. In: Mandal, J.K., Dutta, P., Mukhopadhyay, S. (eds.) CICBA 2017. CCIS, vol. 776, pp. 552–566. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-6430-2_43
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2014, pp. 3104–3112. MIT Press, Cambridge (2014)
Upadhyay, S., Kodner, J., Roth, D.: Bootstrapping transliteration with constrained discovery for low-resource languages. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 501–511. Association for Computational Linguistics, Brussels, Belgium, October–November 2018. https://doi.org/10.18653/v1/D18-1046, https://www.aclweb.org/anthology/D18-1046
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Wang, D., Xu, J., Chen, Y., Zhang, Y.: Monolingual corpora based Japanese-Chinese translation extraction for kana names. J. Chinese Inf. Process. 29(5), 11 (2015)
Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot entity linking with dense entity retrieval. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6397–6407. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.519
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). http://arxiv.org/abs/1609.08144
Yamani, Z., Nurmaini, S., Firdaus, R, M.N., Sari, W.K.: Author matching using string similarities and deep neural networks. In: Proceedings of the Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019), pp. 474–479. Atlantis Press (2020). https://doi.org/10.2991/aisr.k.200424.073
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Blair, P., Eliav, C., Hasanaj, F., Bar, K. (2021). Balancing Speed and Accuracy in Neural-Enhanced Phonetic Name Matching. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12979. Springer, Cham. https://doi.org/10.1007/978-3-030-86517-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-86517-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86516-0
Online ISBN: 978-3-030-86517-7
eBook Packages: Computer ScienceComputer Science (R0)