Abstract
Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company’s reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately re-rank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code available at: https://github.com/avduarte333/adress-matching.
- 2.
- 3.
- 4.
Although the model under study is the BM25+CE, when evaluating the retrieval capabilities, the cross-encoder is not used, therefore, for notation simplicity, the model is mentioned as BM25.
References
Chen, J., Chen, J., She, X., Mao, J., Chen, G.: Deep contrast learning approach for address semantic matching. Appl. Sci. 11(16), 7608 (2021)
Comber, S., Arribas-Bel, D.: Machine learning innovations in address matching: a practical comparison of Word2vec and CRFs. Trans. GIS 23(2), 334–348 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)
Glass, M., Rossiello, G., Chowdhury, M.F.M., Naik, A., Cai, P., Gliozzo, A.: Re2G: retrieve, rerank, generate. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2701–2715. Association for Computational Linguistics, Seattle, United States (2022)
Gupta, V., Gupta, M., Garg, J., Garg, N.: Improvement in semantic address matching using natural language processing. In: 2021 2nd International Conference for Emerging Technology (INCET), pp. 1–5 (2021)
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., Yih, W.t.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics (2020)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1965)
Lin, Y., Kang, M., Wu, Y., Du, Q., Liu, T.: A deep learning architecture for semantic address matching. Int. J. Geogr. Inf. Sci. 34(3), 559–576 (2020)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Mosbach, M., Andriushchenko, M., Klakow, D.: On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines (2020). arXiv:2006.04884
Recchia, G., Louwerse, M.: A comparison of string similarity measures for toponym matching. In: COMP 2013—ACM SIGSPATIAL International Workshop on Computational Models of Place, pp. 54–61 (2013)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). arXiv:1910.01108
Santos, R., Murrieta-Flores, P., Martins, B.: Learning to combine multiple string similarity metrics for effective toponym matching. Int. J. Digit. Earth 11(9), 913–938 (2018)
Statista.: (2021). https://www.statista.com/chart/10922/parcel-shipping-volume-and-parcel-spend-in-selected-countries/
Urbanek, J., Fan, A., Karamcheti, S., Jain, S., Humeau, S., Dinan, E., Rocktäschel, T., Kiela, D., Szlam, A., Weston, J.: Learning to speak and act in a fantasy text adventure game. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 673–683. Association for Computational Linguistics, Hong Kong, China (2019)
Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: a globally normalized BERT model for open-domain question answering (2019)
Acknowledgements
The authors would like to acknowledge the support of Dr. Egídio Moutinho, Dra. Marília Rosado, Dr. Rúben Rocha, Dr. André Esteves, Dr. Paulo Silva, Dr. Gonçalo Ribeiro Enes and Dr. Diogo Freitas Oliveira in the development of this project. We also gratefully acknowledge the financial support provided by Recovery and Resilience Fund towards the Center for Responsible AI project (Ref. C628696807-00454142) and the multiannual financing of the Foundation for Science and Technology (FCT) for INESC-ID (Ref. UIDB/50021/2020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Duarte, A.V., Oliveira, A.L. (2023). Improving Address Matching Using Siamese Transformer Networks. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14116. Springer, Cham. https://doi.org/10.1007/978-3-031-49011-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-49011-8_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49010-1
Online ISBN: 978-3-031-49011-8
eBook Packages: Computer ScienceComputer Science (R0)