Improving Address Matching Using Siamese Transformer Networks

Duarte, André V.; Oliveira, Arlindo L.

doi:10.1007/978-3-031-49011-8_33

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14116))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1128 Accesses

Abstract

Matching addresses is a critical task for companies and post offices involved in the processing and delivery of packages. The ramifications of incorrectly delivering a package to the wrong recipient are numerous, ranging from harm to the company’s reputation to economic and environmental costs. This research introduces a deep learning-based model designed to increase the efficiency of address matching for Portuguese addresses. The model comprises two parts: (i) a bi-encoder, which is fine-tuned to create meaningful embeddings of Portuguese postal addresses, utilized to retrieve the top 10 likely matches of the un-normalized target address from a normalized database, and (ii) a cross-encoder, which is fine-tuned to accurately re-rank the 10 addresses obtained by the bi-encoder. The model has been tested on a real-case scenario of Portuguese addresses and exhibits a high degree of accuracy, exceeding 95% at the door level. When utilized with GPU computations, the inference speed is about 4.5 times quicker than other traditional approaches such as BM25. An implementation of this system in a real-world scenario would substantially increase the effectiveness of the distribution process. Such an implementation is currently under investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DeepAM: Deep Semantic Address Representation for Address Matching

Multi-task deep learning model based on hierarchical relations of address elements for semantic address matching

Article 18 February 2022

Chinese Address Similarity Calculation Based on Auto Geological Level Tagging

Notes

1.
Code available at: https://github.com/avduarte333/adress-matching.
2.
https://github.com/UKPLab/sentence-transformers.
3.
https://github.com/seatgeek/fuzzywuzzy.
4.
Although the model under study is the BM25+CE, when evaluating the retrieval capabilities, the cross-encoder is not used, therefore, for notation simplicity, the model is mentioned as BM25.

References

Chen, J., Chen, J., She, X., Mao, J., Chen, G.: Deep contrast learning approach for address semantic matching. Appl. Sci. 11(16), 7608 (2021)
Article Google Scholar
Comber, S., Arribas-Bel, D.: Machine learning innovations in address matching: a practical comparison of Word2vec and CRFs. Trans. GIS 23(2), 334–348 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)
Google Scholar
Glass, M., Rossiello, G., Chowdhury, M.F.M., Naik, A., Cai, P., Gliozzo, A.: Re2G: retrieve, rerank, generate. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2701–2715. Association for Computational Linguistics, Seattle, United States (2022)
Google Scholar
Gupta, V., Gupta, M., Garg, J., Garg, N.: Improvement in semantic address matching using natural language processing. In: 2021 2nd International Conference for Emerging Technology (INCET), pp. 1–5 (2021)
Google Scholar
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., Yih, W.t.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics (2020)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10(8), 707–710 (1965)
MathSciNet Google Scholar
Lin, Y., Kang, M., Wu, Y., Du, Q., Liu, T.: A deep learning architecture for semantic address matching. Int. J. Geogr. Inf. Sci. 34(3), 559–576 (2020)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Mosbach, M., Andriushchenko, M., Klakow, D.: On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines (2020). arXiv:2006.04884
Recchia, G., Louwerse, M.: A comparison of string similarity measures for toponym matching. In: COMP 2013—ACM SIGSPATIAL International Workshop on Computational Models of Place, pp. 54–61 (2013)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). arXiv:1910.01108
Santos, R., Murrieta-Flores, P., Martins, B.: Learning to combine multiple string similarity metrics for effective toponym matching. Int. J. Digit. Earth 11(9), 913–938 (2018)
Article Google Scholar
Statista.: (2021). https://www.statista.com/chart/10922/parcel-shipping-volume-and-parcel-spend-in-selected-countries/
Urbanek, J., Fan, A., Karamcheti, S., Jain, S., Humeau, S., Dinan, E., Rocktäschel, T., Kiela, D., Szlam, A., Weston, J.: Learning to speak and act in a fantasy text adventure game. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 673–683. Association for Computational Linguistics, Hong Kong, China (2019)
Google Scholar
Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: a globally normalized BERT model for open-domain question answering (2019)
Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the support of Dr. Egídio Moutinho, Dr^a. Marília Rosado, Dr. Rúben Rocha, Dr. André Esteves, Dr. Paulo Silva, Dr. Gonçalo Ribeiro Enes and Dr. Diogo Freitas Oliveira in the development of this project. We also gratefully acknowledge the financial support provided by Recovery and Resilience Fund towards the Center for Responsible AI project (Ref. C628696807-00454142) and the multiannual financing of the Foundation for Science and Technology (FCT) for INESC-ID (Ref. UIDB/50021/2020).

Author information

Authors and Affiliations

Instituto Superior Técnico/INESC-ID, Lisbon, Portugal
André V. Duarte & Arlindo L. Oliveira

Authors

André V. Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Arlindo L. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André V. Duarte .

Editor information

Editors and Affiliations

Lucy Family Institute for Data and Society, Notre Dame, IN, USA
Nuno Moniz
GECAD, Polytechnic of Porto, Porto, Portugal
Zita Vale
GRIA—LIACC, University of Azores, Ponta Delgada, Portugal
José Cascalho
CISUC, University of Coimbra, Coimbra, Portugal
Catarina Silva
IEETA, University of Aveiro, Aveiro, Portugal
Raquel Sebastião

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duarte, A.V., Oliveira, A.L. (2023). Improving Address Matching Using Siamese Transformer Networks. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14116. Springer, Cham. https://doi.org/10.1007/978-3-031-49011-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-49011-8_33
Published: 15 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49010-1
Online ISBN: 978-3-031-49011-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Address Matching Using Siamese Transformer Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DeepAM: Deep Semantic Address Representation for Address Matching

Multi-task deep learning model based on hierarchical relations of address elements for semantic address matching

Chinese Address Similarity Calculation Based on Auto Geological Level Tagging

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Improving Address Matching Using Siamese Transformer Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DeepAM: Deep Semantic Address Representation for Address Matching

Multi-task deep learning model based on hierarchical relations of address elements for semantic address matching

Chinese Address Similarity Calculation Based on Auto Geological Level Tagging

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation