Skip to main content
Log in

Geographical address representation learning for address matching

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Address matching is a crucial task in various location-based businesses like take-out services and express delivery, which aims at identifying addresses referring to the same location in address databases. It is a challenging one due to various possible ways to express the address of a location, especially in Chinese. Traditional address matching approaches relying on string similarities and learning matching rules to identify addresses referring to the same location, could hardly solve the cases with redundant, incomplete or unusual expression of addresses. In this paper, to learn the geographical semantic representations for address strings, we novelly propose to get rich contexts for addresses from the Web through Web search engines, which could strongly enrich the semantic meaning of addresses that could be learned. Apart from that, we propose a two-stage geographical address representation learning model for address matching. In the first stage, we propose to use an encode-decoder architecture to learn the semantic vector representation for each address string where an up-sampling and sub-sampling strategy is applied to solve the problem of address redundancy and incompleteness. The attention mechanism is also applied to the model to highlight important features of addresses in their semantic representations. And in the second stage, we construct a single large graph from the corpus, which contains address elements and addresses as nodes, and the edges between nodes are built by word co-occurrence information to learn embedding representations for all the nodes on the graph. Our empirical study conducted on two real-world address datasets demonstrates that our approach greatly improves both precision (up to 8%) and recall (up to 12%) of the state-of-the-art existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

Notes

  1. www.poi86.com

  2. www.dianping.com

  3. www.meituan.com

  4. www.qichacha.com

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (2015)

  2. Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., Sima’an, m, K.: Graph convolutional encoders for syntax-aware neural machine translation. arXiv preprint arXiv:1704.04675, (2017)

  3. Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)

  4. Cheng, C.-x., Yu, B.: A rule-based segmenting and matching method for fuzzy chinese addresses [j]. Geography and Geo-Information Science. 3, 007 (2011)

    Google Scholar 

  5. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844–3852 (2016)

  6. Ding, Z.-g., Zhang, Z., Li, J.: Improvement on reverse directional maximum matching method based on hash structure for chinese word segmentation. Computer Engineering and Design. 29(12), 3208–3211 (2008)

    Google Scholar 

  7. Drummond, W.J.: Address matching: Gis technology for mapping human activity patterns. J. Am. Plan. Assoc. 61(2), 240–251 (1995)

    Article  Google Scholar 

  8. Guo, H., Zhu, H., Guo, Z., Zhang, X.X., Su, Z.: Address standardization with latent semantic association. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1155–1164. ACM, (2009)

  9. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, (2015)

  10. Hochreiter, S., Schmidhuber, J.: Lstm can solve hard long time lag problems. In Advances in neural information processing systems. 473–479 (1997)

  11. Hu, Z., Huang, P., Deng, Y., Gao, Y., Xing, E.: Entity hierarchy embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1292–1300, (2015)

  12. Kaleem, A.B.D.U.L., Ghori, K.M., Khanzada, Z., Malik, M.N.: Address standardization using supervised machine learning. Interpretation. 1(2), 10 (2011)

    Google Scholar 

  13. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  14. Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In Adv. Neural Inf. Proces. Syst. 3294–3302 (2015)

  15. Kothari, G., Faruquie, T.A., Subramaniam, L.V., Prasad, K.H., Mohania, M.K.. Transfer of supervision for improved address standardization. In Pattern Recognition (ICPR), 20th International Conference on, pages 2178–2181. IEEE, (2010)

  16. Li, D., Wang, S., Mei, Z.: Approximate address matching. In 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pages 264–269. IEEE, (2010)

  17. Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, (2018)

    Google Scholar 

  18. Luo, M., Huang, H.: New method of chinese address standardization based on finite state machine theory. Application Research of Computers, (2016)

    Google Scholar 

  19. Mengjun, K., Qingyun, D., Mingjun, W.: A new method of chinese address extraction based on address tree model. Acta Geodaetica et Cartographica Sinica. 44(1), 99–107 (2015)

    Google Scholar 

  20. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119 (2013)

  21. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543 (2014)

  22. Pu-le, X., Yang, W., Ya-kun, H., Shao-fen, H., Chuan-xin, Z., Fu-long, C.: Chinese place-name address matching method based on large data analysis and bayesian decision. Computer Science. 9, 050 (2017)

    Google Scholar 

  23. Qiu, Y., Li, H., Shen, L., Jiang, Y., Hu. R., Yang, L.: Revisiting correlations between intrinsic and extrinsic evaluations of word embeddings. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pages 209–221. Springer (2018)

  24. Sharma, S., Ratti, R., Arora, I., Solanki, A., Bhatt, G.: Automated parsing of geographical addresses: A multilayer feedforward neural network based approach. In Semantic Computing (ICSC), 2018 IEEE 12th International Conference on, pages 123–130. IEEE (2018)

  25. Song, Z.: Address matching algorithm based on chinese natural language understanding [j]. J. Remote Sens. 17(4), 788–801 (2013)

    Google Scholar 

  26. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112 (2014)

  27. Thekumparampil, K.K., Wang, C., Oh, S., Li, L.-J.: Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735 (2018)

  28. Tian, Q., Ren, F., Hu, T., Liu, J., Li, R., Qingyun, D.: Using an optimized chinese address matching method to develop a geocoding service: A case study of shenzhen, China. ISPRS International Journal of Geo-Information. 5(5), 65 (2016)

    Article  Google Scholar 

  29. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019)

  30. Yong, W., Jiping, L.I.U., Qingsheng, G.U.O., An, L.U.O.: The standardization method of address information for pois from internet based on positional relation. Acta Geodaetica et Cartographica Sinica. 45(5), 623–630 (2016)

    Google Scholar 

  31. Zhu, X., Gan, J., Lu, G., Li, J., and Zhang, S.: Spectral clustering via half-quadratic optimization. World Wide Web, https://doi.org/10.1007/s11280-019-00731-8. (2019)

  32. Zhu, X., Li, X., Zhang, S., Xu, Z., Yu, L., Wang, C.: Graph pca hashing for similarity search. IEEE Transactions on Multimedia. 19(9), 2033–2044 (2017)

    Article  Google Scholar 

  33. Zhu, X., Zhang, S., Hu, R., He, W., Lei, C., Zhu, P.: One-step multi-view spectral clustering. IEEE Trans. Knowl. Data Eng. 31(10), 2022–2034 (2019)

    Article  Google Scholar 

  34. Zhu, X., Zhang, S., Li, Y., Zhang, J., Yang, L., Fang, Y.: Low-rank sparse subspace for spectral clustering. IEEE Trans. Knowl. Data Eng. 31(8), 1532–1543 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by Natural Science Foundation of Jiangsu Province (No. BK20191420), National Natural Science Foundation of China (Grant No. 61632016, 61572336, 61572335, 61772356), Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003, 18KJA520010), and the Open Program of Neusoft Corporation (No. SKLSAOP1801).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixu Li.

Additional information

This article belongs to the Topical Collection: Special Issue on Web and Big Data 2019

Guest Editors: Jie Shao, Man Lung Yiu, and Toyoda Masashi

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shan, S., Li, Z., Yang, Q. et al. Geographical address representation learning for address matching. World Wide Web 23, 2005–2022 (2020). https://doi.org/10.1007/s11280-020-00782-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-020-00782-2

Keywords

Navigation