skip to main content
10.1145/3581807.3581902acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccprConference Proceedingsconference-collections
research-article

An Active Transfer Learning Method Combining Uncertainty with Diversity for Chinese Address Resolution

Published: 22 May 2023 Publication History

Abstract

Chinese address resolution (CAR) is a key step in geocoding technology, and the resolution results directly affect the service quality of address-based applications. Deep learning models have been widely used in CAR task but they require abundant annotated address data to obtain satisfied performance. In this paper, an active transfer learning method combining uncertainty with diversity for CAR is proposed, for which the main goal is to mitigate the annotation requirement for unlabeled address in the target region and to Improve the utilization of labeled data in the source region. Considering the correlation among Chinese addresses, we propose a clustering method of unlabeled address on the basis of feature words, mined from address data based on LDA model, to reflect the distribution of the address. A metric of comprehensive sample strategy combing uncertainty with diversity (CSSCUD) is constructed to select training samples from the target region, which can obtain high valuable samples by considering informativeness and distribution in feature words space jointly in each batch. Experiments on the address dataset from two different regions show that the comprehensive active transfer learning method achieves a higher resolution accuracy than various baselines by using the same number of labeled training samples, which illustrates that the proposed method is effective and practical for CAR.

References

[1]
Kuai X, Guo R, Zhang Z. 2020. Spatial context-based local toponym extraction and Chinese textual address segmentation from urban POI data [J]. ISPRS International Journal of Geo-Information, 9(3): 1-27.
[2]
Tian Q, Ren F, Hu T. 2016. Using an optimized Chinese address matching method to develop a geocoding service: a case study of Shenzhen, China [J]. ISPRS International Journal of Geo-Information, 5(65): 1-17.
[3]
Jameel M S, Chingtham T S. 2009. Compounded uniqueness level: Geo-location indexing using address parser[J]. International Journal of Computer Theory and Engineering, 1(1): 27-34.
[4]
Zhao Y Y, Wang L, Qiu A G. 2013. An improved algorithm for address segmentation [J]. Science of Surveying and Mapping, 38(5): 74-76.
[5]
Xue N W. 2003. Chinese word segmentation as character tagging[C]. In Proceedings of the Computational Linguistics and Chinese Language Processing. 8(1): 29-48.
[6]
Li L, Wang W, He B, Zhang Y. 2018. A hybrid method for Chinese address segmentation [J]. International Journal of Geographical Information Science, 32(1): 30-48.
[7]
Zhu F, Zhao T, Liu Y. 2018. Research on chinese address resolution model based on conditional random field [C]. In Proceedings of the 1st International Conference on Advanced Algorithms and Control Engineering. Lanzhou, 1-9.
[8]
Wang C, Xu B. 2017. Convolutional neural network with word embeddings for Chinese word segmentation[J]. arXiv preprint arXiv:1711.04411.
[9]
Cheng B, Li W H, Tong H X. 2019. Chinese address segmentation based on bilstm-crf [J]. Journal of Geo-information Science, 21(8): 1143-1151.
[10]
Li P, Luo A, Liu J. 2020. Bidirectional gated recurrent unit neural network for Chinese address element segmentation [J]. International Journal of Geo-Information, 9(11): 1-19.
[11]
Ilhan H O, Amasyali M F. 2014. Active learning as a way of increasing accuracy[J]. International Journal of Computer Theory and Engineering, 6(6): 460-465.
[12]
Ren P, Xiao Y, Chang X. 2021. A survey of deep active learning [J]. ACM computing surveys, 54(9): 1-40.
[13]
Lewis D L, Gale W A. 1994. A sequential algorithm for training text classifiers [C]. In Proceedings of the 7th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 3-12.
[14]
Wu D, Lin C T, Huang J. 2019. Active learning for regression using greedy sampling [J]. Information Sciences, 474: 90-105.
[15]
Smailagic A, Costa P, Noh H Y. 2018. Medal: Accurate and robust deep active learning for medical image analysis [C]. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, Washington, 481-488.
[16]
Hu R, Delany S J. 2016. Active learning for text classification with reusability [J]. Expert systems with applications, 45: 438-449.
[17]
Chen Y, Lasko T A, Mei Q. 2015. A study of active learning methods for named entity recognition in clinical text [J]. Journal of biomedical informatics, 58: 11-18.
[18]
Chen Y, Lask T A, Mei Q. 2017. An active learning-enabled annotation system for clinical named entity recognition [J]. Bmc Medical Informatics and Decision Making, 17(2): 35-44.
[19]
Aldogan D, Yaslan Y. 2017. A comparison study on active learning integrated ensemble approaches in sentiment analysis [J]. Computers and Electrical Engineering, 57:311-323.
[20]
Blei D M, Ng A, Jordan M I. 2003. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 3(1): 993-1022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
November 2022
683 pages
ISBN:9781450397056
DOI:10.1145/3581807
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Active learning
  2. Chinese address resolution
  3. Diversity
  4. Feature words
  5. LDA
  6. Transfer learning
  7. Uncertainty

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICCPR 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 25
    Total Downloads
  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media