GD-PTCF: Prompt-Tuning Based Classification Framework for Government Data

Mao, Ming; Zhang, Duo; Xia, Chao; Guo, Yunchuan; Zhang, Dunmin; Li, Xiaolin

doi:10.1007/978-981-97-5666-7_18

Ming Mao^10,11,
Duo Zhang^10,11,
Chao Xia¹⁰,
Yunchuan Guo¹²,
Dunmin Zhang¹⁰ &
…
Xiaolin Li¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14876))

Included in the following conference series:

International Conference on Intelligent Computing

472 Accesses

Abstract

Government Data (GD), crucial for fostering social and economic growth, must adhere to specific classification standards and formats to ensure public accessibility and usability. Despite its potential, GD is currently hindered by a scarcity of high-quality, classified samples and the labor-intensive process of manual classification. To overcome these obstacles, our study introduces a Prompt-Tuning Classification Framework for Government Data (GD-PTCF), designed for automated classification. Initially, we employed web crawling techniques to amass an extensive dataset of Chinese government data. Subsequently, we unveiled a Classification Prompting Pattern (CPP) and utilized a BERT-based neural network, dubbed the Roberta Encoder (RE-coder), to facilitate few-shot prompt-tuning. This approach enables us to achieve remarkable classification accuracy with minimal training data. To further diminish the reliance on manual efforts, we developed a clustering mapping (CLM) strategy. This technique transforms encoded labeled embeddings into clustered embedding, which are then classified based on their proximity to predefined classification centers. Our experimental findings affirm that the GD-PTCF methodology significantly outperforms other pre-trained models in classification accuracy, even with a limited volume of training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient Framework for Crime Prediction Using Feature Engineering and Machine Learning

Hybrid Machine Learning Models of Classifying Residential Requests for Smart Dispatching

CATI: An Extensible Platform Supporting Assisted Classification of Large Datasets

References

Song, Y., Li, Z., He, J., Li, Z., Fang, X., Chen, D.: Employing auto-annotated data for government document classification. In: ICIAI 2019: The 3rd International Conference on Innovation in Artificial Intelligence, Suzhou, China, 15–18 March 2019, pp. 121–125 (2019)
Google Scholar
Pajak, K., Pajak, D.: Multilingual fine-tuning for grammatical error correction. Expert Syst. Appl. 200, 116948 (2022)
Article Google Scholar
Feng, L., Zhang, L., Wang, J., Feng, J.: How to promote the participation of enterprises using open government data? Evolutionary game analysis by applying dynamic measures. Expert Syst. Appl. 238(Part F), 122348 (2024)
Google Scholar
Crusoe, J., Clarinval, A.: Classification of open government data solutions’ help: a novel taxonomy and cluster analysis. In: Lindgren Lindgren, I., et al. (eds.) EGOV 2023. LNCS, vol. 14130, pp. 230–245. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41138-0_15
Chapter Google Scholar
Pan, P., Chen, Y.: Automatic subject classification of public messages in e-government affairs. Data Inf. Manag. 5(3), 336–347 (2021)
Google Scholar
Huang, W., Su, C., Wang, Y.: An intelligent work order classification model for government service based on multi-label neural network. Comput. Commun. 172, 19–24 (2021)
Article Google Scholar
Mosin, V.D., Samenko, I., Kozlovskii, B., Tikhonov, A., Yamshchikov, I.P.: Fine-tuning transformers: vocabulary transfer. Artif. Intell. 317, 103860 (2023)
Article MathSciNet Google Scholar
Liga, D., Robaldo, L.: Fine-tuning GPT-3 for legal rule classification. Comput. Law Secur. Rev. 51, 105864 (2023)
Article Google Scholar
Ni, S., Kao, H.: KPT++: refined knowledgeable prompt tuning for few-shot text classification. Knowl. Based Syst. 274, 110647 (2023)
Article Google Scholar
Cao, R., Wang, Y., Gao, L., Yang, M.: DictPrompt: comprehensive dictionary-integrated prompt tuning for pre-trained language model. Knowl. Based Syst. 273, 110605 (2023)
Article Google Scholar
Hambro, E., et al.: Dungeons and data: a large-scale nethack dataset. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, pp. 24864–24878 (2022)
Google Scholar
Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. EACL 2021, Online, 19–23 April 2021, pp. 255–269 (2021)
Google Scholar

Download references

Acknowledgement

This work was supported by the Fundamental Research Funds for the Central Universities (Grant Number: 3282023017).

Author information

Authors and Affiliations

Beijing Electronic Science and Technology Institute, Beijing, China
Ming Mao, Duo Zhang, Chao Xia, Dunmin Zhang & Xiaolin Li
The School of Cyberspace Security, University of Science and Technology of China, Hefei, China
Ming Mao & Duo Zhang
The Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Yunchuan Guo

Authors

Ming Mao
View author publications
You can also search for this author in PubMed Google Scholar
Duo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yunchuan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Dunmin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolin Li .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Tianjin University of Science and Technology, Tianjin, China
Chuanlei Zhang
Eastern Institute of Technology, Ningbo, China
Yijie Pan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, M., Zhang, D., Xia, C., Guo, Y., Zhang, D., Li, X. (2024). GD-PTCF: Prompt-Tuning Based Classification Framework for Government Data. In: Huang, DS., Zhang, C., Pan, Y. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14876. Springer, Singapore. https://doi.org/10.1007/978-981-97-5666-7_18

Download citation

DOI: https://doi.org/10.1007/978-981-97-5666-7_18
Published: 01 August 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5665-0
Online ISBN: 978-981-97-5666-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GD-PTCF: Prompt-Tuning Based Classification Framework for Government Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Framework for Crime Prediction Using Feature Engineering and Machine Learning

Hybrid Machine Learning Models of Classifying Residential Requests for Smart Dispatching

CATI: An Extensible Platform Supporting Assisted Classification of Large Datasets

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

GD-PTCF: Prompt-Tuning Based Classification Framework for Government Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Efficient Framework for Crime Prediction Using Feature Engineering and Machine Learning

Hybrid Machine Learning Models of Classifying Residential Requests for Smart Dispatching

CATI: An Extensible Platform Supporting Assisted Classification of Large Datasets

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation