Abstract
Government Data (GD), crucial for fostering social and economic growth, must adhere to specific classification standards and formats to ensure public accessibility and usability. Despite its potential, GD is currently hindered by a scarcity of high-quality, classified samples and the labor-intensive process of manual classification. To overcome these obstacles, our study introduces a Prompt-Tuning Classification Framework for Government Data (GD-PTCF), designed for automated classification. Initially, we employed web crawling techniques to amass an extensive dataset of Chinese government data. Subsequently, we unveiled a Classification Prompting Pattern (CPP) and utilized a BERT-based neural network, dubbed the Roberta Encoder (RE-coder), to facilitate few-shot prompt-tuning. This approach enables us to achieve remarkable classification accuracy with minimal training data. To further diminish the reliance on manual efforts, we developed a clustering mapping (CLM) strategy. This technique transforms encoded labeled embeddings into clustered embedding, which are then classified based on their proximity to predefined classification centers. Our experimental findings affirm that the GD-PTCF methodology significantly outperforms other pre-trained models in classification accuracy, even with a limited volume of training data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Song, Y., Li, Z., He, J., Li, Z., Fang, X., Chen, D.: Employing auto-annotated data for government document classification. In: ICIAI 2019: The 3rd International Conference on Innovation in Artificial Intelligence, Suzhou, China, 15–18 March 2019, pp. 121–125 (2019)
Pajak, K., Pajak, D.: Multilingual fine-tuning for grammatical error correction. Expert Syst. Appl. 200, 116948 (2022)
Feng, L., Zhang, L., Wang, J., Feng, J.: How to promote the participation of enterprises using open government data? Evolutionary game analysis by applying dynamic measures. Expert Syst. Appl. 238(Part F), 122348 (2024)
Crusoe, J., Clarinval, A.: Classification of open government data solutions’ help: a novel taxonomy and cluster analysis. In: Lindgren Lindgren, I., et al. (eds.) EGOV 2023. LNCS, vol. 14130, pp. 230–245. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41138-0_15
Pan, P., Chen, Y.: Automatic subject classification of public messages in e-government affairs. Data Inf. Manag. 5(3), 336–347 (2021)
Huang, W., Su, C., Wang, Y.: An intelligent work order classification model for government service based on multi-label neural network. Comput. Commun. 172, 19–24 (2021)
Mosin, V.D., Samenko, I., Kozlovskii, B., Tikhonov, A., Yamshchikov, I.P.: Fine-tuning transformers: vocabulary transfer. Artif. Intell. 317, 103860 (2023)
Liga, D., Robaldo, L.: Fine-tuning GPT-3 for legal rule classification. Comput. Law Secur. Rev. 51, 105864 (2023)
Ni, S., Kao, H.: KPT++: refined knowledgeable prompt tuning for few-shot text classification. Knowl. Based Syst. 274, 110647 (2023)
Cao, R., Wang, Y., Gao, L., Yang, M.: DictPrompt: comprehensive dictionary-integrated prompt tuning for pre-trained language model. Knowl. Based Syst. 273, 110605 (2023)
Hambro, E., et al.: Dungeons and data: a large-scale nethack dataset. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, pp. 24864–24878 (2022)
Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. EACL 2021, Online, 19–23 April 2021, pp. 255–269 (2021)
Acknowledgement
This work was supported by the Fundamental Research Funds for the Central Universities (Grant Number: 3282023017).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mao, M., Zhang, D., Xia, C., Guo, Y., Zhang, D., Li, X. (2024). GD-PTCF: Prompt-Tuning Based Classification Framework for Government Data. In: Huang, DS., Zhang, C., Pan, Y. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14876. Springer, Singapore. https://doi.org/10.1007/978-981-97-5666-7_18
Download citation
DOI: https://doi.org/10.1007/978-981-97-5666-7_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5665-0
Online ISBN: 978-981-97-5666-7
eBook Packages: Computer ScienceComputer Science (R0)