research-article

Text Classification Network Based on Token Dynamic Decay

Authors:

Guanghong Zhou,

Lixia XueAuthors Info & Claims

CSAIDE '24: Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy

Pages 153 - 157

https://doi.org/10.1145/3672919.3672949

Published: 24 July 2024 Publication History

Abstract

The existing pre-trained models have made significant strides in reducing computation time, with a substantial number of these models expediting inference by eliminating less impactful words from the text. However, these models tend to overlook the generalization effect of simple samples on high-level model parameters. Moreover, the direct deletion of words in these models can result in a notable number of misdeletions, thereby impacting the original semantic information of the text. To address these issues, this paper introduces a model for Sample Influence Adaptation Training and Token Dynamically Attenuation Inferred. Specifically, during the training phase, we dynamically adjust the text masking by assessing the semantic analysis difficulty of the text, thereby diminishing the generalization ability of simple samples on high-level models. In the inference phase, a double-layer superimposed influence score method is employed to dynamically decay and remove unimportant tokens. Subsequently, various dimensions of textual information are extracted to calculate the influence score for each token, ultimately eliminating tokens with reduced influence. Extensive experiments conducted on text classification datasets demonstrate that our approach achieves comparable inference speeds to other models while maintaining higher accuracy across multiple datasets.

References

[1]

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv abs/1810.04805, 2019.

[2]

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. ArXiv abs/1907.11692, 2019.

[3]

Goyal, Saurabh "PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination." International Conference on Machine Learning, 2020.

[4]

Ye, D., Lin, Y., Huang, Y., Sun, M.: Tr-bert: Dynamic token reduction for accelerating bert inference. In: North American Chapter of the Association for Computational Linguistics, 2021.

[5]

Huang, Z., Hou, L., Shang, L., Jiang, X., Chen, X., Liu, Q.: Ghostbert: Generate more features with cheap operations for bert. In: Annual Meeting of the Association for Computational Linguistics, 2021.

[6]

Modarressi, A., Mohebbi, H., Pilehvar, M.T.: Adapler: Speeding up inference by adaptive length reduction. In: Annual Meeting of the Association for Computational Linguistics, 2022.

[7]

Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., Ju, Q.: Fastbert: a self-distilling bert with adaptive inference time. In: Annual Meeting of the Association for Computational Linguistics, 2020.

[8]

Eyzaguirre, C., R'io, F.D., Araujo, V., & Soto, '., 2021. DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference. ArXiv, abs/2109.11745.

[9]

Wu, W., & Zhuo, H.H., 2023. DPBERT: Efficient Inference for BERT based on Dynamic Planning. ArXiv, abs/2308.00108.

[10]

Salloum, S.A., Khan, R., Shaalan, K.: A survey of semantic analysis approaches. In: Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), pp. 61–70, 2020. Springer.

[11]

Zhu, W., LeeBERT: Learned Early Exit for BERT with cross-level optimization. Annual Meeting of the Association for Computational Linguistics, 2021.

[12]

Xu, Yangyan "MetaBERT: Collaborative Meta-Learning for Accelerating BERT Inference." 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 2023: 119-124.

[13]

Campos, Daniel Fernando, Alexandre Marques, Mark Kurtz and Chengxiang Zhai. "oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes." SUSTAINLP, 2023.

[14]

Chen, D., Li, Y., Qiu, M., Wang, Z., Li, B., Ding, B., Deng, H., Huang, J., Lin, W., Zhou, J.: Adabert: Task-adaptive bert compression with differentiable neural architecture search. In: International Joint Conference on Artificial Intelligence, 2020.

[15]

Tian, Jiayi, Chao Fang, Hong Wang and Zhongfeng Wang. "Bebert: Efficient And Robust Binary Ensemble Bert." ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022: 1-5.

[16]

Kim, Sehoon "SqueezeLLM: Dense-and-Sparse Quantization." ArXiv abs/2306.07629, 2023: n. pag.

[17]

Ganesh, Prakhar "Compressing Large-Scale Transformer-Based Models: A Case Study on BERT." Transactions of the Association for Computational Linguistics 9, 2020: 1061-1080.

[18]

P. Michel, "Are sixteen heads really better than one?," in Proc. Conf. Neural Informat. Process. Syst.," 2019, pp. 14014–14024.

[19]

Eyzaguirre, C., del R'io, F., Araujo, V., Soto, A.: Dact-bert: Differentiable adaptive computation time for an efficient bert inference. ArXivabs/2109.11745, 2021.

[20]

Yan, Y., Li, L.: Adaensemble: Learning adaptively sparse structured ensemble network for click-through rate prediction. ArXivabs/2301.08353, 2023.

Index Terms

Text Classification Network Based on Token Dynamic Decay
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Lexical semantics

Recommendations

A token centric part-of-speech tagger for biomedical text
AIME'11: Proceedings of the 13th conference on Artificial intelligence in medicine

A difficulty with part-of-speech (POS) tagging of biomedical text is accessing and annotating appropriate training corpora. The latter may result in POS taggers trained on corpora that differ from the tagger's target biomedical text. In such cases where ...
Multi-prototype Morpheme Embedding for Text Classification
SMA 2020: The 9th International Conference on Smart Media and Applications

Representing a word into a continuous space, also known as a word vector, has been successful in various NLP tasks. The word-based embedding has two problems; one is the out-of-vocabulary problem and the other is does not take into account the context ...
Automatic Identification of Stop Words in Chinese Text Classification
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 01

Text classification is an active research area in information retrieval and natural language processing. A fundamental tool in text classification is a list of 'stop' words(stop word list) that is used to identify frequent words that are unlikely to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSAIDE '24: Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy

March 2024

676 pages

ISBN:9798400718212

DOI:10.1145/3672919

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

CSAIDE 2024

CSAIDE 2024: 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy

March 1 - 3, 2024

Nanjing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
13
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten