skip to main content
10.1145/3672919.3672949acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsaideConference Proceedingsconference-collections
research-article

Text Classification Network Based on Token Dynamic Decay

Published: 24 July 2024 Publication History

Abstract

The existing pre-trained models have made significant strides in reducing computation time, with a substantial number of these models expediting inference by eliminating less impactful words from the text. However, these models tend to overlook the generalization effect of simple samples on high-level model parameters. Moreover, the direct deletion of words in these models can result in a notable number of misdeletions, thereby impacting the original semantic information of the text. To address these issues, this paper introduces a model for Sample Influence Adaptation Training and Token Dynamically Attenuation Inferred. Specifically, during the training phase, we dynamically adjust the text masking by assessing the semantic analysis difficulty of the text, thereby diminishing the generalization ability of simple samples on high-level models. In the inference phase, a double-layer superimposed influence score method is employed to dynamically decay and remove unimportant tokens. Subsequently, various dimensions of textual information are extracted to calculate the influence score for each token, ultimately eliminating tokens with reduced influence. Extensive experiments conducted on text classification datasets demonstrate that our approach achieves comparable inference speeds to other models while maintaining higher accuracy across multiple datasets.

References

[1]
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv abs/1810.04805, 2019.
[2]
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. ArXiv abs/1907.11692, 2019.
[3]
Goyal, Saurabh "PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination." International Conference on Machine Learning, 2020.
[4]
Ye, D., Lin, Y., Huang, Y., Sun, M.: Tr-bert: Dynamic token reduction for accelerating bert inference. In: North American Chapter of the Association for Computational Linguistics, 2021.
[5]
Huang, Z., Hou, L., Shang, L., Jiang, X., Chen, X., Liu, Q.: Ghostbert: Generate more features with cheap operations for bert. In: Annual Meeting of the Association for Computational Linguistics, 2021.
[6]
Modarressi, A., Mohebbi, H., Pilehvar, M.T.: Adapler: Speeding up inference by adaptive length reduction. In: Annual Meeting of the Association for Computational Linguistics, 2022.
[7]
Liu, W., Zhou, P., Zhao, Z., Wang, Z., Deng, H., Ju, Q.: Fastbert: a self-distilling bert with adaptive inference time. In: Annual Meeting of the Association for Computational Linguistics, 2020.
[8]
Eyzaguirre, C., R'io, F.D., Araujo, V., & Soto, '., 2021. DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference. ArXiv, abs/2109.11745.
[9]
Wu, W., & Zhuo, H.H., 2023. DPBERT: Efficient Inference for BERT based on Dynamic Planning. ArXiv, abs/2308.00108.
[10]
Salloum, S.A., Khan, R., Shaalan, K.: A survey of semantic analysis approaches. In: Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), pp. 61–70, 2020. Springer.
[11]
Zhu, W., LeeBERT: Learned Early Exit for BERT with cross-level optimization. Annual Meeting of the Association for Computational Linguistics, 2021.
[12]
Xu, Yangyan "MetaBERT: Collaborative Meta-Learning for Accelerating BERT Inference." 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD) 2023: 119-124.
[13]
Campos, Daniel Fernando, Alexandre Marques, Mark Kurtz and Chengxiang Zhai. "oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes." SUSTAINLP, 2023.
[14]
Chen, D., Li, Y., Qiu, M., Wang, Z., Li, B., Ding, B., Deng, H., Huang, J., Lin, W., Zhou, J.: Adabert: Task-adaptive bert compression with differentiable neural architecture search. In: International Joint Conference on Artificial Intelligence, 2020.
[15]
Tian, Jiayi, Chao Fang, Hong Wang and Zhongfeng Wang. "Bebert: Efficient And Robust Binary Ensemble Bert." ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022: 1-5.
[16]
Kim, Sehoon "SqueezeLLM: Dense-and-Sparse Quantization." ArXiv abs/2306.07629, 2023: n. pag.
[17]
Ganesh, Prakhar "Compressing Large-Scale Transformer-Based Models: A Case Study on BERT." Transactions of the Association for Computational Linguistics 9, 2020: 1061-1080.
[18]
P. Michel, "Are sixteen heads really better than one?," in Proc. Conf. Neural Informat. Process. Syst.," 2019, pp. 14014–14024.
[19]
Eyzaguirre, C., del R'io, F., Araujo, V., Soto, A.: Dact-bert: Differentiable adaptive computation time for an efficient bert inference. ArXivabs/2109.11745, 2021.
[20]
Yan, Y., Li, L.: Adaensemble: Learning adaptively sparse structured ensemble network for click-through rate prediction. ArXivabs/2301.08353, 2023.

Index Terms

  1. Text Classification Network Based on Token Dynamic Decay

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CSAIDE '24: Proceedings of the 2024 3rd International Conference on Cyber Security, Artificial Intelligence and Digital Economy
    March 2024
    676 pages
    ISBN:9798400718212
    DOI:10.1145/3672919
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CSAIDE 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 13
      Total Downloads
    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media