skip to main content
research-article

OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification

Published: 09 May 2023 Publication History

Abstract

User intent classification is a vital task for analyzing users’ essential requirements from the users’ input query in information retrieval systems, question answering systems, and dialogue systems. Pre-trained language model Bidirectional Encoder Representation from Transformers (BERT) has been widely applied to the user intent classification task. However, BERT is compute intensive and time-consuming during inference and usually causes latency in real-time applications. To improve the inference efficiency of BERT for the user intent classification task, this article proposes a new network named one-stage deep-supervised early-exiting BERT as one-stage deep-supervised early-exiting BERT (OdeBERT). In addition, a deep supervision strategy is developed to incorporate the network with internal classifiers by one-stage joint training to improve the learning process of classifiers by extracting discriminative category features. Experiments are conducted on publicly available datasets, including ECDT, SNIPS, and FDQuestion. The results show that the OdeBERT can speed up original BERT 12 times faster at most with the same performance, outperforming state-of-the-art baseline methods.

References

[1]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). Association for Computational Linguistics, 4171–4186.
[2]
Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for joint intent classification and slot filling. arXiv:1902.10909. Retrieved from https://arxiv.org/abs/1902.10909
[3]
Yuanxia Liu, Hai Liu, Leung-Pun Wong, Lap-Kei Lee, Haijun Zhang, and Tianyong Hao. 2020. A hybrid neural network RBERT-C Based on Pre-trained RoBERTa and CNN for user intent classification. In Proceedings of Neural Computing for Advanced Applications (NCAA’20). Springer, Singapore, 306–319.
[4]
Changai He, Sibao Chen, Shilei Huang, and Jian Zhang. 2019. Using convolutional neural network with BERT for intent determination. In Proceedings of International Conference on Asian Language Processing (IALP’19). IEEE, China. 65–70.
[5]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, Doha, Qatar, 1746–1751.
[6]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI'16). AAAI Press, 2873–2879.
[7]
Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with adaptive width and depth. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS’20), 9782–9793.
[8]
Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Dawei Song, and Ming Zhou. 2019. A tensorized transformer for language modeling. In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS’19), 2232–2242.
[9]
Mitchell Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing BERT: Studying the effects of weight pruning on transfer learning. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, 143–155.
[10]
J. S. McCarley. 2019. Pruning a bert-based question answering model. arXiv:1910.06360. Retrieved from https://arxiv.org/pdf/1910.06360v1
[11]
Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5797–5808.
[12]
Paul Michel, Omer Levy, and Graham Neubig. 2019. Are sixteen heads really better than one? In Proceedings of 33rd Conference on Neural Information Processing Systems (NeurIPS’19), 14014–14024.
[13]
Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8bert: Quantized 8bit bert. In Proceedings of 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS’19). IEEE, 36–39.
[14]
Aishwarya Bhandare, Vamsi Sripathi, Deepthi Karkada, Vivek Menon, Sun Choi, Kushal Datta, and Vikram Saletore. 2019. Efficient 8-bit quantization of transformer neural machine language translation model. arXiv:1906.00532. Retrieved from https://arxiv.org/abs/1906.00532
[15]
Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2019. Q-BERT: Hessian based ultra low precision quantization of BERT. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’19). 8815–8821.
[16]
Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. 2019. Patient knowledge distillation for bert model compression. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 4323–4332.
[17]
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. TinyBERT: Distilling BERT for natural language understanding. In Proceedings of the Association for Computational Linguistics (EMNLP’20). Association for Computational Linguistics, 4163–4174.
[18]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv:1910.01108. Retrieved from https://arxiv.org/abs/1910.01108v4
[19]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https://arxiv.org/abs/1909.11942
[20]
Canwen Xu and Julian McAuley. 2022. A survey on dynamic neural networks for natural language processing. arXiv:2202.07101. Retrieved from https://arxiv.org/pdf/2202.07101
[21]
Surat Teerapittayanon, Bradley McDanel, and HsiangTsung Kung. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR’17). 2464–2469.
[22]
Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-deep networks: Understanding and mitigating network overthinking. In Proceedings of the 36th International Conference on Machine Learning (ICML’19). PMLR, 3301–3310.
[23]
Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, and Gao Huang. 2019. Improved techniques for training adaptive deep networks. In Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV’19). IEEE, 1891–1900.
[24]
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV’19). IEEE, 3712–3721.
[25]
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic early exiting for accelerating BERT inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2246–2251.
[26]
Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, and Noah A. Smith. 2020. The right tool for the job: Matching model and instance complexities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6640–6651.
[27]
Shijie Geng, Peng Gao, Zuohui Fu, and Yongfeng Zhang. Romebert: Robust training of multi-exit bert. arXiv: 2101.09755. Retrieved from https://arxiv.org/abs/2101.09755v1.
[28]
Stefanos Laskaridis, Alexandros Kouris, and Nicholas D. Lane. Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning. Association for Computing Machinery, New York, NY, 1–6.
[29]
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing System. 3859–3869.
[30]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large margin cosine loss for deep face recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE, 5265–5274.

Cited By

View all
  • (2024)Early-Exit Deep Neural Network - A Comprehensive SurveyACM Computing Surveys10.1145/369876757:3(1-37)Online publication date: 22-Nov-2024
  • (2024)Self-adaptive Education Resource Allocation Using BERT ModelProceedings of the 2024 International Conference on Machine Intelligence and Digital Applications10.1145/3662739.3672178(517-522)Online publication date: 30-May-2024
  • (2024)Improving Open Intent Detection via Triplet-Contrastive Learning and Adaptive BoundaryIEEE Transactions on Consumer Electronics10.1109/TCE.2024.336389670:1(2806-2816)Online publication date: 19-Feb-2024
  • Show More Cited By

Index Terms

  1. OdeBERT: One-stage Deep-supervised Early-exiting BERT for Fast Inference in User Intent Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
    May 2023
    653 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3596451
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 May 2023
    Online AM: 13 March 2023
    Accepted: 07 March 2023
    Revised: 18 January 2023
    Received: 10 May 2022
    Published in TALLIP Volume 22, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. OdeBERT
    2. user intent classification
    3. inference
    4. BERT
    5. deep supervision

    Qualifiers

    • Research-article

    Funding Sources

    • Research Grants Council of the Hong Kong Special Administrative Region, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)92
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Early-Exit Deep Neural Network - A Comprehensive SurveyACM Computing Surveys10.1145/369876757:3(1-37)Online publication date: 22-Nov-2024
    • (2024)Self-adaptive Education Resource Allocation Using BERT ModelProceedings of the 2024 International Conference on Machine Intelligence and Digital Applications10.1145/3662739.3672178(517-522)Online publication date: 30-May-2024
    • (2024)Improving Open Intent Detection via Triplet-Contrastive Learning and Adaptive BoundaryIEEE Transactions on Consumer Electronics10.1109/TCE.2024.336389670:1(2806-2816)Online publication date: 19-Feb-2024
    • (2024)Risk Early Warning of a Dynamic Ideological and Political Education System Based on LSTM-MLP: Online Education Data Processing and OptimizationMobile Networks and Applications10.1007/s11036-024-02439-029:2Online publication date: 1-Apr-2024
    • (2024)The CHIP 2023 Shared Task 6: Chinese Diabetes Question ClassificationHealth Information Processing. Evaluation Track Papers10.1007/978-981-97-1717-0_18(197-204)Online publication date: 20-Mar-2024
    • (2023)Shared Task 1 on NCAA 2023: Chinese Diabetes Question ClassificationInternational Conference on Neural Computing for Advanced Applications10.1007/978-981-99-5847-4_42(591-596)Online publication date: 30-Aug-2023
    • (2023)A Triplet-Contrastive Representation Learning Strategy for Open Intent DetectionInternational Conference on Neural Computing for Advanced Applications10.1007/978-981-99-5847-4_17(229-244)Online publication date: 30-Aug-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media