skip to main content
10.1145/3511808.3557582acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper
Open access

Discriminative Language Model via Self-Teaching for Dense Retrieval

Published: 17 October 2022 Publication History

Abstract

Dense retrieval (DR) has shown promising results in many information retrieval (IR) related tasks, whose foundation is high-quality text representations for effective search. Taking the pre-trained language models (PLMs) as the text encoders has become a popular choice in DR. However, the learned representations based on these PLMs often lose the discriminative power, and thus hurt the recall performance, particularly as PLMs consider too much content of the input texts. Therefore, in this work, we propose to pre-train a discriminative language representation model, called DiscBERT, for DR. The key idea is that a good text representation should be able to automatically keep those discriminative features that could well distinguish different texts from each other in the semantic space. Specifically, inspired by knowledge distillation, we employ a simple yet effective training method, called self-teaching, to distill the model's knowledge constructed when training on the sampled representative tokens of a text sequence into the model's knowledge for the entire text sequence. By further fine-tuning on publicly available retrieval benchmark datasets, DiscBERT can outperform the state-of-the-art retrieval methods.

References

[1]
Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. arXiv preprint arXiv:2010.00768 (2020).
[2]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. Ms marco: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).
[3]
Zhuyun Dai and Jamie Callan. 2020a. Context-aware document term weighting for ad-hoc search. In WWW.
[4]
Zhuyun Dai and Jamie Callan. 2020b. Context-aware term weighting for first stage passage retrieval. In SIGIR.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.
[6]
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran. 2017. Efficient Knowledge Distillation from an Ensemble of Teachers. In Interspeech.
[7]
Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, and Jamie Callan. 2021. Complement lexical retrieval model with semantic residual embeddings. In ECIR.
[8]
Sangchul Hahn and Heeyoul Choi. 2019. Self-Knowledge Distillation in Natural Language Processing. In RANLP 2019.
[9]
Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[10]
Ganesh Jawahar, Beno^it Sagot, and Djamé Seddah. 2019. What does BERT learn about the structure of language?. In ACL.
[11]
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for Natural Language Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020.
[12]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP.
[13]
Victor Lavrenko and W Bruce Croft. 2017. Relevance-based language models. In SIGIR Forum.
[14]
Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. 2021. In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval. In RepL4NLP.
[15]
Shuqi Lu, Di He, Chenyan Xiong, Guolin Ke, Waleed Malik, Zhicheng Dou, Paul Bennett, Tie-Yan Liu, and Arnold Overwijk. 2021. Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder. In EMNLP.
[16]
Yi Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2020. Sparse, dense, and attentional representations for text retrieval. arXiv preprint arXiv:2005.00181 (2020).
[17]
Craswell Nick, Mitra Bhaskar, Yilmaz Emine, Campos Daniel, and Ellen Voorhees M. 2021. Overview of the TREC 2019 deep learning track. (2021).
[18]
Stephen E Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR.
[19]
Gerard Salton and Michael J McGill. 1983. Introduction to modern information retrieval. mcgraw-hill.
[20]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
[21]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR (2008).
[22]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv preprint arXiv:2002.10957 (2020).
[23]
Yige Xu, Xipeng Qiu, Ligao Zhou, and Xuanjing Huang. 2020. Improving bert fine-tuning via self-ensemble and self-distillation. arXiv preprint arXiv:2002.10345 (2020).
[24]
Yuanmeng Yan, Rumei Li, Sirui Wang, Fuzheng Zhang, Wei Wu, and Weiran Xu. 2021. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer. arXiv preprint arXiv:2105.11741 (2021).
[25]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. arXiv preprint arXiv:2104.08051 (2021).
[26]
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2020. RepBERT: Contextualized text embeddings for first-stage retrieval. arXiv preprint arXiv:2006.15498 (2020).
[27]
Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. 2021. Self-distillation: Towards efficient and compact neural networks. (2021).
[28]
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
[29]
Wangchunshu Zhou, Canwen Xu, and Julian McAuley. 2022. BERT Learns to Teach: Knowledge Distillation with Meta Learning. In ACL.

Index Terms

  1. Discriminative Language Model via Self-Teaching for Dense Retrieval

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
      October 2022
      5274 pages
      ISBN:9781450392365
      DOI:10.1145/3511808
      • General Chairs:
      • Mohammad Al Hasan,
      • Li Xiong
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dense document retrieval
      2. discriminative representation
      3. self-teaching

      Qualifiers

      • Short-paper

      Funding Sources

      • the Youth Innovation Promotion Association CAS
      • the Lenovo-CAS Joint Lab Youth Scientist Project
      • the National Natural Science Foundation of China (NSFC)
      • the Young Elite Scientist Sponsorship Program by CAST

      Conference

      CIKM '22
      Sponsor:

      Acceptance Rates

      CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 457
        Total Downloads
      • Downloads (Last 12 months)112
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 08 Mar 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media