Category-Highlighting Transformer Network for Question Retrieval

Ma, Denghao; Chong, Li; Chen, Yueguo; Shen, Liang

doi:10.1007/978-3-031-30675-4_33

Denghao Ma¹⁵,
Li Chong¹⁶,
Yueguo Chen¹⁶ &
…
Liang Shen¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13945))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1409 Accesses
1 Citations

Abstract

Question retrieval aims to find the semantically equivalent questions from question archives for a user question. Recently, Transformer-based models have significantly advanced the progress of question retrieval, which mainly focus on capturing the content-based semantic relations of two questions. However, they can not well capture the category-based semantic relations of two questions, even question categories are very important to identify the semantic equivalence of two questions. To capture both the content-based and category-based semantic relations, we study the issue of improving Transformer by highlighting and incorporating the category information. To this end, we innovatively propose the Category-Highlighting Transformer Network (CHT). Because questions are not equipped with explicit categories, CHT first uses a category identification unit to construct category-based semantic representations for the question and its embedded words. Second, to “deeply” capture the category-based and content-based semantic relations, we develop the category-highlighting Transformer by improving the self-attention unit with the category-based representations. The cascaded category highlighting Transformers are used for modelling “individual” semantics of a question and “joint” semantics of two questions. Extensive experiments on three public datasets show that the category-highlighting Transformer network outperforms the state-of-the-art solutions.

D. Ma and L. Chong—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://data.quora.com/First-Quora-Dataset-ReleaseQuestion-Pairs.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. In: NIPS, pp. 601–608 (2001)
Google Scholar
Cai, L., Zhou, G., Liu, K., Zhao, J.: Learning the latent topics for question retrieval in community QA. In: IJCNLP, pp. 273–281 (2011)
Google Scholar
Chen, J., Chen, Q., Liu, X., Yang, H., Lu, D., Tang, B.: The BQ corpus: a large-scale domain-specific chinese corpus for sentence semantic equivalence identification. In: EMNLP, pp. 4946–4951 (2018)
Google Scholar
Das, A., Yenala, H., Chinnakotla, M.K., Shrivastava, M.: Together we stand: Siamese networks for similar question retrieval. In: ACL (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186 (2019)
Google Scholar
Hamborg, F., Breitinger, C., Gipp, B.: Giveme5W1H: a universal system for extracting main events from news articles. In: INRA@RecSys, pp. 35–43 (2019)
Google Scholar
Jeon, J., Croft, W.B., Lee, J.H.: Finding similar questions in large question and answer archives. In: CIKM, pp. 84–90 (2005)
Google Scholar
Ji, Z., Xu, F., Wang, B., He, B.: Question-answer topic model for question retrieval in community question answering. In: CIKM, pp. 2471–2474 (2012)
Google Scholar
Liu, X., et al.: LCQMC: a large-scale Chinese question matching corpus. In: COLING, pp. 1952–1962 (2018)
Google Scholar
Ma, D., Chen, Y., Chang, K.C.C., Du, X., Xu, C., Chang, Y.: Leveraging fine-grained wikipedia categories for entity search. In: WWW, pp. 1623–1632 (2018)
Google Scholar
Ma, D., Chen, Y., Du, X., Hao, Y.: Interpreting fine-grained categories from natural language queries of entity search. In: DASFAA, pp. 861–877 (2018)
Google Scholar
Murdock, V., Croft, W.B.: A translation model for sentence retrieval. In: EMNLP, pp. 684–691 (2005)
Google Scholar
Peinelt, N., Nguyen, D., Liakata, M.: tBERT: topic models and BERT joining forces for semantic similarity detection. In: ACL, pp. 7047–7055 (2020)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP-IJCNLP, pp. 3980–3990 (2019)
Google Scholar
Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)
Article Google Scholar
Song, F., Croft, W.B.: A general language model for information retrieval. In: CIKM, pp. 316–321 (1999)
Google Scholar
Sun, Y., et al./: ERNIE 2.0: a continual pre-training framework for language understanding. In: AAAI, pp. 8968–8975 (2020)
Google Scholar
Tan, C., Wei, F., Wang, W., Lv, W., Zhou, M.: Multiway attention networks for modeling sentence pairs. In: IJCAI, pp. 4411–4417 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wan, S., Lan, Y., Guo, J., Xu, J., Pang, L., Cheng, X.: A deep architecture for semantic matching with multiple positional sentence representations. In: AAAI, pp. 2835–2841 (2016)
Google Scholar
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences. In: IJCAI, pp. 4144–4150 (2017)
Google Scholar
Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: SIGIR, pp. 475–482 (2008)
Google Scholar
Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features. In: ACL, pp. 4699–4709 (2019)
Google Scholar
Zhou, G., Cai, L., Zhao, J., Liu, K.: Phrase-based translation model for question retrieval in community question answer archives. In: ACL, pp. 653–662 (2011)
Google Scholar

Download references

Acknowledgements

This work is supported by National Key Research and Development Program (No. 2020YFB1710004) and the National Science Foundation of China under the grant 62272466.

Author information

Authors and Affiliations

Meituan, Beijing, China
Denghao Ma & Liang Shen
DEKE Lab, Renmin University of China, Beijing, China
Li Chong & Yueguo Chen

Authors

Denghao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Li Chong
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Liang Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yueguo Chen .

Editor information

Editors and Affiliations

Tianjin University, Tianjin, China
Xin Wang
University of Torino, Turin, Italy
Maria Luisa Sapino
POSTECH, Pohang, Korea (Republic of)
Wook-Shin Han
University of California Santa Barbara, Santa Barbara, CA, USA
Amr El Abbadi
University of Auckland, Auckland, New Zealand
Gill Dobbie
Tianjin University, Tianjin, China
Zhiyong Feng
Beijing University of Posts and Telecommunications, Beijing, China
Yingxiao Shao
The University of Queensland, Brisbane, QLD, Australia
Hongzhi Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, D., Chong, L., Chen, Y., Shen, L. (2023). Category-Highlighting Transformer Network for Question Retrieval. In: Wang, X., et al. Database Systems for Advanced Applications. DASFAA 2023. Lecture Notes in Computer Science, vol 13945. Springer, Cham. https://doi.org/10.1007/978-3-031-30675-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-30675-4_33
Published: 15 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30674-7
Online ISBN: 978-3-031-30675-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Category-Highlighting Transformer Network for Question Retrieval