Duplicate question detection in community-based platforms via interaction networks

Gao, Wang; Yang, Baoping; Xiao, Yue; Zeng, Peng; Hu, Xi; Zhu, Xun

doi:10.1007/s11042-023-15974-x

Duplicate question detection in community-based platforms via interaction networks

Published: 24 June 2023

Volume 83, pages 10881–10898, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wang Gao ORCID: orcid.org/0000-0001-9671-489X^1,2^na1,
Baoping Yang³^na1,
Yue Xiao^1,2^na1,
Peng Zeng^1,2,
Xi Hu^1,2 &
…
Xun Zhu^1,2

111 Accesses
1 Citation
Explore all metrics

Abstract

Community-based Question and Answering (CQA) platforms have a huge number of users, resulting in numerous duplicate questions with similar intent from different users. Effectively detecting duplicate questions can improve the findability of platforms, and enhance the user experience of viewers and writers. Existing state-of-the-art methods focus on designing the structure of multi-layer interaction networks, ignoring the problems of error propagation and loss of low-level semantics. In this paper, we propose a novel Interaction-based Siamese Network (ISN) to address these issues, which utilizes a siamese structure to learn the original semantics of questions and captures interaction information with question interactive units. During the interaction, each interactive unit takes the original semantic representation of another question as an input, thus effectively mitigating the effect of error propagation. Furthermore, we propose an aggregation strategy to propagate low-level interaction features to high-level to preserve low-level semantic information, and introduce self-attention to enhance the model’s global interaction information learning ability. Experimental results on a real-world CQA dataset show that ISN outperforms state-of-the-art models for duplicate question detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Answer Selection in Community Question Answering by Normalizing Support Answers

GEMINIO: Finding Duplicates in a Question Haystack

AUnet: An Unsupervised Method for Answer Reliability Evaluation in Community QA Systems

Availability of data and material

Data will be made available on reasonable request.

Notes

References

Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions of stack overflow. In: Proceedings of IEEE/ACM Working Conference on Mining Software Repositories (MSR). pp 402–412
Bartoszuk M, Gagolewski M (2021) T-norms or t-conorms? How to aggregate similarity degrees for plagiarism detection. Knowl-Based Syst 231:107427
Article Google Scholar
Bjerva J, Plank B, Bos J (2016) Semantic tagging with deep residual networks. In: Proceedings of International Conference on Computational Linguistics (COLING): pp 3531–3541
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 632–642
Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). pp 2406–2417
Choi J, Jung E, Suh J, Rhee W (2021) Improving bi-encoder document ranking models with two rankers and multi-teacher distillation. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). pp 2192–2196
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). pp 4171–4186
Dowty D (2007) Compositionality as an empirical problem. Direct Compositionality 14:14–23
Google Scholar
Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T (2018) Attention-fused deep matching network for natural language inference. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). pp 4033–4040
Gao W, Peng M, Wang H, Zhang Y, Xie Q, Tian G (2019) Incorporating word embeddings into topic modeling of short text. Knowl Inf Syst 61(2):1123–1145
Article Google Scholar
Gao W, Peng M, Wang H, Zhang Y, Han W, Hu G, Xie Q (2020) Generation of topic evolution graphs from short text streams. Neurocomputing 383:282–294
Article Google Scholar
Gao W, Li L, Tao X, Zhou J, Tao J (2023) Identifying informative tweets during a pandemic via a topic-aware neural language model. World Wide Web 26(1):55–70
Article Google Scholar
Gao W, Fang Y, Li L, Tao X (2021) Event detection in social media via graph neural network. In: Proceedings of International Conference on Web Information Systems Engineering (WISE). pp 370–384
Gong Y, Luo H, Zhang J (2018) Natural language inference over interaction space. In: Proceedings of International Conference on Learning Representations (ICLR). pp 1–15
Guo S, Guan Y, Li R, Li X, Tan H (2021) Frame-based multi-level semantics representation for text matching. Knowl-Based Syst 232:107454
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Imtiaz Z, Umer M, Ahmad M, Ullah S, Choi GS, Mehmood A (2020) Duplicate questions pair detection using Siamese MALSTM. IEEE Access 8:21932–21942
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 1746–1751
Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 6586–6593
Liu W, Zhu T, Mao W, Zhao Z, Guo W, Yang X, Ju Q (2022) Semantic matching from different perspectives. CoRR abs/2202.06517:1–10
Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). pp 130–136
Neutel S, de Boer MHT (2021) Towards automatic ontology alignment using BERT. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 1–12
Othman N, Faiz R, Smaïli K (2022) Learning English and Arabic question similarity with Siamese neural networks in community question answering services. Data Knowl Eng 138:101962
Article Google Scholar
Peng Q, Weir DJ, Weeds J, Chai Y (2022) Predicate-argument based bi-encoder for paraphrase identification. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). pp 5579–5589
Pörner N, Schütze H (2019) Multi-view domain adapted sentence embeddings for low-resource unsupervised duplicate question detection. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 1630–1641
Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 3980–3990
Sadeghi F, Bidgoly AJ, Amirkhani H (2022) Fake news detection on social media using a natural language inference approach. Multimed Tools Appl 81(23):33801–33821
Article Google Scholar
Shahmohammadi H, Dezfoulian M, Mansoorizadeh M (2021) Paraphrase detection using LSTM networks and handcrafted features. Multimed Tools Appl 80(4):6479–6492
Article Google Scholar
Song Y, Hu QV, He L (2019) P-CNN: enhancing text matching with positional convolutional neural network. Knowl-Based Syst 169:67–79
Article Google Scholar
Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 4929–4936
Tomar GS, Duque T, Täckström O, Uszkoreit J, Das D (2017) Neural paraphrase identification of questions with noisy pretraining. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 142–147
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). pp 5998–6008
Viji D, Revathy S (2022) A hybrid approach of weighted fine-tuned BERT extraction with deep Siamese BI-LSTM model for semantic text similarity identification. Multimed Tools Appl 81(5):6131–6157
Article Google Scholar
Wang L, Zhang L, Jiang J (2020) Duplicate question detection with deep learning in stack overflow. IEEE Access 8:25964–25975
Article Google Scholar
Wang S, Jiang J (2017) A compare-aggregate model for matching text sequences. In: Proceedings of International Conference on Learning Representations (ICLR). pp 1–15
Wan S, Lan Y, Guo J, Xu J, Pang L, Cheng X (2016) A deep architecture for semantic matching with multiple positional sentence representations. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 2835–2841
Yu C, Xue H, Jiang Y, An L, Li G (2021) A simple and efficient text matching model based on deep interaction. Inf Process Manage 58(6):102738
Article Google Scholar
Zhang Z, Wu Y, Zhao H, Li Z, Zhang S, Zhou X, Zhou X (2020) Semantics-aware BERT for language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 9628–9635
Zhou G, Zhou Y, He T, Wu W (2016) Learning semantic representation with neural networks for community question answering retrieval. Knowl-Based Syst 93:75–83
Article Google Scholar
Zhou Q, Liu X, Wang Q (2021) Interpretable duplicate question detection models based on attention mechanism. Inf Sci 543:259–272
Article Google Scholar
Zilly JG, Srivastava RK, Koutnık J, Schmidhuber J (2017) Recurrent highway networks. In: Proceedings of International Conference on Machine Learning (ICML). pp 4189–4198

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (NSFC, No.62276196), Key Research and Development Program of Hubei Province (No.2022BAD064) , Industry-University-Research Project of Wuhan Education Bureau (No.CXY202208) and Special Research Fund for Discipline Characteristics of Jianghan University (No.2022XKZK10).

Author information

Wang Gao, Baoping Yang, and Yue Xiao contributed equally to this work.

Authors and Affiliations

School of Artificial Intelligence, Jianghan University, Wuhan, 430056, China
Wang Gao, Yue Xiao, Peng Zeng, Xi Hu & Xun Zhu
Engineering Research Center for Intelligent Decision and Information Processing, Jianghan University, Wuhan, 430056, China
Wang Gao, Yue Xiao, Peng Zeng, Xi Hu & Xun Zhu
Physics and Telecommunications College of Engineering, Huanggang Normal University, Huanggang, 438000, China
Baoping Yang

Authors

Wang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Baoping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Xun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wang Gao.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, W., Yang, B., Xiao, Y. et al. Duplicate question detection in community-based platforms via interaction networks. Multimed Tools Appl 83, 10881–10898 (2024). https://doi.org/10.1007/s11042-023-15974-x

Download citation

Received: 29 September 2022
Revised: 17 April 2023
Accepted: 29 May 2023
Published: 24 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15974-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Duplicate question detection in community-based platforms via interaction networks

Abstract

Access this article

Similar content being viewed by others

Answer Selection in Community Question Answering by Normalizing Support Answers

GEMINIO: Finding Duplicates in a Question Haystack

AUnet: An Unsupervised Method for Answer Reliability Evaluation in Community QA Systems

Availability of data and material

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Duplicate question detection in community-based platforms via interaction networks

Abstract

Access this article

Similar content being viewed by others

Answer Selection in Community Question Answering by Normalizing Support Answers

GEMINIO: Finding Duplicates in a Question Haystack

AUnet: An Unsupervised Method for Answer Reliability Evaluation in Community QA Systems

Availability of data and material

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation