Skip to main content
Log in

Duplicate question detection in community-based platforms via interaction networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Community-based Question and Answering (CQA) platforms have a huge number of users, resulting in numerous duplicate questions with similar intent from different users. Effectively detecting duplicate questions can improve the findability of platforms, and enhance the user experience of viewers and writers. Existing state-of-the-art methods focus on designing the structure of multi-layer interaction networks, ignoring the problems of error propagation and loss of low-level semantics. In this paper, we propose a novel Interaction-based Siamese Network (ISN) to address these issues, which utilizes a siamese structure to learn the original semantics of questions and captures interaction information with question interactive units. During the interaction, each interactive unit takes the original semantic representation of another question as an input, thus effectively mitigating the effect of error propagation. Furthermore, we propose an aggregation strategy to propagate low-level interaction features to high-level to preserve low-level semantic information, and introduce self-attention to enhance the model’s global interaction information learning ability. Experimental results on a real-world CQA dataset show that ISN outperforms state-of-the-art models for duplicate question detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and material

Data will be made available on reasonable request.

Notes

  1. https://stackoverflow.com

  2. https://github.com/gaowjhun/12huazhong

  3. https://github.com/google-research/bert

  4. https://github.com/stanfordnlp/GloVe

References

  1. Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions of stack overflow. In: Proceedings of IEEE/ACM Working Conference on Mining Software Repositories (MSR). pp 402–412

  2. Bartoszuk M, Gagolewski M (2021) T-norms or t-conorms? How to aggregate similarity degrees for plagiarism detection. Knowl-Based Syst 231:107427

    Article  Google Scholar 

  3. Bjerva J, Plank B, Bos J (2016) Semantic tagging with deep residual networks. In: Proceedings of International Conference on Computational Linguistics (COLING): pp 3531–3541

  4. Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 632–642

  5. Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). pp 2406–2417

  6. Choi J, Jung E, Suh J, Rhee W (2021) Improving bi-encoder document ranking models with two rankers and multi-teacher distillation. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). pp 2192–2196

  7. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). pp 4171–4186

  8. Dowty D (2007) Compositionality as an empirical problem. Direct Compositionality 14:14–23

    Google Scholar 

  9. Duan C, Cui L, Chen X, Wei F, Zhu C, Zhao T (2018) Attention-fused deep matching network for natural language inference. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). pp 4033–4040

  10. Gao W, Peng M, Wang H, Zhang Y, Xie Q, Tian G (2019) Incorporating word embeddings into topic modeling of short text. Knowl Inf Syst 61(2):1123–1145

    Article  Google Scholar 

  11. Gao W, Peng M, Wang H, Zhang Y, Han W, Hu G, Xie Q (2020) Generation of topic evolution graphs from short text streams. Neurocomputing 383:282–294

    Article  Google Scholar 

  12. Gao W, Li L, Tao X, Zhou J, Tao J (2023) Identifying informative tweets during a pandemic via a topic-aware neural language model. World Wide Web 26(1):55–70

    Article  Google Scholar 

  13. Gao W, Fang Y, Li L, Tao X (2021) Event detection in social media via graph neural network. In: Proceedings of International Conference on Web Information Systems Engineering (WISE). pp 370–384

  14. Gong Y, Luo H, Zhang J (2018) Natural language inference over interaction space. In: Proceedings of International Conference on Learning Representations (ICLR). pp 1–15

  15. Guo S, Guan Y, Li R, Li X, Tan H (2021) Frame-based multi-level semantics representation for text matching. Knowl-Based Syst 232:107454

    Article  Google Scholar 

  16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  17. Imtiaz Z, Umer M, Ahmad M, Ullah S, Choi GS, Mehmood A (2020) Duplicate questions pair detection using Siamese MALSTM. IEEE Access 8:21932–21942

    Article  Google Scholar 

  18. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 1746–1751

  19. Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 6586–6593

  20. Liu W, Zhu T, Mao W, Zhao Z, Guo W, Yang X, Ju Q (2022) Semantic matching from different perspectives. CoRR abs/2202.06517:1–10

  21. Mou L, Men R, Li G, Xu Y, Zhang L, Yan R, Jin Z (2016) Natural language inference by tree-based convolution and heuristic matching. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). pp 130–136

  22. Neutel S, de Boer MHT (2021) Towards automatic ontology alignment using BERT. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 1–12

  23. Othman N, Faiz R, Smaïli K (2022) Learning English and Arabic question similarity with Siamese neural networks in community question answering services. Data Knowl Eng 138:101962

    Article  Google Scholar 

  24. Peng Q, Weir DJ, Weeds J, Chai Y (2022) Predicate-argument based bi-encoder for paraphrase identification. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). pp 5579–5589

  25. Pörner N, Schütze H (2019) Multi-view domain adapted sentence embeddings for low-resource unsupervised duplicate question detection. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 1630–1641

  26. Reimers N, Gurevych I (2019) Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 3980–3990

  27. Sadeghi F, Bidgoly AJ, Amirkhani H (2022) Fake news detection on social media using a natural language inference approach. Multimed Tools Appl 81(23):33801–33821

    Article  Google Scholar 

  28. Shahmohammadi H, Dezfoulian M, Mansoorizadeh M (2021) Paraphrase detection using LSTM networks and handcrafted features. Multimed Tools Appl 80(4):6479–6492

    Article  Google Scholar 

  29. Song Y, Hu QV, He L (2019) P-CNN: enhancing text matching with positional convolutional neural network. Knowl-Based Syst 169:67–79

    Article  Google Scholar 

  30. Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 4929–4936

  31. Tomar GS, Duque T, Täckström O, Uszkoreit J, Das D (2017) Neural paraphrase identification of questions with noisy pretraining. In: Proceedings of Empirical Methods in Natural Language Processing (EMNLP). pp 142–147

  32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). pp 5998–6008

  33. Viji D, Revathy S (2022) A hybrid approach of weighted fine-tuned BERT extraction with deep Siamese BI-LSTM model for semantic text similarity identification. Multimed Tools Appl 81(5):6131–6157

    Article  Google Scholar 

  34. Wang L, Zhang L, Jiang J (2020) Duplicate question detection with deep learning in stack overflow. IEEE Access 8:25964–25975

    Article  Google Scholar 

  35. Wang S, Jiang J (2017) A compare-aggregate model for matching text sequences. In: Proceedings of International Conference on Learning Representations (ICLR). pp 1–15

  36. Wan S, Lan Y, Guo J, Xu J, Pang L, Cheng X (2016) A deep architecture for semantic matching with multiple positional sentence representations. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 2835–2841

  37. Yu C, Xue H, Jiang Y, An L, Li G (2021) A simple and efficient text matching model based on deep interaction. Inf Process Manage 58(6):102738

    Article  Google Scholar 

  38. Zhang Z, Wu Y, Zhao H, Li Z, Zhang S, Zhou X, Zhou X (2020) Semantics-aware BERT for language understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). pp 9628–9635

  39. Zhou G, Zhou Y, He T, Wu W (2016) Learning semantic representation with neural networks for community question answering retrieval. Knowl-Based Syst 93:75–83

    Article  Google Scholar 

  40. Zhou Q, Liu X, Wang Q (2021) Interpretable duplicate question detection models based on attention mechanism. Inf Sci 543:259–272

    Article  Google Scholar 

  41. Zilly JG, Srivastava RK, Koutnık J, Schmidhuber J (2017) Recurrent highway networks. In: Proceedings of International Conference on Machine Learning (ICML). pp 4189–4198

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (NSFC, No.62276196), Key Research and Development Program of Hubei Province (No.2022BAD064) , Industry-University-Research Project of Wuhan Education Bureau (No.CXY202208) and Special Research Fund for Discipline Characteristics of Jianghan University (No.2022XKZK10).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wang Gao.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, W., Yang, B., Xiao, Y. et al. Duplicate question detection in community-based platforms via interaction networks. Multimed Tools Appl 83, 10881–10898 (2024). https://doi.org/10.1007/s11042-023-15974-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15974-x

Keywords

Navigation