skip to main content
10.1145/3209978.3210110acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Related or Duplicate: Distinguishing Similar CQA Questions via Convolutional Neural Networks

Published: 27 June 2018 Publication History

Abstract

Plenty of research attempts target the automatic duplicate detection in Community Question Answering (CQA) systems and frame the task as a supervised learning problem on the question pairs. However, these methods rely on handcrafted features, leading to the difficulty of distinguishing related and duplicate questions as they are often textually similar. To tackle this issue, we propose to leverage neural network architecture to extract "deep" features to identify whether a question pair is duplicate or related. In particular, we construct question correlation matrices, which capture the word-wise similarities between questions. The constructed matrices are input to our proposed convolutional neural network (CNN), in which the convolutional operation moves through the two dimensions of the matrices. Empirical studies on a range of real-world CQA datasets confirm the effectiveness of our proposed correlation matrices and the CNN. Our method outperforms the state-of-the-art methods and achieves better classification performance.

References

[1]
Muhammad Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K. Roy, and Kevin A. Schneider . 2016. Mining Duplicate Questions in Stack Overflow. In Proc. of MSR 2016. 402--412.
[2]
Ruey-Cheng Chen, Evi Yulianti, Mark Sanderson, and W. Bruce Croft . 2017. On the Benefit of Incorporating External Features in a Neural Architecture for Answer Sentence Selection. In Proc. of the SIGIR 2017. 1017--1020.
[3]
Denzil Correa and Ashish Sureka . 2014. Chaff from the Wheat: Characterization and Modeling of Deleted Questions on Stack Overflow. In Proc. of WWW 2014. 631--642.
[4]
Diederik P. Kingma and Jimmy Ba . 2014. Adam: A Method for Stochastic Optimization. CoRR Vol. abs/1412.6980 (2014).
[5]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean . 2013. Distributed Representations of Words and Phrases and their Compositionality Proc. of NIPS 2013. 3111--3119.
[6]
Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng . 2016. Text Matching as Image Recognition. In Proce. of the AAAI 2016. 2793--2799.
[7]
Jinfeng Rao, Hua He, and Jimmy Lin . 2017. Experiments with Convolutional Neural Network Models for Answer Selection Proc. of the SIGIR 2017. 1217--1220.
[8]
Aliaksei Severyn and Alessandro Moschitti . 2015. Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks Proc. of the SIGIR 2015. 373--382.
[9]
Anna Shtok, Gideon Dror, Yoelle Maarek, and Idan Szpektor . 2012. Learning from the Past: Answering New Questions with Past Answers Proc. of WWW 2012. 759--768.
[10]
Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, and Xueqi Cheng . 2016. Modeling Document Novelty with Neural Tensor Network for Search Result Diversification. In Proc. of the SIGIR 2016. 395--404.
[11]
Wei Emma Zhang, Quan Z. Sheng, Jey Han Lau, and Ermyas Abebe . 2017. Detecting Duplicate Posts in Programming QA Communities via Latent Semantics and Association Rules. In Proc. of WWW 2017. 1221--1229.

Cited By

View all
  • (2025)Detecting question relatedness in programming Q&A communities via bimodal feature fusionAutomated Software Engineering10.1007/s10515-024-00482-532:1Online publication date: 4-Jan-2025
  • (2022)Exploring Topic Supervision with BERT for Text Matching2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892023(1-7)Online publication date: 18-Jul-2022
  • (2022)TAQS: An Arabic Question Similarity System Using Transfer Learning of BERT With BiLSTMIEEE Access10.1109/ACCESS.2022.319895510(91509-91523)Online publication date: 2022
  • Show More Cited By

Index Terms

  1. Related or Duplicate: Distinguishing Similar CQA Questions via Convolutional Neural Networks

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
        June 2018
        1509 pages
        ISBN:9781450356572
        DOI:10.1145/3209978
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 27 June 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. convolutional neural networks
        2. question answering
        3. search quality

        Qualifiers

        • Short-paper

        Conference

        SIGIR '18
        Sponsor:

        Acceptance Rates

        SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
        Overall Acceptance Rate 792 of 3,983 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 01 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Detecting question relatedness in programming Q&A communities via bimodal feature fusionAutomated Software Engineering10.1007/s10515-024-00482-532:1Online publication date: 4-Jan-2025
        • (2022)Exploring Topic Supervision with BERT for Text Matching2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892023(1-7)Online publication date: 18-Jul-2022
        • (2022)TAQS: An Arabic Question Similarity System Using Transfer Learning of BERT With BiLSTMIEEE Access10.1109/ACCESS.2022.319895510(91509-91523)Online publication date: 2022
        • (2022)Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game DevelopersEmpirical Software Engineering10.1007/s10664-022-10256-w28:1Online publication date: 8-Dec-2022
        • (2020)iLinker: a novel approach for issue knowledge acquisition in GitHub projectsWorld Wide Web10.1007/s11280-019-00770-1Online publication date: 27-Jan-2020
        • (2019)Research on the Quality Prediction of Online Chinese Question Answering Community Answers Based on CommentsProceedings of the 2nd International Conference on Big Data Technologies10.1145/3358528.3358592(114-120)Online publication date: 28-Aug-2019
        • (2019)Multi-Level Matching Networks for Text MatchingProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331276(949-952)Online publication date: 18-Jul-2019

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media