Skip to main content
Log in

Efficient question classification and retrieval using category information and word embedding on cQA services

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Classifying the task of automatically assigning unlabeled questions into predefined categories (or topics) and effectively retrieving a similar question are crucial aspects of an effective cQA service. We first address the problems associated with estimating and utilizing the distribution of words in each category of word weights. We then apply an automatic expansion word generation technique that is based on our proposed weighting method and the pseudo relevance feedback to question classification. Secondly to address the lexical gap problem in question retrieval, the case frame of the sentence is first defined using the extracted components of a sentence, and a similarity measure based on the case frame and the word embedding is then derived to determine the similarities between two sentences. These similarities are then used to reorder the results of the first retrieval model. Consequently, the proposed methods significantly improve the performance of question classification and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. URL https://kin.naver.com/index.nhn

  2. URL https://answers.yahoo.com/

  3. URL http://answers.google.com/answers/

  4. URL https://zhidao.baidu.com/

  5. It is similar to the spacing words

  6. URL http://kin.naver.com/index.nhn

  7. URL http://webscope.sandbox.yahoo.com/#datasets

References

Download references

Acknowledgments

This work was supported by Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2013-2-00131, Development of Knowledge Evolutionary WiseQA Platform Technology for Human Knowledge Augmented Services).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youngjoong Ko.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bae, K., Ko, Y. Efficient question classification and retrieval using category information and word embedding on cQA services. J Intell Inf Syst 53, 27–49 (2019). https://doi.org/10.1007/s10844-019-00556-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-019-00556-x

Keywords

Navigation