Efficient question classification and retrieval using category information and word embedding on cQA services

Bae, Kyoungman; Ko, Youngjoong

doi:10.1007/s10844-019-00556-x

Efficient question classification and retrieval using category information and word embedding on cQA services

Published: 11 April 2019

Volume 53, pages 27–49, (2019)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Kyoungman Bae¹ &
Youngjoong Ko²

458 Accesses
6 Citations
Explore all metrics

Abstract

Classifying the task of automatically assigning unlabeled questions into predefined categories (or topics) and effectively retrieving a similar question are crucial aspects of an effective cQA service. We first address the problems associated with estimating and utilizing the distribution of words in each category of word weights. We then apply an automatic expansion word generation technique that is based on our proposed weighting method and the pseudo relevance feedback to question classification. Secondly to address the lexical gap problem in question retrieval, the case frame of the sentence is first defined using the extracted components of a sentence, and a similarity measure based on the case frame and the word embedding is then derived to determine the similarities between two sentences. These similarities are then used to reorder the results of the first retrieval model. Consequently, the proposed methods significantly improve the performance of question classification and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Question retrieval using combined queries in community question answering

Article 24 July 2020

Saquib Khushhal, Abdul Majid, … Saeed Arif Shah

Leveraging Semantic Labeling for Question Matching to Facilitate Question-Answer Archive Reuse

An Efficient Model for Finding and Ranking Related Questions in Community Question Answering Systems

Notes

URL https://kin.naver.com/index.nhn
URL https://answers.yahoo.com/
URL http://answers.google.com/answers/
URL https://zhidao.baidu.com/
It is similar to the spacing words
URL http://kin.naver.com/index.nhn
URL http://webscope.sandbox.yahoo.com/#datasets

References

Bae, K.M., & Ko, T. J. (2014). An effective question expanding method for question classification in cqa services, PIKM ’14: 51–55. https://doi.org/10.1145/2663714.2668050.
Bernhard, D., & Gurevych, I. (2009). Combining lexical semantic resources with question & answer archives for translation-based answer finding, ACL ’09, pp. 728—736. https://doi.org/10.3115/1690219.1690248.
Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation, SIGIR’99, pp. 222–229. https://doi.org/10.1145/312624.312681.
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computaional Linguistics, 19(2), 263–311.
Google Scholar
Bracewell, D. B., Yan, J., Ren, F., Kuroiwa, S. (2009). Category classification and topic discovery of Japanese and English news articles. Electronic Notes in Theoretical Computer Science, 225(2), 51–65. https://doi.org/10.1016/j.entcs.2008.12.066.
Article Google Scholar
Cai, L., Zhou, G., Liu, K., Zhau, J. (2011). Large-Scal question classification in cQA by leveraging Wikipedia semantic knowledge, CIKM ’11, pp. 1321–1330. https://doi.org/10.1145/2063576.2063768.
Cao, G., Gao, J., Robertson, S. (2008). Selecting good expansion terms for pseudo-relevance feedback, SIGIR ’08, pp. 243–250. https://doi.org/10.1145/1390334.1390377.
Cai, L., Zhou, G., Liu, K., Zhao, J. (2012). Learning the latent topics for question retrieval in community QA, ACL’12, pp. 273–281.
Cao, X., Cong, G., Cui, B., Jensen, C. S., Zhang, C. (2009). The use of categorization information in language models for question retrieval, CIKM’09, pp 265–274. https://doi.org/10.1145/1645953.1645989.
Cao, X., Cong, G., Cui, B., Jensen, C. S. (2010). A generalized framework of exploring category information for question retrieval in community question answer archives, WWW’10, pp. 201–210. https://doi.org/10.1145/1772690.1772712.
Duan, H., Cao, Y., Lin, C. Y., Yu, Y. (2008). Searching questions by identifying questions topics and question focus, ACL’08, pp. 156–164.
Elci, A. (2011). Text classification by PNN-based term re-weighting. International Journal of Computer Applications (0975 — 8887), 29(12), 7–13. https://doi.org/10.5120/3701-5188.
Article Google Scholar
Huang, Q., Song, D., Ruger, S. (2008). Robust query-specific pseudo feedback document selection for query expasion, ECIR ’08. LNCS, 4956, 547–554.
Google Scholar
Huang, P., Bu, J. J., Chen, C., Qiu, G. (2007). An effective feature-weighting model for question classification, CIS ’07, pp. 32–36. https://doi.org/10.1109/CIS.2007.12.
Jiang, H., Li, P., Hu, X., Wang, S. (2009). An improved method of term weighting for text classification, ICIS ’09, pp. 294–298. https://doi.org/10.1109/ICICISYS.2009.5357842.
Jehl, L., Hieber, F., Riezler, S. (2012). Twitter translation using translation-based cross-lingual retrieval, WMT ’12, pp. 410—421.
Jeon, J., Croft, W. B., Lee, J. H. (2005). Finding similar questions in large question and answer archives, CIKM ’05, pp. 84—90. https://doi.org/10.1145/1099554.1099572.
Ji, Z., Xu, F., Wang, B., He, B. (2012). Question retrieval with high quality answers in community question answering, CIKM’12, pp. 2471–2474. https://doi.org/10.1145/2661829.2661908.
Karimzadehgan, M., & Zhai, C. X. (2010). Estimation of statistical translation models based on mutual information for ad hoc information retrieval, SIGIR’10, pp. 323–330. https://doi.org/10.1145/1835449.1835505.
Kim, S. H., Ko, Y. J., Oard, D. W. (2015). Combining lexical and statistical translation evidence for cross-language information retrieval. Journal of the American Society for Information Science and Technology, 66(1), 1–17. https://doi.org/10.1002/asi.23153.
Google Scholar
Lee, K. S., Croft, W. B., Allan, J. (2008a). A cluster-based resampling method for pseudo-relevance feedback, SIGIR ’08, pp. 235–242. https://doi.org/10.1145/1390334.1390376.
Lee, Z.S., Maarof, M. A., Selamat, A., Shamsuddin, S. M. (2008b). Enhance term weighting algorithm as feature selection technique for illicit web content classification, ISDA ’08, pp. 145–150. https://doi.org/10.1109/ISDA.2008.171.
Li, R., & Guo, X. (2010). An improved algorithm to term weighting in text classification, ICMT ’10, pp. 1–3. https://doi.org/10.1109/ICMULT.2010.5630962.
Loni, B. (2011). A survey of state-of-the-art methods on question classification, (pp. 1–40). Delft University of Technology: Tech. Rep. http://resolver.tudelft.nl/uuid:8e57caa8-04fc-4fe2-b668-20767ab3db92.
Google Scholar
Magdy, W., & Jones, G. J. F. (2011). A study on query expansion methods for patent retrieval, PaIR ’11, pp. 19–24. https://doi.org/10.1145/2064975.2064982.
Manning, C. D., Raghavan, P., Schutze, H. (2007). An introduction to information retrieval, (pp. 173–1). Cambridge: Cambridge University Press.
MATH Google Scholar
Murdock, V., & Croft, W. B. (2005). A statistical model for sentence retrieval, EMNLP ’05, pp. 684–691.
Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval, SIGIR’98, pp. 275–281. https://doi.org/10.1145/290941.291008.
Quan, X., Liu, W., Bite, Q. (2011). Term weighting schemes for question categorization. Pattern Analysis and Machine Intelligence, 33(5), 1009–1021. https://doi.org/10.1109/TPAMI.2010.154.
Article Google Scholar
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M. (1994). Okapi at trec-3, TREC-3, pp. 109–126.
Robertson, S.E., & Walker, S. (1999). Okapi/Keenbow at TREC-8. In: TREC-8, pp. 151–161. http://trec.nist.gov/pubs/trec8/papers/okapi.pdf.
Ruthven, I., & Lalmas, M. (2003). A survey on the use of relevance feedback for information access systems. The Knowledge Engineering Review, 18(2), 95–145. https://doi.org/10.1017/S0269888903000638.
Article Google Scholar
Salton, G., Wong, A., Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. https://doi.org/10.1145/361219.361220.
Article MATH Google Scholar
Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA, SIGIR ’10, pp. 411–418. https://doi.org/10.1145/1835449.1835518.
Sun, R., Ong, C. H., Chua, T. S. (2006). Mining dependency relations for query expansion in passage retrieval, SIGIR ’06, pp. 382–389. https://doi.org/10.1145/1148170.1148237.
Yang, X., Jones, G. J., Wang, B. (2009). Query dependent pseudo-relevance feedback based on Wikipedia, SIGIR ’09, pp. 59–66. https://doi.org/10.1145/1571941.1571954.
Yu, S., Cai, D., Wen, J. R., Ma, W. Y. (2003). Improving pseudo-relevance feedback in web information retrieval using web page segmentation, WWW ’03, pp. 11–18. https://doi.org/10.1145/775152.775155.
Xue, X., & Croft, W. B. (2008). Retrieval models for question and answer archives, SIGIR ’08, pp. 475–482. https://doi.org/10.1145/1390334.1390416.
Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information System, 22(2), 179–214. https://doi.org/10.1145/984321.984322.
Article Google Scholar
Zhang, K., Wu, W., Wu, H., Li, Z., Zhou, M. (2014). Question retrieval with high quality answers in community question answering, CIKM’14, pp. 371–380. https://doi.org/10.1145/2661829.2661908.

Download references

Acknowledgments

This work was supported by Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2013-2-00131, Development of Knowledge Evolutionary WiseQA Platform Technology for Human Knowledge Augmented Services).

Author information

Authors and Affiliations

Language Intelligence Research Group, Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Yuseong-gu, Daejeon, 305-700, Republic of Korea
Kyoungman Bae
Department of Computer Engineering, Dong-A University 840, Hadan 2-dong, Saha-gu, Busan, 604-714, Republic of Korea
Youngjoong Ko

Authors

Kyoungman Bae
View author publications
You can also search for this author in PubMed Google Scholar
Youngjoong Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youngjoong Ko.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bae, K., Ko, Y. Efficient question classification and retrieval using category information and word embedding on cQA services. J Intell Inf Syst 53, 27–49 (2019). https://doi.org/10.1007/s10844-019-00556-x

Download citation

Received: 04 November 2015
Revised: 18 March 2019
Accepted: 19 March 2019
Published: 11 April 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10844-019-00556-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient question classification and retrieval using category information and word embedding on cQA services

Abstract

Access this article

Similar content being viewed by others

Question retrieval using combined queries in community question answering

Leveraging Semantic Labeling for Question Matching to Facilitate Question-Answer Archive Reuse

An Efficient Model for Finding and Ranking Related Questions in Community Question Answering Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient question classification and retrieval using category information and word embedding on cQA services

Abstract

Access this article

Similar content being viewed by others

Question retrieval using combined queries in community question answering

Leveraging Semantic Labeling for Question Matching to Facilitate Question-Answer Archive Reuse

An Efficient Model for Finding and Ranking Related Questions in Community Question Answering Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation