Skip to main content

When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2018)

Abstract

Question Answering Systems is a field of Information Retrieval and Natural Language Processing that automatically answers questions posed by humans in a natural language. One of the main steps of these systems is the Question Classification, where the system tries to identify the type of question (i.e. if it is related to a person, time or a location) facilitate the generation of a precise answer. Machine learning techniques are commonly employed in tasks where the text is represented as a vector of features, such as bag–of–words, Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings. However, the quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training dataset which sometimes is unavailable due to labor-intense of manual annotation of datasets. Normally, word embedding presents a related better performance on small training sets, while bag-of-words and TF-IDF presents better results on large training sets. In this work, we propose a hybrid model that combines TF-IDF and word embedding in order to provide the answer type to text questions using small and large training sets. Our experiments using the Portuguese language, using several different sizes of training sets, showed that the proposed hybrid model statistically outperforms bag-of-words, TF-IDF, and word embedding approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amaral, C., et al.: Priberam’s question answering system in QA@CLEF 2007. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 364–371. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_46

    Chapter  Google Scholar 

  2. Cavalin, P., et al.: Building a question-answering corpus using social media and news articles. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 353–358. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_36

    Chapter  Google Scholar 

  3. Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second harem: advancing the state of the art of named entity recognition in portuguese. In: LREC. Citeseer (2010)

    Google Scholar 

  4. Gonçalves, P.N., Branco, A.H.: A comparative evaluation of QA systems over list questions. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 115–121. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_11

    Chapter  Google Scholar 

  5. Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)

  6. Hovy, E., Hermjakob, U., Ravichandran, D.: A question/answer typology with surface text patterns. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 247–251. Morgan Kaufmann Publishers Inc. (2002)

    Google Scholar 

  7. Huang, Z., Thint, M., Qin, Z.: Question classification using head words and their hypernyms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 927–936. Association for Computational Linguistics (2008)

    Google Scholar 

  8. Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014)

    Google Scholar 

  9. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)

  10. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  11. Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016)

  12. Loni, B.: A survey of state-of-the-art methods on question classification (2011)

    Google Scholar 

  13. Ma, M., Huang, L., Xiang, B., Zhou, B.: Group sparse CNNs for question classification with answer sets. arXiv preprint arXiv:1710.02717 (2017)

  14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  15. Mouriño-García, M., Pérez-Rodríguez, R., Anido-Rifón, L., Gómez-Carballa, M.: Bag-of-concepts document representation for Bayesian text classification. In: 2016 IEEE International Conference on Computer and Information Technology (CIT), pp. 281–288. IEEE (2016)

    Google Scholar 

  16. Nirob, S.M.H., Nayeem, M.K., Islam, M.S.: Question classification using support vector machine with hybrid feature extraction method. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2017)

    Google Scholar 

  17. Santos, D., Rocha, P.: The key to the first CLEF with Portuguese: topics, questions and answers in CHAVE. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 821–832. Springer, Heidelberg (2005). https://doi.org/10.1007/11519645_80

    Chapter  Google Scholar 

  18. dos Santos, H.D., Ulbrich, A.H.D., Woloszyn, V., Vieira, R.: DDC-Outlier: Preventing medication errors using unsupervised learning. IEEE J. Biomed. Health Inform. (2018)

    Google Scholar 

  19. Sarrouti, M., El Alaoui, S.O.: A machine learning-based method for question type classification in biomedical question answering. Methods Inf. Med. 56(03), 209–216 (2017)

    Article  Google Scholar 

  20. Solorio, T., Pérez-Coutiño, M., Montes-y-Gómez, M., Villaseñor-Pineda, L., López-López, A.: Question classification in Spanish and Portuguese. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 612–619. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_66

    Chapter  Google Scholar 

  21. Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 352–357 (2015)

    Google Scholar 

  22. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 90–94. Association for Computational Linguistics (2012)

    Google Scholar 

  23. Woloszyn, V., Machado, G.M., de Oliveira, J.P.M., Wives, L., Saggion, H.: Beatnik: an algorithm to automatic generation of educational description of movies. In: Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), vol. 28, p. 1377 (2017)

    Google Scholar 

  24. Woloszyn, V., Nejdl, W.: Distrustrank: spotting false news domains. In: Proceedings of the 10th ACM Conference on Web Science, pp. 221–228. ACM (2018)

    Google Scholar 

  25. Woloszyn, V., dos Santos, H.D., Wives, L.K., Becker, K.: MRR: an unsupervised algorithm to rank reviews by relevance. In: Proceedings of the International Conference on Web Intelligence, pp. 877–883. ACM (2017)

    Google Scholar 

  26. Xia, W., Zhu, W., Liao, B., Chen, M., Cai, L., Huang, L.: Novel architecture for long short-term memory used in question classification. Neurocomputing 299, 20–31 (2018)

    Article  Google Scholar 

  27. Xu, J., Zhou, Y., Wang, Y.: A classification of questions using SVM and semantic similarity analysis. In: 2012 Sixth International Conference on Internet Computing for Science and Engineering (ICICSE), pp. 31–34. IEEE (2012)

    Google Scholar 

  28. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26–32. ACM (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo G. Cortes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cortes, E.G., Woloszyn, V., Barone, D.A.C. (2018). When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99722-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99721-6

  • Online ISBN: 978-3-319-99722-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics