When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

Cortes, Eduardo G.; Woloszyn, Vinicius; Barone, Dante A. C.

doi:10.1007/978-3-319-99722-3_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11122))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

986 Accesses

Abstract

Question Answering Systems is a field of Information Retrieval and Natural Language Processing that automatically answers questions posed by humans in a natural language. One of the main steps of these systems is the Question Classification, where the system tries to identify the type of question (i.e. if it is related to a person, time or a location) facilitate the generation of a precise answer. Machine learning techniques are commonly employed in tasks where the text is represented as a vector of features, such as bag–of–words, Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings. However, the quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training dataset which sometimes is unavailable due to labor-intense of manual annotation of datasets. Normally, word embedding presents a related better performance on small training sets, while bag-of-words and TF-IDF presents better results on large training sets. In this work, we propose a hybrid model that combines TF-IDF and word embedding in order to provide the answer type to text questions using small and large training sets. Our experiments using the Portuguese language, using several different sizes of training sets, showed that the proposed hybrid model statistically outperforms bag-of-words, TF-IDF, and word embedding approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amaral, C., et al.: Priberam’s question answering system in QA@CLEF 2007. In: Peters, C., et al. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 364–371. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85760-0_46
Chapter Google Scholar
Cavalin, P., et al.: Building a question-answering corpus using social media and news articles. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 353–358. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_36
Chapter Google Scholar
Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second harem: advancing the state of the art of named entity recognition in portuguese. In: LREC. Citeseer (2010)
Google Scholar
Gonçalves, P.N., Branco, A.H.: A comparative evaluation of QA systems over list questions. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds.) PROPOR 2016. LNCS (LNAI), vol. 9727, pp. 115–121. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41552-9_11
Chapter Google Scholar
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S.: Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025 (2017)
Hovy, E., Hermjakob, U., Ravichandran, D.: A question/answer typology with surface text patterns. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 247–251. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Huang, Z., Thint, M., Qin, Z.: Question classification using head words and their hypernyms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 927–936. Association for Computational Linguistics (2008)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016)
Loni, B.: A survey of state-of-the-art methods on question classification (2011)
Google Scholar
Ma, M., Huang, L., Xiang, B., Zhou, B.: Group sparse CNNs for question classification with answer sets. arXiv preprint arXiv:1710.02717 (2017)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mouriño-García, M., Pérez-Rodríguez, R., Anido-Rifón, L., Gómez-Carballa, M.: Bag-of-concepts document representation for Bayesian text classification. In: 2016 IEEE International Conference on Computer and Information Technology (CIT), pp. 281–288. IEEE (2016)
Google Scholar
Nirob, S.M.H., Nayeem, M.K., Islam, M.S.: Question classification using support vector machine with hybrid feature extraction method. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–6. IEEE (2017)
Google Scholar
Santos, D., Rocha, P.: The key to the first CLEF with Portuguese: topics, questions and answers in CHAVE. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 821–832. Springer, Heidelberg (2005). https://doi.org/10.1007/11519645_80
Chapter Google Scholar
dos Santos, H.D., Ulbrich, A.H.D., Woloszyn, V., Vieira, R.: DDC-Outlier: Preventing medication errors using unsupervised learning. IEEE J. Biomed. Health Inform. (2018)
Google Scholar
Sarrouti, M., El Alaoui, S.O.: A machine learning-based method for question type classification in biomedical question answering. Methods Inf. Med. 56(03), 209–216 (2017)
Article Google Scholar
Solorio, T., Pérez-Coutiño, M., Montes-y-Gómez, M., Villaseñor-Pineda, L., López-López, A.: Question classification in Spanish and Portuguese. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 612–619. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_66
Chapter Google Scholar
Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 352–357 (2015)
Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 90–94. Association for Computational Linguistics (2012)
Google Scholar
Woloszyn, V., Machado, G.M., de Oliveira, J.P.M., Wives, L., Saggion, H.: Beatnik: an algorithm to automatic generation of educational description of movies. In: Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), vol. 28, p. 1377 (2017)
Google Scholar
Woloszyn, V., Nejdl, W.: Distrustrank: spotting false news domains. In: Proceedings of the 10th ACM Conference on Web Science, pp. 221–228. ACM (2018)
Google Scholar
Woloszyn, V., dos Santos, H.D., Wives, L.K., Becker, K.: MRR: an unsupervised algorithm to rank reviews by relevance. In: Proceedings of the International Conference on Web Intelligence, pp. 877–883. ACM (2017)
Google Scholar
Xia, W., Zhu, W., Liao, B., Chen, M., Cai, L., Huang, L.: Novel architecture for long short-term memory used in question classification. Neurocomputing 299, 20–31 (2018)
Article Google Scholar
Xu, J., Zhou, Y., Wang, Y.: A classification of questions using SVM and semantic similarity analysis. In: 2012 Sixth International Conference on Internet Computing for Science and Engineering (ICICSE), pp. 31–34. IEEE (2012)
Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26–32. ACM (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

PPGC, Institute of Informatics, Federal University of Rio Grande Do Sul (UFRGS), Caixa Postal 15.064, Porto Alegre, RS, 91.501-970, Brazil
Eduardo G. Cortes, Vinicius Woloszyn & Dante A. C. Barone

Authors

Eduardo G. Cortes
View author publications
You can also search for this author in PubMed Google Scholar
Vinicius Woloszyn
View author publications
You can also search for this author in PubMed Google Scholar
Dante A. C. Barone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eduardo G. Cortes .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Aline Villavicencio
Instituto de Informática - UFRGS, Porto Alegre, Brazil
Viviane Moreira
INESC-ID, Lisbon, Portugal
Alberto Abad
UFSCAR, Sao Carlos, Brazil
Helena Caseli
Centro Singular de Investigación en Tecnoloxías, Universidade de Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Pablo Gamallo
Université de Toulon, Parc Scientifique Technologique Luminy, Marseille, France
Carlos Ramisch
Centro de Informática e Sistemas, Universidade de Coimbra, Coimbra, Portugal
Hugo Gonçalo Oliveira
Federal University of Technology, Dois Vizinhos, Paraná, Brazil
Gustavo Henrique Paetzold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cortes, E.G., Woloszyn, V., Barone, D.A.C. (2018). When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems. In: Villavicencio, A., et al. Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science(), vol 11122. Springer, Cham. https://doi.org/10.1007/978-3-319-99722-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-99722-3_14
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99721-6
Online ISBN: 978-3-319-99722-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics