Identification of Multi-Focal Questions in Question and Answer Reports

Zaki Ali, Mona Mohamed; Nenadic, Goran; Theodoulidis, Babis

doi:10.1007/978-3-319-07983-7_17

Mona Mohamed Zaki Ali^18,19,
Goran Nenadic¹⁸ &
Babis Theodoulidis²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8455))

Included in the following conference series:

International Conference on Applications of Natural Language to Data Bases/Information Systems

1579 Accesses

Abstract

A significant amount of business and scientific data is collected via question and answer reports. However, these reports often suffer from various data quality issues. In many cases, questionnaires contain a number of questions that require multiple answers, which we argue can be a potential source of problems that may lead to poor-quality answers. This paper introduces multi-focal questions and proposes a model for identifying them. The model consists of three phases: question pre-processing, feature engineering and question classification. We use six types of features: lexical/surface features, Part-of-Speech, readability, question structure, wording and placement features, question response type and format features and question focus. A comparative study of three different machine learning algorithms (Bayes Net, Decision Tree and Support Vector Machine) is performed on a dataset of 150 questions obtained from the Carbon Disclosure Project, achieving the accuracy of 91%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems

A Systematic Literature Review of Question Answering: Research Trends, Datasets, Methods

Techniques, datasets, evaluation metrics and future directions of a question answering system

Article 22 December 2023

References

Blumberg, R., Atre, S.: The problem with unstructured data. DM Review 13, 42–49 (2003)
Google Scholar
Marshall, G.: The purpose, design and administration of a questionnaire for data collection. Radiography 11(2), 131–136 (2005)
Article Google Scholar
Fadem, T.J.: The art of asking: ask better questions, get better answers. FT Press (2008)
Google Scholar
Leung, W.-C.: How to design a questionnaire. BMJ 9(11), 187–189 (2001)
Google Scholar
Huang, P., Bu, J., Chen, C., Qiu, G.: An effective feature-weighting model for question classification. In: Computational Intelligence and Security International Conference, pp. 32–36. IEEE (2007)
Google Scholar
Tamura, A., Takamura, H., Okumura, M.: Classification of multiple-sentence questions. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 426–437. Springer, Heidelberg (2005)
Chapter Google Scholar
Xiao-Ming, L., Li, L.: Question Classification Based on Focus. In: 2012 International Conference Communication Systems and Network Technologies (CSNT), pp. 512–516. IEEE (2012)
Google Scholar
Bos, J.: The “La Sapienza” Question Answering System at TREC-2006. In: Voorhees, E.M., Buckland, L.P. (eds.) The Fifteenth Text RETrieval Conference, Gaitersburg, MD, pp. 797–803 (2006)
Google Scholar
Sahin, A., Kulm, G.: Sixth grade mathematics teachers’ intentions and use of probing, guiding, and factual questions. Journal of Mathematics Teacher Education 11(3), 221–241 (2008)
Article Google Scholar
Hagstrom, P.A.: Decomposing questions. PhD dissertation, Massachusetts Institute of Technology (1998)
Google Scholar
Isaacs, J., Rawlins, K.: Conditional questions. Journal of Semantics 25(3), 269–319 (2008)
Article Google Scholar
Rubin, A., Babbie, E.R.: Research methods for social work. Cengage Learning (2008)
Google Scholar
Voorhees, E.M.: Overview of the TREC 2001 question answering track. In: NIST Special Publication, pp. 42–51 (2002)
Google Scholar
Sehgal, A.K., Das, S., Noto, K., Saier, M.K., Elkan, C.: Identifying relevant data for a biological database: Handcrafted rules versus machine learning. IEEE/ACM Transactions Computational Biology and Bioinformatics 8(3), 851–857 (2011)
Article Google Scholar
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26–32. ACM (2003)
Google Scholar
Loni, B., van Tulder, G., Wiggers, P., Tax, D.M.J., Loog, M.: Question classification by weighted combination of lexical, syntactic and semantic features. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 243–250. Springer, Heidelberg (2011)
Chapter Google Scholar
Metzler, D., Croft, W.B.: Analysis of statistical question classification for fact-based questions. Information Retrieval 8 3, 481–504 (2005)
Article Google Scholar
Carbon Disclosure Project, https://www.cdproject.net
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4), 555–596 (2008)
Article Google Scholar
Murray, P.: Fundamental issues in questionnaire design. Accident and Emergency Nursing 7(3), 148–153 (1999)
Article Google Scholar
TreeTagger - a language independent part-of-speech tagger, http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221 (1948)
Article Google Scholar
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Flesch Reading Ease Readability Score, http://rfptemplates.technologyevaluation.com/readability-scores/flesch-reading-ease-readability-score.html
Flesch, R.F.: How to test readability. Harper (1951)
Google Scholar
IBM SPSS Modeler for data and text mining, http://www.01.ibm.com/software/analytics-/spss-/products/modeler/
IBM SPSS Modeler Text Analytics, ftp://public.dhe.ibm.com/software/analytics/spss/doc-umentation/modeler/15.0/en/Users_Guide_For_Text_Analytics.pdf
Nenadié, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 604. ACL (2004)
Google Scholar
Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 1. Springer, New York (2006)
MATH Google Scholar
Kantardzic, M.: Data mining: concepts, models, methods, and algorithms. John Wiley & Sons (2011)
Google Scholar
Li, D.-C., Fang, Y.-H., Fang, Y.M.: The data complexity index to construct an efficient cross-validation method. Decision Support Systems 50(1), 93–102 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Manchester, Manchester, UK
Mona Mohamed Zaki Ali & Goran Nenadic
Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
Mona Mohamed Zaki Ali
Manchester Business School, The University of Manchester, Manchester, UK
Babis Theodoulidis

Authors

Mona Mohamed Zaki Ali
View author publications
You can also search for this author in PubMed Google Scholar
Goran Nenadic
View author publications
You can also search for this author in PubMed Google Scholar
Babis Theodoulidis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Computer Science,, 2 rue Conté, 75003, Paris, France
Elisabeth Métais
Cirad, TETIS, 500 rue J.F. Breton, 34093, Montpellier Cedex 5, France
Mathieu Roche
Irstea, TETIS, 500 rue J.F. Breton, 34093, Montpellier Cedex 5, France
Maguelonne Teisseire

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zaki Ali, M.M., Nenadic, G., Theodoulidis, B. (2014). Identification of Multi-Focal Questions in Question and Answer Reports. In: Métais, E., Roche, M., Teisseire, M. (eds) Natural Language Processing and Information Systems. NLDB 2014. Lecture Notes in Computer Science, vol 8455. Springer, Cham. https://doi.org/10.1007/978-3-319-07983-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-07983-7_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07982-0
Online ISBN: 978-3-319-07983-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics