Skip to main content

Identification of Multi-Focal Questions in Question and Answer Reports

  • Conference paper
  • 1553 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8455))

Abstract

A significant amount of business and scientific data is collected via question and answer reports. However, these reports often suffer from various data quality issues. In many cases, questionnaires contain a number of questions that require multiple answers, which we argue can be a potential source of problems that may lead to poor-quality answers. This paper introduces multi-focal questions and proposes a model for identifying them. The model consists of three phases: question pre-processing, feature engineering and question classification. We use six types of features: lexical/surface features, Part-of-Speech, readability, question structure, wording and placement features, question response type and format features and question focus. A comparative study of three different machine learning algorithms (Bayes Net, Decision Tree and Support Vector Machine) is performed on a dataset of 150 questions obtained from the Carbon Disclosure Project, achieving the accuracy of 91%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blumberg, R., Atre, S.: The problem with unstructured data. DM Review 13, 42–49 (2003)

    Google Scholar 

  2. Marshall, G.: The purpose, design and administration of a questionnaire for data collection. Radiography 11(2), 131–136 (2005)

    Article  Google Scholar 

  3. Fadem, T.J.: The art of asking: ask better questions, get better answers. FT Press (2008)

    Google Scholar 

  4. Leung, W.-C.: How to design a questionnaire. BMJ 9(11), 187–189 (2001)

    Google Scholar 

  5. Huang, P., Bu, J., Chen, C., Qiu, G.: An effective feature-weighting model for question classification. In: Computational Intelligence and Security International Conference, pp. 32–36. IEEE (2007)

    Google Scholar 

  6. Tamura, A., Takamura, H., Okumura, M.: Classification of multiple-sentence questions. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 426–437. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Xiao-Ming, L., Li, L.: Question Classification Based on Focus. In: 2012 International Conference Communication Systems and Network Technologies (CSNT), pp. 512–516. IEEE (2012)

    Google Scholar 

  8. Bos, J.: The “La Sapienza” Question Answering System at TREC-2006. In: Voorhees, E.M., Buckland, L.P. (eds.) The Fifteenth Text RETrieval Conference, Gaitersburg, MD, pp. 797–803 (2006)

    Google Scholar 

  9. Sahin, A., Kulm, G.: Sixth grade mathematics teachers’ intentions and use of probing, guiding, and factual questions. Journal of Mathematics Teacher Education 11(3), 221–241 (2008)

    Article  Google Scholar 

  10. Hagstrom, P.A.: Decomposing questions. PhD dissertation, Massachusetts Institute of Technology (1998)

    Google Scholar 

  11. Isaacs, J., Rawlins, K.: Conditional questions. Journal of Semantics 25(3), 269–319 (2008)

    Article  Google Scholar 

  12. Rubin, A., Babbie, E.R.: Research methods for social work. Cengage Learning (2008)

    Google Scholar 

  13. Voorhees, E.M.: Overview of the TREC 2001 question answering track. In: NIST Special Publication, pp. 42–51 (2002)

    Google Scholar 

  14. Sehgal, A.K., Das, S., Noto, K., Saier, M.K., Elkan, C.: Identifying relevant data for a biological database: Handcrafted rules versus machine learning. IEEE/ACM Transactions Computational Biology and Bioinformatics 8(3), 851–857 (2011)

    Article  Google Scholar 

  15. Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26–32. ACM (2003)

    Google Scholar 

  16. Loni, B., van Tulder, G., Wiggers, P., Tax, D.M.J., Loog, M.: Question classification by weighted combination of lexical, syntactic and semantic features. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 243–250. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Metzler, D., Croft, W.B.: Analysis of statistical question classification for fact-based questions. Information Retrieval 8 3, 481–504 (2005)

    Article  Google Scholar 

  18. Carbon Disclosure Project, https://www.cdproject.net

  19. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Computational Linguistics 34(4), 555–596 (2008)

    Article  Google Scholar 

  20. Murray, P.: Fundamental issues in questionnaire design. Accident and Emergency Nursing 7(3), 148–153 (1999)

    Article  Google Scholar 

  21. TreeTagger - a language independent part-of-speech tagger, http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

  22. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221 (1948)

    Article  Google Scholar 

  23. Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch (1975)

    Google Scholar 

  24. Flesch Reading Ease Readability Score, http://rfptemplates.technologyevaluation.com/readability-scores/flesch-reading-ease-readability-score.html

  25. Flesch, R.F.: How to test readability. Harper (1951)

    Google Scholar 

  26. IBM SPSS Modeler for data and text mining, http://www.01.ibm.com/software/analytics-/spss-/products/modeler/

  27. IBM SPSS Modeler Text Analytics, ftp://public.dhe.ibm.com/software/analytics/spss/doc-umentation/modeler/15.0/en/Users_Guide_For_Text_Analytics.pdf

  28. Nenadié, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 604. ACL (2004)

    Google Scholar 

  29. Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 1. Springer, New York (2006)

    MATH  Google Scholar 

  30. Kantardzic, M.: Data mining: concepts, models, methods, and algorithms. John Wiley & Sons (2011)

    Google Scholar 

  31. Li, D.-C., Fang, Y.-H., Fang, Y.M.: The data complexity index to construct an efficient cross-validation method. Decision Support Systems 50(1), 93–102 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zaki Ali, M.M., Nenadic, G., Theodoulidis, B. (2014). Identification of Multi-Focal Questions in Question and Answer Reports. In: Métais, E., Roche, M., Teisseire, M. (eds) Natural Language Processing and Information Systems. NLDB 2014. Lecture Notes in Computer Science, vol 8455. Springer, Cham. https://doi.org/10.1007/978-3-319-07983-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07983-7_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07982-0

  • Online ISBN: 978-3-319-07983-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics