Towards Domain Independent Why Text Segment Classification Based on Bag of Function Words

Tanaka, Katsuyuki; Takiguchi, Tetsuya; Ariki, Yasuo

doi:10.1007/978-3-642-35101-3_40

Katsuyuki Tanaka²¹,
Tetsuya Takiguchi²¹ &
Yasuo Ariki²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7691))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

3408 Accesses
1 Citations

Abstract

Increased attention has been focused on question answering (QA) technology as next generation search since it improves the usability of information acquisition from web. However, not much research has been conducted on “non-factoid-QA”, especially on Why Question Answering (Why-QA). In this paper, we introduce a machine learning approach to automatically construct a classifier with function words as features to perform Why Text Segments Classification (WTS classification) by using SVM. It is a process of detecting text segments describing “reasons-causes” and is a subtask of Why-QA mainly related to an answer extraction part. We argue that function words are a strong discriminator for WTS classification. Furthermore, since function words appear in almost all text segments regardless of the domain of the topic, it also enables construction of a domain independent classifier. The experimental results showed significant improvement over state-of-the-art results in terms of accuracy of WTS classification as well as domain independent capability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. Machine Learning, 148–156 (1996)
Google Scholar
Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Technical Report, Stanford University (1998)
Google Scholar
Radev, D., Fan, W., Qi, H., Wu, H., Grewal, A.: Probabilistic question answering on the web. In: WWW, pp. 408–419 (2002)
Google Scholar
Tanaka, K., Takiguchi, T., Ariki, Y.: Automatic Why Text Segment Classification and Answer Extraction by Machine Learning. IPSJ Journal 49(6), 57–64 (2008) (Japanese)
Google Scholar
Higashinaka, R., Isozaki, H.: Automatically Acquiring Causal Expression Patterns from Relation-annotated Corpora to Improve Question Answering for why-Questions. TALIP 7, 1–29 (2008)
Article Google Scholar
Yin, L.A.: Two-Stage Approach to Retrieving Answers for How-To Questions. In: EACL 2006, pp. 63–70 (2006)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Kwok, C.C.T., Etzioni, O., Weld, D.S.: Scaling Question and Answering to the Web. In: WWW, pp. 150–161 (2002)
Google Scholar
Lin, J., Katz, B.: Question answering from the web using knowledge annotation and knowledge mining techniques. In: CIKM, pp. 116–123 (2003)
Google Scholar
Platt, J.C.: Fast Training of Support Vector Machines using Sequential Minimal Optimization, pp. 185–208. MIT Press (1999)
Google Scholar
Nagy, I., Tanaka, K., Ariki, Y.: Why Text Segment Classification Based on Part of Speech Feature Selection. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds.) DS 2010. LNCS, vol. 6332, pp. 87–101. Springer, Heidelberg (2010)
Chapter Google Scholar
Matsumoto, Y.: Morphological Analysis System Chasen. IPSJ 41(11), 1208–1214 (2000) (Japanese)
Google Scholar
Mizuno, J., Akiba, T., Fujii, A., Itou, K.: Non-factoid Question Answering Experiments at NTCIR-6: Towards Answer Type Detection for Realworld Questions. In: The Sixth NTCIR Workshop, pp. 487–492 (2007)
Google Scholar
Ishioroshi, M., Sato, M., Mori, T.: Answering Any Class of Japanese Non-factoid Question by Using the Web and Example Q&A Pairs from a Social Q&A Website. In: WAIIT, pp. 59–65 (2008)
Google Scholar
Cortes, C., Vapnik, V.: Support Vector Networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Shibusawa, U., Hayashi, T., Onai, R.: Development and Evaluation of a System for Extracting Answers of a ”Why” Type Question from the WEB. IPSJ Journal 48(3), 1512–1523 (2007) (Japanese)
Google Scholar
Soricut, R., Brill, E.: Automatic Question Answering: Beyond the Factoid. In: HLT/NAACL, pp. 54–64 (2004)
Google Scholar
Srihari, R., Li, W.: Information Extraction Supported Question Answering. In: TREC, pp. 185–196 (1999)
Google Scholar
Verberne, S., Boves, L., Oostdijk, N., Coppen, P.A.J.M.: What is not in the Bag of Words for Why-QA? Comput. Linguist. 36(2), 229–245 (2010)
Google Scholar
Verberne, S., Boves, L., Oostdijk, N.H.J., Coppen, P.A.J.M.: Evaluating Discourse-based Extraction for Why-Question Answering. In: SIGIR, pp. 735–737 (2007)
Google Scholar
Verberne, S., Boves, L., Oostdijk, N., Coppen, P.: Using Syntactic Information for Improving Why-Question Answering. In: COLING, pp. 953–960 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Kobe University, 1-1 Rokkodai, Nada, Kobe, 657-8501, Japan
Katsuyuki Tanaka, Tetsuya Takiguchi & Yasuo Ariki

Authors

Katsuyuki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Takiguchi
View author publications
You can also search for this author in PubMed Google Scholar
Yasuo Ariki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, 2052, Sydney, NSW, Australia
Michael Thielscher
School of Computing and Mathematics, University of Western Sydney, 1797, Penrith South DC, NSW, Australia
Dongmo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanaka, K., Takiguchi, T., Ariki, Y. (2012). Towards Domain Independent Why Text Segment Classification Based on Bag of Function Words. In: Thielscher, M., Zhang, D. (eds) AI 2012: Advances in Artificial Intelligence. AI 2012. Lecture Notes in Computer Science(), vol 7691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35101-3_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-35101-3_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35100-6
Online ISBN: 978-3-642-35101-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics