Why Text Segment Classification Based on Part of Speech Feature Selection

Nagy, Iulia; Tanaka, Katsuyuki; Ariki, Yasuo

doi:10.1007/978-3-642-16184-1_7

Iulia Nagy²²,
Katsuyuki Tanaka²² &
Yasuo Ariki²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6332))

Included in the following conference series:

International Conference on Discovery Science

Abstract

The aim of our research is to develop a scalable automatic why question answering system for English based on supervised method that uses part of speech analysis. The prior approach consisted in building a why-classifier using function words. This paper investigates the performance of combining supervised data mining methods with various feature selection strategies in order to obtain a more accurate why classifier.Feature selection was performed a priori on the dataset to extract representative verbs and/or nouns and avoid the dimensionality curse. LogitBoost and SVM were used for the classification process. Three methods of extending the initial ”function words only” approach, to handle context-dependent features, are proposed and experimentally evaluated on various datasets. The first considers function words and context-independent adverbs; the second incorporates selected lemmatized verbs; the third contains selected lemmatized verbs & nouns. Experiments on web-extracted datasets showed that all methods performed better than the baseline, with slightly more reliable results for the third one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AskOxford. How many words are there in the english language?, http://www.askoxford.com
Blanco, N., Castell, E., Moldovan, D.: Causal relation extraction. In: Proceedings of the Sixth International Language Resources and Evaluation, LREC 2008 (2008)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
Google Scholar
Girju, R.: Automatic detection of causal relations for question answering. In: Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, pp. 76–83 (2003)
Google Scholar
Higashinaka, R., Isozaki, H.: Automatically acquiring causal expression patterns from relation-annotated corpora to improve question answering for why-questions. ACM Transactions on Asian Language Information Processing (TALIP) 7(2), 1–29 (2008)
Article Google Scholar
Khoo, C., Chan, S., Niu, Y.: Extracting causal knowledge from a medical database using graphical patterns. In: In Proceedings of 38th Annual Meeting of the ACL, Hong Kong, pp. 336–343 (2000)
Google Scholar
Nagy, I., Tanaka, K., Takiguchi, T., Ariki, Y.: Extracting why text segment from web based on grammar-gram. In: Proceedings of the Fouth Spoken Document Processing Workshop (2010)
Google Scholar
Philip, R.: ”Pib” Burns of Academic and Northwestern University Research Technologies. English lemmatizer, http://morphadorner.northwestern.edu/morphadorner/lemmatizer/
Tanaka, T., Takiguchi, K., Ariki, Y.: Automatic why text segment classification and answer extraction by machine learning (japanese). Journal of Information Processing Society 49(6), 2234–2242 (2008)
Google Scholar
Tanaka, T., Takiguchi, K., Ariki, Y.: Domain independent why text segment classification and answer extraction by grammar-gram and grammarverb-gram (japanese). WI2, pages pp. 89–94 (2009)
Google Scholar
Toutanova, K., Christopher, D.: Manning. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora, pp. 63–70 (2000)
Google Scholar
Ustun, W.J., Melssen, B., Buydens, L.M.C.: Facilitating the application of support vector regression by using a universal pearson vii function based kernel. Chemometrics and Intelligent Laboratory Systems 81, 29–40 (2006)
Article Google Scholar
Verberne, S.: Developing an approach for why-question answering. In: EACL 2006: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 39–46 (2006)
Google Scholar
Verberne, S., Boves, L., Oostdijk, N., Coppen, P.-A.: Evaluating discourse-based answer extraction for why-question answering. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 735–736 (2007)
Google Scholar
Verberne, S., Boves, L., Oostdijk, N., Coppen, P.-A.: Using syntactic information for improving why-question answering. In: COLING 2008: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 953–960 (2008)
Google Scholar
Verberne, S., Boves, L., Oostdijk, N., Coppen, P.-A.: What is not in the bag of words for why-qa? Comput. Linguist. 36(2), 229–245 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, 657-8501, Japan
Iulia Nagy, Katsuyuki Tanaka & Yasuo Ariki

Authors

Iulia Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Katsuyuki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Yasuo Ariki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Waikato, Hamilton, New Zealand
Bernhard Pfahringer
Department of Computer Science, The University of Waikato, Private Bag 3105, 3240, Hamilton, New Zealand
Geoff Holmes
School of Computer Science and Engineering, The University of New South Wales, 2052, Sydney, Australia
Achim Hoffmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nagy, I., Tanaka, K., Ariki, Y. (2010). Why Text Segment Classification Based on Part of Speech Feature Selection. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds) Discovery Science. DS 2010. Lecture Notes in Computer Science(), vol 6332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16184-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-16184-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16183-4
Online ISBN: 978-3-642-16184-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics