Skip to main content

Why Text Segment Classification Based on Part of Speech Feature Selection

  • Conference paper
Discovery Science (DS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6332))

Included in the following conference series:

Abstract

The aim of our research is to develop a scalable automatic why question answering system for English based on supervised method that uses part of speech analysis. The prior approach consisted in building a why-classifier using function words. This paper investigates the performance of combining supervised data mining methods with various feature selection strategies in order to obtain a more accurate why classifier.Feature selection was performed a priori on the dataset to extract representative verbs and/or nouns and avoid the dimensionality curse. LogitBoost and SVM were used for the classification process. Three methods of extending the initial ”function words only” approach, to handle context-dependent features, are proposed and experimentally evaluated on various datasets. The first considers function words and context-independent adverbs; the second incorporates selected lemmatized verbs; the third contains selected lemmatized verbs & nouns. Experiments on web-extracted datasets showed that all methods performed better than the baseline, with slightly more reliable results for the third one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AskOxford. How many words are there in the english language?, http://www.askoxford.com

  2. Blanco, N., Castell, E., Moldovan, D.: Causal relation extraction. In: Proceedings of the Sixth International Language Resources and Evaluation, LREC 2008 (2008)

    Google Scholar 

  3. Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)

    Google Scholar 

  4. Girju, R.: Automatic detection of causal relations for question answering. In: Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, pp. 76–83 (2003)

    Google Scholar 

  5. Higashinaka, R., Isozaki, H.: Automatically acquiring causal expression patterns from relation-annotated corpora to improve question answering for why-questions. ACM Transactions on Asian Language Information Processing (TALIP) 7(2), 1–29 (2008)

    Article  Google Scholar 

  6. Khoo, C., Chan, S., Niu, Y.: Extracting causal knowledge from a medical database using graphical patterns. In: In Proceedings of 38th Annual Meeting of the ACL, Hong Kong, pp. 336–343 (2000)

    Google Scholar 

  7. Nagy, I., Tanaka, K., Takiguchi, T., Ariki, Y.: Extracting why text segment from web based on grammar-gram. In: Proceedings of the Fouth Spoken Document Processing Workshop (2010)

    Google Scholar 

  8. Philip, R.: ”Pib” Burns of Academic and Northwestern University Research Technologies. English lemmatizer, http://morphadorner.northwestern.edu/morphadorner/lemmatizer/

  9. Tanaka, T., Takiguchi, K., Ariki, Y.: Automatic why text segment classification and answer extraction by machine learning (japanese). Journal of Information Processing Society 49(6), 2234–2242 (2008)

    Google Scholar 

  10. Tanaka, T., Takiguchi, K., Ariki, Y.: Domain independent why text segment classification and answer extraction by grammar-gram and grammarverb-gram (japanese). WI2, pages pp. 89–94 (2009)

    Google Scholar 

  11. Toutanova, K., Christopher, D.: Manning. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora, pp. 63–70 (2000)

    Google Scholar 

  12. Ustun, W.J., Melssen, B., Buydens, L.M.C.: Facilitating the application of support vector regression by using a universal pearson vii function based kernel. Chemometrics and Intelligent Laboratory Systems 81, 29–40 (2006)

    Article  Google Scholar 

  13. Verberne, S.: Developing an approach for why-question answering. In: EACL 2006: Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 39–46 (2006)

    Google Scholar 

  14. Verberne, S., Boves, L., Oostdijk, N., Coppen, P.-A.: Evaluating discourse-based answer extraction for why-question answering. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 735–736 (2007)

    Google Scholar 

  15. Verberne, S., Boves, L., Oostdijk, N., Coppen, P.-A.: Using syntactic information for improving why-question answering. In: COLING 2008: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 953–960 (2008)

    Google Scholar 

  16. Verberne, S., Boves, L., Oostdijk, N., Coppen, P.-A.: What is not in the bag of words for why-qa? Comput. Linguist. 36(2), 229–245 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nagy, I., Tanaka, K., Ariki, Y. (2010). Why Text Segment Classification Based on Part of Speech Feature Selection. In: Pfahringer, B., Holmes, G., Hoffmann, A. (eds) Discovery Science. DS 2010. Lecture Notes in Computer Science(), vol 6332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16184-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16184-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16183-4

  • Online ISBN: 978-3-642-16184-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics