Skip to main content

Automatic Discovery of Attribute Words from Web Documents

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Abstract

We propose a method of acquiring attribute words for a wide range of objects from Japanese Web documents. The method is a simple unsupervised method that utilizes the statistics of words, lexico-syntactic patterns, and HTML tags. To evaluate the attribute words, we also establish criteria and a procedure based on question-answerability about the candidate word.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yoshida, M.: Extracting attributes and their values from web pages. In: Proc. of the ACL 2002 Student Research Workshop, pp. 72–77 (2002)

    Google Scholar 

  2. Yoshida, M., Torisawa, K., Tsujii, J.: Integrating tables on the world wide web. Transactions of the Japanese Society for Artificial Intelligence 19, 548–560 (2004)

    Article  Google Scholar 

  3. Fleischman, M., Hovy, E., Echihabi, A.: Offline strategies for online question answering: Answering questions before they are asked. In: Dignum, F.P.M. (ed.) ACL 2003, pp. 1–7 (2003)

    Google Scholar 

  4. Almuhareb, A., Poesio, M.: Attribute-Based and Value-Based Clustering: An Evaluation. In: Proc. of EMNLP 2004, pp. 158–165 (2004)

    Google Scholar 

  5. Fellbaum, C. (ed.): WordNet: An electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  6. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of COLING 1992, pp. 539–545 (1992)

    Google Scholar 

  7. Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proc. of ACL 1999 (1999)

    Google Scholar 

  8. Takahashi, T., Inui, K., Matsumoto, Y.: Automatic extraction of attribute relations from text (in Japanese). IPSJ, SIG-NLP. NL-164, 19–24 (2004)

    Google Scholar 

  9. Guarino, N.: Concepts, attributes and arbitrary relations: some linguistic and ontological criteria for structuring knowledge base. Data and Knowledge Engineering, 249–261 (1992)

    Google Scholar 

  10. Pustejovsky, J.: The Generative Lexicon. The MIT Press, Cambridge (1995)

    Google Scholar 

  11. Woods, W.A.: What’s in a Link: Foundations for Semantic Networks. In: Representation and Understanding: Studies in Cognitive Science. Academic Press, London (1975)

    Google Scholar 

  12. Kurohashi, S., Nagao, M.: Japanese morphological analysis system JUMAN version 3.61 manual (1999)

    Google Scholar 

  13. Kanayama, H., Torisawa, K., Mitsuishi, Y., Tsujii, J.: A hybrid Japanese parser with hand-crafted grammar and statistics. In: Proc. of COLING 2000, pp. 411–417 (2000)

    Google Scholar 

  14. Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from web documents. In: Proc. of HLT-NAACL 2004, pp. 73–80 (2004)

    Google Scholar 

  15. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorial data. Biometrics 33, 159–174 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  16. Yoshida, M., Torisawa, K., Tsujii, J.: Extracting Attributes and Their Values from Web Pages. In: Web Document Analysis. Ch. 10 World Scientific, Singapore (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tokunaga, K., Kazama, J., Torisawa, K. (2005). Automatic Discovery of Attribute Words from Web Documents. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_10

Download citation

  • DOI: https://doi.org/10.1007/11562214_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics