Automatic Discovery of Attribute Words from Web Documents

Tokunaga, Kosuke; Kazama, Jun’ichi; Torisawa, Kentaro

doi:10.1007/11562214_10

Automatic Discovery of Attribute Words from Web Documents

Kosuke Tokunaga²²,
Jun’ichi Kazama²² &
Kentaro Torisawa²²

Conference paper

1576 Accesses
22 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Abstract

We propose a method of acquiring attribute words for a wide range of objects from Japanese Web documents. The method is a simple unsupervised method that utilizes the statistics of words, lexico-syntactic patterns, and HTML tags. To evaluate the attribute words, we also establish criteria and a procedure based on question-answerability about the candidate word.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yoshida, M.: Extracting attributes and their values from web pages. In: Proc. of the ACL 2002 Student Research Workshop, pp. 72–77 (2002)
Google Scholar
Yoshida, M., Torisawa, K., Tsujii, J.: Integrating tables on the world wide web. Transactions of the Japanese Society for Artificial Intelligence 19, 548–560 (2004)
Article Google Scholar
Fleischman, M., Hovy, E., Echihabi, A.: Offline strategies for online question answering: Answering questions before they are asked. In: Dignum, F.P.M. (ed.) ACL 2003, pp. 1–7 (2003)
Google Scholar
Almuhareb, A., Poesio, M.: Attribute-Based and Value-Based Clustering: An Evaluation. In: Proc. of EMNLP 2004, pp. 158–165 (2004)
Google Scholar
Fellbaum, C. (ed.): WordNet: An electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of COLING 1992, pp. 539–545 (1992)
Google Scholar
Berland, M., Charniak, E.: Finding parts in very large corpora. In: Proc. of ACL 1999 (1999)
Google Scholar
Takahashi, T., Inui, K., Matsumoto, Y.: Automatic extraction of attribute relations from text (in Japanese). IPSJ, SIG-NLP. NL-164, 19–24 (2004)
Google Scholar
Guarino, N.: Concepts, attributes and arbitrary relations: some linguistic and ontological criteria for structuring knowledge base. Data and Knowledge Engineering, 249–261 (1992)
Google Scholar
Pustejovsky, J.: The Generative Lexicon. The MIT Press, Cambridge (1995)
Google Scholar
Woods, W.A.: What’s in a Link: Foundations for Semantic Networks. In: Representation and Understanding: Studies in Cognitive Science. Academic Press, London (1975)
Google Scholar
Kurohashi, S., Nagao, M.: Japanese morphological analysis system JUMAN version 3.61 manual (1999)
Google Scholar
Kanayama, H., Torisawa, K., Mitsuishi, Y., Tsujii, J.: A hybrid Japanese parser with hand-crafted grammar and statistics. In: Proc. of COLING 2000, pp. 411–417 (2000)
Google Scholar
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from web documents. In: Proc. of HLT-NAACL 2004, pp. 73–80 (2004)
Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorial data. Biometrics 33, 159–174 (1977)
Article MATH MathSciNet Google Scholar
Yoshida, M., Torisawa, K., Tsujii, J.: Extracting Attributes and Their Values from Web Pages. In: Web Document Analysis. Ch. 10 World Scientific, Singapore (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology (JAIST), Asahidai 1-1, Nomi, Ishikawa, 923-1292, Japan
Kosuke Tokunaga, Jun’ichi Kazama & Kentaro Torisawa

Authors

Kosuke Tokunaga
View author publications
You can also search for this author in PubMed Google Scholar
Jun’ichi Kazama
View author publications
You can also search for this author in PubMed Google Scholar
Kentaro Torisawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Language Technology, Macquarie University, 2019, Sydney, NSW, Australia
Robert Dale
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
Institute for Infocomm Research, 21, Heng Mui Keng Terrace, 119613, Singapore
Jian Su
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tokunaga, K., Kazama, J., Torisawa, K. (2005). Automatic Discovery of Attribute Words from Web Documents. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_10

Download citation

DOI: https://doi.org/10.1007/11562214_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics