Skip to main content

Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering

  • Conference paper
Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estimate the similarity between parse trees more efficiently. Then, the similarity between parse trees is used in a hierarchical clustering algorithm to group entity pairs into different clusters. Finally, each cluster is labeled by an indicative word and unreliable clusters are pruned out. Evaluation on the New York Times (1995) corpus shows that our method outperforms the only previous work by 5 in F-measure. It also shows that our method performs well on both high-frequent and less-frequent entity pairs. To the best of our knowledge, this is the first work to use a tree similarity metric in relation clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MUC (1987-1998) The nist MUC website, http://www.itl.nist.gov/iaui/894.02/related_projects/muc/

  2. Miller, S., Fox, H., Ramshaw, L., Weischedel, R.: A novel use of statistical parsing to extract information from text. In: Proceedings of NAACL-2000 (2000)

    Google Scholar 

  3. Zelenko, D., Aone, C., Richardella, A.: Kernel Methods for Relation Extraction. Journal of Machine Learning Research 2003(2), 1083–1106 (2003)

    Article  MathSciNet  Google Scholar 

  4. Culotta, A., Sorensen, J.: Dependency Tree Kernel for Relation Extraction. In: Proceeding of ACL-2004 (2004)

    Google Scholar 

  5. Kambhatla, N.: Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations. In: Proceeding of ACL-2004, Poster paper (2004)

    Google Scholar 

  6. Agichtein, E., Gravano, L.: Snow-ball: Extracting Relations from Large Plain-text Collections. In: Proceedings of the Fifth ACM International Conference on Digital Libraries (2000)

    Google Scholar 

  7. Stevenson, M.: An Unsupervised WordNet-based Algorithm for Relation Extraction. In: Proceedings of the 4th LREC workshop Beyond Named Entity: Semantic Labeling for NLP tasks (2004)

    Google Scholar 

  8. Hasegawa, T., Sekine, S., Grishman, R.: Discovering Relations among Named Entities from Large Corpora. In: Proceeding of ACL-2004 (2004)

    Google Scholar 

  9. Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)

    MATH  Google Scholar 

  10. Collins, M., Duffy, N.: Convolution Kernels for Natural Language. In: Proceeding of NIPS-2001 (2001)

    Google Scholar 

  11. Collins, M., Duffy, N.: New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. In: Proceeding of ACL-2002 (2002)

    Google Scholar 

  12. Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report UCS-CRL-99-10, University of California (1999)

    Google Scholar 

  13. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernel. Journal of Machine Learning Research 2002(2), 419–444 (2002)

    Article  Google Scholar 

  14. Suzuki, J., Hirao, T., Sasaki, Y., Maeda, E.: Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data. In: Proceedings of ACL-2003 (2003)

    Google Scholar 

  15. Suzuki, J., Isozaki, H., Maeda, E.: Convolution Kernels with Feature Selection for Natural Language Processing Tasks. In: Proceedings of ACL-2004 (2003)

    Google Scholar 

  16. Moschitti, A.: A study on Convolution Kernels for Shallow Semantic Parsing. In: Proceedings of ACL-2004 (2004)

    Google Scholar 

  17. Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing, pp. 500–527. The MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  18. Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Ph.D. Thesis. University of Pennsylvania (1999)

    Google Scholar 

  19. Fellbaum, C.: WordNet: An Electronic Lexical Database and some of its Applications. MIT Press, Cambridge (1998)

    Google Scholar 

  20. Sekine, S.: OAK System (English Sentence Analysis) (2001), http://nlp.cs.nyu.edu/oak

  21. Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: Proceedings of LREC-2002 (2002)

    Google Scholar 

  22. ACE. The Automatic Content Extraction (ACE) Projects (2004), http://www.ldc.upenn.edu/Projects/ACE/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, M., Su, J., Wang, D., Zhou, G., Tan, C.L. (2005). Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_34

Download citation

  • DOI: https://doi.org/10.1007/11562214_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics