research-article

Crosslingual distant supervision for extracting relations of different complexity

Authors:

Andre Blessing,

Hinrich SchützeAuthors Info & Claims

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 1123 - 1132

https://doi.org/10.1145/2396761.2398411

Published: 29 October 2012 Publication History

Abstract

We propose crosslingual distant supervision (crosslingual DS) for relation extraction, an approach that automatically extracts labels from a pivot language for labeling one or more target languages. The approach has two benefits compared to standard DS: (i) increased coverage if target language labels are not available; and (ii) higher accuracy of automatically generated labels because noisy labels are eliminated in crosslingual filtering. An evaluation for two relations of different complexity shows that crosslingual DS increases the accuracy of relation extraction. Our approach is language independent; we successfully apply it to four different languages: Chinese, English, French and German.

References

[1]

S. F. Adafre and M. de Rijke. Finding similar sentences across multiple languages in Wikipedia. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 62--69, 2006.

[2]

E. Adar, M. Skinner, and D. S. Weld. Information arbitrage across multi-lingual Wikipedia. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 94--103, 2009.

Digital Library

[3]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives. DBpedia: A nucleus for a web of open data. In Proceedings of the 6th International Semantic Web Conference, pages 11--15, 2007.

Digital Library

[4]

M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 28--36, 2008.

[5]

A. Blessing and H. Schütze. Fine-grained geographical relation extraction from Wikipedia. In 7th international Conference on Language Resources and Evaluation, pages 2949--2952, 2010.

[6]

A. Blessing and H. Schütze. Self-annotation for fine-grained geospatial relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 80--88, 2010.

Digital Library

[7]

B. Bohnet. Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 89--97, 2010.

Digital Library

[8]

K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1247--1250, 2008.

Digital Library

[9]

R. C. Bunescu. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 576--583, 2007.

[10]

P. Cimiano, S. Handschuh, and S. Staab. Towards the self-annotating web. In Proceedings of the 13th International Conference on World Wide Web, pages 462--471, 2004.

Digital Library

[11]

M. Craven, J. Kumlien, et al. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77--86, 1999.

Digital Library

[12]

A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 296--303, 2006.

Digital Library

[13]

G. de Melo and G. Weikum. Menta: Inducing multilingual taxonomies from wikipedia. In Proceedings of the 19th ACM international Conference on Information and knowledge management, pages 1099--1108, 2010.

Digital Library

[14]

C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 213--220, 2008.

Digital Library

[15]

D. Ferrucci and A. Lally. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10:327--348, 2004.

Digital Library

[16]

U. Hermjakob, K. Knight, and H. Daumé III. Name translation in statistical machine translation: Learning when to transliterate. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: HLT, pages 389--397, 2008.

[17]

R. Hoffmann, C. Zhang, and D. S. Weld. Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 286--295, 2010.

Digital Library

[18]

H. Ji, R. Grishman, H. T. Dang, K. Griffitt, and J. Ellis. An overview of the TAC2010 knowledge base population track. In Proceedings of the Third Text Analytics Conference, 2010.

[19]

R. S. Z. Kaljahi. Adapting self-training for semantic role labeling. In Proceedings of the ACL 2010 Student Research Workshop, pages 91--96, 2010.

Digital Library

[20]

S. Kim, M. Jeong, J. Lee, and G. G. Lee. A cross-lingual annotation projection approach for relation detection. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 564--571, 2010.

Digital Library

[21]

W. Liao and S. Veeramachaneni. A simple semi-supervised algorithm for named entity recognition. In Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, pages 58--65, 2009.

Digital Library

[22]

A. K. McCallum. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.

[23]

M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics, pages 1003--1011, 2009.

Digital Library

[24]

T.-V. T. Nguyen and A. Moschitti. End-to-end relation extraction using distant supervision from external semantic repositories. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 277--282, 2011.

Digital Library

[25]

E. Noreen. Computer-intensive methods for testing hypotheses: An introduction. Wiley, 1989.

[26]

P. V. Ogren, P. G. Wetzler, and S. Bethard. ClearTK: A UIMA toolkit for statistical natural language processing. In Proceedings of the Workshop on Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP, pages 32--38, 2008.

[27]

S. Padó. User's guide totextttsigf: Significance testing by approximate randomisation, 2006.

[28]

K. Parton, K. McKeown, B. Coyne, M. T. Diab, R. Grishman, D. Hakkani-Tür, M. P. Harper, H. Ji, W. Y. Ma, A. Meyers, S. Stolbach, A. Sun, G. Tür, W. Xu, and S. Yaman. Who, What, When, Where, Why? Comparing multiple approaches to the cross-lingual 5W task. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, pages 423--431, 2009.

Digital Library

[29]

B. M. Pateman and C. Johnson. Using the Wikipedia link structure to correct the Wikipedia link structure. In Proceedings of the 2nd Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, pages 10--18, 2010.

[30]

C. Peters, editor. Cross-Language Information Retrieval and Evaluation, Workshop of Cross-Language Evaluation Forum, Lecture Notes in Computer Science. Springer, 2001.

[31]

S. Riedel, L. Yao, and A. McCallum. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine learning and Knowledge Discovery in Databases: Part III, pages 148--163, 2010.

Digital Library

[32]

H. Schmid. Improvements in part-of-speech tagging with an application to German. In Proceedings of the the Association for Computational Linguistics SIGDAT-Workshop, pages 47--50, 1995.

[33]

C. Silberer, W. Wentland, J. Knopp, and M. Hartung. Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In 6th international Conference on Language Resources and Evaluation, pages 3230--3237, 2008.

[34]

R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. In Advances in Neural Information Processing Systems, pages 1297--1304, 2004.

Digital Library

[35]

F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In 16th international World Wide Web Conference, pages 697--706, 2007.

Digital Library

[36]

G. Wang, Y. Yu, and H. Zhu. PORE: Positive-only relation extraction from Wikipedia text. In Proceedings of the 6th International Semantic Web Conference / 2nd Asian Semantic Web Conference, pages 580--594, 2007.

Digital Library

[37]

F. Wu and D. S. Weld. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010.

Digital Library

[38]

F. Xu, H. Uszkoreit, and H. Li. A seed-driven bottom-up machine learning framework for extracting relations of various complexity. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 584--591, 2007.

[39]

L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1013--1023, 2010.

Digital Library

[40]

T. Zesch, C. Müller, and I. Gurevych. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the 6th International Conference on Language Resources and Evaluation, pages 60--66, 2008.

[41]

G. Zhou, J. Su, J. Zhang, and M. Zhang. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 427--434, 2005.

Digital Library

Cited By

Li RYang CLi TSu S(2022)MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation ExtractionACM Transactions on Information Systems10.1145/350391740:4(1-32)Online publication date: 11-Jan-2022
https://dl.acm.org/doi/10.1145/3503917
Gutiererz FDou DFickas SGriffiths G(2014)Online Reasoning for Ontology-Based Error Detection in TextOn the Move to Meaningful Internet Systems: OTM 2014 Conferences10.1007/978-3-662-45563-0_34(562-579)Online publication date: 2014
https://doi.org/10.1007/978-3-662-45563-0_34

Index Terms

Crosslingual distant supervision for extracting relations of different complexity
1. Computing methodologies
  1. Machine learning
    1. Learning settings

Recommendations

Distant supervision for relation extraction with hierarchical attention-based networks
Abstract
Distant supervision employs external knowledge bases to automatically label corpora. The labeled sentences in a corpus are usually packaged and trained for relation extraction using a multi-instance learning paradigm. The automated ...
Highlights
- Propose a novel hierarchical attention-based networks for relation extraction.
- ...
Extracting meronyms for a biology knowledge base using distant supervision
AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction

Knowledge of objects and their parts, meronym relations, are at the heart of many question-answering systems, but manually encoding these facts is impractical. Past researchers have tried hand-written patterns, supervised learning, and bootstrapped ...
Distant Supervision for Relation Extraction via Group Selection
ICONIP 2015: Proceeings, Part II, of the 22nd International Conference on Neural Information Processing - Volume 9490

Distant supervision DS aligns relations between name entities from a knowledge base KB with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

October 2012

2840 pages

ISBN:9781450311564

DOI:10.1145/2396761

General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'12

Sponsor:

CIKM'12: 21st ACM International Conference on Information and Knowledge Management

October 29 - November 2, 2012

Hawaii, Maui, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
271
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li RYang CLi TSu S(2022)MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation ExtractionACM Transactions on Information Systems10.1145/350391740:4(1-32)Online publication date: 11-Jan-2022
https://dl.acm.org/doi/10.1145/3503917
Gutiererz FDou DFickas SGriffiths G(2014)Online Reasoning for Ontology-Based Error Detection in TextOn the Move to Meaningful Internet Systems: OTM 2014 Conferences10.1007/978-3-662-45563-0_34(562-579)Online publication date: 2014
https://doi.org/10.1007/978-3-662-45563-0_34

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten