skip to main content
10.1145/2396761.2398411acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Crosslingual distant supervision for extracting relations of different complexity

Published: 29 October 2012 Publication History

Abstract

We propose crosslingual distant supervision (crosslingual DS) for relation extraction, an approach that automatically extracts labels from a pivot language for labeling one or more target languages. The approach has two benefits compared to standard DS: (i) increased coverage if target language labels are not available; and (ii) higher accuracy of automatically generated labels because noisy labels are eliminated in crosslingual filtering. An evaluation for two relations of different complexity shows that crosslingual DS increases the accuracy of relation extraction. Our approach is language independent; we successfully apply it to four different languages: Chinese, English, French and German.

References

[1]
S. F. Adafre and M. de Rijke. Finding similar sentences across multiple languages in Wikipedia. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pages 62--69, 2006.
[2]
E. Adar, M. Skinner, and D. S. Weld. Information arbitrage across multi-lingual Wikipedia. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 94--103, 2009.
[3]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, and Z. Ives. DBpedia: A nucleus for a web of open data. In Proceedings of the 6th International Semantic Web Conference, pages 11--15, 2007.
[4]
M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 28--36, 2008.
[5]
A. Blessing and H. Schütze. Fine-grained geographical relation extraction from Wikipedia. In 7th international Conference on Language Resources and Evaluation, pages 2949--2952, 2010.
[6]
A. Blessing and H. Schütze. Self-annotation for fine-grained geospatial relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 80--88, 2010.
[7]
B. Bohnet. Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 89--97, 2010.
[8]
K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1247--1250, 2008.
[9]
R. C. Bunescu. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 576--583, 2007.
[10]
P. Cimiano, S. Handschuh, and S. Staab. Towards the self-annotating web. In Proceedings of the 13th International Conference on World Wide Web, pages 462--471, 2004.
[11]
M. Craven, J. Kumlien, et al. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77--86, 1999.
[12]
A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 296--303, 2006.
[13]
G. de Melo and G. Weikum. Menta: Inducing multilingual taxonomies from wikipedia. In Proceedings of the 19th ACM international Conference on Information and knowledge management, pages 1099--1108, 2010.
[14]
C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 213--220, 2008.
[15]
D. Ferrucci and A. Lally. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10:327--348, 2004.
[16]
U. Hermjakob, K. Knight, and H. Daumé III. Name translation in statistical machine translation: Learning when to transliterate. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: HLT, pages 389--397, 2008.
[17]
R. Hoffmann, C. Zhang, and D. S. Weld. Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 286--295, 2010.
[18]
H. Ji, R. Grishman, H. T. Dang, K. Griffitt, and J. Ellis. An overview of the TAC2010 knowledge base population track. In Proceedings of the Third Text Analytics Conference, 2010.
[19]
R. S. Z. Kaljahi. Adapting self-training for semantic role labeling. In Proceedings of the ACL 2010 Student Research Workshop, pages 91--96, 2010.
[20]
S. Kim, M. Jeong, J. Lee, and G. G. Lee. A cross-lingual annotation projection approach for relation detection. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 564--571, 2010.
[21]
W. Liao and S. Veeramachaneni. A simple semi-supervised algorithm for named entity recognition. In Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, pages 58--65, 2009.
[22]
A. K. McCallum. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
[23]
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics, pages 1003--1011, 2009.
[24]
T.-V. T. Nguyen and A. Moschitti. End-to-end relation extraction using distant supervision from external semantic repositories. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 277--282, 2011.
[25]
E. Noreen. Computer-intensive methods for testing hypotheses: An introduction. Wiley, 1989.
[26]
P. V. Ogren, P. G. Wetzler, and S. Bethard. ClearTK: A UIMA toolkit for statistical natural language processing. In Proceedings of the Workshop on Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP, pages 32--38, 2008.
[27]
S. Padó. User's guide totextttsigf: Significance testing by approximate randomisation, 2006.
[28]
K. Parton, K. McKeown, B. Coyne, M. T. Diab, R. Grishman, D. Hakkani-Tür, M. P. Harper, H. Ji, W. Y. Ma, A. Meyers, S. Stolbach, A. Sun, G. Tür, W. Xu, and S. Yaman. Who, What, When, Where, Why? Comparing multiple approaches to the cross-lingual 5W task. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, pages 423--431, 2009.
[29]
B. M. Pateman and C. Johnson. Using the Wikipedia link structure to correct the Wikipedia link structure. In Proceedings of the 2nd Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources, pages 10--18, 2010.
[30]
C. Peters, editor. Cross-Language Information Retrieval and Evaluation, Workshop of Cross-Language Evaluation Forum, Lecture Notes in Computer Science. Springer, 2001.
[31]
S. Riedel, L. Yao, and A. McCallum. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine learning and Knowledge Discovery in Databases: Part III, pages 148--163, 2010.
[32]
H. Schmid. Improvements in part-of-speech tagging with an application to German. In Proceedings of the the Association for Computational Linguistics SIGDAT-Workshop, pages 47--50, 1995.
[33]
C. Silberer, W. Wentland, J. Knopp, and M. Hartung. Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In 6th international Conference on Language Resources and Evaluation, pages 3230--3237, 2008.
[34]
R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. In Advances in Neural Information Processing Systems, pages 1297--1304, 2004.
[35]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In 16th international World Wide Web Conference, pages 697--706, 2007.
[36]
G. Wang, Y. Yu, and H. Zhu. PORE: Positive-only relation extraction from Wikipedia text. In Proceedings of the 6th International Semantic Web Conference / 2nd Asian Semantic Web Conference, pages 580--594, 2007.
[37]
F. Wu and D. S. Weld. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010.
[38]
F. Xu, H. Uszkoreit, and H. Li. A seed-driven bottom-up machine learning framework for extracting relations of various complexity. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 584--591, 2007.
[39]
L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1013--1023, 2010.
[40]
T. Zesch, C. Müller, and I. Gurevych. Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the 6th International Conference on Language Resources and Evaluation, pages 60--66, 2008.
[41]
G. Zhou, J. Su, J. Zhang, and M. Zhang. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 427--434, 2005.

Cited By

View all
  • (2022)MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation ExtractionACM Transactions on Information Systems10.1145/350391740:4(1-32)Online publication date: 11-Jan-2022
  • (2014)Online Reasoning for Ontology-Based Error Detection in TextOn the Move to Meaningful Internet Systems: OTM 2014 Conferences10.1007/978-3-662-45563-0_34(562-579)Online publication date: 2014

Index Terms

  1. Crosslingual distant supervision for extracting relations of different complexity

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crosslingual distant supervision
    2. relation extraction

    Qualifiers

    • Research-article

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)MiDTD: A Simple and Effective Distillation Framework for Distantly Supervised Relation ExtractionACM Transactions on Information Systems10.1145/350391740:4(1-32)Online publication date: 11-Jan-2022
    • (2014)Online Reasoning for Ontology-Based Error Detection in TextOn the Move to Meaningful Internet Systems: OTM 2014 Conferences10.1007/978-3-662-45563-0_34(562-579)Online publication date: 2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media