skip to main content
10.1145/2509558.2509571acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

A survey of noise reduction methods for distant supervision

Published: 27 October 2013 Publication History

Abstract

We survey recent approaches to noise reduction in distant supervision learning for relation extraction. We group them according to the principles they are based on: at-least-one constraints, topic-based models, or pattern correlations. Besides describing them, we illustrate the fundamental differences and attempt to give an outlook to potentially fruitful further research. In addition, we identify related work in sentiment analysis which could profit from approaches to noise reduction.

References

[1]
E. Alfonseca, K. Filippova, J.-Y. Delort, and G. Garrido. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 54--59. Association for Computational Linguistics, 2012.
[2]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In International Joint Conference on Artificial Intelligence, 2007.
[3]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Annual Meeting-Association For Computational Linguistics, volume 45, page 440, 2007.
[4]
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In IN IJCAI, pages 1034--1041, 2005.
[5]
A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pages 1--12, 2009.
[6]
A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362--370. Association for Computational Linguistics, 2009.
[7]
R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 541--550, 2011.
[8]
H. Ji and R. Grishman. Knowledge base population: Successful approaches and challenges. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 1, pages 1148--1158, 2011.
[9]
W. Lin, R. Yangarber, and R. Grishman. Bootstrapped learning of semantic classes from positive and negative examples. In Proceedings of ICML-2003 Workshop on The Continuum from Labeled to Unlabeled Data, volume 1, page 21, 2003.
[10]
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 1003--1011. Association for Computational Linguistics, 2009.
[11]
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 79--86. Association for Computational Linguistics, 2002.
[12]
M. Purver and S. Battersby. Experimenting with distant supervision for emotion classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 482--491. Association for Computational Linguistics, 2012.
[13]
S. Riedel, L. Yao, and A. McCallum. Modeling relations and their mentions without labeled text. In Machine Learning and Knowledge Discovery in Databases, pages 148--163. Springer, 2010.
[14]
B. Roth and D. Klakow. Feature-based models for improving the quality of noisy training data for relation extraction. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM). ACM, 2013.
[15]
M. Surdeanu, J. Tibshirani, R. Nallapati, and C. D. Manning. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 455--465. Association for Computational Linguistics, 2012.
[16]
J. Suttles and N. Ide. Distant supervision for emotion classification with discrete binary values. In Computational Linguistics and Intelligent Text Processing, pages 121--136. Springer, 2013.
[17]
S. Takamatsu, I. Sato, and H. Nakagawa. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, ACL '12, pages 721--729, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
[18]
L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labelled data. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1013--1023. Association for Computational Linguistics, 2010.

Cited By

View all
  • (2024)Few-Shot Relation Extraction With Dual Graph Neural Network InteractionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.327893835:10(14396-14408)Online publication date: Oct-2024
  • (2024)Triple-D: Denoising Distant Supervision for High-Quality Data Creation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00151(1874-1887)Online publication date: 13-May-2024
  • (2023)Web MiningMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_20(447-467)Online publication date: 26-Feb-2023
  • Show More Cited By

Index Terms

  1. A survey of noise reduction methods for distant supervision

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AKBC '13: Proceedings of the 2013 workshop on Automated knowledge base construction
    October 2013
    124 pages
    ISBN:9781450324113
    DOI:10.1145/2509558
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distant supervision
    2. machine learning
    3. relation extraction

    Qualifiers

    • Poster

    Conference

    CIKM'13
    Sponsor:

    Acceptance Rates

    AKBC '13 Paper Acceptance Rate 9 of 19 submissions, 47%;
    Overall Acceptance Rate 9 of 19 submissions, 47%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Few-Shot Relation Extraction With Dual Graph Neural Network InteractionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.327893835:10(14396-14408)Online publication date: Oct-2024
    • (2024)Triple-D: Denoising Distant Supervision for High-Quality Data Creation2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00151(1874-1887)Online publication date: 13-May-2024
    • (2023)Web MiningMachine Learning for Data Science Handbook10.1007/978-3-031-24628-9_20(447-467)Online publication date: 26-Feb-2023
    • (2022)An event-based automatic annotation method for datasets of interpersonal relation extractionApplied Intelligence10.1007/s10489-022-03547-853:3(2629-2639)Online publication date: 11-May-2022
    • (2021)A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature WaveIEEE Access10.1109/ACCESS.2021.31309569(160721-160757)Online publication date: 2021
    • (2021)Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in SpanishCognitive Computation10.1007/s12559-020-09800-xOnline publication date: 18-Jan-2021
    • (2021)Improving Open Information Extraction with Distant Supervision LearningNeural Processing Letters10.1007/s11063-021-10548-0Online publication date: 4-Jun-2021
    • (2019)Using distant supervision to augment manually annotated data for relation extractionPLOS ONE10.1371/journal.pone.021691314:7(e0216913)Online publication date: 30-Jul-2019
    • (2018)Relation Extraction Using Distant SupervisionACM Computing Surveys10.1145/324174151:5(1-35)Online publication date: 19-Nov-2018
    • (2018)Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation ExtractionCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191546(1131-1138)Online publication date: 23-Apr-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media