Abstract
Distant supervision (DS) aligns relations between name entities from a knowledge base (KB) with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels usually tend to be noisy, when a pair of entity has multiple and/or incomplete relations in a KB. This paper proposes two ranking-based methods to reduce noise and select effective training data for multi-instance multi-label learning (MIML), one of the most popular learning paradigms for distantly supervised relation extraction. Through the proposed methods, training groups that are of low quality are excluded from the training data according to different ranking strategies. Experimental evaluation on the KBP dataset using state-of-the-art MIML algorithms in this community demonstrated that the proposed methods improved the performance significantly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bunescu, R.C., Mooney, R.J.: Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 576–583 (2007)
Bellare, K., McCallum, A.: Learning extractors from unlabeled text using relevant databases. In: Proceedings of the 6th International Workshop on Information Extraction on the Web, pp. 10–15 (2007)
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer L., Weld, D.S.: Knowledge based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 541–550 (2011)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP.2, pp. 1003–1011(2009)
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 148–163 (2010)
Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465 (2012)
Angeli, G., Tibshirani, J., Wu, J.Y., Manning, C.D.: Combining distant and partial supervision for relation extraction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1556–1567 (2014)
Min, B., Grishman, R., Wan, L., Wang, C., Gondek, D.: Distant supervision for relation extraction with an incomplete knowledge base. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 777–782 (2013)
Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, pp. 77–86 (1999)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of the 26th Advances in Neural Information Processing Systems, pp. 2787–2795(2013)
Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 1112–1119 (2014)
Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 2181–2187 (2015)
Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 74–84 (2013)
Fan, M., Zhao, D., Zhou, Q., et al.: Distant supervision for relation extraction with matrix completion. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 839–849 (2014)
Nagesh, A., Haffari, G., Ramakrishna, G.: Noisy or-based model for relation extraction using distant supervision. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1937–1941 (2014)
Intxaurrondo, A., Surdeanu, M., de Lacalle, O.L., Agirre, E.: Removing noisy mentions for distant supervision. Procesamiento del Lenguaje Natural 51, 41–48 (2013)
Xu, W., Hoffmann, R., Zhao, L., Grishman, R.: Filling knowledge base gaps for distant supervision of relation extraction. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 665–670 (2013)
Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 721–729 (2012)
Ritter, A., Zettlemoyer, L., Etzioni, O.: Modeling missing data in distant supervision for information extraction. Trans. Assoc. Comput. Linguist. 1, 367–378 (2013)
Usunier, N., Buffoni, D., Gallinari, P.: Ranking with ordered weighted pairwise classification. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1057–1064 (2009)
Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, vol. 11, pp. 2764–2770 (2011)
Acknowledgement
This work was supported in part by National 863 Program of China (2015AA015405), NSFCs (National Natural Science Foundation of China) (61402128, 61473101, 61173075 and 61272383).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Xiang, Y., Wang, X., Zhang, Y., Qin, Y., Fan, S. (2015). Distant Supervision for Relation Extraction via Group Selection. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-26535-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26534-6
Online ISBN: 978-3-319-26535-3
eBook Packages: Computer ScienceComputer Science (R0)