Abstract
Distant supervision is a hotspot in relation extraction research. Instead of relying on annotated text, distant supervision hires a knowledge base as supervision. For each pair of entities that appears in some knowledge base’s relation, this approach find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. The automatic labeling provides a large amount of data, but the data have serious problem. Most features appear only few times in training data, and such insufficient data make these features very susceptible to noise, which will lead to a flawed classifier. In this paper, we propose a method to improve few occurrence features’ performance in distant supervision relation extraction. We present a novel model to calculating the similarity between a feature and an entity pair, and then adjust the entity pair’ features by their similarity. The experiment shows our method boosted the performance of relation extraction.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL 2009), pp. 1003–1011. Association for Computational Linguistics (2009)
Wu, F., Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the 16th ACM International Conference on Information and Knowledge Management (CIKM 2007), pp. 41–50. ACM Press, New York (2007)
Bellare, K., Mccallum, A.: Learning extractors from unlabeled text using relevant databases. In: Proceedings of the Sixth International Workshop on Information Integration on the Web (IIWeb 2007), in Conjunction with AAAI 2007, pp. 10–16. AAAI Press, Vancouver (2007)
Hoffmann, R., Zhang, C., Weld, D.S.: Learning 5000 relational extractors. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 286–295. Association for Computational Linguistics, Stroudsburg (2010)
Riedel, S., Yao, L., McCallum, A.: Modeling Relations and Their Mentions without Labeled Text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010)
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining (2010)
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.: Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 541–550. Association for Computational Linguistics (2011)
Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 721–729. Association for Computational Linguistics (2012)
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2004)
Freebase data dumps, http://download.freebase.com/datadumps/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Zhao, Y. (2013). Improving Few Occurrence Feature Performance in Distant Supervision for Relation Extraction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)