Abstract
One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a given set of propositions as a source of supervision. We propose a different approach: we infer weakly supervised examples for relations from statistical relational models learned by using knowledge outside the natural language task. We argue that this deep distant supervision creates more robust examples that are particularly useful when learning the entire model (the structure and parameters). We demonstrate on several domains that this form of weak supervision yields superior results when learning structure compared to using distant supervision labels or a smaller set of labels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The ratio is actually a log-odds of their weights. We refer to the book for more details [4].
- 3.
With probabilistic training examples, it can be shown that minimizing the KL-divergence between the examples and the current model gives true \(probability - \) predicted probability as the gradient. This has the similar effect of pushing the predicted probabilities closer to the true probabilities.
References
Bell, B., Koren, Y., Volinsky, C.: The bellkor solution to the netflix grand prize (2009)
Craven, M., Kumlien, J.: Constructing biological knowledge bases by extracting information from text sources. In: ISMB (1999)
Devlin, S., Kudenko, D., Grzes, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(2), 251–278 (2011)
Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for AI. Morgan & Claypool, San Rafael (2009)
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: ACL (2011)
Kersting, K., Driessens, K.: Non-parametric policy gradients: a unified treatment of propositional and relational domains. In: ICML (2008)
Khot, T., Natarajan, S., Kersting, K., Shavlik, J.: Learning Markov logic networks via functional gradient boosting. In: ICDM (2011)
Kim, J., Ohta, T., Pyysalo, S., Kano, Y., Tsujii, J.: Overview of BioNLP’09 shared task on event extraction. In: BioNLP Workshop Companion Volume for Shared Task (2009)
Kuhlmann, G., Stone, P., Mooney, R.J., Shavlik, J.W.: Guiding a reinforcement learner with natural language advice: initial results in robocup soccer. In: AAAI Workshop on Supervisory Control of Learning and Adaptive Systems (2004)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL and AFNLP (2009)
Natarajan, S., Kersting, K., Khot, T., Shavlik, J.: Boosted Statistical Relational Learners: From Benchmarks to Data-Driven Medicine. SpringerBriefs in Computer Science. Springer, Heidelberg (2015)
Natarajan, S., Khot, T., Kersting, K., Guttmann, B., Shavlik, J.: Gradient-based boosting for statistical relational learning: the relational dependency network case. Mach. Learn. 86(1), 25–56 (2012)
Natarajan, S., Picado, J., Khot, T., Kersting, K., Re, C., Shavlik, J.: Effectively creating weakly labeled training examples via approximate domain knowledge. In: Davis, J., Ramon, J. (eds.) ILP 2014. LNCS, vol. 9046, pp. 92–107. Springer, Heidelberg (2015). doi:10.1007/978-3-319-23708-4_7
Neville, J., Jensen, D.: Relational dependency networks. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning, pp. 653–692. MIT Press, Cambridge (2007)
Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS. PVLDB 4(6), 373–384 (2011)
Poon, H., Vanderwende, L.: Joint inference for knowledge extraction from biomedical literature. In: NAACL (2010)
Raghavan, S., Mooney, R.: Online inference-rule learning from natural-language extractions. In: International Workshop on Statistical Relational AI (2013)
Riedel, S., Chun, H., Takagi, T., Tsujii, J.: A Markov logic approach to bio-molecular event extraction. In: BioNLP (2009)
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 148–163. Springer, Heidelberg (2010)
Sorower, S., Dietterich, T., Doppa, J., Orr, W., Tadepalli, P., Fern, X.: Inverting Grice’s maxims to learn rules from natural language extractions. In: NIPS, pp. 1053–1061 (2011)
Surdeanu, M., Ciaramita, M.: Robust information extraction with perceptrons. In: NIST ACE (2007)
Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.: Multi-instance multi-label learning for relation extraction. In: EMNLP-CoNLL (2012)
Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: ACL (2012)
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Transfer learning via advice taking. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds.) Advances in Machine Learning I. SCI, vol. 262, pp. 147–170. Springer, Heidelberg (2010)
Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Katz, G., Pustejovsky, J.: SemEval-2007 task 15: TempEval temporal relation identification. In: SemEval (2007)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
Yoshikawa, K., Riedel, S., Asahara, M., Matsumoto, Y.: Jointly identifying temporal relations with Markov logic. In: ACL and AFNLP (2009)
Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: ACL (2005)
Acknowledgements
Sriraam Natarajan, Anurag Wazalwar and Dileep Viswanathan gratefully acknowledge the support of the DARPA Machine Reading Program and DEFT Program under the Air Force Research Laboratory (AFRL) prime contract nos. FA8750-09-C-0181 and FA8750-13-2-0039 respectively. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government. Kristian Kersting was supported by the Fraunhofer ATTRACT fellowship STREAM and by the European Commission under contract number FP7-248258-First-MM.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Natarajan, S., Soni, A., Wazalwar, A., Viswanathan, D., Kersting, K. (2016). Deep Distant Supervision: Learning Statistical Relational Models for Weak Supervision in Natural Language Extraction. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-41706-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41705-9
Online ISBN: 978-3-319-41706-6
eBook Packages: Computer ScienceComputer Science (R0)