Abstract
Supervised machine learning methods have been widely used in relation extraction to find the relation between two named entities in a sentence. However, the disadvantages of supervised machine learning methods are that constructing the training data set is costly and time-consuming, and the machine learning system is ultimately dependent on the specific domain of the training data. To overcome these disadvantages, we propose a two-step relation extraction model with distant supervision. The two-step model consists of a one-class model and a multi-class model. The one-class model selects positive sentences from input sentences and the multi-class model classifies the positive sentences into specific classes. In the experiments, the proposed model showed good F1-measures (62.9 % in the auto-labeled test data, 63.8 % in the gold-labeled test data), although it does not use any human-labeled training data.
Similar content being viewed by others
References
Kwon AR, Lee KS (2013) Opinion bias detection based on social opinions for Twitter. J Inf Process Syst 9:538–547
Hsueh HY, Chen CN, Huang KF (2013) Generating metadata from web documents: a systematic approach. Hum-Centric Comput Inf Sci 3:1–17. doi:10.1186/2192-1962-3-7
Ko M, Choi W (2013) A distributional inference for cross-lingual undefined entity linking. J Converg 4:23–28
IBM Waston Website. http://www.ibm.com/smarterplanet/us/en/ibmwatson/. Accessed 8 Sept 2015
Apple Siri Website. http://www.apple.com/ios/siri/. Accessed 8 Sept 2015
Culotta A, Sorensen J (2004) Dependency tree kernels for relation extraction. In: Proceedings of the 42nd annual meeting on association for computational linguistics, vol 432. doi:10.3115/1218955.1219009
Bunescu RC, Mooney RJ (2005) A shortest path dependency kernel for relation extraction. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, pp 724–731. doi:10.3115/1220575.1220666
Zhang M, Zhang J, Su J (2006) Exploring syntactic features for relation extraction using a convolution tree kernel. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics, pp 288–295: doi:10.3115/1220835.1220872
Zhou GD, Zhang M, Ji DH, Zhu QM (2007) Tree kernel-based relation extraction with context-sensitive structured parse tree information. In: Proceedings of EMNLP-CoNLL, pp 728–736
Choi M, Kim H (2013) Social relation extraction from texts using a support-vector-machine-based dependency trigram kernel. Inf Process Manag 49:303–311. doi:10.1016/j.ipm.2012.04.002
NIST (2007) The NIST ACE evaluation website. http://www.nist.gov/speech/tests/ace. Accessed 8 Jan 2015
Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: Proceedings of the 47th annual meeting of the association for computational linguistics, pp 1003–1011
Chrupala G, Momtazi S, Wiegand M, Kazalski S, Xu F, Roth B, Balahur A, Klakow D (2010) Saarland university spoken language systems at the slot filling task of TAC KBP 2010. In: Proceedings of TAC 2010 workshop
Pershina M, Min B, Xu W, Grishman R (2014) Infusion of labeled data into distant supervision for relation extraction. In: Proceedings of the 52nd annual meeting on association for computational linguistics, pp 732–738
Snow R, Jurafsky D, Ng AY (2004) Learning syntactic patterns for automatic hypernym discovery. In: Proceedings of advances in neural information processing systems, vol 17, pp 1297–1304
Ngai G, Florian R (2001) Transformation-based learning in the fast lane. In: Proceedings of the second meeting of the North American chapter of the association for computational linguistics on language technologies, pp 1–8. doi:10.3115/1073336.1073342
OpenNLP Website. https://opennlp.apache.org/. Accessed 8 Jan 2015
Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp 41–47. doi:10.3115/1073083.1073092
Wikipedia Website. http://en.wikipedia.org/wiki/Tf-idf. Accessed 8 Jan 2015
Salton G, Fox EA, Wu H (1983) Extended Boolean information retrieval. Commun ACM 26:1022–1036. doi:10.1145/182.358466
Milidiu R, Santos C, Duarte J, Renteria R (2006) Semi-supervised learning for Portuguese noun phrase extraction. Comput Process Port Lang 200–203. doi:10.1007/11751984_21
DBpedia Ontology 3.9 Website. http://wiki.dbpedia.org/Downloads39. Accessed 8 Jan 2015
Manevitz LM, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2:139–154
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2013R1A1A4A01005074). This research was also supported by LG Electronics.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Choi, M., Lee, Hg. & Kim, H. Relation extraction based on two-step classification with distant supervision. J Supercomput 72, 2609–2622 (2016). https://doi.org/10.1007/s11227-015-1535-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1535-4