Abstract
One of the dominant problems facing Named Entity Recognition is that when a system trained on one domain is applied to a different domain, a substantial drop in performance is frequently observed. In this paper, we apply active learning strategies to domain adaptation for named entity recognition systems and show that adaptive learning combining the source and target domains is more effective than non-adaptive learning directly from the target domain. Active learning aims to minimize labeling effort by selecting the most informative instances to label. We investigate several sample selection techniques such as Maximum Entropy and Smallest Margin and apply them to the ACE corpus. Our results show that the labeling cost can be reduced by over 92 % without degrading the performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Upper Saddle River (2009)
Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004 (2004)
Becker, M., Hachey, B., Alex, B., Grove, C.: Optimising selective sampling for bootstrapping named entity recognition. In: Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, Bonn, Germany (2005)
Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, pp. 1137–1144, August 2008
Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079, Honolulu, October 2008
Xiao, M., Guo, Y.: Domain adaptation for sequence labeling tasks with a probabilistic language adaptation. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA (2013)
Rai, P., Saha, A., Daume III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, Los Angeles, California, pp. 27–32, June 2010
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, pp. 1002–1012, 9–11 October 2010
Li, L., Jin, X., Pan, S.J., Sun, J.: Multi-domain active learning for text classification. In: KDD 2012, Beijing, China, 12–16 August 2012
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia (2006)
Brown, P.F., Pietra, V.J.D., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)
Schein, A.I., Ungar, L.H.: Active learning for logistic regression: an evaluation. Mach. Learn. 68, 235–265 (2007)
Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)
Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2015)
Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. (2015)
Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. Commun. E98-B(1), 190–200 (2015)
Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl. 75(4), 1947–1962 (2016)
Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: Proceedings of KONVENS 2012 (Main Track: Oral Presentations), Vienna, 20 September 2012
He, Y., Grishman, R.: ICE: rapid information extraction customization for NLP novices. In: Proceedings of NAACL-HLT 2015, Denver, Colorado, pp. 31–35, May 31–June 5 2015
Chen, B., Shu, H., Coatrieux, G., Chen, G., Sun, X., Coatrieux, J.: Color image analysis by quaternion-type moments. J. Math. Imag. Vis. 51(1), 124–144 (2015)
Fu, L., Grishman, R.: An efficient active learning framework for new relation types. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, October 2013
Cao, K., Li, X., Fan, M., Grishman, R.: Improving event detection with active learning. In: Proceedings of Recent Advances in Natural Language Processing (RANLP) (2015)
Nguyen, T., Plank, B., Grishman, R.: Semantic representations for domain adaptation: a case study on the tree Kernel-based method for relation extraction. In: Proceedings of 53rd Annual Meeting Association for Computational Linguistics (ACL) (2015)
Tjong, E.F., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, CONLL 2003, vol. 4, pp. 142–147 (2003)
Xia, Z., Wang, X., Sun, X., Wang, B.: Steganalysis of least significant bit matching using multi-order differences. Secur. Commun. Netw. 7(8), 1283–1291 (2014)
Sun, H., Mcintosh, S.: Big data mobile services for New York city taxi riders and drivers. In: 2016 IEEE International Conference on Mobile Services, San Francisco (to appear)
Li, J., Li, X., Yang, B., Sun, X.: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10(3), 507–518 (2015)
Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. (2016)
Sun, A., Grishman, R.: Cross-domain bootstrapping for named entity recognition. In: Proceedings of SIGIR 2011 Workshop on Entity-Oriented Search (EOS) (2015)
Sun, H., Grishman, R., Wang, Y.: Active learning based named entity recognition and its application in natural language coverless information hiding. J. Internet Technol. (to appear)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Sun, H., Grishman, R., Wang, Y. (2016). Domain Adaptation with Active Learning for Named Entity Recognition. In: Sun, X., Liu, A., Chao, HC., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2016. Lecture Notes in Computer Science(), vol 10040. Springer, Cham. https://doi.org/10.1007/978-3-319-48674-1_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-48674-1_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48673-4
Online ISBN: 978-3-319-48674-1
eBook Packages: Computer ScienceComputer Science (R0)