Skip to main content

Domain Adaptation with Active Learning for Named Entity Recognition

  • Conference paper
  • First Online:
Cloud Computing and Security (ICCCS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10040))

Included in the following conference series:

Abstract

One of the dominant problems facing Named Entity Recognition is that when a system trained on one domain is applied to a different domain, a substantial drop in performance is frequently observed. In this paper, we apply active learning strategies to domain adaptation for named entity recognition systems and show that adaptive learning combining the source and target domains is more effective than non-adaptive learning directly from the target domain. Active learning aims to minimize labeling effort by selecting the most informative instances to label. We investigate several sample selection techniques such as Maximum Entropy and Smallest Margin and apply them to the ACE corpus. Our results show that the labeling cost can be reduced by over 92 % without degrading the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Upper Saddle River (2009)

    Google Scholar 

  2. Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL 2004 (2004)

    Google Scholar 

  3. Becker, M., Hachey, B., Alex, B., Grove, C.: Optimising selective sampling for bootstrapping named entity recognition. In: Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, Bonn, Germany (2005)

    Google Scholar 

  4. Zhu, J., Wang, H., Yao, T., Tsou, B.K.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, pp. 1137–1144, August 2008

    Google Scholar 

  5. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079, Honolulu, October 2008

    Google Scholar 

  6. Xiao, M., Guo, Y.: Domain adaptation for sequence labeling tasks with a probabilistic language adaptation. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA (2013)

    Google Scholar 

  7. Rai, P., Saha, A., Daume III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, Los Angeles, California, pp. 27–32, June 2010

    Google Scholar 

  8. Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, pp. 1002–1012, 9–11 October 2010

    Google Scholar 

  9. Li, L., Jin, X., Pan, S.J., Sun, J.: Multi-domain active learning for text classification. In: KDD 2012, Beijing, China, 12–16 August 2012

    Google Scholar 

  10. Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 Multilingual Training Corpus. Linguistic Data Consortium, Philadelphia (2006)

    Google Scholar 

  11. Brown, P.F., Pietra, V.J.D., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)

    Google Scholar 

  12. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 384–394 (2010)

    Google Scholar 

  13. Schein, A.I., Ungar, L.H.: Active learning for logistic regression: an evaluation. Mach. Learn. 68, 235–265 (2007)

    Article  Google Scholar 

  14. Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)

    Article  Google Scholar 

  15. Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2015)

    Article  MathSciNet  Google Scholar 

  16. Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. (2015)

    Google Scholar 

  17. Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. Commun. E98-B(1), 190–200 (2015)

    Google Scholar 

  18. Xia, Z., Wang, X., Sun, X., Liu, Q., Xiong, N.: Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl. 75(4), 1947–1962 (2016)

    Article  Google Scholar 

  19. Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: Proceedings of KONVENS 2012 (Main Track: Oral Presentations), Vienna, 20 September 2012

    Google Scholar 

  20. He, Y., Grishman, R.: ICE: rapid information extraction customization for NLP novices. In: Proceedings of NAACL-HLT 2015, Denver, Colorado, pp. 31–35, May 31–June 5 2015

    Google Scholar 

  21. Chen, B., Shu, H., Coatrieux, G., Chen, G., Sun, X., Coatrieux, J.: Color image analysis by quaternion-type moments. J. Math. Imag. Vis. 51(1), 124–144 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fu, L., Grishman, R.: An efficient active learning framework for new relation types. In: International Joint Conference on Natural Language Processing, Nagoya, Japan, October 2013

    Google Scholar 

  23. Cao, K., Li, X., Fan, M., Grishman, R.: Improving event detection with active learning. In: Proceedings of Recent Advances in Natural Language Processing (RANLP) (2015)

    Google Scholar 

  24. Nguyen, T., Plank, B., Grishman, R.: Semantic representations for domain adaptation: a case study on the tree Kernel-based method for relation extraction. In: Proceedings of 53rd Annual Meeting Association for Computational Linguistics (ACL) (2015)

    Google Scholar 

  25. Tjong, E.F., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL, CONLL 2003, vol. 4, pp. 142–147 (2003)

    Google Scholar 

  26. Xia, Z., Wang, X., Sun, X., Wang, B.: Steganalysis of least significant bit matching using multi-order differences. Secur. Commun. Netw. 7(8), 1283–1291 (2014)

    Article  Google Scholar 

  27. Sun, H., Mcintosh, S.: Big data mobile services for New York city taxi riders and drivers. In: 2016 IEEE International Conference on Mobile Services, San Francisco (to appear)

    Google Scholar 

  28. Li, J., Li, X., Yang, B., Sun, X.: Segmentation-based image copy-move forgery detection scheme. IEEE Trans. Inf. Forensics Secur. 10(3), 507–518 (2015)

    Article  Google Scholar 

  29. Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. (2016)

    Google Scholar 

  30. Sun, A., Grishman, R.: Cross-domain bootstrapping for named entity recognition. In: Proceedings of SIGIR 2011 Workshop on Entity-Oriented Search (EOS) (2015)

    Google Scholar 

  31. Sun, H., Grishman, R., Wang, Y.: Active learning based named entity recognition and its application in natural language coverless information hiding. J. Internet Technol. (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiyu Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Sun, H., Grishman, R., Wang, Y. (2016). Domain Adaptation with Active Learning for Named Entity Recognition. In: Sun, X., Liu, A., Chao, HC., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2016. Lecture Notes in Computer Science(), vol 10040. Springer, Cham. https://doi.org/10.1007/978-3-319-48674-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48674-1_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48673-4

  • Online ISBN: 978-3-319-48674-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics