Skip to main content
Log in

Transfer learning for fine-grained entity typing

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Fine-grained entity typing (FGET) is to classify the mentions of entities into hierarchical fine-grained semantic types. There are two main issues with existing FGET approaches. Firstly, the process of training corpora for FGET is normally to label the data automatically, which inevitably induces noises. Existing approaches either directly tweak noisy labels in corpora by heuristics or algorithmically retreat to parental types, both leading to coarse-grained type labels instead of fine-grained ones. Secondly, existing approaches usually use recurrent neural networks to generate feature representations of mention phrases and their contexts, which, however, perform relatively poor on long contexts and out-of-vocabulary (OOV) words. In this paper, we propose a transfer learning-based approach to extract more efficient feature representations and offset label noises. More precisely, we adopt three transfer learning schemes: (i) transferring sub-word embeddings to generate more efficient OOV embeddings; (ii) using a pre-trained language model to generate more efficient context features; (iii) using a pre-trained topic model to transfer the topic-type relatedness through topic anchors and select confusing fine-grained types at inference time. The pre-trained topic model can offset the label noises without retreating to coarse-grained types. The experimental results demonstrate the effectiveness of our transfer learning approach for FGET.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In this paper, we use capitalized italic prints to denote level 1 type labels (e.g., /Person), non-capitalized italic prints to denote level 2 and 3 types (e.g., /Person/artist). The symbol “/” is used to represent the hierarchical relationship.

  2. https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits.

  3. Some other datasets are not publicly available.

  4. https://github.com/shimaokasonse/NFGEC.

  5. https://www.tensorflow.org/guide/estimators.

References

  1. Abhishek A, Anand A, Awekar A (2017) Fine-grained entity type classification by jointly learning representations and label embeddings. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 1, Long Papers, pp 797–807. Association for Computational Linguistics, Valencia, Spain. https://www.aclweb.org/anthology/E17-1075

  2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  3. Baheti A, Ritter A, Li J, Dolan B (2018) Generating more interesting responses in neural conversation models with distributional constraints. In: Proceedings of the 2018 conference on empirical methods in natural language processing. Association for Computational Linguistics, Brussels, Belgium, pp 3970–3980. https://www.aclweb.org/anthology/D18-1431

  4. Banerjee D, Islam K, Xue K, Mei G, Xiao L, Zhang G, Xu R, Lei C, Ji S, Li J (2019) A deep transfer learning approach for improved post-traumatic stress disorder diagnosis. Knowl Inf Syst 60(3):1693–1724

    Article  Google Scholar 

  5. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  6. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146

    Article  Google Scholar 

  7. Brown PF, Della Pietra VJ, Desouza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–480

    Google Scholar 

  8. Silla Carlos N, Freitas Alex A (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Disc 22(1):31–72

    Article  MathSciNet  Google Scholar 

  9. Clark K, Luong MT, Le QV, Manning CD (2020) ELECTRA: pre-training text encoders as discriminators rather than generators. In: Proceddings of ICLR, pp 1–17. Retrieved March 19, 2020, from https://openreview.net/pdf?id=r1xMH1BtvB

  10. Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp 160–167

  11. Daniel G, Nevena L, Kuzman G, Jesse K, David H (2014) Context-dependent fine-grained entity type tagging. arXiv preprint arXiv:1412.1820

  12. Daume H III, Marcu D (2006) Domain adaptation for statistical classifiers. J Artif Intell Res 26:101–126

    Article  MathSciNet  Google Scholar 

  13. Deng D, Jing L, Yu J, Sun S, Ng MK (2019) Sentiment lexicon construction with hierarchical supervision topic model. IEEE/ACM Trans Audio Speech Language Process 27(4):704–718. https://doi.org/10.1109/TASLP.2019.2892232

    Article  Google Scholar 

  14. Dong L, Wei F, Sun H, Zhou M, Xu K (2015) A hybrid neural model for type classification of entity mentions. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence (IJCAI 2015), pp 1243–1249

  15. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  16. Ekbal A, Sourjikova E, Frank A, Ponzetto SP (2010) Assessing the challenge of fine-grained named entity recognition and classification. In: Proceedings of the 2010 named entities workshop, pp 93–101

  17. Eunsol C, Omer L, Yejin C, Luke Z (2018) Ultra-fine entity typing. In: Proceedings of the 56th annual meeting of the association for computational linguistics, pp 87–96

  18. Fleischman M, Hovy E (2002) Fine grained classification of named entities. In: COLING 2002: The 19th international conference on computational linguistics, pp 1–7. https://www.aclweb.org/anthology/C02-1130

  19. Ghaddar A, Langlais P (2018) Transforming Wikipedia into a large-scale fine-grained entity type corpus. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), pp. 4413–4420. European language resources association (ELRA), Miyazaki, Japan. Retrieved April 02, 2019, from https://www.aclweb.org/anthology/L18-1699

  20. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  21. Griffiths TL, Steyvers M, Blei DM, Tenenbaum JB (2005) Integrating topics and syntax. In: Advances in neural information processing systems, pp 537–544

  22. Hailong J, Lei H, Juanzi L, Tiansi D (2018) Attributed and predictive entity embedding for fine-grained entity typing in knowledge bases. In: Proceedings of the 27th international conference on computational linguistics, pp 282–292

  23. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  24. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  25. Jacob D, Ming-Wei C, Kenton L, Kristina T (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  26. Jeffrey P, Richard S, Christopher DM (2014) GloVe: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  27. Jin M, Luo X, Zhu H, Zhuo HH (2018) Combining deep learning and topic modeling for review understanding in context-aware recommendation. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (Long Papers), pp. 1605–1614. Association for Computational Linguistics, New Orleans, Louisiana. https://doi.org/10.18653/v1/N18-1145

  28. Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77

    Article  Google Scholar 

  29. Keren G, Sabato S, Schuller B (2020) Analysis of loss functions for fast single-class classification. Knowl Inf Syst 62(1):337–358

    Article  Google Scholar 

  30. Liu M, He M, Wang R, Li S (2019) A new local density and relative distance based spectrum clustering. Knowl Inf Syst 61(2):965–985

    Article  Google Scholar 

  31. Ma D, Chen Y, Chang KCC, Du X, Xu C, Chang Y (2018) Leveraging fine-grained Wikipedia categories for entity search. In: Proceedings of the 2018 world wide web conference, pp 1623–1632

  32. Mendes PN, Jakob M, García-Silva A, Bizer C (2011) DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th international conference on semantic systems, pp. 1–8. ACM

  33. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2020) Deep learning based text classification: a comprehensive review. arXiv preprint arXiv:2004.03705

  34. Amir Yosef Mohamed, Sandro Bauer, Johannes Hoffart, Marc Spaniol, Gerhard Weikum (2012) HYENA: hierarchical type classification for entity names. Proc COLING 2012:1361–1370

    Google Scholar 

  35. Neelakantan A, Chang MW (2015) Inferring missing entity type instances for knowledge base completion: New dataset and methods. In: Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 515–525. Association for Computational Linguistics, Denver, Colorado. https://doi.org/10.3115/v1/N15-1054

  36. Nitish G, Sameer S, Dan R (2017) Entity linking via joint encoding of types, descriptions, and context. In: Proceedings of the conference on empirical methods in natural language processing, pp 2671–2680

  37. Peng X, Denilson B (2018) Neural fine-grained entity type classification with hierarchy-aware loss. In: Proceedings of NAACL-HLT, pp 16–25

  38. Peters M, Ammar W, Bhagavatula C, Power R (2017) Semi-supervised sequence tagging with bidirectional language models. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp 1756–1765. Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17-1161

  39. Rabinovich M, Klein D (2017) Fine-grained entity typing with high-multiplicity assignments. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers), pp 330–334. Association for Computational Linguistics, Vancouver, Canada. Retrieved April 02, 2019, from https://doi.org/10.18653/v1/P17-2052

  40. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training pp 1–12. Retrieved April 01, 2019, from https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/languageunsupervised/language understanding paper.pdf

  41. Radford W, Curran JR (2013) Joint apposition extraction with syntactic and semantic constraints. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), pp 671–677. Association for Computational Linguistics, Sofia, Bulgaria. Retrieved April 02, 2019, from https://www.aclweb.org/anthology/P13-2118

  42. Rahman A, Ng V (2010) Inducing fine-grained semantic classes via hierarchical and collective classification. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 931–939

  43. Ralph W, Martha P, Mitchell M, Eduard H, Sameer P, Lance R, Nianwen X, Ann T, Jeff K, Michelle F (2013) Ontonotes release 5.0 with OntoNotes DB tool v0.999 beta. In: Linguistic data consortium, pp 1–53. Retrieved April 02, 2019, from https://hdl.handle.net/11272.1/AB2/MKJJ2R

  44. Recasens M, de Marneffe MC, Potts C (2013) The life and death of discourse entities: Identifying singleton mentions. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 627–633. Association for Computational Linguistics, Atlanta, Georgia. Retrieved April 03, 2019, from https://www.aclweb.org/anthology/N13-1071

  45. Ren X, He W, Qu M, Huang L, Ji H, Han J (2016) Afet: automatic fine-grained entity typing by hierarchical partial-label embedding. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1369–1378

  46. Ren X, He W, Qu M, Voss CR, Ji H, Han J (2016) Label noise reduction in entity typing by heterogeneous partial-label embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1825–1834

  47. Sang EF, De Meulder F (2003) Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv preprint arXiv:cs/0306050

  48. Sanjeev K, Ulli W, Hinrich S (2017) End-to-end trainable attentive decoder for hierarchical entity classification. In: Proceedings of European chapter of association for computational linguistics, pp 752–758

  49. Shimaoka S, Stenetorp P, Inui K, Riedel S (2017) Neural architectures for fine-grained entity type classification. In: Proceedings of the 15th Conference of the European chapter of the association for computational linguistics: volume 1, long papers, pp 1271–1280. Association for Computational Linguistics, Valencia, Spain. Retrieved April 03, 2019, from https://www.aclweb.org/anthology/E17-1119

  50. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  51. Suzuki M, Matsuda K, Sekine S, Okazaki N, Inui K (2016) Fine-grained named entity classification with wikipedia article vectors. In: 2016 IEEE/WIC/ACM international conference on web intelligence (WI), pp 483–486. IEEE

  52. Tomas M, Greg C, Kai C, Jeffrey D (2013) Efficient estimation of word representations in vector space. In: ICLR workshop, pp 1–12

  53. Tomas M, Ilya S, Kai C, Greg C, Jeffrey D (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  54. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  55. Wiedemann G, Ruppert E, Jindal R, Biemann C (2018) Transfer learning from lda to bilstm-cnn for offensive language detection in twitter. In: Proceedings of GermEval 2018, 14th conference on natural language processing (KONVENS 2018), pp 85–94

  56. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

  57. Xiao L, Daniel SW (2012) Fine-grained entity recognition. In: Proceedings of 26th AAAI conference on artificial intelligence, pp 94–100

  58. Yaghoobzadeh Yadollah, Adel Heike, Schutze Hinrich (2018) Corpus-level fine-grained entity typing. J Artif Intell Res 61:835–862

    Article  MathSciNet  Google Scholar 

  59. Yaghoobzadeh Y, Adel H, Schütze H (2017) Noise mitigation for neural entity typing and relation extraction. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 1, long papers, pp 1183–1194. Association for Computational Linguistics, Valencia, Spain. Retrieved April 03, 2019, from https://www.aclweb.org/anthology/E17-1111

  60. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237

  61. Yang Z, Salakhutdinov R, Cohen WW (2017) Transfer learning for sequence tagging with hierarchical recurrent networks. In: Proceedings of ICLR, pp 1–10

  62. Yogatama D, Gillick D, Lazic N (2015) Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 2: short papers), pp 291–296. Association for Computational Linguistics, Beijing, China. https://doi.org/10.3115/v1/P15-2048

  63. Yukun M, Erik C, Sa G (2016) Label embedding for zero-shot fine-grained named entity typing. In: Proceedings of the 26th international conference on computational linguistics: technical papers, pp 171–180

  64. Zha D, Li C (2019) Multi-label dataless text classification with topic modeling. Knowl Inf Syst 61(1):137–160

    Article  Google Scholar 

  65. Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1441–1451. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1139

  66. Zhang Z, Zhao H, Ling K, Li J, Li Z, He S, Fu G (2019) Effective subword segmentation for text comprehension. IEEE/ACM Trans Audio Speech Language Process 27(11):1664–1674

    Article  Google Scholar 

  67. Zhao W, Peng H, Eger S, Cambria E, Yang M (2019) Towards scalable and reliable capsule networks for challenging NLP applications. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1549–1559. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1150

  68. Zhong X, Cambria E, Hussain A (2020) Extracting time expressions and named entities with constituent-based tagging schemes. Cognitive Comput 12:1–19

    Article  Google Scholar 

Download references

Acknowledgements

We thank anonymous reviewers for their very thoughtful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruili Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, F., Wang, R. & Zhou, Y. Transfer learning for fine-grained entity typing. Knowl Inf Syst 63, 845–866 (2021). https://doi.org/10.1007/s10115-021-01549-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01549-5

Keywords

Navigation