Skip to main content
Log in

A continuous learning method for recognizing named entities by integrating domain contextual relevance measurement and Web farming mode of Web intelligence

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Web farming can advance computational social science into a never-end learning process, in which social phenomena are dynamically and scientifically understood based on continuously produced, updated and expired data in the connected hyper world. Named entity recognition is a basic and core task of Web farming. However, the existing named entity recognition methods mainly depend on the complete, high-quality and well-labelled data sets and cannot meet the requirements of real-world applications. This paper proposes a continuous learning method for recognizing named entity by introducing the Web farming mode of Web Intelligence into the recognizing process. During the on-line stage, the domain contextual relevance of candidate entities is calculated by using the domain discrimination degree and the domain dependence function for recognizing the target entities. During the off-line stage, an active learning approach is designed to continuously improve the target corpus set by binding density-based clustering with semantic distance measurement. Experimental results show that the proposed method can effectively improve the accuracy of entity recognition and is more suitable for real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  1. Asim, M. N., Wasim, M., Khan, M. U. G., Mahmood, W., Abbasi, H. M.: A survey of ontology learning techniques and applications. Database 2018 (2018)

  2. Bhatia, P., Arumae, K., Celikkaya, E. B.: Dynamic Transfer Learning for Named Entity Recognition. International Workshop on Health Intelligence, pp.69–81. Springer, Cham (2019)

    Google Scholar 

  3. Cheng, J., Wang, K.: Active learning for image retrieval with co-SVM. Pattern Recogn. 40(1), 330–334 (2007)

    Article  Google Scholar 

  4. Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL. 4, 357–370 (2016)

    Article  Google Scholar 

  5. Cioffi-Revilla, C.: Bigger Computational Social Science: Data, Theories, Models, and Simulations--Not Just Big Data. Theories, Models, and Simulations--Not Just Big Data (May 24, 2016) (2016)

  6. De Boom, C., Van Canneyt, S., Bohez, S., Demeester, T., and Dhoedt, B.: Learning semantic similarity for very short texts. In: the 2015 IEEE International Conference on Data Mining Workshop (ICDW 2015), pp. 1229–1234. IEEE (2015)

  7. Dong, G., Chen, J., Wang, H., Zhong, N.: A narrow-domain entity recognition method based on domain relevance measurement and context information. In: the 2017 International Conference on Web Intelligence, pp. 623–628. ACM (2017)

  8. Gao, C., Liu, J., Zhong, N.: Network immunization with distributed autonomy-oriented entities. IEEE TPDS. 22(7), 1222–1229 (2010)

    Article  Google Scholar 

  9. Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., Scheffer, T.: Systematic feature evaluation for gene name recognition. BMC BIOINFORMATICS. 6(1), S9 (2005)

    Article  Google Scholar 

  10. Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 945–954 (2001)

  11. Han, X., Kwoh, C. K., Kim, J. J.: Clustering based active learning for biomedical named entity recognition. In: the 2016 International Joint Conference on Neural Networks (IJCNN), pp. 1253–1260. IEEE (2016)

  12. Hu, J., Zhong, N.: Web farming with clickstream. Int. J. Inf. Technol. Decis. Mak. 7(02), 291–308 (2008)

    Article  Google Scholar 

  13. Jiang, X., Tan, A.H.: CRCTOL: a semantic-based domain ontology learning system. J. Am. Soc. Inf. Sci. Technol. 61(1), 150–168 (2010)

    Article  Google Scholar 

  14. Ju Z, Wang J, Zhu F.: Named entity recognition from biomedical text using SVM. In: 2011 International Conference on Bioinformatics and Biomedical Engineering (BIBM), pp. 1–4. IEEE (2011)

  15. Kang, Y.B., Haghighi, P.D., Burstein, F.: CFinder: an intelligent key concept finder from text for ontology development. Expert Syst. Appl. 41(9), 4494–4504 (2014)

    Article  Google Scholar 

  16. Kim, S., Song, Y., Kim, K., Cha, J. W., Lee, G. G.: Mmr-based active machine learning for bio named entity recognition. In: the 2006 Human Language Technology Conference of the NAACL (2006)

  17. Leaman, R., Wei, C.H., Lu, Z.: tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics. 7(S1), S3 (2015)

  18. Lewis, D. D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. Machine Learning Proceedings 1994, pp. 148–156. Morgan Kaufmann (1994)

  19. Li, Y.F., Zhong, N.: Web mining model and its applications for information gathering. Knowl.-Based Syst. 17(5–6), 207–217 (2004)

    Article  Google Scholar 

  20. Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using CRFs. Comput. Biol. Chem. 33(4), 334–338 (2009)

    Article  Google Scholar 

  21. Li, J., Sun, A., Han, J., Li, C.: A Survey on Deep Learning for Named Entity Recognition. In: the CoRR (2018), p. 1 (2018)

  22. Ling, X., Weld, D. S.: Fine-grained entity recognition. In: the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 94–100. AAAI Press (2012)

  23. Martin, J.H., Jurafsky, D.: Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, Pearson/Prentice Hall (2009)

    Google Scholar 

  24. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In Workshop at International Conference on Learning Representations (2013)

  25. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

  26. Mikolov, T., Yih, W. T., Zweig, G.: Linguistic regularities in continuous space word representations. In: the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)

  27. Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., Krishnamurthy, J., et al.: Never-ending learning. Commun. ACM. 61(5), 103–115 (2018)

    Article  Google Scholar 

  28. Navigli, R., Velardi, P.: Learning domain ontologies from document warehouses and dedicated Web sites. Comput Linguist. 30(2), 151–179 (2004)

    Article  Google Scholar 

  29. Nguyen, T. D., Mai, K., Pham, T. H., Nguyen, M. T., Nguyen, T. V. T., Eguchi, T., Sasano R., Sekine, S.: Extended Named Entity Recognition API and Its Applications in Language Education. In: the 2017 ACL, System Demonstrations, pp. 37–42 (2017)

  30. Pasolli, E., Melgani, F.: Active learning methods for electrocardiographic signal classification. IEEE Trans. Inf. Technol. Biomed. 14(6), 1405–1416 (2010)

    Article  Google Scholar 

  31. Qu, L., Ferraro, G., Zhou, L., Hou, W., Baldwin, T.: Named entity recognition for novel types by transfer learning. In: the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 899–905 (2016)

  32. Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)

    Article  Google Scholar 

  33. Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15(2), 442–456 (2003)

    Article  Google Scholar 

  34. Sathiya, B., Geetha, T.V.: Automatic ontology learning from multiple knowledge sources of text. International IJIIT. 14(2), 1–21 (2018)

    Article  Google Scholar 

  35. Settles, B., and Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)

  36. Seung, H. S., Opper, M., Sompolinsky, H.: Query by committee. In: the fifth annual workshop on Computational learning theory, pp. 287–294. ACM (1992)

  37. Tao, X., Li, Y., Zhong, N., Nayak, R.: Automatically acquiring training sets for Web information gathering. In: the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp.532–535. IEEE Computer Society (2006)

  38. Tao, X., Li, Y., Zhong, N.: A personalized ontology model for Web information gathering. IEEE Trans. Knowl. Data Eng. 23(4), 496–511 (2010)

    Article  Google Scholar 

  39. Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 486–495 (2007)

  40. Tran, V.C., Nguyen, N.T., Fujita, H., Hoang, D.T., Hwang, D.: A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl.-Based Syst. 132, 179–187 (2017)

    Article  Google Scholar 

  41. Yao, Y.Y., Zhong, N., Liu, J., Ohsuga, S.: Web intelligence (WI): research challenges and trends in the new information age. Lecture Notes in Artificial Intelligence, 2198, 1–17 (2001)

  42. Zeng, Y., Zhong, N., Wang, Y., Qin, Y., Huang, Z., Zhou, H., Yao, Y., Van Harmelen, F.: User-centric query refinement and processing using granularity-based strategies. Knowl. Inf. Syst. 27(3), 419–450 (2011)

    Article  Google Scholar 

  43. Zhong, N.: Developing intelligent portals by using WI technologies. Wavelet Analysis and Its Applications, and Active Media Technology: (In 2 Volumes) pp. 555–567 (2004)

  44. Zhong, N., Chen, J.: Constructing a new-style conceptual model of brain data for systematic brain informatics. IEEE Trans. Knowl. Data Eng. 24(12), 2127–2142 (2011)

    Article  Google Scholar 

  45. Zhong, N., Liu, J., Yao, Y.: Envisioning intelligent information technologies through the prism of Web intelligence. Commun. ACM. 50(3), 89–94 (2007)

    Article  Google Scholar 

  46. Zhong N, Liu, J., Yao, Y.: Web intelligence (WI). Wiley Encyclopedia of Computer Science and Engineering, 1–11 (2007)

  47. Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)

    Article  Google Scholar 

  48. Zhong, N., Ma, J.H., Huang, R.H., Liu, J.M., Yao, Y.Y., Zhang, Y.X., Chen, J.H.: Research challenges and perspectives on wisdom Web of things (W2T). J. Supercomput. 64(3), 862–882 (2013)

    Article  Google Scholar 

  49. Zhong, N., Liu, J., Shi, Y., Yao, Y.: An interview with professor raj Reddy on Web intelligence (WI) and computational social science (CSS). WI. 16(3), 143–146 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

The work is supported by Science and Technology Project of Beijing Municipal Commission of Education (No. KM201710005026), National Basic Research Program of China (No.2014CB744600), Beijing Natural Science Foundation (No. 4182005), Beijing Key Laboratory of MRI and Brain Informatics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianhui Chen.

Additional information

This article belongs to the Topical Collection: Computational Social Science as the Ultimate Web Intelligence

Guest Editors: Xiaohui Tao, Juan D. Velasquez, Jiming Liu, and Ning Zhong

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, S., Gao, J., Zhang, S. et al. A continuous learning method for recognizing named entities by integrating domain contextual relevance measurement and Web farming mode of Web intelligence. World Wide Web 23, 1769–1790 (2020). https://doi.org/10.1007/s11280-019-00758-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00758-x

Keywords

Navigation