Abstract
Web farming can advance computational social science into a never-end learning process, in which social phenomena are dynamically and scientifically understood based on continuously produced, updated and expired data in the connected hyper world. Named entity recognition is a basic and core task of Web farming. However, the existing named entity recognition methods mainly depend on the complete, high-quality and well-labelled data sets and cannot meet the requirements of real-world applications. This paper proposes a continuous learning method for recognizing named entity by introducing the Web farming mode of Web Intelligence into the recognizing process. During the on-line stage, the domain contextual relevance of candidate entities is calculated by using the domain discrimination degree and the domain dependence function for recognizing the target entities. During the off-line stage, an active learning approach is designed to continuously improve the target corpus set by binding density-based clustering with semantic distance measurement. Experimental results show that the proposed method can effectively improve the accuracy of entity recognition and is more suitable for real-world applications.
Similar content being viewed by others
References
Asim, M. N., Wasim, M., Khan, M. U. G., Mahmood, W., Abbasi, H. M.: A survey of ontology learning techniques and applications. Database 2018 (2018)
Bhatia, P., Arumae, K., Celikkaya, E. B.: Dynamic Transfer Learning for Named Entity Recognition. International Workshop on Health Intelligence, pp.69–81. Springer, Cham (2019)
Cheng, J., Wang, K.: Active learning for image retrieval with co-SVM. Pattern Recogn. 40(1), 330–334 (2007)
Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL. 4, 357–370 (2016)
Cioffi-Revilla, C.: Bigger Computational Social Science: Data, Theories, Models, and Simulations--Not Just Big Data. Theories, Models, and Simulations--Not Just Big Data (May 24, 2016) (2016)
De Boom, C., Van Canneyt, S., Bohez, S., Demeester, T., and Dhoedt, B.: Learning semantic similarity for very short texts. In: the 2015 IEEE International Conference on Data Mining Workshop (ICDW 2015), pp. 1229–1234. IEEE (2015)
Dong, G., Chen, J., Wang, H., Zhong, N.: A narrow-domain entity recognition method based on domain relevance measurement and context information. In: the 2017 International Conference on Web Intelligence, pp. 623–628. ACM (2017)
Gao, C., Liu, J., Zhong, N.: Network immunization with distributed autonomy-oriented entities. IEEE TPDS. 22(7), 1222–1229 (2010)
Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., Scheffer, T.: Systematic feature evaluation for gene name recognition. BMC BIOINFORMATICS. 6(1), S9 (2005)
Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 945–954 (2001)
Han, X., Kwoh, C. K., Kim, J. J.: Clustering based active learning for biomedical named entity recognition. In: the 2016 International Joint Conference on Neural Networks (IJCNN), pp. 1253–1260. IEEE (2016)
Hu, J., Zhong, N.: Web farming with clickstream. Int. J. Inf. Technol. Decis. Mak. 7(02), 291–308 (2008)
Jiang, X., Tan, A.H.: CRCTOL: a semantic-based domain ontology learning system. J. Am. Soc. Inf. Sci. Technol. 61(1), 150–168 (2010)
Ju Z, Wang J, Zhu F.: Named entity recognition from biomedical text using SVM. In: 2011 International Conference on Bioinformatics and Biomedical Engineering (BIBM), pp. 1–4. IEEE (2011)
Kang, Y.B., Haghighi, P.D., Burstein, F.: CFinder: an intelligent key concept finder from text for ontology development. Expert Syst. Appl. 41(9), 4494–4504 (2014)
Kim, S., Song, Y., Kim, K., Cha, J. W., Lee, G. G.: Mmr-based active machine learning for bio named entity recognition. In: the 2006 Human Language Technology Conference of the NAACL (2006)
Leaman, R., Wei, C.H., Lu, Z.: tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics. 7(S1), S3 (2015)
Lewis, D. D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. Machine Learning Proceedings 1994, pp. 148–156. Morgan Kaufmann (1994)
Li, Y.F., Zhong, N.: Web mining model and its applications for information gathering. Knowl.-Based Syst. 17(5–6), 207–217 (2004)
Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using CRFs. Comput. Biol. Chem. 33(4), 334–338 (2009)
Li, J., Sun, A., Han, J., Li, C.: A Survey on Deep Learning for Named Entity Recognition. In: the CoRR (2018), p. 1 (2018)
Ling, X., Weld, D. S.: Fine-grained entity recognition. In: the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 94–100. AAAI Press (2012)
Martin, J.H., Jurafsky, D.: Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, Pearson/Prentice Hall (2009)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In Workshop at International Conference on Learning Representations (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mikolov, T., Yih, W. T., Zweig, G.: Linguistic regularities in continuous space word representations. In: the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)
Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., Krishnamurthy, J., et al.: Never-ending learning. Commun. ACM. 61(5), 103–115 (2018)
Navigli, R., Velardi, P.: Learning domain ontologies from document warehouses and dedicated Web sites. Comput Linguist. 30(2), 151–179 (2004)
Nguyen, T. D., Mai, K., Pham, T. H., Nguyen, M. T., Nguyen, T. V. T., Eguchi, T., Sasano R., Sekine, S.: Extended Named Entity Recognition API and Its Applications in Language Education. In: the 2017 ACL, System Demonstrations, pp. 37–42 (2017)
Pasolli, E., Melgani, F.: Active learning methods for electrocardiographic signal classification. IEEE Trans. Inf. Technol. Biomed. 14(6), 1405–1416 (2010)
Qu, L., Ferraro, G., Zhou, L., Hou, W., Baldwin, T.: Named entity recognition for novel types by transfer learning. In: the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 899–905 (2016)
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)
Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15(2), 442–456 (2003)
Sathiya, B., Geetha, T.V.: Automatic ontology learning from multiple knowledge sources of text. International IJIIT. 14(2), 1–21 (2018)
Settles, B., and Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)
Seung, H. S., Opper, M., Sompolinsky, H.: Query by committee. In: the fifth annual workshop on Computational learning theory, pp. 287–294. ACM (1992)
Tao, X., Li, Y., Zhong, N., Nayak, R.: Automatically acquiring training sets for Web information gathering. In: the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp.532–535. IEEE Computer Society (2006)
Tao, X., Li, Y., Zhong, N.: A personalized ontology model for Web information gathering. IEEE Trans. Knowl. Data Eng. 23(4), 496–511 (2010)
Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 486–495 (2007)
Tran, V.C., Nguyen, N.T., Fujita, H., Hoang, D.T., Hwang, D.: A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl.-Based Syst. 132, 179–187 (2017)
Yao, Y.Y., Zhong, N., Liu, J., Ohsuga, S.: Web intelligence (WI): research challenges and trends in the new information age. Lecture Notes in Artificial Intelligence, 2198, 1–17 (2001)
Zeng, Y., Zhong, N., Wang, Y., Qin, Y., Huang, Z., Zhou, H., Yao, Y., Van Harmelen, F.: User-centric query refinement and processing using granularity-based strategies. Knowl. Inf. Syst. 27(3), 419–450 (2011)
Zhong, N.: Developing intelligent portals by using WI technologies. Wavelet Analysis and Its Applications, and Active Media Technology: (In 2 Volumes) pp. 555–567 (2004)
Zhong, N., Chen, J.: Constructing a new-style conceptual model of brain data for systematic brain informatics. IEEE Trans. Knowl. Data Eng. 24(12), 2127–2142 (2011)
Zhong, N., Liu, J., Yao, Y.: Envisioning intelligent information technologies through the prism of Web intelligence. Commun. ACM. 50(3), 89–94 (2007)
Zhong N, Liu, J., Yao, Y.: Web intelligence (WI). Wiley Encyclopedia of Computer Science and Engineering, 1–11 (2007)
Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)
Zhong, N., Ma, J.H., Huang, R.H., Liu, J.M., Yao, Y.Y., Zhang, Y.X., Chen, J.H.: Research challenges and perspectives on wisdom Web of things (W2T). J. Supercomput. 64(3), 862–882 (2013)
Zhong, N., Liu, J., Shi, Y., Yao, Y.: An interview with professor raj Reddy on Web intelligence (WI) and computational social science (CSS). WI. 16(3), 143–146 (2018)
Acknowledgments
The work is supported by Science and Technology Project of Beijing Municipal Commission of Education (No. KM201710005026), National Basic Research Program of China (No.2014CB744600), Beijing Natural Science Foundation (No. 4182005), Beijing Key Laboratory of MRI and Brain Informatics.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article belongs to the Topical Collection: Computational Social Science as the Ultimate Web Intelligence
Guest Editors: Xiaohui Tao, Juan D. Velasquez, Jiming Liu, and Ning Zhong
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lin, S., Gao, J., Zhang, S. et al. A continuous learning method for recognizing named entities by integrating domain contextual relevance measurement and Web farming mode of Web intelligence. World Wide Web 23, 1769–1790 (2020). https://doi.org/10.1007/s11280-019-00758-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00758-x