Abstract
Events are important activities people are involved in real life, and the information about events may be fascinating and important for people to understand and keep abreast with the key developments of some important social and individual subjects. In the big data era, event detection methods can help people efficiently and quickly extract specific information from massive Web information. However, the existing methods usually load the entire Web page information as the input into the models, and the rich noise and irrelevant information on Web pages will seriously impact the event detection performance of these methods. Also, the existing methods mostly used static models, which fail to consider the dynamics of information on the Web. To improve the performance of event detection and classification, we propose in this paper a new method that partitions the Web pages into multiple text blocks and utilizes Bi-LSTM with the attention mechanism for fine-grained event detection from Chinese Web pages. We also propose a dynamic method that updates the data as well as the model regularly and incrementally, making our model more adaptive to the ongoing changes of the Webpage data. The experimental results show that our model outperforms existing methods in event detection in terms of detection performance, the associated computational overhead, and the ability to deal with evolving Webpage information.
This research is supported by the Natural Science Foundation of China (No. 62172372), Zhejiang Provincial Natural Science Foundation (No. LZ21F030001) and Zhejiang Lab (No. 2022KG0AN01).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Can, E.F., Manmatha, R.: Modeling concept dependencies for event detection. In: Proceedings of International Conference on Multimedia Retrieval, pp. 289–296 (2014)
Chieu, H.L., Ng, H.T.: A maximum entropy approach to information extraction from semi-structured and free text. Aaai/iaai 2002, 786–791 (2002)
Cui, W.: A Chinese text classification system based on naive bayes algorithm. In: MATEC Web of Conferences, vol. 44, p. 01015. EDP Sciences (2016)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ding, N., Li, Z., Liu, Z., Zheng, H., Lin, Z.: Event detection with trigger-aware lattice neural network. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 347–356 (2019)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 297–304 (2004)
Li, Z., Yao, L., Chang, X., Zhan, K., Sun, J., Zhang, H.: Zero-shot event detection via event-adaptive concept relevance mining. Pattern Recogn. 88, 595–603 (2019)
Lin, Z., et al.: A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Lu, H., Zhan, D., Zhou, L., He, D.: An improved focused crawler: using web page classification and link priority evaluation. Math. Probl. Eng. 2016, 1–10 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Qing, S., Ying, Z., Pengzhou, Z.: Research review on key techniques of topic-based news elements extraction. In: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), pp. 585–590. IEEE (2017)
Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 3, pp. 311–314 (2009)
Sims, M., Park, J.H., Bamman, D.: Literary event detection. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3623–3634 (2019)
Singh, B., Gupta, D.K., Singh, R.M.: Improved architecture of focused crawler on the basis of content and link analysis. Int. J. Mod. Educ. Comput. Sci. 9(11), 33 (2017)
Tong, M., et al.: Improving event detection via open-domain trigger knowledge. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5887–5897 (2020)
Yang, K., Xu, H., Gao, K.: CM-BERT: cross-modal BERT for text-audio sentiment analysis. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 521–528 (2020)
Yu, S., Su, J., Luo, D.: Improving BERT-based text classification with auxiliary sentence and domain knowledge. IEEE Access 7, 176600–176612 (2019)
Zhang, C., Hong, S., Zhang, P.: The research on event extraction of Chinese news based on subject elements. In: 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1–5. IEEE (2016)
Zhong, Z., Jin, L., Feng, Z.: Multi-font printed Chinese character recognition using multi-pooling convolutional neural network. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 96–100. IEEE (2015)
Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, Y. et al. (2022). Event Detection from Web Data in Chinese Based on Bi-LSTM with Attention. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2022. Lecture Notes in Computer Science(), vol 13725. Springer, Cham. https://doi.org/10.1007/978-3-031-22064-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-22064-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22063-0
Online ISBN: 978-3-031-22064-7
eBook Packages: Computer ScienceComputer Science (R0)