Abstract
The problem of finding event announcement pages for any given website is called event source page discovery. In this paper, we show a policy-based deep reinforcement learning (RL) model for the event source page discovery agent. We use two stages to train our agent, pre-training and fine-tuning. In the pre-training phase, the model is trained with limited labeled data, where each episode has a fixed number of steps. In the fine-tuning phase, the agent is trained using unlabeled data and a reward system based on an event source page classifier. The agent learns whether to continue exploring or stop exploring through an adaptive threshold. The proposed agent achieves 74% precision with a 1.28 unit cost (the average number of clicks for each event source page) on the real word data set.
This project is supported by Ministry of Science and Technology, Taiwan under grant MOST-109-2221-E-008-060-MY3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Foley, J., Bendersky, M., Josifovski, V.: Learning to extract local events from the web. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 423–432. No. 09–13 in SIGIR15, ACM, Santiago, Chile (2015). https://doi.org/10.1145/2766462.2767739
Grigoriadis, A., Paliouras, G.: Focused crawling using temporal difference-learning. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 142–153. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24674-9_16
Han, M., Wuillemin, P.-H., Senellart, P.: Focused crawling through reinforcement learning. In: Mikkonen, T., Klamma, R., Hernández, J. (eds.) ICWE 2018. LNCS, vol. 10845, pp. 261–278. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91662-0_20
Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–606. ACM, New York (2003). https://doi.org/10.1145/956750.956826
Menczer, F., Belew, R.K.: Adaptive retrieval agents: internalizing local context and scaling up to the web. Mach. Learn. 39(2/3), 203–242 (2000). https://doi.org/10.1023/A:1007653114902
Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: evaluating adaptive algorithms. ACM Trans. Internet Technol. 4(4), 378–419 (2004)
Meusel, R., Mika, P., Blanco, R.: Focused crawling for structured data. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1039–1048. CIKM 2014, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2661829.2661902
Partalas, I., Paliouras, G., Vlahavas, I.: Reinforcement learning with classifier selection for focused crawling. In: Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 759–760. IOS Press, NLD (2008)
Rennie, J., McCallum, A.: Using reinforcement learning to spider the web efficiently. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 335–343. ICML 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)
Wang, Q., Kanagal, B., Garg, V., Sivakumar, D.: Constructing a comprehensive events database from the web. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 229–238. CIKM 2019, Association for Computing Machinery, New York, NY, USA (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chang, CH., Liao, YC., Yeh, T. (2022). Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-20891-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20890-4
Online ISBN: 978-3-031-20891-1
eBook Packages: Computer ScienceComputer Science (R0)