Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model

Chang, Chia-Hui; Liao, Yu-Ching; Yeh, Ting

doi:10.1007/978-3-031-20891-1_42

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13724))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1252 Accesses

Abstract

The problem of finding event announcement pages for any given website is called event source page discovery. In this paper, we show a policy-based deep reinforcement learning (RL) model for the event source page discovery agent. We use two stages to train our agent, pre-training and fine-tuning. In the pre-training phase, the model is trained with limited labeled data, where each episode has a fixed number of steps. In the fine-tuning phase, the agent is trained using unlabeled data and a reward system based on an event source page classifier. The agent learns whether to continue exploring or stop exploring through an adaptive threshold. The proposed agent achieves 74% precision with a 1.28 unit cost (the average number of clicks for each event source page) on the real word data set.

This project is supported by Ministry of Science and Technology, Taiwan under grant MOST-109-2221-E-008-060-MY3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reinforcement Learning in Deep Web Crawling: Survey

MARES: multitask learning algorithm for Web-scale real-time event summarization

Article 02 June 2018

Web API Search: Discover Web API and Its Endpoint with Natural Language Queries

References

Foley, J., Bendersky, M., Josifovski, V.: Learning to extract local events from the web. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 423–432. No. 09–13 in SIGIR15, ACM, Santiago, Chile (2015). https://doi.org/10.1145/2766462.2767739
Grigoriadis, A., Paliouras, G.: Focused crawling using temporal difference-learning. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 142–153. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24674-9_16
Chapter Google Scholar
Han, M., Wuillemin, P.-H., Senellart, P.: Focused crawling through reinforcement learning. In: Mikkonen, T., Klamma, R., Hernández, J. (eds.) ICWE 2018. LNCS, vol. 10845, pp. 261–278. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91662-0_20
Chapter Google Scholar
Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–606. ACM, New York (2003). https://doi.org/10.1145/956750.956826
Menczer, F., Belew, R.K.: Adaptive retrieval agents: internalizing local context and scaling up to the web. Mach. Learn. 39(2/3), 203–242 (2000). https://doi.org/10.1023/A:1007653114902
Article MATH Google Scholar
Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: evaluating adaptive algorithms. ACM Trans. Internet Technol. 4(4), 378–419 (2004)
Article Google Scholar
Meusel, R., Mika, P., Blanco, R.: Focused crawling for structured data. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1039–1048. CIKM 2014, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2661829.2661902
Partalas, I., Paliouras, G., Vlahavas, I.: Reinforcement learning with classifier selection for focused crawling. In: Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 759–760. IOS Press, NLD (2008)
Google Scholar
Rennie, J., McCallum, A.: Using reinforcement learning to spider the web efficiently. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 335–343. ICML 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)
Google Scholar
Wang, Q., Kanagal, B., Garg, V., Sivakumar, D.: Constructing a comprehensive events database from the web. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 229–238. CIKM 2019, Association for Computing Machinery, New York, NY, USA (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

National Central University, Taoyuan, Taiwan
Chia-Hui Chang, Yu-Ching Liao & Ting Yeh

Authors

Chia-Hui Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Ching Liao
View author publications
You can also search for this author in PubMed Google Scholar
Ting Yeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia-Hui Chang .

Editor information

Editors and Affiliations

University of Pau and Pays de l'Adour, Anglet, France
Richard Chbeir
The University of Queensland, Brisbane, QLD, Australia
Helen Huang
Sapienza Università di Roma, Rome, Italy
Fabrizio Silvestri
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
The New Cyber Research Department, Peng Cheng Laboratory, Shenzhen, China
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chang, CH., Liao, YC., Yeh, T. (2022). Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-20891-1_42
Published: 07 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20890-4
Online ISBN: 978-3-031-20891-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning in Deep Web Crawling: Survey

MARES: multitask learning algorithm for Web-scale real-time event summarization

Web API Search: Discover Web API and Its Endpoint with Natural Language Queries

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Reinforcement Learning in Deep Web Crawling: Survey

MARES: multitask learning algorithm for Web-scale real-time event summarization

Web API Search: Discover Web API and Its Endpoint with Natural Language Queries

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation