Skip to main content

Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2022 (WISE 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13724))

Included in the following conference series:

  • 993 Accesses

Abstract

The problem of finding event announcement pages for any given website is called event source page discovery. In this paper, we show a policy-based deep reinforcement learning (RL) model for the event source page discovery agent. We use two stages to train our agent, pre-training and fine-tuning. In the pre-training phase, the model is trained with limited labeled data, where each episode has a fixed number of steps. In the fine-tuning phase, the agent is trained using unlabeled data and a reward system based on an event source page classifier. The agent learns whether to continue exploring or stop exploring through an adaptive threshold. The proposed agent achieves 74% precision with a 1.28 unit cost (the average number of clicks for each event source page) on the real word data set.

This project is supported by Ministry of Science and Technology, Taiwan under grant MOST-109-2221-E-008-060-MY3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Foley, J., Bendersky, M., Josifovski, V.: Learning to extract local events from the web. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 423–432. No. 09–13 in SIGIR15, ACM, Santiago, Chile (2015). https://doi.org/10.1145/2766462.2767739

  2. Grigoriadis, A., Paliouras, G.: Focused crawling using temporal difference-learning. In: Vouros, G.A., Panayiotopoulos, T. (eds.) SETN 2004. LNCS (LNAI), vol. 3025, pp. 142–153. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24674-9_16

    Chapter  Google Scholar 

  3. Han, M., Wuillemin, P.-H., Senellart, P.: Focused crawling through reinforcement learning. In: Mikkonen, T., Klamma, R., Hernández, J. (eds.) ICWE 2018. LNCS, vol. 10845, pp. 261–278. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91662-0_20

    Chapter  Google Scholar 

  4. Liu, B., Grossman, R., Zhai, Y.: Mining data records in web pages. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601–606. ACM, New York (2003). https://doi.org/10.1145/956750.956826

  5. Menczer, F., Belew, R.K.: Adaptive retrieval agents: internalizing local context and scaling up to the web. Mach. Learn. 39(2/3), 203–242 (2000). https://doi.org/10.1023/A:1007653114902

    Article  MATH  Google Scholar 

  6. Menczer, F., Pant, G., Srinivasan, P.: Topical web crawlers: evaluating adaptive algorithms. ACM Trans. Internet Technol. 4(4), 378–419 (2004)

    Article  Google Scholar 

  7. Meusel, R., Mika, P., Blanco, R.: Focused crawling for structured data. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1039–1048. CIKM 2014, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2661829.2661902

  8. Partalas, I., Paliouras, G., Vlahavas, I.: Reinforcement learning with classifier selection for focused crawling. In: Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 759–760. IOS Press, NLD (2008)

    Google Scholar 

  9. Rennie, J., McCallum, A.: Using reinforcement learning to spider the web efficiently. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 335–343. ICML 1999, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)

    Google Scholar 

  10. Wang, Q., Kanagal, B., Garg, V., Sivakumar, D.: Constructing a comprehensive events database from the web. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 229–238. CIKM 2019, Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Hui Chang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chang, CH., Liao, YC., Yeh, T. (2022). Event Source Page Discovery via Policy-Based RL with Multi-task Neural Sequence Model. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20891-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20890-4

  • Online ISBN: 978-3-031-20891-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics