Optimization of Automatic Navigation to Hidden Web Pages by Ranking-Based Browser Preloading

Hidalgo, Justo; Losada, José; Álvarez, Manuel; Pan, Alberto

doi:10.1007/11780397_4

Justo Hidalgo²¹,
José Losada²¹,
Manuel Álvarez²² &
…
Alberto Pan²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4055))

Included in the following conference series:

International Workshop on Data Engineering Issues in E-Commerce and Services

464 Accesses

Abstract

Web applications have become an invaluable source of information for many different vertical solutions, but their complex navigation and semistructured format make their information difficult to retrieve. Web Automation and Extraction systems are able to navigate through web links and to fill web forms automatically in order to get information not directly accessible by a URL. In these systems, the main optimization parameter is the time required to navigate through the intermediate pages which lead to the desired final pages. This paper proposes a series of techniques and algorithms that improves this parameter by basically storing historical information from previous queries, and using it to make the browser manager preload an adequate subset of the whole navigational sequence on a specific browser, before the following query is executed. These techniques also handle which sequences are the most common, thus being the ones which are preloaded more often.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proceedings of the ACM SIGMOD international conference on Management of data (2003)
Google Scholar
Bergman, M.K.: The Deep Web. Surfacing Hidden Value, http://www.brightplanet.com/technology/deepweb.asp
Garret, J.J.: Ajax: A New Approach to Web Applications, http://www.adaptivepath.com/publications/essays/archives/000385print.php
Hidalgo, J., Pan, A., Losada, J., Álvarez, M.: Adding Physical Optimization to Cost Models in Information Mediators. In: IEEE Conference on e-Business Engineering (2005)
Google Scholar
Hidalgo, J., Pan, A., Losada, J., Álvarez, M., Viña, A.: Building the Architecture of a Statistics-based Query Optimization Solution for Heterogeneous Mediators. In: 6th International Conference on Information Integration and Web-based Applications & Services, p. 1 (2004)
Google Scholar
Knoblock, C.A., Lerman, K., Minton, S., Muslea, I.: Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (1999)
Google Scholar
Kushmerick, N., Weld, D.S., Doorembos, R.: Wrapper induction for information extraction. In: Proceedings of the fifteenth International Joint Conference on Artificial Intelligence (1997)
Google Scholar
Laender, A.H.F., Ribeiro-Neto, B.A., Soares da Silva, A., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31(2) (2002)
Google Scholar
Pan, A., et al.: Semi-Automatic Wrapper Generation for Commercial Web Sources. In: Proceedings of IFIP WG8.1 Working Conference on Engineering Information Systems in the Internet Context (2002)
Google Scholar
Pan, A., Raposo, J., Álvarez, M., Montoto, P., Orjales, V., Hidalgo, J., Ardao, L., Molano, A., Viña, A.: The DENODO Data Integration Platform. In: 28th International Conference on Very Large Databases (2002)
Google Scholar
Raghavan, S., García-Molina, H.: Crawling the Hidden Web. In: Proceedings of the 27th International Conference on Very Large Databases (2001)
Google Scholar
Wiederhold, G.: Mediators in the Architecture of Future Information Systems. IEEE Computer (March 1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Denodo Technologies, Inc., Madrid, Spain
Justo Hidalgo & José Losada
Department of Information and Communications Technologies., University of A Coruña, Spain
Manuel Álvarez & Alberto Pan

Authors

Justo Hidalgo
View author publications
You can also search for this author in PubMed Google Scholar
José Losada
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Álvarez
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IBM Watson Research Center, Hawthorne, New York,
Juhnyoung Lee
Dept of Computer Science, Sookmyung Women’s University, Korea
Junho Shim
School of Computer Science & Engineering, Seoul National University, Korea
Sang-goo Lee
Cisco Systems, Inc., 95134, San Jose, CA, USA
Christoph Bussler
San Jose State University, USA
Simon Shim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hidalgo, J., Losada, J., Álvarez, M., Pan, A. (2006). Optimization of Automatic Navigation to Hidden Web Pages by Ranking-Based Browser Preloading. In: Lee, J., Shim, J., Lee, Sg., Bussler, C., Shim, S. (eds) Data Engineering Issues in E-Commerce and Services. DEECS 2006. Lecture Notes in Computer Science, vol 4055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780397_4

Download citation

DOI: https://doi.org/10.1007/11780397_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35440-6
Online ISBN: 978-3-540-35441-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics