Abstract
Web applications have become an invaluable source of information for many different vertical solutions, but their complex navigation and semistructured format make their information difficult to retrieve. Web Automation and Extraction systems are able to navigate through web links and to fill web forms automatically in order to get information not directly accessible by a URL. In these systems, the main optimization parameter is the time required to navigate through the intermediate pages which lead to the desired final pages. This paper proposes a series of techniques and algorithms that improves this parameter by basically storing historical information from previous queries, and using it to make the browser manager preload an adequate subset of the whole navigational sequence on a specific browser, before the following query is executed. These techniques also handle which sequences are the most common, thus being the ones which are preloaded more often.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proceedings of the ACM SIGMOD international conference on Management of data (2003)
Bergman, M.K.: The Deep Web. Surfacing Hidden Value, http://www.brightplanet.com/technology/deepweb.asp
Garret, J.J.: Ajax: A New Approach to Web Applications, http://www.adaptivepath.com/publications/essays/archives/000385print.php
Hidalgo, J., Pan, A., Losada, J., Álvarez, M.: Adding Physical Optimization to Cost Models in Information Mediators. In: IEEE Conference on e-Business Engineering (2005)
Hidalgo, J., Pan, A., Losada, J., Álvarez, M., Viña, A.: Building the Architecture of a Statistics-based Query Optimization Solution for Heterogeneous Mediators. In: 6th International Conference on Information Integration and Web-based Applications & Services, p. 1 (2004)
Knoblock, C.A., Lerman, K., Minton, S., Muslea, I.: Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (1999)
Kushmerick, N., Weld, D.S., Doorembos, R.: Wrapper induction for information extraction. In: Proceedings of the fifteenth International Joint Conference on Artificial Intelligence (1997)
Laender, A.H.F., Ribeiro-Neto, B.A., Soares da Silva, A., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31(2) (2002)
Pan, A., et al.: Semi-Automatic Wrapper Generation for Commercial Web Sources. In: Proceedings of IFIP WG8.1 Working Conference on Engineering Information Systems in the Internet Context (2002)
Pan, A., Raposo, J., Álvarez, M., Montoto, P., Orjales, V., Hidalgo, J., Ardao, L., Molano, A., Viña, A.: The DENODO Data Integration Platform. In: 28th International Conference on Very Large Databases (2002)
Raghavan, S., García-Molina, H.: Crawling the Hidden Web. In: Proceedings of the 27th International Conference on Very Large Databases (2001)
Wiederhold, G.: Mediators in the Architecture of Future Information Systems. IEEE Computer (March 1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hidalgo, J., Losada, J., Álvarez, M., Pan, A. (2006). Optimization of Automatic Navigation to Hidden Web Pages by Ranking-Based Browser Preloading. In: Lee, J., Shim, J., Lee, Sg., Bussler, C., Shim, S. (eds) Data Engineering Issues in E-Commerce and Services. DEECS 2006. Lecture Notes in Computer Science, vol 4055. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780397_4
Download citation
DOI: https://doi.org/10.1007/11780397_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35440-6
Online ISBN: 978-3-540-35441-3
eBook Packages: Computer ScienceComputer Science (R0)