Skip to main content
Log in

VR-Tree: A novel tree-based approach for modeling Web Query Interfaces

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Web Query Interfaces (WQIs) play a very important role in retrieving Deep Web content. WQIs allow users to query domain-specific databases for obtaining information of interest from diverse domains such as car rentals, hotels, airfare, etc. As the number of WQIs on the web is increasing drastically, some research efforts are focused on building a single (unified) WQI that allows users to query and integrate information available in different web databases related to a specific domain. A very important task in this WQIs’ integration process is the extraction, modeling and understanding of WQIs’ semantic content. However, this task is challenging because of the great heterogeneity in the design of WQIs. This paper presents a novel tree-based approach for the modeling and understanding of WQIs. A tree schema called the Visual Reduced Tree (VR-Tree) is built from the tree produced by a web browser’s render engine, applying a set of well- defined functions and guided by a set of heuristic rules to identify the WQI’s main components and their relationships. The proposed strategy was evaluated by running a collection of experiments over the Tel-8 and ICQ datasets from the UIUC repository. The results show that the automatic modeling of WQIs is possible with a high degree of precision if compared against previous approaches, simplifying the modeling task by only considering visual and spatial properties of WQI components using the VR-Tree schema proposed in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://lobobrowser.org/cobra/getting-started.jsp

References

  • Barbosa, L., & Freire, J. (2007). Combining classifiers to identify online databases. In Proceedings of the 16th international conference on World Wide Web, WWW ’07 (pp. 431–440). New York: ACM. 10.1145/1242572.1242631.

    Chapter  Google Scholar 

  • Boughammoura, R., Hlaoua, L., & Omri, M.N. (2012). Viqi: A new approach for visual interpretation of deep web query interfaces abs/1205.0917. http://dblp.uni-trier.de/db/journals/corr/corr1205.html#abs-1205-0917.

  • Chang, K.C.C., He, B., Li, C., & Zhang, Z. (2003). The UIUC Web Integration Repository. Computer Science Department, University of Illinois at Urbana-Champaign. URL: http://metaquerier.cs.uiuc.edu/repository. Online: accessed 07-December-2013.

  • Dragut, E.C., Kabisch, T., Yu, C., & Leser, U. (2009). A hierarchical approach to model web query interfaces for web source integration. Proceedings of the VLDB Endowment, 2(1), 325–336. doi:10.14778/1687627.1687665.

    Article  Google Scholar 

  • Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., & Schallhart, C. (2012). Opal: automated form understanding for the deep web. In WWW (pp. 829–838).

  • Google (2013). The chromium projects: Blink. URL: http://www.chromium.org/blink/. Online: accessed 19-July-2013.

  • He, H., Meng, W., Yu, C., & Wu, Z. (2004). Automatic integration of web search interfaces with wise-integrator. The VLDB Journal, 13(3), 256–273. doi:10.1007/s00778-004-0126-4.

    Article  Google Scholar 

  • He, H., Meng, W., Yu, C.T., & Wu, Z. (2005). Constructing interface schemas for search interfaces of web databases. In WISE (pp. 29–42).

  • Kaljuvee, O., Buyukkokten, O., Garcia-Molina, H., & Paepcke, A. (2001). Efficient web form entry on pdas. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01 (pp. 663–672). New York: ACM. doi:10.1145/371920.372180.

    Google Scholar 

  • Khare, R., & An, Y. (2009). An empirical study on using hidden markov model for search interface segmentation. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09 (pp. 17–26). New York: ACM. doi:10.1145/1645953.1645959.

    Chapter  Google Scholar 

  • Kushmerick, N. (2003). Learning to invoke web forms. In Proceedings of the 15th International Conference on Ontologies, Databases, and Applications of Semantics (pp. 997–1013). Springer-Verlag.

  • Marin Castro, H.M., Sosa Sosa, V.J., Martinez Trinidad, J.F., & Lopez-Arevalo, I. (2013). Automatic discovery of web query interfaces using machine learning techniques. Journal of Intelligent Information System, 40(1), 85–108.

    Article  Google Scholar 

  • Melto, D. (2003). The webkit open source project. URL: http://www.webkit.org/.Online:accessed19-July-2013.

  • Mozilla (2003). Project gecko. Mozilla Organization. URL: https://developer.mozilla.org/en-US/docs/Mozilla/Gecko. Online: accessed 19-July-2013.

  • Nguyen, H., Nguyen, T., & Freire, J. (2008). Learning to extract form labels. Proceedings of the VLDB Endowment, 1(1), 684–694. http://dl.acm.org/citation.cfm?id=1453856.1453931.

    Article  Google Scholar 

  • Opera (2003). Project presto. Opera Software ASA. URL: http://dev.opera.com/. Online: accessed 19-July-2013.

  • Raghavan, S., & Garcia-Molina, H. (2001). Crawling the hidden web, In: Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01 (pp. 129–138). San Francisco: Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=645927.672025.

    Google Scholar 

  • Wu, W., Doan, A., Yu, C.T., & Meng, W. (2009). Modeling and extracting deep-web query interfaces, Advances in Information and Intelligent Systems (pp. 65–90).

  • Wu, W., Yu, C., Doan, A., & Meng, W. (2004). An interactive clustering-based approach to integrating source query interfaces on the deep Web, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, SIGMOD ’04 (pp. 95–106). New York: ACM. doi:10.1145/1007568.1007582.

    Chapter  Google Scholar 

  • Zakas, N.C. (2010). High Performance JavaScript. O’ Reilly Media, United States of America.

  • Zhang, Z., He, B., & Chang, K.C.C. (2004). Understanding web query interfaces: Best-effort parsing with hidden syntax, In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04 (pp. 107–118). New York: ACM. doi:10.1145/1007568.1007583.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heidy M. Marin-Castro.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marin-Castro, H.M., Sosa Sosa, V.J. VR-Tree: A novel tree-based approach for modeling Web Query Interfaces. J Intell Inf Syst 49, 367–390 (2017). https://doi.org/10.1007/s10844-017-0449-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0449-4

Keywords

Navigation