Abstract
Query interface schema extraction is an important issue for Deep Web data acquisition and integration. In order to obtain the query interface schema, it is firstly required to associate elements and labels of Deep Web query interface correctly. Due to the fact that query interface on HTML page can be parsed as well structured DOM, we proposed an effective algorithm for associating elements and labels of Deep Web query interface based on hierarchical DOM. Our algorithm mainly adopted the nearest-neighbor-distance and other two useful heuristic rules to associate the most related label of a given control element. The experimental results on real query interfaces show that our proposed algorithm is highly effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, W., Meng, X., Meng, W.: A Survey of Deep Web Data Integration. Chinese Journal of Computers 30(9), 1475–1489 (2007)
Chang, K.C., He, B., Li, C., Patel, M., Zhang, Z.: Structured database on the Web: Observations and Implications. SIGMOD Record, 61–70 (2004)
Jayant, M., Jeffery, S.R., Cohen, S., et al.: Webscale Data Integration: You Call Only Afford to Pay as You Go. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, Asilomar, pp. 342–350 (2007)
He, H., Meng, W., Yu, C., Wu, Z.: Constructing Interface Schemas for Search Interfaces of Web Databases. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 29–42. Springer, Heidelberg (2005)
Wu, W.: Integrating Deep Web data sources. University of Illinois at Urbana-Champaign (2006)
Liang, H., Zuo, W., Ren, F.: Attribute extraction of Deep web query interface based on heuristic rule. Computer Research and Development (46), 48–54 (2009)
Wang, H., Yu, J.: Attribute extraction of Deep web interface based on N-Gram. Computer and Modernization 12, 135–138 (2010)
He, H., Meng, W., Yu, C.T., Wu, Z.: WISE—integrator: An automatic integrator of Web search interfaces for e-commerce. In: Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, pp. 357–368 (2003)
Wang, Y., Peng, T., Zuo, W., Zhu, H.: Schema Extraction of Deep Web Query Interface. In: International Conference on Web Information Systems and Mining, Shanghai, pp. 391–395 (2009)
Wu, W., Doan, A., Yu, C.: WebIQ: Learning from the Web to match Deep-Web query interfaces. In: Proceedings of the 22nd IEEE International Conference on Data Engineering, Atlanta, pp. 44–53 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qiang, B., Shi, L., Wu, C., He, Q., Shen, C. (2012). Associating Labels and Elements of Deep Web Query Interface Based on DOM. In: Wang, F.L., Lei, J., Gong, Z., Luo, X. (eds) Web Information Systems and Mining. WISM 2012. Lecture Notes in Computer Science, vol 7529. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33469-6_81
Download citation
DOI: https://doi.org/10.1007/978-3-642-33469-6_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33468-9
Online ISBN: 978-3-642-33469-6
eBook Packages: Computer ScienceComputer Science (R0)