Abstract
Current wrapper approaches break down in extracting data from differently structured and frequently changing Web pages. To tackle this challenge, this paper defines domain-specific ontology, captures the semantic hierarchy in Web pages automatically by exploiting both structural information and common formatting information, and recognizes and extracts data by using ontology-based semantic matching without relying on page-specific formatting. It is adaptive to differently structured and frequently changing Web pages for a domain of interest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proc. VLDB, pp. 337–348 (2003)
Embley, D.W., Tao, C., Liddle, S.W.: Automatically Extracting Ontologically Specified Data from HTML Table of Unknown Structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)
Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper Induction for Information Extraction. In: IJCAI, pp. 729–737 (1997)
Li, S., Liu, M., Ling, T.W., Peng, Z.: Automatic HTML to XML Conversion. In: Proc. WAIM, pp. 714–719 (2004)
Meng, X., Lu, H., Wang, H., Gu, M.: Data Extraction from the Web Based on Pre-Defined Schema. J. Comput. Sci. Technol. 17(4), 377–388 (2002)
Peng, Z., Li, Q., Feng, L., Li, X., Liu, J.: Using Object Deputy Model to Prepare Data for Data Warehousing. IEEE Transaction on Knowledge and Data Engneering 17(9) (2005)
Potok, T.E., Elmore, M.T., Reed, J.W., Samatova, N.F.: An Ontology- Based HTML to XML Conversion Using Intelligent Agents. In: Loyd, J., et al. (eds.) HICSS, pp. 120–129 (2002)
Sahuguet, A., Azavant, F.: Building Intelligent Web Applications Using Lightweight Wrappers. Data and Knowledge Engineering 36(3), 283–316 (2001)
Wang, G., Sun, B., Lv, J.-H., Yu, G.: RPE Query Processing and Optimization Techniques for XML Databases. J. Comput. Sci. Technol. 19(2), 224–237 (2004)
Wu, W., Yu, C.T., Doan, A., Meng, W.: An Interactive Clustering-baded Approach to Integrating Source Query interfaces on the Deep Web. In: Proc. SIGMOD, pp. 95–106 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, S., Ou, W., Yu, J. (2005). Ontology-Based HTML to XML Conversion. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_98
Download citation
DOI: https://doi.org/10.1007/11563952_98
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)