Abstract
Most documents available over the web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing graph-based or tree-based data models for the web only provide a very low level representation of such hierarchical structure. In this paper, we introduce a conceptual model for the web that is able to represent the complex hierarchical structure within the web documents at a high level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, we can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T. Bray, J. Paoli, and C.M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation. See http://www.w3c.org/TR/1999/REC-xml-19980210, February 1998.
P. Buneman, S. Davidson, G. Hilebrand, and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In Proceedings of the ACM SIG-MOD International Conference on Management of Data, pages 505–516, 1996.
J. Clark and S. DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation. See http://www.w3c.org/TR/1999/REC-xpath-19991116, November 1999.
M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A Query Language for a Web-Site Management System. SIGMOD Record, pages 4–11, 1997.
M. Fernandez, D. Florescu, A. Levy, and D. Suciu. Reasoning About Web-Site Structure. In Proceedings of AAAI’98 Workshop on AI and Information Integration, 1998.
D. Florescu, A. Levy, and A. Mendelzon. Database Techniques for the World-Wide Web: A Survey. SIGMOD Record, 27(3):59–74, 1998.
J. Hammer, H. Garcia-Molina, J. Cho, A. Crespo, and R. Aranha. Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data, 1997.
C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, P. J. Modi, I. Muslea, A. G. Philpot, and S. Tejada. Modeling Web Sources for Information Integration. In Proceedings of the 15th National Conference on AI, 1998.
M. Liu and T. W. Ling. A Data Model for Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Conference on Ad-vances in Database Technology (EDBT 2000), pages 317–331, Konstanz, Germany, March 27-31 2000. Springer-Verlag LNCS 1777.
M. Liu, T. W. Ling, and T. Guan. Integration of Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Database Engineering and Application Symposium (IDEAS’ 99), pages 44–52, Montreal, Canada, August 2-4 1999. IEEE-CS Press.
I. Muslea, S. Minton, and C. A. Knoblock. Hierarchical Wrapper Induction for Semistructured Information Sources. To appear in Journal of Autonomous Agents and Multi-Agent Systems.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object Exchange across Heterogeneous Information. In Proceedings of the International Conference on Data Engineering, pages 251–260. IEEE Computer Society, 1995.
D. Raggett, A. L. Hors, and I. Jacobs. HTML 4.01 Specification. W3C Recommendation. See http://www.w3c.org/TR/html401, December 1999.
L. Wood, A. L. Hors, et al. Document Object Model (DOM) Level 2 Specification. W3C Recommendation. See http://www.w3c.org/TR/2000/CR-DOM-Level-2-20000307, March 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, M., Wang Ling, T. (2000). A Conceptual Model for the Web. In: Laender, A.H.F., Liddle, S.W., Storey, V.C. (eds) Conceptual Modeling — ER 2000. ER 2000. Lecture Notes in Computer Science, vol 1920. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45393-8_17
Download citation
DOI: https://doi.org/10.1007/3-540-45393-8_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41072-0
Online ISBN: 978-3-540-45393-2
eBook Packages: Springer Book Archive