Abstract
One of the main limitation when accessing web is the lack of explicit structure, whose presence may help in understanding data semantics. Here, an approach to extract logical schema from web pages is presented, defining a page model where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. This model aims to take into account both traditional, static HTML pages, as well as dynamic pages content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apers, P.M.G.: Identifying internet-related database reasearch, 2nd Intl. East-West DB Workshop, 1994.
WWW Consortium — http://www.w3.org
Ceri, S., et al.: Design Principles for Data-intensive Web Sites — Proc. Of ACM SIGMOD, 1999
Abiteboul, S.: et al., Data on the Web, Morgan Kaufmann, 2000.
Huck, G., et al., Jedi: extracting and synthesizing information form the web, Proc of 3rd IFCIS-CoopIS, 1998.
Adelberg, B.: NoDoSe: A tool for semi-automatically extracting structured and semistructured data from text documents, Proc. of ACM SIGMOD, 1998.
Hammer, J., et al.: Extracting semistructured information from the web, Workshop on Management of semistr. data, 1997.
Smith, D., Lopez, M.: Information Extraction for semi-structured documents, Proc. of Workshop on management of Semistructured data, 1997.
Vijjappu, L., et al., Web structure analysis for information mining
Longheu, A., Carchiolo, V., Malgeri, M.: Structuring the web, Proc. of DEXA-Takma —London, 2000
Longheu, A., Carchiolo, V., Malgeri, M.: Extracting logical schema from the web, Applied Intelligence, Special issue on text and web mining, Kluwer Academic.
Baeza-Yates, R. et al.: Modern Information Retrievial, ACM Press, 1999
Parisi, C., Longheu, A.: Ristrutturazione dei siti web: un modello semantico per l’accesso alle informazioni, Tech Internal Report No. DIIT00/Ah74, 2000
Suciu, D.: On database theory and XML, http://www.cs.washington.edu/homes/suciu
Heflin, J.: Towards the semantic web: knowledge representation in a dynamic, distributed environment, Phd thesis, University of Maryland, College Park. 2001 http://www.cs.umd.edu/users/heflin/
Bry, F., et al.: Towards grouping constructs for semistructured data, technical report PMS-FB-2001-7, Computer Science inst., Munich, Germany
Heflin, J., et al: Dynamic ontologies on the web, Proc of the Seventeenth National Conference on Artificial Intelligence-AAAI-2000, 2000
RDF Recommendation — http://www.w3.org/TR/REC-rdf-syntax
Decker, S., et al.: The semantic web — on the respective roles of XML and RDF, IEEE Internet Computing, 2000
Mani, M., et al.: Semantic data modeling using XML schemas, Proc. 20th Intl Conf. on Conceptual Modeling (ER), 2001.
Davulcu, H., et al.: A layered architecture for querying dynamic web content, Proc. of ACM Conference on Management of Data (SIGMOD), 1999.
Lawrence, S.: Context in web search, IEEE Data engineering bulletin, Vol. 23, no. 3, 2000
Suciu, D. et al.: Focusing search in hierarchical structures with directory sets, http://www.cs.washington.edu/homes/suciu
Fiebig, T. et al.: Evaluating queries on structure with extended access support relations, Proc. of 3rd International Workshop on Web and Databases-WebDB, 2000
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carchiolo, V., Longheu, A., Malgeri, M. (2002). Extraction of Hidden Semantics from Web Pages. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_20
Download citation
DOI: https://doi.org/10.1007/3-540-45675-9_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44025-3
Online ISBN: 978-3-540-45675-9
eBook Packages: Springer Book Archive