Extraction of Hidden Semantics from Web Pages

Carchiolo, Vincenza; Longheu, Alessandro; Malgeri, Michele

doi:10.1007/3-540-45675-9_20

Vincenza Carchiolo⁷,
Alessandro Longheu⁷ &
Michele Malgeri⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2412))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1765 Accesses

Abstract

One of the main limitation when accessing web is the lack of explicit structure, whose presence may help in understanding data semantics. Here, an approach to extract logical schema from web pages is presented, defining a page model where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. This model aims to take into account both traditional, static HTML pages, as well as dynamic pages content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A survey on semantic schema discovery

Article 27 November 2021

CALVADOS: A Tool for the Semantic Analysis and Digestion of Web Contents

Legalo: Revealing the Semantics of Links

References

Apers, P.M.G.: Identifying internet-related database reasearch, 2^nd Intl. East-West DB Workshop, 1994.
Google Scholar
WWW Consortium — http://www.w3.org
Ceri, S., et al.: Design Principles for Data-intensive Web Sites — Proc. Of ACM SIGMOD, 1999
Google Scholar
Abiteboul, S.: et al., Data on the Web, Morgan Kaufmann, 2000.
Google Scholar
Huck, G., et al., Jedi: extracting and synthesizing information form the web, Proc of 3^rd IFCIS-CoopIS, 1998.
Google Scholar
Adelberg, B.: NoDoSe: A tool for semi-automatically extracting structured and semistructured data from text documents, Proc. of ACM SIGMOD, 1998.
Google Scholar
Hammer, J., et al.: Extracting semistructured information from the web, Workshop on Management of semistr. data, 1997.
Google Scholar
Smith, D., Lopez, M.: Information Extraction for semi-structured documents, Proc. of Workshop on management of Semistructured data, 1997.
Google Scholar
Vijjappu, L., et al., Web structure analysis for information mining
Google Scholar
Longheu, A., Carchiolo, V., Malgeri, M.: Structuring the web, Proc. of DEXA-Takma —London, 2000
Google Scholar
Longheu, A., Carchiolo, V., Malgeri, M.: Extracting logical schema from the web, Applied Intelligence, Special issue on text and web mining, Kluwer Academic.
Google Scholar
Baeza-Yates, R. et al.: Modern Information Retrievial, ACM Press, 1999
Google Scholar
Parisi, C., Longheu, A.: Ristrutturazione dei siti web: un modello semantico per l’accesso alle informazioni, Tech Internal Report No. DIIT00/Ah74, 2000
Google Scholar
Suciu, D.: On database theory and XML, http://www.cs.washington.edu/homes/suciu
Heflin, J.: Towards the semantic web: knowledge representation in a dynamic, distributed environment, Phd thesis, University of Maryland, College Park. 2001 http://www.cs.umd.edu/users/heflin/
Google Scholar
Bry, F., et al.: Towards grouping constructs for semistructured data, technical report PMS-FB-2001-7, Computer Science inst., Munich, Germany
Google Scholar
Heflin, J., et al: Dynamic ontologies on the web, Proc of the Seventeenth National Conference on Artificial Intelligence-AAAI-2000, 2000
Google Scholar
RDF Recommendation — http://www.w3.org/TR/REC-rdf-syntax
Decker, S., et al.: The semantic web — on the respective roles of XML and RDF, IEEE Internet Computing, 2000
Google Scholar
Mani, M., et al.: Semantic data modeling using XML schemas, Proc. 20th Intl Conf. on Conceptual Modeling (ER), 2001.
Google Scholar
Davulcu, H., et al.: A layered architecture for querying dynamic web content, Proc. of ACM Conference on Management of Data (SIGMOD), 1999.
Google Scholar
Lawrence, S.: Context in web search, IEEE Data engineering bulletin, Vol. 23, no. 3, 2000
Google Scholar
Suciu, D. et al.: Focusing search in hierarchical structures with directory sets, http://www.cs.washington.edu/homes/suciu
Fiebig, T. et al.: Evaluating queries on structure with extended access support relations, Proc. of 3rd International Workshop on Web and Databases-WebDB, 2000
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria Informatica. e delle Telecomunicazioni, Facoltà di Ingegneria — Università di Catania, V.le A. Doria 6, Catania, Italy
Vincenza Carchiolo, Alessandro Longheu & Michele Malgeri

Authors

Vincenza Carchiolo
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Longheu
View author publications
You can also search for this author in PubMed Google Scholar
Michele Malgeri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Electronics, UMIST, Manchester, M60 1QD, UK
Hujun Yin , Nigel Allinson & Richard Freeman , &
Department of Computation, UMIST, Manchester, M60 1QD, UK
John Keane
Department of Biomolecular Science, UMIST, Manchester, M60 1QD, UK
Simon Hubbard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carchiolo, V., Longheu, A., Malgeri, M. (2002). Extraction of Hidden Semantics from Web Pages. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_20

Download citation

DOI: https://doi.org/10.1007/3-540-45675-9_20
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44025-3
Online ISBN: 978-3-540-45675-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics