Extracting Logical Schema from the Web

Carchiolo, Vincenza; Longheu, Alessandro; Malgeri, Michele

doi:10.1023/A:1023206322783

Extracting Logical Schema from the Web

Published: May 2003

Volume 18, pages 341–355, (2003)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Vincenza Carchiolo¹,
Alessandro Longheu¹ &
Michele Malgeri¹

64 Accesses
1 Citation
Explore all metrics

Abstract

One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructed at different levels, structuring a single pages or a whole site or group of sites. Here we present an approach to give a logical schema to a web-site, first defining a model for a single page, where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. Then, we introduce a site model in which both physical and logical links among different page sections are represented: physical are existing hyperlinks, while logical links are links between sections containing semantically related information. We show how such links can be found and classified according to their relevance, also showing how schema is used in a structure-aware browser to improve both browsing and searching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

P.M.G. Apers, “Identifying internet-related database reasearch,” in Proc. 2nd Intl. East-West Database Workshop, 1994.
P. Buneman, “Semistructured data,” in Proc.Workshop on Management of Semistructured Data, 1997.
S. Abiteboul, “Querying semi-structured data,” in Proc. ICDT, 1997.
S. Abiteboul et al., Data on the Web, Morgan Kaufmann Publishers, 2000.
S. Nestrorov et al., “Extracting schema from semistructured data,” in Proc. of ACM SIGMOD, 1998.
G. Huck et al., “Jedi: Extracting and synthesizing information form the web,” in Proc. of 3rd IFCIS Intl. CoopIS, 1998.
H.G. Molina et al., “The TSIMMIS project: Integration of heterogeneus information sources,” in Proc. of the Processing Society of Japan, 1997.
A. Longheu, V. Carchiolo, and M. Malgeri, “Structuring the web,” in Proc. of DEXA-Takma, 2000.
B. Adelberg, “NoDoSe: A tool for semi-automatically extracting structured and semistructured data from text documents,” in Proc. of ACM SIGMOD, 1998.
J. Hammer et al., “Extracting semistructured information from the web,” in Proc. Workshop on Management of Semistructured Data, 1997.
D. Smith and M. Lopez, “Information Extraction for semistructured documents,” in Proc. of Workshop on Management of Semistructured Data, 1997.
P. Atzeni et al., “To weave the web,” in Proc. of the 23rd VLDB Conference, 1997.
P. Fernandez et al., “Catching the boat with Strudel: Experiences with a web-site management system.”
P. Fraternali, Autoweb—http://www.elet.polimi.it/users/dei/sections/compeng/piero.fraternali/autoweb/
S. Ceri et al., “Design principles for data-intensive web sites,” in Proc. Of ACM SIGMOD, 1999.
Yahoo!, http://www.yahoo.com17. CNN, http://www.cnn.com
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrievial, ACM Press, 1999.
C. Parisi and A. Longheu, “Ristrutturazione dei siti web: un modello semantico per l’accesso alle informazioni,” Tech Internal Report No. DIIT00/Ah74, 2000.
Y. Maarek et al., “Webcutter: A system for dynamic and tailorable site mapping,” in Proc. Of 6th WWW Conference, 1997.
L.A Zadeh, “Fuzzy sets,” Information and Control, vol. 8, pp. 338–353.
L.A Zadeh, “Fuzzy sets as a basis for the theory of possibility,” Fuzzy sets & Systems, vol. 1, 1978.
Document Object Model, http://www.w3.org/DOM
RDF Recommendation, http://www.w3.org/TR/REC-rdf-syntax
XML Namespaces, http://www.w3.org/TR/REC-xml-names.
XML Schemas, http://www.w3.org/XML/Schema.html

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Facoltà di Ingegneria, Università di Catania, V.le A. Doria, 6-I95125, Catania
Vincenza Carchiolo, Alessandro Longheu & Michele Malgeri

Authors

Vincenza Carchiolo
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Longheu
View author publications
You can also search for this author in PubMed Google Scholar
Michele Malgeri
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carchiolo, V., Longheu, A. & Malgeri, M. Extracting Logical Schema from the Web. Applied Intelligence 18, 341–355 (2003). https://doi.org/10.1023/A:1023206322783

Download citation

Issue Date: May 2003
DOI: https://doi.org/10.1023/A:1023206322783

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting Logical Schema from the Web

Abstract

Access this article

Similar content being viewed by others

Leveraging Semantic Search and LLMs for Domain-Adaptive Information Retrieval

A retrospective of knowledge graphs

Data Catalogs in the Enterprise: Applications and Integration

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Extracting Logical Schema from the Web

Abstract

Access this article

Similar content being viewed by others

Leveraging Semantic Search and LLMs for Domain-Adaptive Information Retrieval

A retrospective of knowledge graphs

Data Catalogs in the Enterprise: Applications and Integration

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation