Skip to main content
Log in

Extracting Logical Schema from the Web

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructed at different levels, structuring a single pages or a whole site or group of sites. Here we present an approach to give a logical schema to a web-site, first defining a model for a single page, where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. Then, we introduce a site model in which both physical and logical links among different page sections are represented: physical are existing hyperlinks, while logical links are links between sections containing semantically related information. We show how such links can be found and classified according to their relevance, also showing how schema is used in a structure-aware browser to improve both browsing and searching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P.M.G. Apers, “Identifying internet-related database reasearch,” in Proc. 2nd Intl. East-West Database Workshop, 1994.

  2. P. Buneman, “Semistructured data,” in Proc.Workshop on Management of Semistructured Data, 1997.

  3. S. Abiteboul, “Querying semi-structured data,” in Proc. ICDT, 1997.

  4. S. Abiteboul et al., Data on the Web, Morgan Kaufmann Publishers, 2000.

  5. S. Nestrorov et al., “Extracting schema from semistructured data,” in Proc. of ACM SIGMOD, 1998.

  6. G. Huck et al., “Jedi: Extracting and synthesizing information form the web,” in Proc. of 3rd IFCIS Intl. CoopIS, 1998.

  7. H.G. Molina et al., “The TSIMMIS project: Integration of heterogeneus information sources,” in Proc. of the Processing Society of Japan, 1997.

  8. A. Longheu, V. Carchiolo, and M. Malgeri, “Structuring the web,” in Proc. of DEXA-Takma, 2000.

  9. B. Adelberg, “NoDoSe: A tool for semi-automatically extracting structured and semistructured data from text documents,” in Proc. of ACM SIGMOD, 1998.

  10. J. Hammer et al., “Extracting semistructured information from the web,” in Proc. Workshop on Management of Semistructured Data, 1997.

  11. D. Smith and M. Lopez, “Information Extraction for semistructured documents,” in Proc. of Workshop on Management of Semistructured Data, 1997.

  12. P. Atzeni et al., “To weave the web,” in Proc. of the 23rd VLDB Conference, 1997.

  13. P. Fernandez et al., “Catching the boat with Strudel: Experiences with a web-site management system.”

  14. P. Fraternali, Autoweb—http://www.elet.polimi.it/users/dei/sections/compeng/piero.fraternali/autoweb/

  15. S. Ceri et al., “Design principles for data-intensive web sites,” in Proc. Of ACM SIGMOD, 1999.

  16. Yahoo!, http://www.yahoo.com17. CNN, http://www.cnn.com

  17. R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrievial, ACM Press, 1999.

  18. C. Parisi and A. Longheu, “Ristrutturazione dei siti web: un modello semantico per l’accesso alle informazioni,” Tech Internal Report No. DIIT00/Ah74, 2000.

  19. Y. Maarek et al., “Webcutter: A system for dynamic and tailorable site mapping,” in Proc. Of 6th WWW Conference, 1997.

  20. L.A Zadeh, “Fuzzy sets,” Information and Control, vol. 8, pp. 338–353.

  21. L.A Zadeh, “Fuzzy sets as a basis for the theory of possibility,” Fuzzy sets & Systems, vol. 1, 1978.

  22. Document Object Model, http://www.w3.org/DOM

  23. RDF Recommendation, http://www.w3.org/TR/REC-rdf-syntax

  24. XML Namespaces, http://www.w3.org/TR/REC-xml-names.

  25. XML Schemas, http://www.w3.org/XML/Schema.html

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carchiolo, V., Longheu, A. & Malgeri, M. Extracting Logical Schema from the Web. Applied Intelligence 18, 341–355 (2003). https://doi.org/10.1023/A:1023206322783

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023206322783

Navigation