Skip to main content

Extraction of Hidden Semantics from Web Pages

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning — IDEAL 2002 (IDEAL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2412))

  • 1765 Accesses

Abstract

One of the main limitation when accessing web is the lack of explicit structure, whose presence may help in understanding data semantics. Here, an approach to extract logical schema from web pages is presented, defining a page model where its contents is divided into “logical” sections, i.e. parts of a page each collecting related information. This model aims to take into account both traditional, static HTML pages, as well as dynamic pages content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Apers, P.M.G.: Identifying internet-related database reasearch, 2nd Intl. East-West DB Workshop, 1994.

    Google Scholar 

  2. WWW Consortium — http://www.w3.org

  3. Ceri, S., et al.: Design Principles for Data-intensive Web Sites — Proc. Of ACM SIGMOD, 1999

    Google Scholar 

  4. Abiteboul, S.: et al., Data on the Web, Morgan Kaufmann, 2000.

    Google Scholar 

  5. Huck, G., et al., Jedi: extracting and synthesizing information form the web, Proc of 3rd IFCIS-CoopIS, 1998.

    Google Scholar 

  6. Adelberg, B.: NoDoSe: A tool for semi-automatically extracting structured and semistructured data from text documents, Proc. of ACM SIGMOD, 1998.

    Google Scholar 

  7. Hammer, J., et al.: Extracting semistructured information from the web, Workshop on Management of semistr. data, 1997.

    Google Scholar 

  8. Smith, D., Lopez, M.: Information Extraction for semi-structured documents, Proc. of Workshop on management of Semistructured data, 1997.

    Google Scholar 

  9. Vijjappu, L., et al., Web structure analysis for information mining

    Google Scholar 

  10. Longheu, A., Carchiolo, V., Malgeri, M.: Structuring the web, Proc. of DEXA-Takma —London, 2000

    Google Scholar 

  11. Longheu, A., Carchiolo, V., Malgeri, M.: Extracting logical schema from the web, Applied Intelligence, Special issue on text and web mining, Kluwer Academic.

    Google Scholar 

  12. Baeza-Yates, R. et al.: Modern Information Retrievial, ACM Press, 1999

    Google Scholar 

  13. Parisi, C., Longheu, A.: Ristrutturazione dei siti web: un modello semantico per l’accesso alle informazioni, Tech Internal Report No. DIIT00/Ah74, 2000

    Google Scholar 

  14. Suciu, D.: On database theory and XML, http://www.cs.washington.edu/homes/suciu

  15. Heflin, J.: Towards the semantic web: knowledge representation in a dynamic, distributed environment, Phd thesis, University of Maryland, College Park. 2001 http://www.cs.umd.edu/users/heflin/

    Google Scholar 

  16. Bry, F., et al.: Towards grouping constructs for semistructured data, technical report PMS-FB-2001-7, Computer Science inst., Munich, Germany

    Google Scholar 

  17. Heflin, J., et al: Dynamic ontologies on the web, Proc of the Seventeenth National Conference on Artificial Intelligence-AAAI-2000, 2000

    Google Scholar 

  18. RDF Recommendation — http://www.w3.org/TR/REC-rdf-syntax

  19. Decker, S., et al.: The semantic web — on the respective roles of XML and RDF, IEEE Internet Computing, 2000

    Google Scholar 

  20. Mani, M., et al.: Semantic data modeling using XML schemas, Proc. 20th Intl Conf. on Conceptual Modeling (ER), 2001.

    Google Scholar 

  21. Davulcu, H., et al.: A layered architecture for querying dynamic web content, Proc. of ACM Conference on Management of Data (SIGMOD), 1999.

    Google Scholar 

  22. Lawrence, S.: Context in web search, IEEE Data engineering bulletin, Vol. 23, no. 3, 2000

    Google Scholar 

  23. Suciu, D. et al.: Focusing search in hierarchical structures with directory sets, http://www.cs.washington.edu/homes/suciu

  24. Fiebig, T. et al.: Evaluating queries on structure with extended access support relations, Proc. of 3rd International Workshop on Web and Databases-WebDB, 2000

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carchiolo, V., Longheu, A., Malgeri, M. (2002). Extraction of Hidden Semantics from Web Pages. In: Yin, H., Allinson, N., Freeman, R., Keane, J., Hubbard, S. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2002. IDEAL 2002. Lecture Notes in Computer Science, vol 2412. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45675-9_20

Download citation

  • DOI: https://doi.org/10.1007/3-540-45675-9_20

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44025-3

  • Online ISBN: 978-3-540-45675-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics