Skip to main content

Wiccap Data Model: Mapping Physical Websites to Logical Views

  • Conference paper
  • First Online:
Conceptual Modeling — ER 2002 (ER 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2503))

Included in the following conference series:

Abstract

Information sources over the WWW contain a large amount of data organized according to different interests and values. Thus, it is important that facilities are there to enable users to extract information of interests in a simple and effective manner. To do this, information from the Web sources need to be extracted automatically according to users’ interests. However, the extraction of information requires in-depth knowledge of relevant technologies and the extraction process is slow, tedious and difficult for ordinary users. We propose the Wiccap Data Model, an XML data model that maps Web information sources into commonly perceived logical models. Based on this data model, ordinary users are able to extract information easily and efficiently. To accelerate the creation of data models, we also define a formal process for creating such data model and have implemented a software tool to facilitate and automate the process of producing Wiccap Data Models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brad Adelberg. NoDoSE-A Tool for Semi-Automatically Extracting Semi-Structured Data from Text Documents. In ACM SIGMOD International Conference on Management of Data, pages 283–294, Seattle, Washington, June 1998.

    Google Scholar 

  2. Paolo Atzeni, Giansalvatore Mecca, and Paolo Merialdo. To Weave the Web. In Proceedings of 23rd International Conference on Very Large Data Bases (VLDB 97), pages 206–215, Athens, Greece, August 25–29 1997. Morgan Kaufmann.

    Google Scholar 

  3. Robert Baumgartner, Sergio Flesca, and Georg Gottlob. Visual Web Information Extraction with Lixto. In Proceedings of 27th International Conference on Very Large Data Bases (VLDB 2001), pages 119–128, Roma, Italy, September 11–14 2001. Morgan Kaufmann.

    Google Scholar 

  4. Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In Proceedings of 27th International Conference on Very Large Data Bases (VLDB 2001), pages 109–118, Roma, Italy, September 11–14 2001. Morgan Kaufmann.

    Google Scholar 

  5. David W. Embley, Yiu-Kai Ng, and Li Xu. Recognizing Ontology-Applicable Multiple-Record Web Documents. In Proceedings of 20th International Conference on Conceptual Modeling (ER 2001), Lecture Notes in Computer Science, pages 555–570, Yokohama, Japan, November 27–30 2001. Springer.

    Chapter  Google Scholar 

  6. Joachim Hammer, Héctor García-Molina, Junghoo Cho, Arturo Crespo, and Rohan Aranha. Extracting Semistructured Information from the Web. In Proceedings of the Workshop on Management of Semistructured Data, Tucson, Arizona, May 1997.

    Google Scholar 

  7. Gerald Huck, Peter Fankhauser, Karl Aberer, and Erich J. Neuhold. Jedi: Extracting and Synthesizing Information from the Web. In Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems (CoopIS 98), pages 32–43, New York City, New York, USA, August 20–22 1998. IEEE-CS Press.

    Google Scholar 

  8. Craig A. Knoblock, Steven Minton, Jose Luis Ambite, Pragnesh Jay Modi Naveen Ashish, Ion Muslea, Andrew G. Philpot, and Sheila Tejada. Modeling Web Sources for Information Integration. In Proceedings of Fifteenth National Conference on Artificial Intelligence (AAAI-98), pages 211–218, Madison, Wisconsin, July 1998.

    Google Scholar 

  9. Nicholas Kushmerick. Wrapper induction: Efficiency and expressiveness. In AAAI-98 Workshop on AI and Information Integration, pages 15–68, Madison, Wisconsin, July 1998.

    Google Scholar 

  10. Feifei Li. Network Extraction Agent for WICCAP System. Technical report, Nanyang Technological University, November 2001.

    Google Scholar 

  11. Feifei Li, Zehua Liu, Yangfeng Huang, and Wee Keong Ng. An Information Concierge for the Web. In Proceedings of the First International Workshop on Internet Bots: Systems and Applications (INBOSA2001), in conjunction with the 12th International Conference on Database and Expert System Applications (DEXA’2001), pages 672–676, Munich, Germany, September 3–8 2001.

    Google Scholar 

  12. Ling Liu, Calton Pu, and Wei Han. XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. In Proceedings of 16th International Conference on Data Engineering (ICDE 2000), pages 611–621, San Diego, California, USA, 28 February–3 March 2000. IEEE Computer Society.

    Google Scholar 

  13. Murali Mani, Dongwon Lee, and Richard R. Muntz. Semantic Data Modeling Using XML Schemas. In Proceedings of 20th International Conference on Conceptual Modeling (ER 2001), Lecture Notes in Computer Science, pages 149–163, Yokohama, Japan, November 27–30 2001. Springer.

    Chapter  Google Scholar 

  14. Giansalvatore Mecca and Paolo Atzeni. Cut and Paste. Journal of Computer and System Sciences, 58(3):453–482, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  15. Ion Muslea, Steve Minton, and Craig Knoblock. A Hierarchical Approach to Wrapper Induction. In Proceedings of the Third International Conference on Autonomous Agents (Agents’99), pages 190–197, Seattle, WA, USA, 1999. ACM Press.

    Google Scholar 

  16. BBC Online News. http://news.bbc.co.uk/.

  17. Arnaud Sahuguet and Fabien Azavant. Wysiwyg web wrapper factory (w4f). In Proceedings of World Wide Web Conference, Orlando, October 1999.

    Google Scholar 

  18. Bernhard Thalheim and Antje Dústerhóft. SiteLang: Conceptual Modeling of Internet Sites. In Proceedings of 20th International Conference on Conceptual Modeling (ER 2001), Lecture Notes in Computer Science, pages 179–192, Yokohama, Japan, November 27–30 2001. Springer.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, Z., Li, F., Ng, W.K. (2002). Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds) Conceptual Modeling — ER 2002. ER 2002. Lecture Notes in Computer Science, vol 2503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45816-6_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-45816-6_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44277-6

  • Online ISBN: 978-3-540-45816-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics