Skip to main content

Integrating Semi-structured Data into Business Applications: A Web Intelligence Example

  • Conference paper
Professional Knowledge Management (WM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3782))

Abstract

The World Wide Web, representing a universe of knowledge, provides public domain information about market developments and competitor activities on the market. This information is becoming more and more a critical success factor for enterprises and can be retrieved for example from Web sites or online shops. The extraction from these semi-structured information sources is mostly done manually and is very time consuming. Therefore, powerful and user-friendly tools for extracting and integrating information from various different Web sources, or in general, various heterogeneous semi-structured data sources are needed. In this paper we describe a solution how data from public information sources, in particular from the World Wide Web, can be retrieved and normalized to structured data formats automatically. We also illustrate how this data can be automatically integrated afterwards in – often complex – Web Intelligence applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tiemeyer, E., Zsifkovitis, H.E.: Information als Führungsmittel: Executive Information Systems, Konzeption, Technologie, Produkte, Einführung, 1st edn., p. 95. Munich (1995)

    Google Scholar 

  2. Kahaner, L.: Competitive Intelligence: How to Gather, Analyse Information to Move your Business to the Top. Touchstone, New York (1998)

    Google Scholar 

  3. Society of Competitive Intelligence Professionals (SCIP): What is CI?, http://www.scip.org/ci/index.asp (accessed on September 28, 2004)

  4. Raghavan, P.: Social Networks on the Web and in the Enterprise. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds.) Proceedings of the Web Intelligence: Research and Development, First Asia-Pacific Conference, WI 2001, pp. 58–60. Springer, Berlin (2001)

    Google Scholar 

  5. Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for Web Information Extraction. In: Proc. of PODS (2002), pp. 17–28 (2002); Full version: Journal of the ACM 51(1), 74–113 (2004)

    Google Scholar 

  6. Gottlob, G., Herzog, M.: Infopipes: A Flexible Framework for M-Commerce Applications. In: Proc. of TES workshop at VLDB, pp. 175–186 (2001)

    Google Scholar 

  7. Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: Proc. of VLDB, pp. 119–128 (2001)

    Google Scholar 

  8. Pirelli&, C.: SpA: Annual Report (2003), http://www.pirelli.com//investor_relation/bilanciocompl2003.pdf , p. 7 (accessed on September 28 2004)

  9. Himmeröder, R., Lausen, G., Ludäscher, B., May, W.: A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web. In: WWWCM. LNCS, vol. 1727, pp. 307–320. Springer, Heidelberg (1999)

    Google Scholar 

  10. Cabeza, D., Hermenegildo, M.: Distributed WWW programming using Ciao-Prolog and the PiLLoW library. TPLP 1(3) (2001)

    Google Scholar 

  11. Aberer, K., Fankhauser, P., Huck, G., Neuhold, E.: JEDI: Extracting and Synthesizing Information from the Web. In: Proc. of COOPIS, pp. 32–43 (1998)

    Google Scholar 

  12. Atzeni, P., Mecca, G.: Cut and paste. In: Proc. of PODS (1997)

    Google Scholar 

  13. Aranha, R., Cho, J., Crespo, A., Hammer, J., Garcia-Molina, H.: Extracting semistructured information from the web. In: Proc. Workshop on Mang. of Semistructured Data (1997)

    Google Scholar 

  14. Knoblock, C., Minton, S., Muslea, I.: A hierarchical approach to wrapper induction. In: Proc. of 3rd Intern. Conf. on Autonomous Agents (1999)

    Google Scholar 

  15. Doorenbos, R., Kushmerick, N., Weld, D.: Wrapper induction for information extraction. In: Proc. of IJCAI (1997)

    Google Scholar 

  16. Davulcu, H., Kifer, M., Ramakrishnan, I., Yang, G.: Computat. aspects of resilient data extract. from semistr. sources. In Proc. of PODS (2000)

    Google Scholar 

  17. Kushmerick, N.: Wrapper verification. World Wide Web Journal (2000)

    Google Scholar 

  18. Crescenzi, V., Mecca, G., Merialdo. P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)

    Google Scholar 

  19. Cafarella, M., Downey, D., Etzioni, O., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll (Preliminary Results). In: Proceedings of the World Wide Web Conference (2004)

    Google Scholar 

  20. Azavant, F., Sahuguet, A.: Building light-weight wrappers for legacy Web data-sources using W4F. In: Proc. of VLDB, pp. 738–741 (1999)

    Google Scholar 

  21. Han, W., Liu, L., Pu, C.: XWrap: An extensible wrapper construction system for internet information. In: Proc. of ICDE (2000)

    Google Scholar 

  22. Li, F., Liu, Z., Ng, W.K.: Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Proc. of the 21st International Conference on Conceptual Modelling, pp. 120–134 (2002)

    Google Scholar 

  23. Kou, H., Li, C., Meng, X., Wang, H.: A schema-guided toolkit for generating wrappers. In: Proc. of WEBSA 2003 (2003)

    Google Scholar 

  24. Alvarez, M., Hidalgo, J., Pan, A., Raposo, J., Vina, A.: The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes. In: Proceedings of DEXA 2002, Aix-en-Provence, France (2002)

    Google Scholar 

  25. Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. Sigmod Record 31(2), 84–93 (2002)

    Article  Google Scholar 

  26. Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet White Paper, http://www.brightplanet.com/technology/deepweb.asp (accessed on January 1, 2005)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baumgartner, R., Frölich, O., Gottlob, G., Herzog, M., Lehmann, P. (2005). Integrating Semi-structured Data into Business Applications: A Web Intelligence Example. In: Althoff, KD., Dengel, A., Bergmann, R., Nick, M., Roth-Berghofer, T. (eds) Professional Knowledge Management. WM 2005. Lecture Notes in Computer Science(), vol 3782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590019_54

Download citation

  • DOI: https://doi.org/10.1007/11590019_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30465-4

  • Online ISBN: 978-3-540-31620-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics