Integrating Semi-structured Data into Business Applications: A Web Intelligence Example

Baumgartner, Robert; Frölich, Oliver; Gottlob, Georg; Herzog, Marcus; Lehmann, Peter

doi:10.1007/11590019_54

Robert Baumgartner²³,
Oliver Frölich²³,
Georg Gottlob²³,
Marcus Herzog²³ &
…
Peter Lehmann²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3782))

Included in the following conference series:

Biennial Conference on Professional Knowledge Management/Wissensmanagement

2294 Accesses
3 Citations

Abstract

The World Wide Web, representing a universe of knowledge, provides public domain information about market developments and competitor activities on the market. This information is becoming more and more a critical success factor for enterprises and can be retrieved for example from Web sites or online shops. The extraction from these semi-structured information sources is mostly done manually and is very time consuming. Therefore, powerful and user-friendly tools for extracting and integrating information from various different Web sources, or in general, various heterogeneous semi-structured data sources are needed. In this paper we describe a solution how data from public information sources, in particular from the World Wide Web, can be retrieved and normalized to structured data formats automatically. We also illustrate how this data can be automatically integrated afterwards in – often complex – Web Intelligence applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tiemeyer, E., Zsifkovitis, H.E.: Information als Führungsmittel: Executive Information Systems, Konzeption, Technologie, Produkte, Einführung, 1st edn., p. 95. Munich (1995)
Google Scholar
Kahaner, L.: Competitive Intelligence: How to Gather, Analyse Information to Move your Business to the Top. Touchstone, New York (1998)
Google Scholar
Society of Competitive Intelligence Professionals (SCIP): What is CI?, http://www.scip.org/ci/index.asp (accessed on September 28, 2004)
Raghavan, P.: Social Networks on the Web and in the Enterprise. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds.) Proceedings of the Web Intelligence: Research and Development, First Asia-Pacific Conference, WI 2001, pp. 58–60. Springer, Berlin (2001)
Google Scholar
Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for Web Information Extraction. In: Proc. of PODS (2002), pp. 17–28 (2002); Full version: Journal of the ACM 51(1), 74–113 (2004)
Google Scholar
Gottlob, G., Herzog, M.: Infopipes: A Flexible Framework for M-Commerce Applications. In: Proc. of TES workshop at VLDB, pp. 175–186 (2001)
Google Scholar
Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: Proc. of VLDB, pp. 119–128 (2001)
Google Scholar
Pirelli&, C.: SpA: Annual Report (2003), http://www.pirelli.com//investor_relation/bilanciocompl2003.pdf , p. 7 (accessed on September 28 2004)
Himmeröder, R., Lausen, G., Ludäscher, B., May, W.: A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web. In: WWWCM. LNCS, vol. 1727, pp. 307–320. Springer, Heidelberg (1999)
Google Scholar
Cabeza, D., Hermenegildo, M.: Distributed WWW programming using Ciao-Prolog and the PiLLoW library. TPLP 1(3) (2001)
Google Scholar
Aberer, K., Fankhauser, P., Huck, G., Neuhold, E.: JEDI: Extracting and Synthesizing Information from the Web. In: Proc. of COOPIS, pp. 32–43 (1998)
Google Scholar
Atzeni, P., Mecca, G.: Cut and paste. In: Proc. of PODS (1997)
Google Scholar
Aranha, R., Cho, J., Crespo, A., Hammer, J., Garcia-Molina, H.: Extracting semistructured information from the web. In: Proc. Workshop on Mang. of Semistructured Data (1997)
Google Scholar
Knoblock, C., Minton, S., Muslea, I.: A hierarchical approach to wrapper induction. In: Proc. of 3rd Intern. Conf. on Autonomous Agents (1999)
Google Scholar
Doorenbos, R., Kushmerick, N., Weld, D.: Wrapper induction for information extraction. In: Proc. of IJCAI (1997)
Google Scholar
Davulcu, H., Kifer, M., Ramakrishnan, I., Yang, G.: Computat. aspects of resilient data extract. from semistr. sources. In Proc. of PODS (2000)
Google Scholar
Kushmerick, N.: Wrapper verification. World Wide Web Journal (2000)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo. P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)
Google Scholar
Cafarella, M., Downey, D., Etzioni, O., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll (Preliminary Results). In: Proceedings of the World Wide Web Conference (2004)
Google Scholar
Azavant, F., Sahuguet, A.: Building light-weight wrappers for legacy Web data-sources using W4F. In: Proc. of VLDB, pp. 738–741 (1999)
Google Scholar
Han, W., Liu, L., Pu, C.: XWrap: An extensible wrapper construction system for internet information. In: Proc. of ICDE (2000)
Google Scholar
Li, F., Liu, Z., Ng, W.K.: Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Proc. of the 21st International Conference on Conceptual Modelling, pp. 120–134 (2002)
Google Scholar
Kou, H., Li, C., Meng, X., Wang, H.: A schema-guided toolkit for generating wrappers. In: Proc. of WEBSA 2003 (2003)
Google Scholar
Alvarez, M., Hidalgo, J., Pan, A., Raposo, J., Vina, A.: The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes. In: Proceedings of DEXA 2002, Aix-en-Provence, France (2002)
Google Scholar
Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. Sigmod Record 31(2), 84–93 (2002)
Article Google Scholar
Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet White Paper, http://www.brightplanet.com/technology/deepweb.asp (accessed on January 1, 2005)

Download references

Author information

Authors and Affiliations

DBAI, Institute for Information Systems, Vienna Technical University, Favoritenstr. 9, A-1040, Vienna, Austria
Robert Baumgartner, Oliver Frölich, Georg Gottlob & Marcus Herzog
Department of Information and Communication, Hochschule der Medien, Fachhochschule Stuttgart, Wolframstr. 32, D-70191, Stuttgart, Germany
Peter Lehmann

Authors

Robert Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Frölich
View author publications
You can also search for this author in PubMed Google Scholar
Georg Gottlob
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Herzog
View author publications
You can also search for this author in PubMed Google Scholar
Peter Lehmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, University of Hildesheim, Marienburger Platz 22, 31141, Hildesheim, Germany
Klaus-Dieter Althoff
Knowledge Management Department, German Research Center for Artificial Intelligence (DFKI) GmbH, Kaiserslautern, Germany
Andreas Dengel
Department of Business Information Systems II, University of Trier, Trier, Germany
Ralph Bergmann
Department for Experience Management, Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern
Markus Nick
Knowledge Management Department, German Research Center for Artificial Intelligence (DFKI) GmbH, Trippstadter Straße 122, 67663, Kaiserslautern, Germany
Thomas Roth-Berghofer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baumgartner, R., Frölich, O., Gottlob, G., Herzog, M., Lehmann, P. (2005). Integrating Semi-structured Data into Business Applications: A Web Intelligence Example. In: Althoff, KD., Dengel, A., Bergmann, R., Nick, M., Roth-Berghofer, T. (eds) Professional Knowledge Management. WM 2005. Lecture Notes in Computer Science(), vol 3782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590019_54

Download citation

DOI: https://doi.org/10.1007/11590019_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30465-4
Online ISBN: 978-3-540-31620-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics