Abstract
The World Wide Web, representing a universe of knowledge, provides public domain information about market developments and competitor activities on the market. This information is becoming more and more a critical success factor for enterprises and can be retrieved for example from Web sites or online shops. The extraction from these semi-structured information sources is mostly done manually and is very time consuming. Therefore, powerful and user-friendly tools for extracting and integrating information from various different Web sources, or in general, various heterogeneous semi-structured data sources are needed. In this paper we describe a solution how data from public information sources, in particular from the World Wide Web, can be retrieved and normalized to structured data formats automatically. We also illustrate how this data can be automatically integrated afterwards in – often complex – Web Intelligence applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tiemeyer, E., Zsifkovitis, H.E.: Information als Führungsmittel: Executive Information Systems, Konzeption, Technologie, Produkte, Einführung, 1st edn., p. 95. Munich (1995)
Kahaner, L.: Competitive Intelligence: How to Gather, Analyse Information to Move your Business to the Top. Touchstone, New York (1998)
Society of Competitive Intelligence Professionals (SCIP): What is CI?, http://www.scip.org/ci/index.asp (accessed on September 28, 2004)
Raghavan, P.: Social Networks on the Web and in the Enterprise. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds.) Proceedings of the Web Intelligence: Research and Development, First Asia-Pacific Conference, WI 2001, pp. 58–60. Springer, Berlin (2001)
Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for Web Information Extraction. In: Proc. of PODS (2002), pp. 17–28 (2002); Full version: Journal of the ACM 51(1), 74–113 (2004)
Gottlob, G., Herzog, M.: Infopipes: A Flexible Framework for M-Commerce Applications. In: Proc. of TES workshop at VLDB, pp. 175–186 (2001)
Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: Proc. of VLDB, pp. 119–128 (2001)
Pirelli&, C.: SpA: Annual Report (2003), http://www.pirelli.com//investor_relation/bilanciocompl2003.pdf , p. 7 (accessed on September 28 2004)
Himmeröder, R., Lausen, G., Ludäscher, B., May, W.: A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web. In: WWWCM. LNCS, vol. 1727, pp. 307–320. Springer, Heidelberg (1999)
Cabeza, D., Hermenegildo, M.: Distributed WWW programming using Ciao-Prolog and the PiLLoW library. TPLP 1(3) (2001)
Aberer, K., Fankhauser, P., Huck, G., Neuhold, E.: JEDI: Extracting and Synthesizing Information from the Web. In: Proc. of COOPIS, pp. 32–43 (1998)
Atzeni, P., Mecca, G.: Cut and paste. In: Proc. of PODS (1997)
Aranha, R., Cho, J., Crespo, A., Hammer, J., Garcia-Molina, H.: Extracting semistructured information from the web. In: Proc. Workshop on Mang. of Semistructured Data (1997)
Knoblock, C., Minton, S., Muslea, I.: A hierarchical approach to wrapper induction. In: Proc. of 3rd Intern. Conf. on Autonomous Agents (1999)
Doorenbos, R., Kushmerick, N., Weld, D.: Wrapper induction for information extraction. In: Proc. of IJCAI (1997)
Davulcu, H., Kifer, M., Ramakrishnan, I., Yang, G.: Computat. aspects of resilient data extract. from semistr. sources. In Proc. of PODS (2000)
Kushmerick, N.: Wrapper verification. World Wide Web Journal (2000)
Crescenzi, V., Mecca, G., Merialdo. P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)
Cafarella, M., Downey, D., Etzioni, O., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll (Preliminary Results). In: Proceedings of the World Wide Web Conference (2004)
Azavant, F., Sahuguet, A.: Building light-weight wrappers for legacy Web data-sources using W4F. In: Proc. of VLDB, pp. 738–741 (1999)
Han, W., Liu, L., Pu, C.: XWrap: An extensible wrapper construction system for internet information. In: Proc. of ICDE (2000)
Li, F., Liu, Z., Ng, W.K.: Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Proc. of the 21st International Conference on Conceptual Modelling, pp. 120–134 (2002)
Kou, H., Li, C., Meng, X., Wang, H.: A schema-guided toolkit for generating wrappers. In: Proc. of WEBSA 2003 (2003)
Alvarez, M., Hidalgo, J., Pan, A., Raposo, J., Vina, A.: The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes. In: Proceedings of DEXA 2002, Aix-en-Provence, France (2002)
Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. Sigmod Record 31(2), 84–93 (2002)
Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet White Paper, http://www.brightplanet.com/technology/deepweb.asp (accessed on January 1, 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baumgartner, R., Frölich, O., Gottlob, G., Herzog, M., Lehmann, P. (2005). Integrating Semi-structured Data into Business Applications: A Web Intelligence Example. In: Althoff, KD., Dengel, A., Bergmann, R., Nick, M., Roth-Berghofer, T. (eds) Professional Knowledge Management. WM 2005. Lecture Notes in Computer Science(), vol 3782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590019_54
Download citation
DOI: https://doi.org/10.1007/11590019_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30465-4
Online ISBN: 978-3-540-31620-6
eBook Packages: Computer ScienceComputer Science (R0)