Abstract
Nowadays, information integration to web data sources and XML becomes a favorite information exchange format. New application motivates the problems that massive information is often transmitted in network and must be processed in limited buffer in mediator. To process query on massive data from web data source effectively, we present a method of XML compression based on edit distance for information transmission in information integration. By compressing XML, this method can reduce both the transmission time and buffer space. Two different strategies of XML compression for transmission and process in mediator are designed. Optimization of the combination of these strategies is discussed. We also propose the query execution algorithms on compressed XML data in buffer of mediator. We focus on main operators of data from wrapper in mediator, namely sort, union, join and aggregation. Implementation of these operators on compressed data using two different methods is described in this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice Hall, Englewood Cliffs (2000)
Wirderhold, G.: Mediators in the Architecture of Future Information Systems. IEEE Computer 25, 38–49
Bray, T., Paoli, J., Sperberg-McQueen, C.M.: Extensible markup language (XML) 1.0. W3C Recommendation (February 1998), http://www.w3.org/TR/REXxml
Christophides, V., Cluet, S., Simeon, J.: On Wrapping Query Languages and Efficient XML Integration. In: Proc. of ACM SIGMOD Conf. on Management of Data, Dallas, TX (May 2000)
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: 27th International Conference on Very Large Data Bases, Rome, Italy (2001)
Wang, H., Li, J., He, Z.: An Effective Storage Strategy for Compressed XML Warehouse. In: Proc. of National Database Conference of China (2002)
Liefke, H., Suciu, D.: XMill: an ecient compressor for XML data. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000)
Cheney, J.: Compressing XML with Multiplexed Hierarchical Models. In: Proceedings of the 2001 IEEE Data Compression Conference, pp. 163–172 (2001)
Manolescu, D., Florescu, D.: Kossmann: Answering XML Queries over Heterogeneous Data Sources. In: SIGMOD2001 (2001)
Papakonstantinou, S., Abiteboul, H.: Garcia-Molina: Object Fusion in Mediator Systems. In: VLDB 1996 (1996)
Ives, Z., Halevy, A., Weld, D.: Integrating Network-Bound XML Data. Data Engineering Bulletin 24(2) (2001)
Xyleme, L.: A dynamic warehouse for XML data of the Web. IEEE Data Engineering Bulletin (2001)
Naughton, J., et al.: The Niagara Internet Query System. IEEE Data Engineering Bulletin (2001)
Kossmann, D.: The state of the art in distributed query processing. In: ACM Computing Surveys, vol. 32(4) (December 2000)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An Adaptive Query Execution System for Data Integration. In: Proceedings of the SIGMOD Conference, Philadelphia, Pennsylvania (1999)
Bouganim, L., Fabret, F., Valduriez, P., Mohan, C.: Dynamic Query Scheduling in Data Integration Systems. In: 16th International Conference on Data Engineering, San Diego, California, February 28 - March 03 (2000)
Marian, S., Abiteboul, G., Mignet, C.: Change-centric management of versions in an XML warehouse. In: VLDB (2001)
Silberschatz, P., Baer Galvin, G.: Gagne: Operating System Concepts, 6th edn. John Wiley & Sons, Incl., Chichester (2001)
Wang, H.: Research of Information Integration in distribute Environment. Thesis of Bachelor Degree of Harbin Institute of Technology (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Li, J., He, Z., Luo, J. (2003). Web Information Integration Based on Compressed XML. In: Bianchi-Berthouze, N. (eds) Databases in Networked Information Systems. DNIS 2003. Lecture Notes in Computer Science, vol 2822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39845-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-39845-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20111-3
Online ISBN: 978-3-540-39845-5
eBook Packages: Springer Book Archive