Abstract
With the popularity of the internet, more and more data are generated on internet. Because of the usability of Extensible Markup Language(XML for short), more data is organized by XML document format. Because of the flexibility of XML, data organized by XML have a variety of organizational formats which brings a lot of inconvenience to data management. In particular, when the large-scale data operations are performed on XML data, for example data integration, model change, and so on, there are many problems. One of the current implementations is to use Data Exchange to carry out the above operations. The works of predecessors mainly are to analyze the characteristics of Schema Mapping on XML, and institute Data Exchange rules. These rules only consider the data integrity, reliability, but don’t consider the quality of the data after conversion. This paper proposes the concept of quality assurance mechanisms. Firstly we discuss that a new model with quality assurance, and provide a suitable method for this model. Then we propose the strategy of weak branch’s convergence on the basis of Schema. In the end theoretical analysis and experimental results show that the method is correct and feasible.
Supported by the This research is partially supported by National Science Foundation of China (No. 61003046), the NSFC-RGC of China (No.60831160525), National Grant of High Technology 863 Program of China (No. 2009AA01Z149), Key Program of the National Natural Science Foundation of China (No. 60933001), National Postdoctoral Foundation of China (No. 20090450126, No. 201003447), Doctoral Fund of Ministry of Education of China(No. 20102302120054), Postdoctoral Foundation of Heilongjiang Province (No. LBH-Z09109), Development Program for Outstanding Young Teachers in Harbin Institute of Technology (No.HITQNJS.2009.052).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bernstein, P.: Model management 2.0:manipulating richer mappings. In: SIGMOD 2007, pp. 1–12 (2007)
Kolaitis, P.: Schema mappings, data exchange, and metadata management. In: PODS 2005 (2005)
Miller, R.: The Clio project: managing heterogeneity. SIGMOD Record 30, 78–83 (2001)
Popa, L.: Translating web data. In: VLDB 2002, pp. 598–609 (2002)
Bernstein, P.: Implementing mapping composition. In: VLDB 2006, pp. 55–66 (2006)
Chiticariu, L.: Debugging schema mappings with routes. In: VLDB 2006, pp. 79–90 (2006)
Fagin, R., Kolaitis, P., Popa, L., Tan, W.C.: Composing schema mappings: second-order dependencies to the rescue. ACM TODS 30(4), 994–1055 (2005)
Madhavan, J.: Composing mappings among data sources. In: VLDB 2003, pp. 572–583 (2003)
Nash, A., Bernstein, P., Melnik, S.: Composition of mappings given by embedded dependencies. ACM TODS 32(1), 4 (2007)
Weis, M.: DogmatiX Tracks down Duplicates in XML. In: SIGMOD 2005, pp. 431–442 (2005)
Feng, Y.: Mapping XML DTD to Relational Schema. In: DBTA 2009, pp. 557–560 (2009)
Lu, S., Sun, Y., Atay, M., Fotouhi, F.: A New Inlining Algorithm for Mapping XML DTDs to Relational Schemas. In: ER (Workshops) 2003, pp. 366–377 (2003)
Zhou, R.: Holistic constraint-preserving transformation from relational schema into XML schema. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 4–18. Springer, Heidelberg (2008)
Liu, Y., Wang, T., Yang, D., Tang, S.: Propagating Functional Dependencies from Relational Schema to XML Schema Using Path Mapping Rules. In: International Conference on Internet Computing 2007, pp. 294–299 (2007)
Milano, D., Scannapieco, M., Catarci, T.: Structure Aware XML Object Identification. In: CleanDB 2006 (2006)
Ristad, E.S.: Learning String-Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell (PAMI) 20(5), 522–532 (1998)
Arasu, A.: Learning String Transformations From Examples. PVLDB 2(1), 514–525
Lu, C.L., Su, Z.-Y., Tang, C.Y.: A new measure of edit distance between labeled trees. In: Wang, J. (ed.) COCOON 2001. LNCS, vol. 2108, pp. 338–348. Springer, Heidelberg (2001)
Broder, A.: On the resemblance and containment of documents, p. 21. IEEE, Los Alamitos (1997)
Manber, U.: Finding similar files in a large file system. In: USENIX Winter, pp. 1–10 (1994)
Shannon, C.E.: A Mathematical Theory of Communication. CSLI Publications (1948)
Li, C., Wang, B., Yang, X.: Vgram: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB 2007, pp. 303–314 (2007)
Dasu, T., Johnson, T.: Mining database structure; or, how to build a data quality browser. In: SIGMOD 2002, pp. 240–251 (2002)
Köhler, H.: Sampling dirty data for matching attributes. In: SIGMOD 2010, pp. 63–74 (2010)
Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., Vanhoutte, A.: Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Inf. Process. Manage (IPM) 25(3), 315–318 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bian, X., Wang, H., Gao, H. (2011). Schema Mapping with Quality Assurance for Data Integration. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-20291-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)