Schema Mapping with Quality Assurance for Data Integration

Bian, Xu; Wang, Hongzhi; Gao, Hong

doi:10.1007/978-3-642-20291-9_53

Xu Bian²¹,
Hongzhi Wang²¹ &
Hong Gao²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Included in the following conference series:

Asia-Pacific Web Conference

1064 Accesses

Abstract

With the popularity of the internet, more and more data are generated on internet. Because of the usability of Extensible Markup Language(XML for short), more data is organized by XML document format. Because of the flexibility of XML, data organized by XML have a variety of organizational formats which brings a lot of inconvenience to data management. In particular, when the large-scale data operations are performed on XML data, for example data integration, model change, and so on, there are many problems. One of the current implementations is to use Data Exchange to carry out the above operations. The works of predecessors mainly are to analyze the characteristics of Schema Mapping on XML, and institute Data Exchange rules. These rules only consider the data integrity, reliability, but don’t consider the quality of the data after conversion. This paper proposes the concept of quality assurance mechanisms. Firstly we discuss that a new model with quality assurance, and provide a suitable method for this model. Then we propose the strategy of weak branch’s convergence on the basis of Schema. In the end theoretical analysis and experimental results show that the method is correct and feasible.

Supported by the This research is partially supported by National Science Foundation of China (No. 61003046), the NSFC-RGC of China (No.60831160525), National Grant of High Technology 863 Program of China (No. 2009AA01Z149), Key Program of the National Natural Science Foundation of China (No. 60933001), National Postdoctoral Foundation of China (No. 20090450126, No. 201003447), Doctoral Fund of Ministry of Education of China(No. 20102302120054), Postdoctoral Foundation of Heilongjiang Province (No. LBH-Z09109), Development Program for Outstanding Young Teachers in Harbin Institute of Technology (No.HITQNJS.2009.052).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bernstein, P.: Model management 2.0:manipulating richer mappings. In: SIGMOD 2007, pp. 1–12 (2007)
Google Scholar
Kolaitis, P.: Schema mappings, data exchange, and metadata management. In: PODS 2005 (2005)
Google Scholar
Miller, R.: The Clio project: managing heterogeneity. SIGMOD Record 30, 78–83 (2001)
Article Google Scholar
Popa, L.: Translating web data. In: VLDB 2002, pp. 598–609 (2002)
Google Scholar
Bernstein, P.: Implementing mapping composition. In: VLDB 2006, pp. 55–66 (2006)
Google Scholar
Chiticariu, L.: Debugging schema mappings with routes. In: VLDB 2006, pp. 79–90 (2006)
Google Scholar
Fagin, R., Kolaitis, P., Popa, L., Tan, W.C.: Composing schema mappings: second-order dependencies to the rescue. ACM TODS 30(4), 994–1055 (2005)
Article Google Scholar
Madhavan, J.: Composing mappings among data sources. In: VLDB 2003, pp. 572–583 (2003)
Google Scholar
Nash, A., Bernstein, P., Melnik, S.: Composition of mappings given by embedded dependencies. ACM TODS 32(1), 4 (2007)
Article Google Scholar
Weis, M.: DogmatiX Tracks down Duplicates in XML. In: SIGMOD 2005, pp. 431–442 (2005)
Google Scholar
Feng, Y.: Mapping XML DTD to Relational Schema. In: DBTA 2009, pp. 557–560 (2009)
Google Scholar
Lu, S., Sun, Y., Atay, M., Fotouhi, F.: A New Inlining Algorithm for Mapping XML DTDs to Relational Schemas. In: ER (Workshops) 2003, pp. 366–377 (2003)
Google Scholar
Zhou, R.: Holistic constraint-preserving transformation from relational schema into XML schema. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 4–18. Springer, Heidelberg (2008)
Chapter Google Scholar
Liu, Y., Wang, T., Yang, D., Tang, S.: Propagating Functional Dependencies from Relational Schema to XML Schema Using Path Mapping Rules. In: International Conference on Internet Computing 2007, pp. 294–299 (2007)
Google Scholar
Milano, D., Scannapieco, M., Catarci, T.: Structure Aware XML Object Identification. In: CleanDB 2006 (2006)
Google Scholar
Ristad, E.S.: Learning String-Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell (PAMI) 20(5), 522–532 (1998)
Article Google Scholar
Arasu, A.: Learning String Transformations From Examples. PVLDB 2(1), 514–525
Google Scholar
Lu, C.L., Su, Z.-Y., Tang, C.Y.: A new measure of edit distance between labeled trees. In: Wang, J. (ed.) COCOON 2001. LNCS, vol. 2108, pp. 338–348. Springer, Heidelberg (2001)
Chapter Google Scholar
Broder, A.: On the resemblance and containment of documents, p. 21. IEEE, Los Alamitos (1997)
Google Scholar
Manber, U.: Finding similar files in a large file system. In: USENIX Winter, pp. 1–10 (1994)
Google Scholar
Shannon, C.E.: A Mathematical Theory of Communication. CSLI Publications (1948)
Google Scholar
Li, C., Wang, B., Yang, X.: Vgram: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB 2007, pp. 303–314 (2007)
Google Scholar
Dasu, T., Johnson, T.: Mining database structure; or, how to build a data quality browser. In: SIGMOD 2002, pp. 240–251 (2002)
Google Scholar
Köhler, H.: Sampling dirty data for matching attributes. In: SIGMOD 2010, pp. 63–74 (2010)
Google Scholar
Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., Vanhoutte, A.: Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Inf. Process. Manage (IPM) 25(3), 315–318 (1989)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
Xu Bian, Hongzhi Wang & Hong Gao

Authors

Xu Bian
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information, Renmin University of China, 100872, Beijing, China
Xiaoyong Du
LFCS, School of Informatics, University of Edinburgh, 10 Crichton Street, EH8 9AB, Edinburgh, Scotland, UK
Wenfei Fan
School of Software, Tsinghua University, Room 819, Main Building, 100084, Beijing, China
Jianmin Wang
Computer School, Wuhan University, Luojiashan Road, 430072, Wuhan, Hubei, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, St. Lucia, Australia
Mohamed A. Sharaf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bian, X., Wang, H., Gao, H. (2011). Schema Mapping with Quality Assurance for Data Integration. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_53

Download citation

DOI: https://doi.org/10.1007/978-3-642-20291-9_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20290-2
Online ISBN: 978-3-642-20291-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics