Skip to main content

Schema Mapping with Quality Assurance for Data Integration

  • Conference paper
Web Technologies and Applications (APWeb 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6612))

Included in the following conference series:

  • 1064 Accesses

Abstract

With the popularity of the internet, more and more data are generated on internet. Because of the usability of Extensible Markup Language(XML for short), more data is organized by XML document format. Because of the flexibility of XML, data organized by XML have a variety of organizational formats which brings a lot of inconvenience to data management. In particular, when the large-scale data operations are performed on XML data, for example data integration, model change, and so on, there are many problems. One of the current implementations is to use Data Exchange to carry out the above operations. The works of predecessors mainly are to analyze the characteristics of Schema Mapping on XML, and institute Data Exchange rules. These rules only consider the data integrity, reliability, but don’t consider the quality of the data after conversion. This paper proposes the concept of quality assurance mechanisms. Firstly we discuss that a new model with quality assurance, and provide a suitable method for this model. Then we propose the strategy of weak branch’s convergence on the basis of Schema. In the end theoretical analysis and experimental results show that the method is correct and feasible.

Supported by the This research is partially supported by National Science Foundation of China (No. 61003046), the NSFC-RGC of China (No.60831160525), National Grant of High Technology 863 Program of China (No. 2009AA01Z149), Key Program of the National Natural Science Foundation of China (No. 60933001), National Postdoctoral Foundation of China (No. 20090450126, No. 201003447), Doctoral Fund of Ministry of Education of China(No. 20102302120054), Postdoctoral Foundation of Heilongjiang Province (No. LBH-Z09109), Development Program for Outstanding Young Teachers in Harbin Institute of Technology (No.HITQNJS.2009.052).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernstein, P.: Model management 2.0:manipulating richer mappings. In: SIGMOD 2007, pp. 1–12 (2007)

    Google Scholar 

  2. Kolaitis, P.: Schema mappings, data exchange, and metadata management. In: PODS 2005 (2005)

    Google Scholar 

  3. Miller, R.: The Clio project: managing heterogeneity. SIGMOD Record 30, 78–83 (2001)

    Article  Google Scholar 

  4. Popa, L.: Translating web data. In: VLDB 2002, pp. 598–609 (2002)

    Google Scholar 

  5. Bernstein, P.: Implementing mapping composition. In: VLDB 2006, pp. 55–66 (2006)

    Google Scholar 

  6. Chiticariu, L.: Debugging schema mappings with routes. In: VLDB 2006, pp. 79–90 (2006)

    Google Scholar 

  7. Fagin, R., Kolaitis, P., Popa, L., Tan, W.C.: Composing schema mappings: second-order dependencies to the rescue. ACM TODS 30(4), 994–1055 (2005)

    Article  Google Scholar 

  8. Madhavan, J.: Composing mappings among data sources. In: VLDB 2003, pp. 572–583 (2003)

    Google Scholar 

  9. Nash, A., Bernstein, P., Melnik, S.: Composition of mappings given by embedded dependencies. ACM TODS 32(1), 4 (2007)

    Article  Google Scholar 

  10. Weis, M.: DogmatiX Tracks down Duplicates in XML. In: SIGMOD 2005, pp. 431–442 (2005)

    Google Scholar 

  11. Feng, Y.: Mapping XML DTD to Relational Schema. In: DBTA 2009, pp. 557–560 (2009)

    Google Scholar 

  12. Lu, S., Sun, Y., Atay, M., Fotouhi, F.: A New Inlining Algorithm for Mapping XML DTDs to Relational Schemas. In: ER (Workshops) 2003, pp. 366–377 (2003)

    Google Scholar 

  13. Zhou, R.: Holistic constraint-preserving transformation from relational schema into XML schema. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 4–18. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Liu, Y., Wang, T., Yang, D., Tang, S.: Propagating Functional Dependencies from Relational Schema to XML Schema Using Path Mapping Rules. In: International Conference on Internet Computing 2007, pp. 294–299 (2007)

    Google Scholar 

  15. Milano, D., Scannapieco, M., Catarci, T.: Structure Aware XML Object Identification. In: CleanDB 2006 (2006)

    Google Scholar 

  16. Ristad, E.S.: Learning String-Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell (PAMI) 20(5), 522–532 (1998)

    Article  Google Scholar 

  17. Arasu, A.: Learning String Transformations From Examples. PVLDB 2(1), 514–525

    Google Scholar 

  18. Lu, C.L., Su, Z.-Y., Tang, C.Y.: A new measure of edit distance between labeled trees. In: Wang, J. (ed.) COCOON 2001. LNCS, vol. 2108, pp. 338–348. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  19. Broder, A.: On the resemblance and containment of documents, p. 21. IEEE, Los Alamitos (1997)

    Google Scholar 

  20. Manber, U.: Finding similar files in a large file system. In: USENIX Winter, pp. 1–10 (1994)

    Google Scholar 

  21. Shannon, C.E.: A Mathematical Theory of Communication. CSLI Publications (1948)

    Google Scholar 

  22. Li, C., Wang, B., Yang, X.: Vgram: Improving performance of approximate queries on string collections using variable-length grams. In: VLDB 2007, pp. 303–314 (2007)

    Google Scholar 

  23. Dasu, T., Johnson, T.: Mining database structure; or, how to build a data quality browser. In: SIGMOD 2002, pp. 240–251 (2002)

    Google Scholar 

  24. Köhler, H.: Sampling dirty data for matching attributes. In: SIGMOD 2010, pp. 63–74 (2010)

    Google Scholar 

  25. Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., Vanhoutte, A.: Similarity measures in scientometric research: The Jaccard index versus Salton’s cosine formula. Inf. Process. Manage (IPM) 25(3), 315–318 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bian, X., Wang, H., Gao, H. (2011). Schema Mapping with Quality Assurance for Data Integration. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds) Web Technologies and Applications. APWeb 2011. Lecture Notes in Computer Science, vol 6612. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20291-9_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20291-9_53

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20290-2

  • Online ISBN: 978-3-642-20291-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics