Skip to main content

A Pattern-Based Framework for Addressing Data Representational Inconsistency

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9877))

Included in the following conference series:

  • 2059 Accesses

Abstract

Data representational inconsistency, where data has diverse formats or structures, is a crucial data quality problem. Existing fixing approaches either target on a specific domain or require massive information from users. In this work, we propose a user-friendly pattern-based framework for addressing data representational inconsistency. Our framework consists of three modules: pattern design, pattern detection, and pattern unification. We identify several challenges in all the three tasks in order to handle an inconsistent dataset both accurately and efficiently. We propose various techniques to tackle these issues, and our experimental results on real-life datasets demonstrate better performance of our proposals compared with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sadiq, S.: Handbook of data quality: Research and practice (2015)

    Google Scholar 

  2. Churches, T., Christen, P., Lim, K., Zhu, J.X.: Preparation of name and address data for record linkage using hidden markov models. BMC Med. Inf. Decis. Making 2(1), 9 (2002)

    Article  Google Scholar 

  3. GmbH, A.: Addressdoctor enterprise documentation - informatica (2014)

    Google Scholar 

  4. Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  5. Türker, C., Gertz, M.: Semantic integrity support in sql: 1999 and commercial (object-) relational database management systems. VLDB J. 10(4), 241–269 (2001)

    Article  MATH  Google Scholar 

  6. Ceri, S., Cochrane, R., Widom, J.: Practical applications of triggers and constraints: Successes and lingering issues. In: VLDB, pp. 10–14 (2000)

    Google Scholar 

  7. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. (TODS) 33(2), 94–115 (2008)

    Article  Google Scholar 

  8. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proc. VLDB Endowment 3(1–2), 173–184 (2010)

    Article  Google Scholar 

  9. Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 457–468 (2014)

    Google Scholar 

  10. Li, G., Zhou, X., Feng, J., Wang, J.: Progressive keyword search in relational databases. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1183–1186, March 2009

    Google Scholar 

  11. Luo, Y., Wang, W., Lin, X., Zhou, X., Wang, J., Li, K.: Spark2: Top-k keyword query in relational databases. IEEE Trans. Knowl. Data Eng. 23(12), 1763–1780 (2011)

    Article  Google Scholar 

  12. Huynh, D.T., Hua, W.: Self-supervised learning approach for extracting citation information on the web. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds.) APWeb 2012. LNCS, vol. 7235, pp. 719–726. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

  14. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, techniques, and tools (2006)

    Google Scholar 

  15. Yi, B., Hua, W., Sadiq, S.: Technical report: Pattern-based framework for addressing data representational inconsistency (2016). https://drive.google.com/folderview?id=0B7vhn9TkNVEVYjN4WWhIclpLdTA&usp=sharing

Download references

Acknowledgment

This work was supported by the grant DP140103171 (Declaration, Exploration, Enhancement and Provenance: The DEEP Approach to Data Quality Management Systems) from the Australian Research Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bingyu Yi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Yi, B., Hua, W., Sadiq, S. (2016). A Pattern-Based Framework for Addressing Data Representational Inconsistency. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46922-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46921-8

  • Online ISBN: 978-3-319-46922-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics