Skip to main content

Data Quality Aware Queries in Collaborative Information Systems

  • Conference paper
Advances in Data and Web Management (APWeb 2009, WAIM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

Abstract

The issue of data quality is gaining importance as individuals as well as corporations are increasingly relying on multiple, often external sources of data to make decisions. Traditional query systems do not factor in data quality considerations in their response. Studies into the diverse interpretations of data quality indicate that fitness for use is a fundamental criteria in the evaluation of data quality. In this paper, we present a 4 step methodology that includes user preferences for data quality in the response of queries from multiple sources. User preferences are modelled using the notion of preference hierarchies. We have developed an SQL extension to facilitate the specification of preference hierarchies. Further, we will demonstrate through experimentation how our approach produces an improved result in query response.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer, New York (2006)

    MATH  Google Scholar 

  2. Benjelloun, O., Garcia-Molina, H., Su, Q., Widom, J.: Swoosh: A generic approach to entity resolution. VLDB Journal (2008)

    Google Scholar 

  3. Bohannon, P., Wenfei, F., Geerts, F., Xibei, J., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE (2007)

    Google Scholar 

  4. Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of ICDE, pp. 421–430 (2001)

    Google Scholar 

  5. Chomicki, J.: Querying with intrinsic preferences. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 34–51. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd international conference on Very large data bases, pp. 315–326 (2007)

    Google Scholar 

  7. Fishburn, P.: Preference structures and their numerical representations. Theoretical Computer Science 217(2), 359–383 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Friedman, T., Bitterer, A.: Magic Quadrant for Data Quality Tools. Gartner Group (2006)

    Google Scholar 

  9. Govindarajan, K., Jayaraman, B., Mantha, S.: Preference Queries in Deductive Databases. New Generation Computing 19(1), 57–86 (2000)

    Article  MATH  Google Scholar 

  10. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate String Joins in a Database (Almost) for Free. In: Proceedings of the international conference on very large data bases, pp. 491–500 (2001)

    Google Scholar 

  11. Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text Joins for Data Cleansing and Integration in an RDBMS. In: Proc. of Int. Conf. on Data Engineering (ICDE) (2003)

    Google Scholar 

  12. Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making: Methods and Appllication. Lecture Notes in Economics and Mathematical Systems. Springer, Heidelberg (1981)

    Book  Google Scholar 

  13. Kießling, W.: Foundations of preferences in database systems. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 311–322. VLDB Endowment (2002)

    Google Scholar 

  14. Lacroix, M., Lavency, P.: Preferences: Putting More Knowledge into Queries. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 217–225. Morgan Kaufmann Publishers Inc., San Francisco (1987)

    Google Scholar 

  15. Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: ProbView: a flexible probabilistic database system. ACM Transactions on Database Systems (TODS) 22(3), 419–469 (1997)

    Article  Google Scholar 

  16. Mantha, S.M.: First-order preference theories and their applications. PhD thesis, Mathematics, Salt Lake City, UT, USA (1992)

    Google Scholar 

  17. Naumann, F.: Quality-Driven Query Answering for Integrated Information Systems. LNCS, vol. 2261. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  18. Naumann, F., Freytag, J.C., Spiliopoulou, M.: Qualitydriven source selection using Data Envelopment Analysis. In: Proc. of the 3rd Conference on Information Quality (IQ), Cambridge, MA (1998)

    Google Scholar 

  19. Redman, T.C.: Data Quality for the Information Age. Artech House, Inc., Norwood (1997)

    Google Scholar 

  20. Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41(2), 79–82 (1998)

    Article  Google Scholar 

  21. Saaty, T.L.: How to Make a Decision: The Analytic Hierarchy Process. European Journal of Operational Research 48(1), 9–26 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Saaty, T.L.: Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. RWS Publications (1996)

    Google Scholar 

  23. Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14, 6–14 (2005)

    Google Scholar 

  24. Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-Science. SIGMOD RECORD 34(3), 31 (2005)

    Article  Google Scholar 

  25. von Wright, G.H.: The Logic of Preference. Edinburgh University Press (1963)

    Google Scholar 

  26. Wang, R.Y., Kon, H.B.: Toward total data quality management (TDQM). Prentice-Hall, Inc., Upper Saddle River (1993)

    Google Scholar 

  27. Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering 7(4), 623–640 (1995)

    Article  Google Scholar 

  28. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)

    Article  Google Scholar 

  29. Wellman, M.P., Doyle, J.: Preferential semantics for goals. In: Proceedings of the National Conference on Artificial Intelligence, pp. 698–703 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yeganeh, N.K., Sadiq, S., Deng, K., Zhou, X. (2009). Data Quality Aware Queries in Collaborative Information Systems. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00672-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00671-5

  • Online ISBN: 978-3-642-00672-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics