Abstract
The issue of data quality is gaining importance as individuals as well as corporations are increasingly relying on multiple, often external sources of data to make decisions. Traditional query systems do not factor in data quality considerations in their response. Studies into the diverse interpretations of data quality indicate that fitness for use is a fundamental criteria in the evaluation of data quality. In this paper, we present a 4 step methodology that includes user preferences for data quality in the response of queries from multiple sources. User preferences are modelled using the notion of preference hierarchies. We have developed an SQL extension to facilitate the specification of preference hierarchies. Further, we will demonstrate through experimentation how our approach produces an improved result in query response.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer, New York (2006)
Benjelloun, O., Garcia-Molina, H., Su, Q., Widom, J.: Swoosh: A generic approach to entity resolution. VLDB Journal (2008)
Bohannon, P., Wenfei, F., Geerts, F., Xibei, J., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE (2007)
Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of ICDE, pp. 421–430 (2001)
Chomicki, J.: Querying with intrinsic preferences. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 34–51. Springer, Heidelberg (2002)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd international conference on Very large data bases, pp. 315–326 (2007)
Fishburn, P.: Preference structures and their numerical representations. Theoretical Computer Science 217(2), 359–383 (1999)
Friedman, T., Bitterer, A.: Magic Quadrant for Data Quality Tools. Gartner Group (2006)
Govindarajan, K., Jayaraman, B., Mantha, S.: Preference Queries in Deductive Databases. New Generation Computing 19(1), 57–86 (2000)
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate String Joins in a Database (Almost) for Free. In: Proceedings of the international conference on very large data bases, pp. 491–500 (2001)
Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text Joins for Data Cleansing and Integration in an RDBMS. In: Proc. of Int. Conf. on Data Engineering (ICDE) (2003)
Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making: Methods and Appllication. Lecture Notes in Economics and Mathematical Systems. Springer, Heidelberg (1981)
Kießling, W.: Foundations of preferences in database systems. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 311–322. VLDB Endowment (2002)
Lacroix, M., Lavency, P.: Preferences: Putting More Knowledge into Queries. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 217–225. Morgan Kaufmann Publishers Inc., San Francisco (1987)
Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: ProbView: a flexible probabilistic database system. ACM Transactions on Database Systems (TODS) 22(3), 419–469 (1997)
Mantha, S.M.: First-order preference theories and their applications. PhD thesis, Mathematics, Salt Lake City, UT, USA (1992)
Naumann, F.: Quality-Driven Query Answering for Integrated Information Systems. LNCS, vol. 2261. Springer, Heidelberg (2002)
Naumann, F., Freytag, J.C., Spiliopoulou, M.: Qualitydriven source selection using Data Envelopment Analysis. In: Proc. of the 3rd Conference on Information Quality (IQ), Cambridge, MA (1998)
Redman, T.C.: Data Quality for the Information Age. Artech House, Inc., Norwood (1997)
Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41(2), 79–82 (1998)
Saaty, T.L.: How to Make a Decision: The Analytic Hierarchy Process. European Journal of Operational Research 48(1), 9–26 (1990)
Saaty, T.L.: Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. RWS Publications (1996)
Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14, 6–14 (2005)
Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-Science. SIGMOD RECORD 34(3), 31 (2005)
von Wright, G.H.: The Logic of Preference. Edinburgh University Press (1963)
Wang, R.Y., Kon, H.B.: Toward total data quality management (TDQM). Prentice-Hall, Inc., Upper Saddle River (1993)
Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering 7(4), 623–640 (1995)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)
Wellman, M.P., Doyle, J.: Preferential semantics for goals. In: Proceedings of the National Conference on Artificial Intelligence, pp. 698–703 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yeganeh, N.K., Sadiq, S., Deng, K., Zhou, X. (2009). Data Quality Aware Queries in Collaborative Information Systems. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-00672-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)