Abstract
The need for integration of data in a heterogeneous or federated database environment creates a corresponding need for estimating the accuracy of the integrated data as a function of the accuracy of the originating data sources. Even in a single database system, different base relations are frequently characterized by dissimilar levels of accuracy; however, no technique exists for defining the accuracy of this single database system in terms of the accuracy of the base relations. This need is further heightened in the case of federated environments involving multiple heterogeneous databases. To address this need, a generalized method is proposed for estimating the overall data accuracy in terms of the accuracy of relevant base relations and the actual database query. The query is examined in terms of its underlying set of base operators. A rigorous theoretical framework encompassing all these possible base operators is presented in this paper using the relational model. While the accuracy estimates are postulated on the basis of uniform distribution, the implications of non-uniform error distributions are also examined in theoretical terms. Finally, a running example is utilized to highlight the practical implications of the proposed theoretical framework.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ceri, S. & Pelagatti, G. (1984). Distributed Databases Principles & Systems. McGraw-Hill.
Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.
Date, C. J. (1990). An Introduction to Database Systems. Reading: Addison-Wesley.
Deen, S. M., et al. (1987). Implementation of a prototype for PRECI*. Computer Journal, 30(2), 157–162.
Heimbigner, D. & McLeod, D. (1985). A federated architecture for information management. ACM Transactions on Office Information Systems, 3, 253–278.
Janson, M. (1988). Data Quality: The Achilles Heel of End-User Computing. Omega Journal of Management Science, 16(5), 491–502.
Johnson, J. R., et al. (1981). Characteristics of Errors in Accounts Receivable and Inventory Audits. Accounting Review, 56(2), 270–293.
Kent, W. (1978). Data and Reality. New York: North Holland.
Klug, A. (1982). Equivalence of relational algebra and relational calculus query languages having aggregate functions. The Journal of ACM, 29, 699–717
Lander, T. & Rosenberg, R. (1982). An Overview of Multibase. In proceedings of Second Symposiam on Distributed Databases, Sept. 1982.
Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems. Communications of the ACM, 29(1), 4–11.
Liepens, G. E., et al. (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Science, 19, 205–219.
Litwin, W. & Abdellatif, A. (1986). Multidatabase interoperability. IEEE Computer, 10–18.
Morey, R. C. (1982). Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, 25(5), 337–342.
O'Neill, E. T. & Vizine-Goetz, D. (1988). Quality Control in Online Databases. In Annual Review of Information, Science, and Technology, (pp. 125–156): Elsevier Publishing Company.
Paradice, D. B. & Fuerst, W. L. (1991). An MIS data quality methodology based on optimal error detection. Journal of Information Systems, 5(1), 48–66.
Pu, C. (1988). Superdatabases for Composition of Heterogeneous Databases. J. Carlis (Ed.), In IEEE 1988 Data Engineering Conference, Los Angeles, 548–555.
Rajinikanth, M. (1990). Multiple Database Integration in CALIDA: Design and Implementation. In First International Conference on Systems Integration, inproceedings of first international conference on systems integration, (April).
Reddy, M. P., et al. (1989). Query Processing in Heterogeneous Distributed Database Management Systems. (Ed.) Amar Gupta, IEEE Press, New York.
Sheth, A. (1991). Special Issue: Semantic Issues in Multidatabase Systems. SIGMOD Record, 20(4), (December).
Sheth, A. & Larson, J. (1990). Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3).
Smith, J. M., et al. (1981). Multibase — Integrating Heterogeneous Distributed Database Systems. In Proceedings of AFIPS, 50, 487–499.
Spaccapietra, S., et al. (1992). Model Independent Assertions for Integration of Heterogeneous Schemas. The VLDB Journal, 1(1), 81–126.
Templeton, M., et al. (1987). MERMAID — A Front-end to Distributed Hetergeneous Databases. Proceedings of the IEEE, 1(5), (May), 695–708.
Wang, Y. R., et al. (1993). Data Quality Requirements Analysis and Modeling. In the Proceedings of the 9th International Conference on Data Engineering, Vienna: IEEE Computer Society Press, 670–677.
Wang, Y. R., et al. (1995). Toward Quality Data: An Attribute-based Approach. Journal of Decision Support Systems (March).
Wang, Y. R. & Madnick, S. E. (1989). Facilitating connectivity in composite information systems. ACM Data Base, 20(3), 38–46.
Wang, Y. R. & Madnick, S. E. (1990). A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In the Proceedings of the 16th International Conference on Very Large Data bases (VLDB), Brisbane, Australia, 519–538.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reddy, M.P., Wang, R.Y. (1995). Estimating data accuracy in a federated database environment. In: Bhalla, S. (eds) Information Systems and Data Management. CISMOD 1995. Lecture Notes in Computer Science, vol 1006. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60584-3_27
Download citation
DOI: https://doi.org/10.1007/3-540-60584-3_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60584-3
Online ISBN: 978-3-540-47799-0
eBook Packages: Springer Book Archive