Skip to main content

Estimating data accuracy in a federated database environment

  • Distributed Systems
  • Conference paper
  • First Online:
Information Systems and Data Management (CISMOD 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1006))

Included in the following conference series:

Abstract

The need for integration of data in a heterogeneous or federated database environment creates a corresponding need for estimating the accuracy of the integrated data as a function of the accuracy of the originating data sources. Even in a single database system, different base relations are frequently characterized by dissimilar levels of accuracy; however, no technique exists for defining the accuracy of this single database system in terms of the accuracy of the base relations. This need is further heightened in the case of federated environments involving multiple heterogeneous databases. To address this need, a generalized method is proposed for estimating the overall data accuracy in terms of the accuracy of relevant base relations and the actual database query. The query is examined in terms of its underlying set of base operators. A rigorous theoretical framework encompassing all these possible base operators is presented in this paper using the relational model. While the accuracy estimates are postulated on the basis of uniform distribution, the implications of non-uniform error distributions are also examined in theoretical terms. Finally, a running example is utilized to highlight the practical implications of the proposed theoretical framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ceri, S. & Pelagatti, G. (1984). Distributed Databases Principles & Systems. McGraw-Hill.

    Google Scholar 

  2. Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.

    Article  Google Scholar 

  3. Date, C. J. (1990). An Introduction to Database Systems. Reading: Addison-Wesley.

    Google Scholar 

  4. Deen, S. M., et al. (1987). Implementation of a prototype for PRECI*. Computer Journal, 30(2), 157–162.

    Article  Google Scholar 

  5. Heimbigner, D. & McLeod, D. (1985). A federated architecture for information management. ACM Transactions on Office Information Systems, 3, 253–278.

    Google Scholar 

  6. Janson, M. (1988). Data Quality: The Achilles Heel of End-User Computing. Omega Journal of Management Science, 16(5), 491–502.

    Article  Google Scholar 

  7. Johnson, J. R., et al. (1981). Characteristics of Errors in Accounts Receivable and Inventory Audits. Accounting Review, 56(2), 270–293.

    Google Scholar 

  8. Kent, W. (1978). Data and Reality. New York: North Holland.

    Google Scholar 

  9. Klug, A. (1982). Equivalence of relational algebra and relational calculus query languages having aggregate functions. The Journal of ACM, 29, 699–717

    Article  Google Scholar 

  10. Lander, T. & Rosenberg, R. (1982). An Overview of Multibase. In proceedings of Second Symposiam on Distributed Databases, Sept. 1982.

    Google Scholar 

  11. Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems. Communications of the ACM, 29(1), 4–11.

    Article  Google Scholar 

  12. Liepens, G. E., et al. (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Science, 19, 205–219.

    Google Scholar 

  13. Litwin, W. & Abdellatif, A. (1986). Multidatabase interoperability. IEEE Computer, 10–18.

    Google Scholar 

  14. Morey, R. C. (1982). Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, 25(5), 337–342.

    Article  Google Scholar 

  15. O'Neill, E. T. & Vizine-Goetz, D. (1988). Quality Control in Online Databases. In Annual Review of Information, Science, and Technology, (pp. 125–156): Elsevier Publishing Company.

    Google Scholar 

  16. Paradice, D. B. & Fuerst, W. L. (1991). An MIS data quality methodology based on optimal error detection. Journal of Information Systems, 5(1), 48–66.

    Google Scholar 

  17. Pu, C. (1988). Superdatabases for Composition of Heterogeneous Databases. J. Carlis (Ed.), In IEEE 1988 Data Engineering Conference, Los Angeles, 548–555.

    Google Scholar 

  18. Rajinikanth, M. (1990). Multiple Database Integration in CALIDA: Design and Implementation. In First International Conference on Systems Integration, inproceedings of first international conference on systems integration, (April).

    Google Scholar 

  19. Reddy, M. P., et al. (1989). Query Processing in Heterogeneous Distributed Database Management Systems. (Ed.) Amar Gupta, IEEE Press, New York.

    Google Scholar 

  20. Sheth, A. (1991). Special Issue: Semantic Issues in Multidatabase Systems. SIGMOD Record, 20(4), (December).

    Google Scholar 

  21. Sheth, A. & Larson, J. (1990). Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3).

    Google Scholar 

  22. Smith, J. M., et al. (1981). Multibase — Integrating Heterogeneous Distributed Database Systems. In Proceedings of AFIPS, 50, 487–499.

    Google Scholar 

  23. Spaccapietra, S., et al. (1992). Model Independent Assertions for Integration of Heterogeneous Schemas. The VLDB Journal, 1(1), 81–126.

    Article  Google Scholar 

  24. Templeton, M., et al. (1987). MERMAID — A Front-end to Distributed Hetergeneous Databases. Proceedings of the IEEE, 1(5), (May), 695–708.

    Google Scholar 

  25. Wang, Y. R., et al. (1993). Data Quality Requirements Analysis and Modeling. In the Proceedings of the 9th International Conference on Data Engineering, Vienna: IEEE Computer Society Press, 670–677.

    Google Scholar 

  26. Wang, Y. R., et al. (1995). Toward Quality Data: An Attribute-based Approach. Journal of Decision Support Systems (March).

    Google Scholar 

  27. Wang, Y. R. & Madnick, S. E. (1989). Facilitating connectivity in composite information systems. ACM Data Base, 20(3), 38–46.

    Google Scholar 

  28. Wang, Y. R. & Madnick, S. E. (1990). A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In the Proceedings of the 16th International Conference on Very Large Data bases (VLDB), Brisbane, Australia, 519–538.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reddy, M.P., Wang, R.Y. (1995). Estimating data accuracy in a federated database environment. In: Bhalla, S. (eds) Information Systems and Data Management. CISMOD 1995. Lecture Notes in Computer Science, vol 1006. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60584-3_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-60584-3_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60584-3

  • Online ISBN: 978-3-540-47799-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics