Integrating semantically heterogeneous aggregate views of distributed databases

McClean, Sally; Scotney, Bryan; Morrow, Philip; Greer, Kieran

doi:10.1007/s10619-008-7031-6

Integrating semantically heterogeneous aggregate views of distributed databases

Published: 02 October 2008

Volume 24, pages 73–94, (2008)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Sally McClean¹,
Bryan Scotney¹,
Philip Morrow¹ &
…
Kieran Greer¹

130 Accesses
5 Citations
Explore all metrics

Abstract

In statistical databases and data warehousing applications it is commonly the case that aggregate views are maintained as an underlying mechanism for summarising information. Where the databases or applications are distributed, or arise from independent data collections or system developments, there may be incompatibility, heterogeneity, and data inconsistency. These challenges need to be overcome if federations of aggregated databases are to be successfully incorporated into systems for database management, querying, retrieval, and knowledge discovery.

In this paper we address the issue of integrating aggregate views that have semantically heterogeneous classification schemes. In previous work we have developed a methodology that is efficient but that cannot easily handle data inconsistencies. Our previous approach is therefore not particularly well-suited to very large databases or federations of large numbers of databases. We now address these scalability issues by introducing a methodology for heterogeneous aggregate view integration that constructs a dynamic shared ontology to which each of the aggregate views can be explicitly related. A maximum likelihood technique, implemented using the EM (Expectation-Maximisation) algorithm, is used to inherently handle data inconsistencies in the computation of integrated aggregates that are described in terms of the dynamic shared ontology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trends and Future Perspective Challenges in Big Data

Density-Based Clustering Based on Hierarchical Density Estimates

Big data analytics on Apache Spark

Article 13 October 2016

References

Anand, S.S., Scotney, B.W., Tan, M.G., McClean, S.I., Bell, D.A., Hughes, J.G., Magill, I.C.: Designing a kernel for data mining. IEEE Expert March-April, 65–74 (1997)
Article Google Scholar
AnHai, D., Pedro, D., Alon, Y.H.: Reconciling schemas of disparate data sources: a machine-learning approach. In: ACM SIGMOD Conf. on Management of Data, pp. 509–520. Assoc. Comput. Mach., New York (2001)
Google Scholar
Bergamaschi, S., et al.: Semantic integration of heterogeneous information sources. Data Knowl. Eng. 36(3), 215–249 (2001)
Article MATH Google Scholar
Caragea, D., et al.: Information integration from semantically heterogeneous biological data sources. In: Proceedings of the 16th Intl. Workshop on Database and Expert Systems Applications, Las Vegas, Nevada, pp. 580–584 (2005)
Chen, R., Krishnamoorthy, S.: A new algorithm for learning parameters of a Bayesian Network from distributed data. In: IEEE International Conference on Data Mining, Maebashi, Japan, pp. 585–588 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26(1), 83–94 (2005)
Google Scholar
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J.D., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: data models and languages. J. Intell. Inf. Syst. 8(2), 117–132 (1997)
Article Google Scholar
Kargupta, H., Chan, P.: Advances in Distributed and Parallel Knowledge Discovery. AAAI Press/MIT Press, Cambridge (2000)
Google Scholar
Kittler, J., et al.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–238 (1998)
Article Google Scholar
Levy, A.: The information manifold approach to data integration. IEEE Intell. Syst. 1312–1316 (1998)
Lim, E.-P., Srivastava, J., Shekhar, S.: An evidential reasoning approach to attribute value conflict resolution in database management. IEEE Trans. Knowl. Data Eng. 8, 707–723 (1996)
Article Google Scholar
Malvestuto, F.M.: The derivation problem for summary data. In: Proc. ACM-SIGMOD Conf. on Management of Data, pp. 82–89. Assoc. Comput. Mach., New York (1988)
Google Scholar
McClean, S.I., Scotney, B.W.: Using evidence theory for the integration of distributed databases. Int. J. Intell. Syst. 12(10), 763–776 (1997)
Article Google Scholar
McClean, S.I., Scotney, B.W., Shapcott, C.M.: Aggregation of imprecise and uncertain information in databases. IEEE Trans. Knowl. Data Eng. 13(6), 902–912 (2001)
Article Google Scholar
McClean, S.I., Scotney, B.W., Greer, K.R.C.: A scalable approach to integrating heterogeneous aggregate views of distributed databases. IEEE Trans. Knowl. Data Eng. 15(1), 232–235 (2003)
Article Google Scholar
McClean, S.I., Scotney, B.W., Morrow, P.J., Greer, K.R.C.: Knowledge discovery by probabilistic clustering of distributed databases. Data Knowl. Eng. 54, 189–210 (2005)
Article Google Scholar
Sadreddini, M.H., Bell, D.A., McClean, S.I.: A model for integration of raw data and aggregate views in heterogeneous statistical databases. Database Technol. 4(2), 115–127 (1991)
Google Scholar
Sadreddini, M.H., Bell, D.A., McClean, S.I.: A framework for query optimization in distributed statistical databases. Inf. Softw. Technol. 6, 363–377 (1992)
Article Google Scholar
Scotney, B.W., McClean, S.I.: Efficient knowledge discovery through the integration of heterogeneous data. Inf. Softw. Technol. 41, 569–578 (1999). Special Issue-Knowledge Discovery and Data Mining
Article Google Scholar
Scotney, B.W., McClean, S.I., Rodgers, M.C.: Optimal and efficient integration of heterogeneous summary tables in a distributed database. Data Knowl. Eng. 29, 337–350 (1999)
Article MATH Google Scholar
Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering classifiers for knowledge discovery from physically distributed databases. Data Knowl. Eng. 49(3), 223–242 (2004)
Article Google Scholar
Vardi, Y., Lee, D.: From image deblurring to optimal investments: maximum likelihood solutions for positive linear inverse problems (with discussion), J. R. Stat. Soc. Ser. B 569–612 (1993)
Yin, X., Han, J., Yang, J., Yu, P.S.: Efficient classification across multiple database relations: a crossmine approach. IEEE Trans. Knowl. Data Eng. 18(6), 770–783 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Information Engineering, University of Ulster, Cromore Road, Coleraine, BT52 1SA, Northern Ireland
Sally McClean, Bryan Scotney, Philip Morrow & Kieran Greer

Authors

Sally McClean
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Scotney
View author publications
You can also search for this author in PubMed Google Scholar
Philip Morrow
View author publications
You can also search for this author in PubMed Google Scholar
Kieran Greer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip Morrow.

Additional information

Recommended by: Ahmed K. Elmagarmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McClean, S., Scotney, B., Morrow, P. et al. Integrating semantically heterogeneous aggregate views of distributed databases. Distrib Parallel Databases 24, 73–94 (2008). https://doi.org/10.1007/s10619-008-7031-6

Download citation

Published: 02 October 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10619-008-7031-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating semantically heterogeneous aggregate views of distributed databases

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Density-Based Clustering Based on Hierarchical Density Estimates

Big data analytics on Apache Spark

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrating semantically heterogeneous aggregate views of distributed databases

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

Density-Based Clustering Based on Hierarchical Density Estimates

Big data analytics on Apache Spark

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation