skip to main content
research-article

Bridging the data integration gap: from theory to implementation

Published: 05 May 2011 Publication History

Abstract

The integration of multiple autonomous and heterogeneous data sources (both across the web and via a company intranet) has received much attention throughout the years, particularly due to its many applications in the fields of Artificial Intelligence and medical research data sharing. Data integration systems embody this work and have come very far in the past twenty years. The problem of designing such systems is characterized by a number of issues that are interesting from a theoretical point of view: answering queries using logical views, query containment and completeness, automatic integration of existing data sources via schema mapping tools, etc. In this work we discuss these issues, compare and contrast various proposed solutions (federated database systems and data warehouses), and finally propose a novel extension of the MVC (model, view, controller) web-based framework that allows for the rapid development and implementation of data integration systems solutions suitable for use on the web.

References

[1]
Heimbigner, D., and Mcleod, D. 1985. A federated architecture for information management. ACM Trans. Off. Znf. Syst. 3, 3 (July), 253--278.
[2]
Amit P. Sheth and James A. Larson. 1990. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22, 3 (September 1990), 183--236. DOI=10.1145/96602.96604 http://doi.acm.org/10.1145/96602.96604
[3]
Witold Litwin, Leo Mark, and Nick Roussopoulos. 1990. Interoperability of multiple autonomous databases. ACM Comput. Surv. 22, 3 (September 1990), 267--293. DOI=10.1145/96602.96608 http://doi.acm.org/10.1145/96602.96608
[4]
Kamel, M.N., et al. "The Federated Database Management System: An Architecture of Distributed Systems for the 90's." Distributed Computing Systems, 1990. Proceedings., Second IEEE Workshop on Future Trends of Cairo, Eqypt Sep.30-Oct. 2, 1990, Los Alamitos, CA, USA, IEEE Comput.Soc, US, Sep. 30, 1990, pp. 346--352.
[5]
Ullman, J. (1996) Information Integration Using Logical Views. Technical Report. Stanford InfoLab. (Publication Note: Invited paper for 1997 ICDT)
[6]
Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. 1996. Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB '96), T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 251--262.
[7]
Michael R. Genesereth, Arthur M. Keller, and Oliver M. Duschka. 1997. Infomaster: an information integration system. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data (SIGMOD '97), Joan M. Peckman, Sudha Ram, and Michael Franklin (Eds.). ACM, New York, NY, USA, 539--542. DOI=10.1145/253260.253400 http://doi.acm.org/10.1145/253260.253400
[8]
Oliver Michael Duschka. 1998. Query Planning and Optimization in Information Integration. Ph.D. Dissertation. Stanford University, Stanford, CA, USA. Advisor(s) Michael R. Genesereth. AAI9837087.
[9]
Alon Y. Levy. 2000. Logic-based techniques in data integration. In Logicbased artificial intelligence, Jack Minker (Ed.). Kluwer International Series In Engineering And Computer Science, Vol. 597. Kluwer Academic Publishers, Norwell, MA, USA 575--595.
[10]
Alon Y. Halevy. 2001. Answering queries using views: A survey. The VLDB Journal 10, 4 (December 2001), 270--294. DOI=10.1007/s007780100054 http://dx.doi.org/10.1007/s007780100054
[11]
Foto N. Afrati, Chen Li, and Jeffrey D. Ullman. 2001. Generating efficient plans for queries using views. SIGMOD Rec. 30, 2 (May 2001), 319--330. DOI=10.1145/376284.375705 http://doi.acm.org/10.1145/376284.375705
[12]
Rachel Pottinger and Alon Halevy. 2001. MiniCon: A scalable algorithm for answering queries using views. The VLDB Journal 10, 2-3 (September 2001), 182--198.
[13]
A. Rosenthal and L. Seligman. Scalability issues in data integration. In Proceedings of the AFCEA Federal Database Conference, 2001.
[14]
Semantic Query Transformation for the Intelligent Integration of Information Sources over the Web, Ismail Khalil Ibrahim, Werner Winiwarter, Stéphane Bressan, Proceedings of the International Workshop on Information Integration on the Web - Technologies and Applications (WIIW2001), April 9-10, 2001, Rio de Janeiro, Brazil
[15]
R. McCann, A. Doan, A. Kramnik, and V. Varadarajan. Building data integration systems via mass collaboration. In Proc. of the SIGMOD-03 Workshop on the Web and Databases (WebDB-03), 2003.
[16]
Maurizio Lenzerini. 2002. Data integration: a theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '02). ACM, New York, NY, USA, 233--246. DOI=10.1145/543613.543644 http://doi.acm.org/10.1145/543613.543644
[17]
Theodore Johnson. 2002. Data warehousing. In Handbook of massive data sets, James Abello, Panos M. Pardalos, and Mauricio G. C. Resende (Eds.). Kluwer Academic Publishers, Norwell, MA, USA 661--710.
[18]
Xu L., Embley D.W. Combining the Best Globa-as-View and Local-as-View for Data Integration In (Doroshenko A., Halpin T., Liddle S., Mayr H. eds.) Proc. of the 3rd International Conference ISTA'2004: Information Systems Technology and its Applications, Salt Lake City, GI Lecture Notes in Informatics P-48, 2004, pp. 123--135.
[19]
McBrien, P., Poulovassilis, A.: Defining peer-to-peer data integration using both as view rules. In: Proc. DBISP2P, at VLDB'03. (2003) 91--107.
[20]
Antonella Poggi and Marco Ruzzi. Filling the gap between data integration and data federation. In Proceedings of the Twelfth Italian Symposium on Advanced Database Systems (SEBD). June 2004.
[21]
Patrick Ziegler and Klaus R. Dittrich. Three Decades of Data Integration - All Problems Solved? In In 18th IFIP World Computer Congress (WCC 2004), Volume 12, Building the Information Society, volume 2004, pages 3--12, 2004.
[22]
H. Kozankiewicz, K. Stencel and K. Subieta, Implementation of Federated Databases through Updateable Views. Proc. EGC 2005 - European Grid Conference, Springer LNCS 3470 (2005), 610--619.
[23]
A. Y. Halevy, A. Rajaraman, and J. J. Ordille. Data Integra- tion: the Teenage Years. In U. Dayal, K.-Y. Whang, D. B. Lomet, G. Alonso, G. M. Lohman, M. L. Kersten, S. K. Cha, and Y.-K. Kim, editors, VLDB, pp. 9--16. ACM, 2006.
[24]
H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information System, 1997.
[25]
V.S. Subrahmanian. HERMES: a Heterogeneous Reasoning and Mediator System. http://www.cs.umd.edu/project/hermes.
[26]
R. Ramakrishnan and A. Silberschatz. Scalable Integration of Data Collection on the Web. Technical report, University of Wisconsin-Madison, 1998.
[27]
Krasner, G. and Pope, T. (1988) "A cookbook for using the Model-View-Controller user-interface paradigm in Smalltalk". Journal of Object-Oriented Programming, Vol. 1, No. 3, 26--49.
[28]
Peter McBrien, Alexandra Poulovassilis, "Data Integration by Bi-Directional Schema Transformation Rules," icde, pp.227, 19th International Conference on Data Engineering (ICDE'03), 2003.
[29]
Erhard Rahm and Philip A. Bernstein. 2001. A survey of approaches to automatic schema matching. The VLDB Journal 10, 4 (December 2001), 334--350. DOI=10.1007/s007780100057 http://dx.doi.org/10.1007/s007780100057
[30]
Li Xu and David W. Embley. 2003. Discovering Direct and Indirect Matches for Schema Elements. In Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (DASFAA '03). IEEE Computer Society, Washington, DC, USA, 39--.
[31]
X.L. Sun, E. Rose, Automated Schema Matching Techniques: An Exploratory Study, Research Letters in the Information and Mathematical Sciences 4 (2003).

Cited By

View all
  • (2016)Measures for Cloud Computing Effectiveness AssessmentWeb-Based Services10.4018/978-1-4666-9466-8.ch012(251-271)Online publication date: 2016
  • (2014)Measures for Cloud Computing Effectiveness AssessmentInternational Journal of Cloud Applications and Computing10.4018/ijcac.20140701024:3(20-43)Online publication date: Jul-2014
  • (2013)Design principles for research data exportProceedings of the 8th international conference on Design Science at the Intersection of Physical and Virtual Design10.1007/978-3-642-38827-9_3(34-49)Online publication date: 11-Jun-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 36, Issue 3
May 2011
89 pages
ISSN:0163-5948
DOI:10.1145/1968587
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2011
Published in SIGSOFT Volume 36, Issue 3

Check for updates

Author Tags

  1. data integration
  2. data sharing
  3. data warehousing
  4. federated databases

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Measures for Cloud Computing Effectiveness AssessmentWeb-Based Services10.4018/978-1-4666-9466-8.ch012(251-271)Online publication date: 2016
  • (2014)Measures for Cloud Computing Effectiveness AssessmentInternational Journal of Cloud Applications and Computing10.4018/ijcac.20140701024:3(20-43)Online publication date: Jul-2014
  • (2013)Design principles for research data exportProceedings of the 8th international conference on Design Science at the Intersection of Physical and Virtual Design10.1007/978-3-642-38827-9_3(34-49)Online publication date: 11-Jun-2013
  • (2012)A framework for data warehouse federations building2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/ICSMC.2012.6378233(2897-2902)Online publication date: Oct-2012
  • (2012)Towards better cross-cloud data integrationProceedings of the 2012 international conference on Web Information Systems and Mining10.1007/978-3-642-33469-6_51(393-401)Online publication date: 26-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media