Abstract
Intelligent integration of information continues to challenge database research for over 35 years. While data integration processes of all kinds are now reasonably well understood and widely used in practice, the growth and heterogeneity of data requires much higher degrees of automation to limit the need for human specialist work. This requires deeper insights in data-centric approaches of Enterprise Information Integration which focus on the semantics of information integration. Recent formalizations and algorithms enable both significant improvement in schema integration, and in its automated transformation to efficient data-level integration, in a wide variety of architectural settings such as data warehouses or peer-to-peer databases. In addition to giving a short overview of developments in this field for the past 20 years, this paper focuses particularly on the challenges posed by heterogeneity in data models.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
FCD is transmitted using some Car-To-Car (C2C) or Car-to-Infrastructure communication service (C2I, Stubing et al. 2010). C2C and C2I communication is summarized under the term C2X (Car-to-X) communication.
References
Abiteboul, S., Hull, R., Vianu, V. (1995). Foundations of databases. Addison-Wesley.
Arenas, M., Barceló, P., Libkin, L., Murlak, F. (2010). Relational and XML data exchange. Synthesis lectures on data management. Morgan & Claypool Publishers.
Arens, Y., Knoblock, C. A., & Shen, W.-M. (1996). Query reformulation for dynamic information integration. Journal of Intelligent Information Systems, 6(2–3), 99–130.
Atzeni, P., & Torlone, R. (1996). Management of multiple models in an extensible database design tool. In P. M. G. Apers, M. Bouzeghoub, & G. Gardarin (Eds.), Proc. 5th international conference on extending database technology (EDBT) (Lecture Notes in Computer Science, Vol. 1057, pp. 79–95). Avignon: Springer.
Atzeni, P., Cappellari, P., & Bernstein, P. A. (2005). A multilevel dictionary for model management. In L. M. L. Delcambre, C. Kop, H. C. Mayr, J. Mylopoulos, & O. Pastor (Eds.), Proc. 24th international conference on conceptual modeling (ER) (Lecture Notes in Computer Science, Vol. 3716, pp. 160–175). Klagenfurt: Springer.
Atzeni, P., Cappellari, P., & Bernstein, P. A. (2006). Model-independent schema and data translation. In Y. E. Ioannidis, M. H. Scholl, J. W. Schmidt, F. Matthes, M. Hatzopoulos, K. B¨ohm, A. Kemper, T. Grust, & C. Böhm (Eds.), Proc. 10th international conference on extending database technology (EDBT) (Lecture Notes in Computer Science, Vol. 3896, pp. 368–385). Munich: Springer.
Atzeni, P., Bellomarini, L., Bugiotti, F., & Gianforme, G. (2009). Mism: A platform for model-independent solutions to model management problems. Journal of Data Semantics, 14, 133–161.
Aumueller, D., Do, H. H., Massmann, S., & Rahm, E. (2005). Schema and ontology matching with COMA++. In F. Ozcan (Ed.), Proceedings of the ACM SIGMOD international conference on management of data (pp. 906–908). Baltimore: ACM.
Bachman, C.W., Daya, M. (1977). The role concept in data models. In: Proceedings of the Third International Conference on Very Large Data Bases (VLDB), pp. 464–476. IEEE-CS and ACM, Tokyo, Japan.
Batini, C., Lenzerini, M., & Navathe, S. B. (1986). A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4), 323–364.
Baumeister, M., & Jarke, M. (1999). Compaction of large class hierarchies in databases for chemical engineering. proceedings 8. gi-fachtagung für datenbanksysteme in büro, technik und wissenschaft (BTW) (pp. 343–361). Freiburg: Springer.
Beeri, C., & Vardi, M. Y. (1984). A proof procedure for data dependencies. Journal of the ACM, 31(4), 718–741.
Bergamaschi, S., Castano, S., Vincini, M., & Beneventano, D. (2001). Semantic integration of heterogeneous information sources. Data & Knowledge Engineering, 36(3), 215–249.
Bernstein, P. A., & Haas, L. M. (2008). Information integration in the enterprise. Communications of the ACM, 51(9), 72–79.
Bernstein, P. A., & Melnik, S. (2007). Model management 2.0: Manipulating richer mappings. In L. Zhou, T. W. Ling, & B. C. Ooi (Eds.), Proc. ACM SIGMOD intl. conf. on management of data (pp. 1–12). Beijing: ACM Press. doi:10.1145/1247480.1247482.
Bernstein, P. A., Halevy, A. Y., & Pottinger, R. (2000). A vision for management of complex models. SIGMOD Record, 29(4), 55–63.
Bernstein, P. A., Melnik, S., Petropoulos, M., & Quix, C. (2004). Industrialstrength schema matching. SIGMOD Record, 33(4), 38–43.
Bernstein, P.A., Green, T.J., Melnik, S., Nash, A. (2006). Implementing mapping composition. In: U. Dayal, K.Y. Whang, D.B. Lomet, G. Alonso, G.M. Lohman, M.L. Kersten, S.K. Cha, Y.K. Kim (eds.) Proc. 32nd Intl. Conference on Very Large Data Bases (VLDB), pp. 55–66. ACM Press.
Biskup, J., & Convent, B. (1986). A formal view integration method. In C. Zaniolo (Ed.), Proc. ACM SIGMOD intl. conf. on management of data (pp. 398–407). Washington: ACM Press.
Brandt, S. C., Morbach, J., Miatidis, M., Theißen, M., Jarke, M., & Marquardt, W. (2008). An ontology-based approach to knowledge management in design processes. Computers & Chemical Engineering, 32(1–2), 320–342.
Brodie, M.L. (2010). Data integration at scale: From relational data integration to information ecosystems. In: Proc. 24th IEEE Intl. Conf. on Advanced Information Networking and Applications (AINA), pp. 2–3. IEEE Computer Society, Perth, Australia.
Calì, A., Calvanese, D., Giacomo, G. D., & Lenzerini, M. (2004). Data integration under integrity constraints. Information Systems, 29(2), 147–163. doi:10.1016/S0306-4379(03)00050-4.
Calvanese, D., Giacomo, G. D., Lenzerini, M., Nardi, D., & Rosati, R. (1998). Description logic framework for information integration. In A. G. Cohn, L. K. Schubert, & S. C. Shapiro (Eds.), Proceedings of the sixth international conference on principles of knowledge representation and reasoning (KR’98) (pp. 2–13). Trento: Morgan Kaufmann.
Calvanese, D., Giacomo, G. D., Lenzerini, M., Nardi, D., & Rosati, R. (2001). Data Integration in Data Warehousing. International Journal of Cooperative Information Systems (IJCIS), 10(3), 237–271.
Casanova, M.A., Vidal, V.M.P. (1983). Towards a sound view integration methodology. In: Proc. 2nd ACM Symposium on Principles of Database Systems (PODS), pp. 36–47. ACM, Atlanta, GA.
Cattell, R. (2010). Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4), 12–27.
Ceri, S., Pelagatti, G. (1984). Distributed databases: principles and systems. McGraw-Hill Book Company.
Collet, C., Huhns, M. N., & Shen, W. M. (1991). Resource integration using a large knowledge base in carnot. IEEE Computer, 24(12), 55–62.
d’Aquin, M., & Motta, E. (2011). Watson, more than a semantic web search engine. Semantic Web Journal, 2(1), 55–63. http://www.semantic-web-journal.net/content/new-submission-watson-more-semantic-web-search-engine.
Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R., Peng, Y., Reddivari, P. et al. (2004). Swoogle: a search and metadata engine for the semantic web. In: Proc. CIKM.
Do, H.H., Rahm, E. (2002). Coma -a system for flexible combination of schema matching approaches. In: Proc. 28th Intl. Conference on Very Large Data Bases (VLDB), pp. 610–621. Morgan Kaufmann, Hong Kong, China.
Dolk, D.R. (1988). Model management and structured modeling: the role of an information resource dictionary system. Communications of the ACM 31(6).
Duchateau, F., Coletta, R., Bellahsene, Z., & Miller, R. J. (2009). (Not) yet another matcher. In D. W. L. Cheung, I. Y. Song, W. W. Chu, X. Hu, & J. J. Lin (Eds.), Proc. 18th ACM conference on information and knowledge management (CIKM) (pp. 1537–1540). Hong Kong: ACM.
Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., & dos Santos, C. T. (2011). Ontology alignment evaluation initiative: 6 years of experience. Journal on Data Semantics, 15, 158–192.
Fagin, R., Kolaitis, P., Miller, R. J., & Popa, L. (2005a). Data exchange: Semantics and query answering. Theoretical Computer Science, 336, 89–124.
Fagin, R., Kolaitis, P. G., Popa, L., & Tan, W. C. (2005b). Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems, 30(4), 994–1055.
Fagin, R., Haas, L.M., Hern’andez, M.A., Miller, R.J., Popa, L., Velegrakis, Y. (2009). Clio: Schema mapping creation and data exchange. In: A. Borgida, V.K. Chaudhri, P. Giorgini, E.S.K. Yu (eds.) Conceptual Modeling: Foundations and Applications, Lecture Notes in Computer Science, vol. 5600, pp. 198–236. Springer.
Fuxman, A., Hernández, M.A., Ho, C.T.H., Miller, R.J., Papotti, P., Popa, L. (2006). Nested mappings: Schema mapping reloaded. In: U. Dayal, K.Y. Whang, D.B. Lomet, G. Alonso, G.M. Lohman, M.L. Kersten, S.K. Cha, Y.K. Kim (eds.) Proc. 32nd Intl. Conference on Very Large Data Bases (VLDB), pp. 67–78. ACM Press.
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J. D., et al. (1997). The tsimmis approach to mediation: Data models and languages. Journal of Intelligent Information Systems, 8(2), 117–132.
Geisler, S., Quix, C., Schiffer, S., & Jarke, M. (2012). An evaluation framework for traffic information systems based on data streams. Transportation Research Part C, 23, 29–55.
Genesereth, M. R., Keller, A. M., & Duschka, O. M. (1997). Infomaster: An information integration system. In J. Peckham (Ed.), Proceedings of the ACM SIGMOD international conference on management of data (pp. 539–542). Tucson: ACM Press.
Haas, L. M. (2007). Beauty and the beast: The theory and practice of information integration. In T. Schwentick & D. Suciu (Eds.), ICDT, lecture notes in computer science (Vol. 4353, pp. 28–43). Barcelona: Springer.
Haas, L. M., Hernández, M. A., Ho, H., Popa, L., & Roth, M. (2005). Clio grows up: From research prototype to industrial tool. In F. Ozcan (Ed.), Proceedings of the ACM SIGMOD international conference on management of data (pp. 805–810). Baltimore: ACM.
Halevy, A. Y. (2001). Answering queries using views: A survey. VLDB Journal, 10(4), 270–294.
Halevy, A. Y., Ives, Z. G., Madhavan, J., Mork, P., Suciu, D., & Tatarinov, I. (2004). The piazza peer data management system. IEEE Transactions on Knowledge and Data Engineering, 16(7), 787–798. doi:10.1109/TKDE.2004.1318562.
Haslhofer, B., Klas, W. (2010). A survey of techniques for achieving metadata interoperability. ACM Comput. Surv. 42(2).
Hernández, M.A., Miller, R.J., Haas, L.M. (2001). Clio: A semi-automatic tool for schema mapping. In: Proc. ACM SIGMOD Intl. Conference on the Management of Data, p. 607. ACM Press, Santa Barbara, CA.
ISO/IEC (1990). Information technology—Information Resource Dictionary System (IRDS) framework. International Standard ISO/IEC 10027:1990, ISO International Organization for Standardization.
ISO/IEC (2005). Information technology -Meta Object Facility (MOF). International Standard ISO/IEC 19502:2005, ISO International Organization for Standardization.
Jarke, M., Gallersdörfer, R., Jeusfeld, M. A., & Staudt, M. (1995). ConceptBase -a deductive object base for meta data management. Journal of Intelligent Information Systems, 4(2), 167–192.
Jarke, M., Jeusfeld, M. A., Quix, C., & Vassiliadis, P. (1999). Architecture and Quality in Data Warehouses: An Extended Repository Approach. Information Systems, 24(3), 229–253.
Jarke, M., Lenzerini, M., Vassiliou, Y., Vassiliadis, P. (eds.) (2003). Fundamentals of data warehouses, 2 edn. Springer-Verlag.
Jarke, M., Jeusfeld, M., Nissen, H., Quix, C., Staudt, M. (2009). Metamodelling with datalog and classes: Conceptbase at the age of 21. In: Proc. 2nd Intl. Conf. Object Databases (ICOODB 09), pp. 95–112. Springer-Verlag.
Jean-Mary, Y. R., Shironoshita, E. P., & Kabuka, M. R. (2009). Ontology matching with semantic verification. Journal of Web Semantics, 7(3), 235–251.
Jeusfeld, M.A. (1992). Änderungskontrolle in deduktiven Objektbanken. PhD thesis, Universität Passau.
Jeusfeld, M. A., & Johnen, U. A. (1995). An executable meta model for reengineering of database schemas. Intl. Journal of Cooperative Information Systems, 4(2–3), 237–258.
Kensche, D., Quix, C., Chatti, M.A., Jarke, M. (2007). GeRoMe: A generic role based metamodel for model management. Journal on Data Semantics VIII, 82–117.
Kensche, D., Quix, C., Li, X., Li, Y. (2007). GeRoMeSuite: A system for holistic generic model management. In: C. Koch, J. Gehrke, M.N. Garofalakis, D. Srivastava, K. Aberer, A. Deshpande, D. Florescu, C.Y. Chan, V. Ganti, C.C. Kanne, W. Klas, E.J. Neuhold (eds.) Proceedings 33rd Intl. Conf. on Very Large Data Bases (VLDB), pp. 1322–1325. Vienna, Austria.
Kensche, D., Quix, C., Li, X., Li, Y., & Jarke, M. (2009). Generic schema mappings for composition and query answering. Data & Knowledge Engineering, 68(7), 599–621. doi:10.1016/j.datak.2009.02.006.
Kerner, B., Demir, C., Herrtwich, R., Klenov, S., Rehborn, H., Aleksic, M. et al. (2005). Traffic state detection with floating car data in road networks. In: Proceedings of the 8th International IEEE Conference on Intelligent Transportation Systems, pp. 700–705. Daimler Chrysler AG.
Kirk, T., Levy, A.Y., Sagiv, Y., Srivastava, D. (1995). The Information Manifold. In: Proceedings of the AAAI 1995 Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, pp. 85–91.
Lenzerini, M. (2002). Data integration: A theoretical perspective. In L. Popa (Ed.), Proc. 21st ACM symposium on principles of database systems (PODS) (pp. 233–246). Madison: ACM Press. doi:10.1145/543613.543644.
Li, X., & Quix, C. (2011). Merging relational views: A minimization approach. In M. A. Jeusfeld, L. M. L. Delcambre, & T. W. Ling (Eds.), Proc. 30th intl. conference on conceptual modeling (ER 2011) (Lecture Notes in Computer Science, Vol. 6998, pp. 379–392). Brussels: Springer.
Li, X., Quix, C., Kensche, D., & Geisler, S. (2010). Automatic schema merging using mapping constraints among incomplete sources. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, & A. An (Eds.), Proc. 19th ACM conf. on information and knowledge management (CIKM) (pp. 299–308). Toronto: ACM.
Li, X., Quix, C., Kensche, D., Geisler, S., & Guo, L. (2011). Automatic generation of mediated schemas through reasoning over data dependencies. In S. Abiteboul, K. B¨ohm, C. Koch, & K. L. Tan (Eds.), Proc. 27th intl. conf. on data engineering (ICDE) (pp. 1280–1283). Hannover: IEEE Computer Society.
Madhavan, J., & Halevy, A. Y. (2003). Composing mappings among data sources. In J. C. Freytag, P. C. Lockemann, S. Abiteboul, M. J. Carey, P. G. Selinger, & A. Heuer (Eds.), Proc. of 29th intl. conference on very large data bases (VLDB) (pp. 572–583). Berlin: Morgan Kaufmann.
Melnik, S., Rahm, E., & Bernstein, P. A. (2003a). Developing metadata-intensive applications with Rondo. Journal of Web Semantics, 1(1), 47–74.
Melnik, S., Rahm, E., Bernstein, P.A. (2003). Rondo: A programming platform for generic model management. In: Proc. ACM SIGMOD Intl. Conference on Management of Data, pp. 193–204. ACM, San Diego, CA.
Melnik, S., Bernstein, P. A., Halevy, A. Y., & Rahm, E. (2005). Supporting executable mappings in model management. In F. Ozcan (Ed.), Proceedings of the ACM SIGMOD international conference on management of data (pp. 167–178). Baltimore: ACM.
Miller, R. J., Haas, L. M., & Hernández, M. A. (2000). Schema mapping as query discovery. In A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, & K. Y. Whang (Eds.), Proc. 26th intl. conference on very large data bases (VLDB) (pp. 77–88). Cairo: Morgan Kaufmann.
Mork, P., Bernstein, P.A., Melnik, S. (2007). Teaching a schema translator to produce o/r views. In: Proc. 26th Intl. Conf. on Conceptual Modeling (ER’07), LNCS, vol. 4801, pp. 102–119. Springer.
Mylopoulos, J., Borgida, A., Jarke, M., & Koubarakis, M. (1990). Telos: Representing Knowledge About Information Systems. ACM Transactions on Information Systems, 8(4), 325–362.
Nardi, D., Brachman, R.J. (2003). An introduction to description logics. In: F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider (eds.) Description Logic Handbook. Cambridge University Press.
Nissen, H. W., & Jarke, M. (1999). Repository Support for Multi-Perspective Requirements Engineering. Information Systems, 24(2), 131–158.
Parent, C., & Spaccapietra, S. (1998). Issues and approaches of database integration. Communications of the ACM, 41(5), 166–178.
Pottinger, R., & Bernstein, P. A. (2003). Merging models based on given correspondences. In J. C. Freytag, P. C. Lockemann, S. Abiteboul, M. J. Carey, P. G. Selinger, & A. Heuer (Eds.), Proc. of 29th intl. conference on very large data bases (VLDB) (pp. 826–873). Berlin: Morgan Kaufmann.
Pottinger, R., & Halevy, A. Y. (2001). Minicon: A scalable algorithm for answering queries using views. VLDB Journal, 10(2–3), 182–198.
Quix, C. (2009). Meta data repository. In: L. Liu, M.T. Ozsu (eds.) Encyclopedia of Database Systems, pp. 1718–1722. Springer.
Quix, C., Kensche, D., & Li, X. (2007). Matching of ontologies with xml schemas using a generic metamodel. In R. Meersman & Z. Tari (Eds.), Proc. OTM confederated international conf. CoopIS/DOA/ODBASE/GADA/IS (Lecture Notes in Computer Science, Vol. 4803, pp. 1081–1098). Vilamoura: Springer.
Quix, C., Geisler, S., Kensche, D., Li, X.: Results of GeRoMesuite for OAEI 2008. In: Proc. 3rd Intl. Workshop On Ontology Matching (OM2008) (2008). URL http://data.semanticweb.org/workshop/om/2008/paper/main/13.
Quix, C., Geisler, S., Kensche, D., & Li, X. (2009). Results of geromesuite for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), Proc. 4th intl. workshop on ontology matching (CEUR Workshop Proceedings, Vol. 551). Chantilly: CEUR-WS.org.
Quix, C., Roy, P., Kensche, D. (2011). Automatic selection of background knowledge for ontology matching. In: Proc. Intl. Workshop on Semantic Web Information Management (SWIM), pp. 5:1–5:7. ACM, New York, NY, USA.
Rahm, E., & Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB Journal, 10(4), 334–350.
Ramesh, B., & Jarke, M. (2001). Toward Reference Models of Requirements Traceability. IEEE Transactions on Software Engineering, 27(1), 58–93.
Richardson, J., Schwarz, P. (1991). Aspects: extending objects to support multiple, independent roles. In: Proc. ACM SIGMOD Intl. Conference on Management of Data, pp. 298–307. Denver, CO.
Shmueli, O. (1993). Equivalence of datalog queries is undecidable. Journal of Logic Programming, 15(3), 231–241.
Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., & Lum, V. Y. (1977). EXPRESS: A Data EXtraction, Processing, amd REStructuring System. ACM Transactions on Database Systems, 2(2), 134–174.
Shvaiko, P., Euzenat, J. (2005). A survey of schema-based matching approaches. Journal on Data Semantics IV, 146–171. LNCS 3730.
Shvaiko, P., Euzenat, J. (2012). Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering. To appear, preprint available at http://www.dit.unitn.it/~p2p/RelatedWork/Matching/SurveyOMtkde_SE.pdf.
Singh, M. P., Cannata, P. E., Jacobs, N., Ksiezyk, T., Ong, K., Sheth, A. P., et al. (1997). The carnot heterogeneous database project: Implemented applications. Distributed and Parallel Databases, 5(2), 207–225.
Smith, M. (2007). Toward enterprise information integration. Software Magazine. URL http://www.softwaremag.com/content/ContentCT.aspP=3034.
Spaccapietra, S., & Parent, C. (1994). View integration: A step forward in solving structural conflicts. IEEE Transactions on Knowledge and Data Engineering, 6(2), 258–274.
Staudt, M., & Jarke, M. (2000). View Management Support in Advanced Knowledge Base Servers. Journal Intelligent Information Systems, 15(3), 253–285.
Stonebraker, M. (2010). SQL databases v. NoSQL databases. Communications of the ACM, 53(4), 10–11.
Stubing, H., Bechler, M., Heussner, D., May, T., Radusch, I., Rechner, H., et al. (2010). simtd: A car-to-x system architecture for field operational tests. IEEE Communications Magazine, 48(5), 148–154.
Vogels, W. (2007). Data access patterns in the amazon.com technology platform. In: C. Koch, J. Gehrke, M.N. Garofalakis, D. Srivastava, K. Aberer, A. Deshpande, D. Florescu, C.Y. Chan, V. Ganti, C.C. Kanne, W. Klas, E.J. Neuhold (eds.) Proceedings 33rd Intl. Conf. on Very Large Data Bases (VLDB), p. 1. Vienna, Austria.
Wiederhold, G. (1992). Mediators in the architecture of future information systems. IEEE Computer, 25(3), 38–49.
Wiederhold, G (ed.). (1996). Special Issue on Intelligent Integration of Information. Journal of Intelligent Information Systems 6(2–3), 93–291.
Wong, R. K., Chau, H. L., & Lochovsky, F. H. (1997). A data model and semantics of objects with dynamic roles. In A. Gray & P. A. Larson (Eds.), Proceedings of the 13th international conference on data engineering (ICDE) (pp. 402–411). Birmingham: IEEE Computer Society.
Zhou, G., Hull, R., & King, R. (1996). Generating data integration mediators that use materialization. Journal of Intelligent Information Systems, 6(2–3), 199–221.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jarke, M., Jeusfeld, M. & Quix, C. Data-centric intelligent information integration—from concepts to automation. J Intell Inf Syst 43, 437–462 (2014). https://doi.org/10.1007/s10844-014-0340-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-014-0340-5