Abstract
The move toward automatic data integration from autonomous and heterogeneous sources is viewed as a transition from a closed to an open system, which is in essence an adaptive information processing system. Data definition languages from various computing eras spanning almost 50 years to date are examined, assessing if they have moved from closed systems to open systems paradigm. The study proves that contemporary data definition languages are indistinguishable from older ones using measurements of Variety, Tension and Entropy, three characteristics of complex adaptive systems (CAS). The conclusion is that even contemporary data definition languages designed for such integration exhibit closed systems characteristics along with open systems aspirations only. Plenty of good will is insufficient to make them more suitable for automatic data integration than their oldest predecessors. A previous report and these new findings set the stage for the development and proposal of a mathematically sound data definition language based on CAS, thus potentially making it better suited for automatic data integration from autonomous heterogeneous sources.
Similar content being viewed by others
References
Batini C, Lenzerini M, Navathe S (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4): 323–364
Hunter A, Liu W (2005) Merging uncertain information with semantic heterogeneity in XML. Knowl Inf Syst 9(2): 230–258
Rohn E (2009) Generational analysis of variety in data structures: impact on automatic data integration and on the semantic web. J Knowl Inf Syst. doi:10.1007/s10115-009-0246-7
Ashby RW (1940) Adaptiveness and equilibrium. J Ment Sci 86: 478–484
Ashby RW (1947) The nervous system as physical machine: with special reference to the origin of adaptive behavior. Mind 56(221): 44–59
Ashby RW (1956) An introduction to cybernetics. Chapman & Hall, London
Casti JL (1985) Canonical Models and the law of requisite variety. J Optim Theory Appl 46(4): 455–459
Bar-Yam Y (1997) Dynamics of complex systems. Westview Press, Cambridge
Hannon B, Ruth M (1997) Modeling dynamic biological systems. Modeling dynamic systems. Springer, Berlin
Polderman JW, Willems JC (1998) Introduction to mathematical systems theory—a behavioral approach. Texts in applied mathematics. Springer, Berlin
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423
Frame M, Mandelbrot B (2003) A panorama of fractals and their uses. unknown date http://classes.yale.edu/fractals/Panorama/SocialSciences/Linguistics/Linguistics.html
Buckley W (1998) Society—a complex adaptive system. International studies in global change. Gordon and Breach Publishers, New York
Mowshowitz A (1981) On approaches to the study of social issues in computing. Commun ACM 24(3): 146–155
Lyytinen K (1987) Different perspectives on information systems: problems and solutions. ACM Comput Surv 19(1): 6–46
Sullivan J, Vine (2003) wikimedia.org. p. License: this image is public domain. You may use this image for any purpose, including commercial. http://commons.wikimedia.org/wiki/File:Vine.jpg
Buckley W (1967) Sociology and modern systems theory. Prentice-Hall Inc, Englewood Clifs
Rohn E (2007) Complex adaptive system based data integration: theory and applications, in information systems. New Jersey Institute of Technology, Newark, p. 390
Raymond RC (1950) Communications, entropy, and life. Am Sci 38(April 1950): 273–278
Markus ML, Steinfield CW, Wigand RT (2003) The evolution of vertical is standards: electronic interchange standards in the US home mortgage industry. MIS Quarterly (Special Issue), 2003(2003 Special Issue)
Zipf GK (1949) Human behavior and the principle of least effort: an introduction to human ecology. Addison-Wesley, Reading
Shu NC et al (1977) EXPRESS: a data extraction, processing, and restructuring system. ACM Trans Database Syst 2(2): 134–174
Shu NC, Housel BC, Lum VY (1975) CONVERT: a high level translation definition language for data conversion. In: Proceedings of the 1975 ACM SIGMOD international conference on management of data. 1975. ACM Press, San Jose
Sowa JF (2001) Meaning preservation in translation. http://users.best-web.net/~sowa/logic/meaning.htm
Rohn E (2006) Data integration potentiometer in DERMIS. In: The 3rd international ISCRAM conference. Newark
Rohn E (2007) A survey of schema standards and portals for emergency management and collaboration. In: The 4th international ISCRAM conference. Delft
Rohn E, Klashner R (2001) A survey of XML standards. Internal technical report. NJIT, Newark
Rohn E, Klashner R (2004) Hidden disorder in XML tags. In: Proceedings of the Americas conference on information systems. New York. http://www.aisnet.org/conf.shtml
Sowa JF (1999) Knowledge representation: logical, philosophical, and computational foundations. Brooks Cole Publishing Co, Pacific Grove
Post EL (1943) Formal reduction of the general combinatorial decision problem. Am J Math 65(2): 197–215
Abiteboul S, Cluet S, Milo T (2002) Correspondence and translation for heterogeneous data. Theor Comput Sci 275(1–2)
Abiteboul S et al (1997) The lorel query language for semi-structured data. Int J Digit Libr 1(1): 68–88
Adelberg B (1998) NoDoSE—a tool for semi-automatically extracting structured and semi-structured data from text documents. ACM SIGIR 27(2): 283–294
Halevy A (2005) Why your data won’t mix: semantic heterogeneity. ACM Queue 3(8): 50–58
Halevy AY et al. (2005) Enterprise information integration: successes, challenges and controversies. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM Press, Baltimore. doi:10.1145/1066157.1066246
Katz RH (1980) Heterogeneous databases and high level abstraction. In: Proceedings of the workshop on data abstraction, databases and conceptual modelling. ACM Press, Pingree Park
Kirk T et al (1995) The information manifold. Information gathering from heterogeneous, distributed environments, ed. Knoblock CA, Levy A
Lahiri T, Abiteboul S, Widom J (2000) Ozone: integrating structured and semistructured data. In: Revised papers from the 7th international workshop on database programming languages: research issues in structured and semistructured database programming: Springer-Verlag
Lee J, Malone T (1990) PARTIALLY SHARED VIEWS a scheme for communicating among groups that use different type hierarchies. ACM Trans Inf Syst(TOIS) 8(1): 1–26
Levi AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: The 22nd international conference on very large databases (VLDB-96). Bombay
Li W-S, Clifton C (1995) Semint: a system prototype for semantic integration in heterogeneous databases. In: ACM SIGMOD international conference on Management of data. ACM Press, San Jose
Liu L, Pu C, Han W (2000) XWRAP: an XML-enabled wrapper construction system for web information sources. In: 16th International conference on data engineering (ICDE’00). San Diego, California, p 611. http://www.computer.org/portal/web/csdl/proceedings/i#5
Liu S et al (2005) XSDL: making XML semantics explicit. Lect Notes Comput Sci 3372(2005): 64–83
Metadatabase (2003) An information integration theory and reference model. http://viu.eng.rpi.edu/mdb/iitrm.html
Miller RJ, Loannidis YE, Ramakrishnan R (1994) Schema equivalence in heterogeneous systems: bridging theory and practice. Inf Syst Front 19(1): 3–31
Motro A, Buneman P (1981) Constructing superviews. In: SIGMOD international conference on management of data. ACM Press, Ann Arbor
Noy NF (2004) Semantic integration: a survey of ontology-based approaches. SIGMOD Rec 33(4): 65–70
Quass D et al. (1995) Querying semistructured heterogeneous information. In: Deductive and object-oriented databases, pp. 319–344
Sanderson M, van Rijsbergen C (1999) The impact on retrieval effectiveness of skewed frequency distributions. ACM Trans Inf Syst 17(4): 440–465
Smith JM et al (1981) Multibase—integrating heterogeneous distributed database systems. In: National computer conference. AFIPS, Montvale
Stohr E, Nickerson JV (2003) Enterprise Integration: Methods and Direction. In: Luftman J. (eds) Competing in the information age: Align in the sand. Oxford University Press, Oxford
Swartwout D, Fry JP (1978) Towards the support of integrated views of multiple databases: an aggregate schema facility. In: SIGMOD. ACM, Austin
Uschold M, Gruninger M (2004) Ontologies and semantics for seamless connectivity. SIGMOD Rec 33(4): 58–64
Von-Wun S et al (2003) Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques. In: Proceedings of the 3rd ACM/IEEE-CS joint conference on digital libraries. IEEE Computer Society, Houston
Wade AE (1993) Single logical view over enterprise-wide distributed databases. In: ACM SIGMOD international conference on management of data. ACM Press, Washington
Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Comput 25(3): 38–49
Yan LL (1997) Towards efficient and scalable mediation: the AURORA approach. In: Proceedings of the 1997 conference of the centre for advanced studies on collaborative research. p. 23
Zhang YT, Gong L, Wang YC (2005) Corpus-based word sense disambiguation using naive Bayesian. Zhongnan Daxue Xuebao (Ziran Kexue Ban)/J Central South University (Sci Technol) 36(SUPPL): 483
Ahuja A, Ng Y-K (2009) A dynamic attribute-based data filtering and recovery scheme for web information processing. Knowl Inf Syst 18(3): 263–291
Yang J, Cheung WK, Chen X (2009) Learning element similarity matrix for semi-structured document analysis. Knowl Inf Syst 19(1): 53–78
Hurford J (1987) Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua 77(2): 187–222
Komarova N, Nowak MA (2001) The evolutionary dynamics of the lexical matrix. Bull Mathe Biol 63: 451–484
Chklovski T et al (2004) The senseval-3 multilingual English–Hindi lexical sample task. In: Third international workshop on the evaluation of systems for the semantic analysis of text. Barcelona
Nowak MA, Krakauer DC (1999) The evolution of language. Proc Natl Acad Sci USA 96(14): 8028–8033
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rohn, E. Generational analysis of tension and entropy in data structures: impact on automatic data integration and on the semantic web. Knowl Inf Syst 28, 175–196 (2011). https://doi.org/10.1007/s10115-010-0314-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0314-z