Skip to main content
Log in

Generational analysis of tension and entropy in data structures: impact on automatic data integration and on the semantic web

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The move toward automatic data integration from autonomous and heterogeneous sources is viewed as a transition from a closed to an open system, which is in essence an adaptive information processing system. Data definition languages from various computing eras spanning almost 50 years to date are examined, assessing if they have moved from closed systems to open systems paradigm. The study proves that contemporary data definition languages are indistinguishable from older ones using measurements of Variety, Tension and Entropy, three characteristics of complex adaptive systems (CAS). The conclusion is that even contemporary data definition languages designed for such integration exhibit closed systems characteristics along with open systems aspirations only. Plenty of good will is insufficient to make them more suitable for automatic data integration than their oldest predecessors. A previous report and these new findings set the stage for the development and proposal of a mathematically sound data definition language based on CAS, thus potentially making it better suited for automatic data integration from autonomous heterogeneous sources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Batini C, Lenzerini M, Navathe S (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4): 323–364

    Article  Google Scholar 

  2. Hunter A, Liu W (2005) Merging uncertain information with semantic heterogeneity in XML. Knowl Inf Syst 9(2): 230–258

    Article  Google Scholar 

  3. Rohn E (2009) Generational analysis of variety in data structures: impact on automatic data integration and on the semantic web. J Knowl Inf Syst. doi:10.1007/s10115-009-0246-7

  4. Ashby RW (1940) Adaptiveness and equilibrium. J Ment Sci 86: 478–484

    Google Scholar 

  5. Ashby RW (1947) The nervous system as physical machine: with special reference to the origin of adaptive behavior. Mind 56(221): 44–59

    Article  Google Scholar 

  6. Ashby RW (1956) An introduction to cybernetics. Chapman & Hall, London

    MATH  Google Scholar 

  7. Casti JL (1985) Canonical Models and the law of requisite variety. J Optim Theory Appl 46(4): 455–459

    Article  MATH  MathSciNet  Google Scholar 

  8. Bar-Yam Y (1997) Dynamics of complex systems. Westview Press, Cambridge

    MATH  Google Scholar 

  9. Hannon B, Ruth M (1997) Modeling dynamic biological systems. Modeling dynamic systems. Springer, Berlin

    Google Scholar 

  10. Polderman JW, Willems JC (1998) Introduction to mathematical systems theory—a behavioral approach. Texts in applied mathematics. Springer, Berlin

    Google Scholar 

  11. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423

    MATH  MathSciNet  Google Scholar 

  12. Frame M, Mandelbrot B (2003) A panorama of fractals and their uses. unknown date http://classes.yale.edu/fractals/Panorama/SocialSciences/Linguistics/Linguistics.html

  13. Buckley W (1998) Society—a complex adaptive system. International studies in global change. Gordon and Breach Publishers, New York

    Google Scholar 

  14. Mowshowitz A (1981) On approaches to the study of social issues in computing. Commun ACM 24(3): 146–155

    Article  Google Scholar 

  15. Lyytinen K (1987) Different perspectives on information systems: problems and solutions. ACM Comput Surv 19(1): 6–46

    Article  Google Scholar 

  16. Sullivan J, Vine (2003) wikimedia.org. p. License: this image is public domain. You may use this image for any purpose, including commercial. http://commons.wikimedia.org/wiki/File:Vine.jpg

  17. Buckley W (1967) Sociology and modern systems theory. Prentice-Hall Inc, Englewood Clifs

    Google Scholar 

  18. Rohn E (2007) Complex adaptive system based data integration: theory and applications, in information systems. New Jersey Institute of Technology, Newark, p. 390

  19. Raymond RC (1950) Communications, entropy, and life. Am Sci 38(April 1950): 273–278

    Google Scholar 

  20. Markus ML, Steinfield CW, Wigand RT (2003) The evolution of vertical is standards: electronic interchange standards in the US home mortgage industry. MIS Quarterly (Special Issue), 2003(2003 Special Issue)

  21. Zipf GK (1949) Human behavior and the principle of least effort: an introduction to human ecology. Addison-Wesley, Reading

    Google Scholar 

  22. Shu NC et al (1977) EXPRESS: a data extraction, processing, and restructuring system. ACM Trans Database Syst 2(2): 134–174

    Article  Google Scholar 

  23. Shu NC, Housel BC, Lum VY (1975) CONVERT: a high level translation definition language for data conversion. In: Proceedings of the 1975 ACM SIGMOD international conference on management of data. 1975. ACM Press, San Jose

  24. Sowa JF (2001) Meaning preservation in translation. http://users.best-web.net/~sowa/logic/meaning.htm

  25. Rohn E (2006) Data integration potentiometer in DERMIS. In: The 3rd international ISCRAM conference. Newark

  26. Rohn E (2007) A survey of schema standards and portals for emergency management and collaboration. In: The 4th international ISCRAM conference. Delft

  27. Rohn E, Klashner R (2001) A survey of XML standards. Internal technical report. NJIT, Newark

  28. Rohn E, Klashner R (2004) Hidden disorder in XML tags. In: Proceedings of the Americas conference on information systems. New York. http://www.aisnet.org/conf.shtml

  29. Sowa JF (1999) Knowledge representation: logical, philosophical, and computational foundations. Brooks Cole Publishing Co, Pacific Grove

    Google Scholar 

  30. Post EL (1943) Formal reduction of the general combinatorial decision problem. Am J Math 65(2): 197–215

    Article  MATH  MathSciNet  Google Scholar 

  31. Abiteboul S, Cluet S, Milo T (2002) Correspondence and translation for heterogeneous data. Theor Comput Sci 275(1–2)

    Google Scholar 

  32. Abiteboul S et al (1997) The lorel query language for semi-structured data. Int J Digit Libr 1(1): 68–88

    Article  MathSciNet  Google Scholar 

  33. Adelberg B (1998) NoDoSE—a tool for semi-automatically extracting structured and semi-structured data from text documents. ACM SIGIR 27(2): 283–294

    Article  Google Scholar 

  34. Halevy A (2005) Why your data won’t mix: semantic heterogeneity. ACM Queue 3(8): 50–58

    Article  Google Scholar 

  35. Halevy AY et al. (2005) Enterprise information integration: successes, challenges and controversies. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM Press, Baltimore. doi:10.1145/1066157.1066246

  36. Katz RH (1980) Heterogeneous databases and high level abstraction. In: Proceedings of the workshop on data abstraction, databases and conceptual modelling. ACM Press, Pingree Park

  37. Kirk T et al (1995) The information manifold. Information gathering from heterogeneous, distributed environments, ed. Knoblock CA, Levy A

  38. Lahiri T, Abiteboul S, Widom J (2000) Ozone: integrating structured and semistructured data. In: Revised papers from the 7th international workshop on database programming languages: research issues in structured and semistructured database programming: Springer-Verlag

  39. Lee J, Malone T (1990) PARTIALLY SHARED VIEWS a scheme for communicating among groups that use different type hierarchies. ACM Trans Inf Syst(TOIS) 8(1): 1–26

    Article  Google Scholar 

  40. Levi AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: The 22nd international conference on very large databases (VLDB-96). Bombay

  41. Li W-S, Clifton C (1995) Semint: a system prototype for semantic integration in heterogeneous databases. In: ACM SIGMOD international conference on Management of data. ACM Press, San Jose

  42. Liu L, Pu C, Han W (2000) XWRAP: an XML-enabled wrapper construction system for web information sources. In: 16th International conference on data engineering (ICDE’00). San Diego, California, p 611. http://www.computer.org/portal/web/csdl/proceedings/i#5

  43. Liu S et al (2005) XSDL: making XML semantics explicit. Lect Notes Comput Sci 3372(2005): 64–83

    Article  Google Scholar 

  44. Metadatabase (2003) An information integration theory and reference model. http://viu.eng.rpi.edu/mdb/iitrm.html

  45. Miller RJ, Loannidis YE, Ramakrishnan R (1994) Schema equivalence in heterogeneous systems: bridging theory and practice. Inf Syst Front 19(1): 3–31

    Google Scholar 

  46. Motro A, Buneman P (1981) Constructing superviews. In: SIGMOD international conference on management of data. ACM Press, Ann Arbor

  47. Noy NF (2004) Semantic integration: a survey of ontology-based approaches. SIGMOD Rec 33(4): 65–70

    Article  Google Scholar 

  48. Quass D et al. (1995) Querying semistructured heterogeneous information. In: Deductive and object-oriented databases, pp. 319–344

  49. Sanderson M, van Rijsbergen C (1999) The impact on retrieval effectiveness of skewed frequency distributions. ACM Trans Inf Syst 17(4): 440–465

    Article  Google Scholar 

  50. Smith JM et al (1981) Multibase—integrating heterogeneous distributed database systems. In: National computer conference. AFIPS, Montvale

  51. Stohr E, Nickerson JV (2003) Enterprise Integration: Methods and Direction. In: Luftman J. (eds) Competing in the information age: Align in the sand. Oxford University Press, Oxford

    Google Scholar 

  52. Swartwout D, Fry JP (1978) Towards the support of integrated views of multiple databases: an aggregate schema facility. In: SIGMOD. ACM, Austin

  53. Uschold M, Gruninger M (2004) Ontologies and semantics for seamless connectivity. SIGMOD Rec 33(4): 58–64

    Article  Google Scholar 

  54. Von-Wun S et al (2003) Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques. In: Proceedings of the 3rd ACM/IEEE-CS joint conference on digital libraries. IEEE Computer Society, Houston

  55. Wade AE (1993) Single logical view over enterprise-wide distributed databases. In: ACM SIGMOD international conference on management of data. ACM Press, Washington

  56. Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Comput 25(3): 38–49

    Google Scholar 

  57. Yan LL (1997) Towards efficient and scalable mediation: the AURORA approach. In: Proceedings of the 1997 conference of the centre for advanced studies on collaborative research. p. 23

  58. Zhang YT, Gong L, Wang YC (2005) Corpus-based word sense disambiguation using naive Bayesian. Zhongnan Daxue Xuebao (Ziran Kexue Ban)/J Central South University (Sci Technol) 36(SUPPL): 483

    Google Scholar 

  59. Ahuja A, Ng Y-K (2009) A dynamic attribute-based data filtering and recovery scheme for web information processing. Knowl Inf Syst 18(3): 263–291

    Article  Google Scholar 

  60. Yang J, Cheung WK, Chen X (2009) Learning element similarity matrix for semi-structured document analysis. Knowl Inf Syst 19(1): 53–78

    Article  Google Scholar 

  61. Hurford J (1987) Biological evolution of the Saussurean sign as a component of the language acquisition device. Lingua 77(2): 187–222

    Article  Google Scholar 

  62. Komarova N, Nowak MA (2001) The evolutionary dynamics of the lexical matrix. Bull Mathe Biol 63: 451–484

    Article  Google Scholar 

  63. Chklovski T et al (2004) The senseval-3 multilingual English–Hindi lexical sample task. In: Third international workshop on the evaluation of systems for the semantic analysis of text. Barcelona

  64. Nowak MA, Krakauer DC (1999) The evolution of language. Proc Natl Acad Sci USA 96(14): 8028–8033

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eli Rohn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rohn, E. Generational analysis of tension and entropy in data structures: impact on automatic data integration and on the semantic web. Knowl Inf Syst 28, 175–196 (2011). https://doi.org/10.1007/s10115-010-0314-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0314-z

Keywords

Navigation