Skip to main content
Log in

A metadata approach to statistical query processing

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Concerning the task of integrating census and survey data from different sources as it is carried out by supranational statistical agencies, a formal metadata approach is investigated which supports data integration and table processing simultaneously. To this end, a metadata model is devised such that statistical query processing is accomplished by means of symbolic reasoning on machine-readable, operative metadata. As in databases, statistical queries are stated as formal expressions specifying declaratively what the intended output is; the operations necessary to retrieve appropriate available source data and to aggregate source data into the requested macrodata are derived mechanically. Using simple mathematics, this paper focuses particularly on the metadata model devised to harmonize semantically related data sources as well as the table model providing the principal data structure of the proposed system. Only an outline of the general design of a statistical information system based on the proposed metadata model is given and the state of development is summarized briefly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Appel, G. (1993) A metadata driven statistical information system, in Proc. Statistical Meta-Information Systems Workshop, Office for Official Publications of the European Communities, Luxembourg, pp. 291–309.

    Google Scholar 

  • Basili, C. and Meo-Evoli, L. (1992) A deductive query processor for statistical databases, in Proc. Database and Expert System Applications, A. M. Tjoa and I. Ramos (eds), Springer-Verlag, Vienna and New York, pp. 390–395.

    Google Scholar 

  • Barcaroli, G. and Di Pace, L. (1992) The automatic generation of statistical incompatibility rules from entity-relationship schemes, in Proc. New Techniques and Technologies for Statistics (Bonn 1992), Office for Official Publications of the European Communities/EUROSTAT, Luxembourg, pp. 226–236.

    Google Scholar 

  • Bethlehem, J. G. and Hundepool, A. J. (1992) Integrated statistical information processing on microcomputers, in Proc. New Techniques and Technologies for Statistics (Bonn 1992), Office for Official Publications of the European Communities/EUROSTAT, Luxembourg, pp. 7–17.

    Google Scholar 

  • Catarci, T. and Santucci, G. (1990) GRASP: a graphical system for statistical databases, in Proc. Statistical and Scientific Database Management (5th SSDBM), Z. Michalewicz (ed.), Lecture Notes in Computer Science 420, Springer-Verlag, Berlin, pp. 148–162.

    Google Scholar 

  • Catarci, T., D'Angiolini, G. and Lenzerini, M. (1990) A structured language for modelling statistical data, in Proc. COMPSTAT '90, K. Momirovic and V. Mildner (eds), Physica, Heidelberg and New York, pp. 237–242.

    Google Scholar 

  • Chan, P. and Shoshani, A. (1981) SUBJECT: a directory driven system for large statistical databases, in Proc. First LBL Workshop on Statistical Database Management, Lawrence Berkeley Lab., Berkeley CA. and Proc. 7th VLDB (1981), pp. 553–563.

    Google Scholar 

  • Chen, M. C., McNamee, L. and Melkanoff, M. (1989) A model of summary data and its applications in statistical databases, in Proc. Statistical and Scientific Database Management (4th SSDBM), M. Rafanelli, J. C. Klensin, P. Svensson. (eds), Lecture Notes in Computer Science 339, Springer-Verlag, Berlin, pp. 356–372.

    Google Scholar 

  • D'Angiolini, G. (1992) A knowledge-based approach to statistical information modeling, in Proc. New Techniques and Technologies for Statistics (Bonn 1992), Office for Official Publications of the European Communities/EUROSTAT, Luxembourg, pp. 304–309.

    Google Scholar 

  • D'Atri, A. and Ricci, F. L. (1989) Interpretation of statistical queries to relational databases, in Proc. Statistical and Scientific Database Management (4th SSDBM), M. Rafanelli et al. (eds) Lecture Notes in Computer Science 339, Springer-Verlag, Berlin, pp. 246–258.

    Google Scholar 

  • Darius, P., Boucueau, M., de Greef, P., de Faber, E. and Froeschl, K. (1993) Modelling metadata. Statistical Journal of the UN-ECE, 10(2), 171–179.

    Google Scholar 

  • de Feber, E. and de Greef, P. (1992) Towards a formalised metadata concept, in Proc. COMPSTAT '92, Vol. 2, Y. Dodge and J. Whitaker (eds), Physica, Heidelberg and New York, pp. 351–356.

    Google Scholar 

  • DIN 55 301 (1978) Gestaltung statistischer Tabellen (Presentation of statistical tables) (in German). Deutsche Normen (DIN) 55 301, Beuth, Berlin and Cologne, 8 pp.

  • Edlefson, L. and Jones, S. (1986) GAUSS - Programming Language Manual. Aptech Systems Inc., Kent, Washington, USA.

    Google Scholar 

  • Fahrngruber, P. (1993) A structural analysis of statistical tables in labour force statistics (in German). Diploma thesis, Institut für Statistik, Universität Wien.

  • Falcitelli, G., Meo-Evoli, L., Ndrdelli, E. and Ricci, F. L. (1989) The MEFISTO model: an object oriented representation for statistical data management, in Proc. Data Analysis and Learning Symbolic and Numeric Knowledge, E. Diday (ed.), Nova Science, New York and Budapest.

    Google Scholar 

  • Falcitelli, G., Meo-Evoli, L., Ndrdelli, E. and Ricci, F. L. (1990) ADAMS: an object oriented system for macrodata manipulation. Technical Report 5/8, CNR, Rome, Italy.

    Google Scholar 

  • Felligi, I. P. and Holt, D. (1976) A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71, 17–35.

    Google Scholar 

  • Fortunato, E., Rafanelli, M. and Ricci, F. L. (1987) The statistical functional model for the logical representation of a statistical table. Technical Report 11/87, CNR/ISRDS, Rome, Italy.

    Google Scholar 

  • Froeschl, K. A. (1989) Mechanized statistics: numerical algorithms and formal strategies (in German). Doctoral Dissertation thesis, Institut für Statistik, Universität Wien.

  • Froeschl, K. A. (1992a) Functional design of a statistical transaction platform, in Proc. New Techniques and Technologies for Statistics (Bonn 1992), Office for Official Publications of the European Communities/EUROSTAT, Luxembourg, pp. 71–79.

    Google Scholar 

  • Froeschl, K. A. (1992b) Semantic metadata: query processing and data aggregation, in Proc. COMPSTAT '92, Vol. 2, Y. Dodge and J. Whitaker (eds), Physica, Heidelberg and New York, pp. 357–362.

    Google Scholar 

  • Froeschl, K. A. (1993) Towards an operative view of semantic metadata. Statistical Journal of the UN-ECE, 10(2), 181–194.

    Google Scholar 

  • Graves, R. B., Hutton, F. E. and Deecker, G. (1993) Information holdings within Statistics Canada: a framework. Report Informatics Branch, Statistics Canada, November 1993, 27 pp.

  • Grifoni, P., Pisanelli, D. M. and Ricci, F. L. (1993) A survey on statistical data modeling, in Proc. Statistical MetaInformation Systems, Office for Official Publications of the European Communities, Luxembourg, pp. 321–334.

    Google Scholar 

  • Grossmann, W. and Froeschl, K. A. (1992) A conceptual model for metadata (in German). Technical Report SMC-115, Institut für Statistik, Universität Wien.

  • Grossmann, W. and Froeschl, K. A. (1994) Automated table generation by metadata-DÖS'CHEN (in German). Project Report, Institut für Statistik, Universität Wien, September 1994, 160 pp.

  • Hamilton, G. W. and Stuart, K. I. (1974) UK statistics sources: use and indexing requirements. Report, Loughborough University of Technology, Loughborough.

    Google Scholar 

  • Hamilton, G. W. and Stuart, K. I. (1976) An indexing and retrieval service for statistics users. Report, Loughborough University of Technology, Loughborough.

    Google Scholar 

  • Hinterberger, H. (1991) Visualizing patterns in multidimensional spaces: density-displays to trade detail for speed, in Statistical and Scientific Databases, Z. Michelwicz (ed.), Ellis Horwood, Chichester, pp. 83–108.

    Google Scholar 

  • Hutton, F. E. and Graves, R. B. (1993) IBOSS: A statistical information system for Statistics Canada. Report Informatics Branch, Statistics Canada, April, 39 pp.

  • Lackner, K. (1993) A metadata model of labour force statistics in Austria (in German). Diploma thesis, Institut für Statistik, Universität Wien.

  • Malvestuto, F. M. (1989) A universal table model for categorical databases. Information Sciences, 49, 203–223.

    Google Scholar 

  • Meo-Evoli, L., Ricci, F. L. and Shosham, A. (1992) On the semantic completeness of macro-data operators for statistical aggregation, in Proc. Scientific and Statistical Database Management (VIth SSDBM), H. Hinterberger and J. C. French (eds), ETH Zürich (CH), 239–258.

    Google Scholar 

  • Nordbäck, L. (1992) The PC-AXIS vision, the liberation of official statistics, in Proc. New Techniques and Technologies for Statistics (Bonn 1992), Office for Official Publications of the European Communities/EUROSTAT, Luxembourg, pp. 218–225.

    Google Scholar 

  • Ö-Norm A 6195 (1989) Gestaltung statistischer Tabellen (Layout of statistical tables) (in German). Österr. Norm (Ö-Norm) A 6195. Österr. Normungsinstitut, Vienna, 15 pp.

  • Özsoyoglu, G., Özsoyoglu, Z. H. and Matos, V. (1987) Extending relational algebra and relational calculus with set-valued attributes and aggregate functions. ACM Transactions on Database Systems 12(4), 566–592.

    Google Scholar 

  • Özsoyoglu, G., Matos, V. and Özsoyoglu, Z. H. (1989) Query processing techniques in the summary table-by-example database query language. ACM Transactions on Database Systems 14(4).

  • Rafanelli, M. (1991) Data models, in Statistical and Scientific Databases, Z. Michelwicz (ed.), Ellis Horwood, Chichester, pp. 109–166.

    Google Scholar 

  • Rafanelli, M. and Ricci, F. L. (1990) A visual interface for browsing and manipulating statistical entities, in Proc. Statistical and Scientific Database Management (5th SSDBM), Z. Michalewicz (ed.), Lecture Notes in Computer Science 420, Springer-Verlag, Berlin pp. 163–182.

    Google Scholar 

  • Rafanelli, M. and Ricci, F. L. (1991) A functional model for macro-databases. ACM SIGMOD Records, 20(1), 3–8.

    Google Scholar 

  • Rafanelli, M. and Ricci, F. L. (1993) Mefisto: a functional model for statistical entities. IEEE Transactions on Knowledge and Data Engineering, 5(4), 670–681.

    Google Scholar 

  • Rafanelli, M. and Shoshani, A. (1990) STORM: a statistical object representation model, in Proc. Statistical and Scientific Database Management (5th SSDBM), Z. Michalewicz (ed.), Lecture Notes in Computer Science 420, Springer-Verlag, Berlin, pp. 14–29.

    Google Scholar 

  • Rowe, N. C. (1991) Management of regression-model data. Data and Knowledge Engineering, 6(4), 349–363.

    Google Scholar 

  • Saris, W. E., Prastacos, P. and Recober, M. M. (1992) CASIP: a complete automated system for information processing in family budget research, in Proc. New Techniques and Technologies for Statistics (Bonn 1992), Office for Official Publications of the European Communities/EUROSTAT, Luxembourg, pp. 80–87.

    Google Scholar 

  • Sato, H. (1989) A data model, knowledge base, and natural language processing for sharing a large statistical database, in Proc. Statistical and Scientific Database Management (4th SSDBM), M. Rafanelli, J. C. Klevsin and P. Svensson. (eds), Lecture Notes in Computer Science 339, Springer-Verlag, Berlin, pp. 207–225.

    Google Scholar 

  • Shoshani, A. (1982) Statistical databases: characteristics, problems and some solutions, in Proc. 8th VLDB. Reprinted in Proc. Computer Science and Statistics: The Interface, J. E. Gentle (ed.), North-Holland, Amsterdam and New York (1983), pp. 9–23.

    Google Scholar 

  • Silver, M. (1993) The role of footnotes in a statistical metainformation system. Statistical Journal of the UN-ECE, 10(2), 153–170.

    Google Scholar 

  • Smith, T. M. F. (1994) Sample surveys 1975–1990; an age of reconciliation? (with discussion). International Statistical Review, 62(1), 5–34.

    Google Scholar 

  • Sundgren, B. (1973) An Infological Approach to Data Bases. Statistics Sweden, Stockholm.

    Google Scholar 

  • Sundgren, B. (1992) Organizing the metainformation systems of a statistical office. Working Paper No. 3, UN-ECE/METIS, Geneva.

    Google Scholar 

  • Sundgren, B. (1993) Statistical metainformation systems - pragmatics, semantics, syntactics. Statistical Journal of the UNECE, 10(2), 121–142.

    Google Scholar 

  • van den Berg, G. and de Feber, E. (1992) Definition and use of meta-data in statistical data processing, in Proc. Scientific and Statistical Database Management (VIth SSDBM), H. Hinterberger and J. C. French (eds), ETH Zürich (CH), pp. 290–306.

    Google Scholar 

  • Whorfe, B. L. (1956) Language, Thought and Reality, Cambridge University Press, Cambridge, MA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Froeschl, K.A. A metadata approach to statistical query processing. Stat Comput 6, 11–29 (1996). https://doi.org/10.1007/BF00161570

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00161570

Keywords

Navigation