Skip to main content

Perspective on Data Mining from Statistical Viewpoints

  • Conference paper
  • First Online:
Book cover Knowledge Discovery and Data Mining. Current Issues and New Applications (PAKDD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1805))

Included in the following conference series:

Abstract

The history of statistical data analysis is old, it goes back to the 1920’s. Many fundamental concepts of multivariate statistical data analysis, especially pure theoretical notions, have been accomplished by the 1950’s. After the 1960’s, the practical applications of multivariate statistical data analysis have been available, coupled with the progress of computers, and these have also been an affect on theoretical considerations.

The basic process of data analysis is given as follows:

  1. p1).

    An objective of data analysis is given.

  2. p2).

    The data which seems to be closely connected with the objective is observed. (sampling data)

  3. p3).

    Constructing a model (or a set of models) for explaining the variation of the data.

  4. p4).

    Preprocessing (or transforming) the original data in order to make consistency between input data and the model.

  5. p5).

    Identification of the model based on observed (input) data.

  6. p6).

    Evaluate a goodness of fit. If the goodness of fit is insufficient, then return to P2) or P3), else go to next process.

  7. p7).

    Interpretation of the result and investigate the validity.

The most different point on “data mining” and statistical data analysis seems to be the concept of “Data”. In data mining, the data is given as a database in advance. But, in statistical data analysis, the data is observed according to the objective of the analysis.

On the other hand, the object of “data mining” is to find the effective (or valuable) information in the data. From the framework of statistical data analysis above, the main processes of data mining are p3), p4) and p5). However, the concept of “efficient information” in data mining is different from the main part of the data variation in statistical data analysis. For instance, in principal component analysis, the main part of the data variation is obtained as the first principal component, which has the largest proportion. But in data mining, the major variation of the data is of no interest, because the knowledge obtained from it is trivial. Then, data mining seems to be interested in the principal components with small proportion in order to get unusual but valuable information. Hence, statistical data analysis for residual data which is removing the main part of the data variation from the original data, will be useful for data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sato, Y. (2000). Perspective on Data Mining from Statistical Viewpoints. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45571-X_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67382-8

  • Online ISBN: 978-3-540-45571-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics