In recent decades technological innovation has made the availability of large and sometimes huge amounts of information on a phenomenon of interest simple and cheap. This is due to two main reasons: on one side the development of automatic methods of data acquisition, and on the other side the progress of storage technology producing the fall of related costs. This new environment involves all areas of human endeavor.
Every month a supermarket chain releases millions of receipts, one for each shopping trolley checking out. The content of each trolley summarizes the needs, the propensities and the economic behavior of the customer that selected it. The collection of all these shopping lists forms an important information base for the supermarket in order to decide the sales and purchases politics. Such an analysis becomes even more interesting when each shopping list is connected with the customers loyalty cards, allowing to follow the single client behavior by recording the purchases...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References and Further Reading
Azzalini A, Scarpa B (2010) Data analysis and data mining. Oxford University Press, New York
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International, Monterey, CA
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140
Breiman L (2001a) Random forests. Mach Learn 45:5–32
Breiman L (2001b) Statistical modeling: the two cultures (with discussion). Stat Sci 16:199–231
Coase RH (1982) How should economists choose? American Enterprise Institute for Public Policy Research, Washington, DC
Friedman JH (1997) Data mining and statistics: what’s the connection? In: Proceedings of the 29th symposium on the interface: computing science and statistics. Houston, TX, May 1997. Available at http://www-stat.stanford.edu/∼jhf/ftp/dm-stat.pdf
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, Morgan Kauffman, San Francisco, pp 148–156
Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London
Hastie T, Tibshirani R, Friedman JH (2008) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, London
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Vapnik V (1996) The nature of statistical learning theory. Springer, New York
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this entry
Cite this entry
Scarpa, B. (2011). Data Mining. In: Lovric, M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-04898-2_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04897-5
Online ISBN: 978-3-642-04898-2
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering