Pattern Detection and Discovery

Hand, David J.

doi:10.1007/3-540-45728-3_1

David J. Hand²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2447))

593 Accesses
26 Citations

Abstract

Data mining comprises two subdisciplines. One of these is based on statistical modelling, though the large data sets associated with data mining lead to new problems for traditional modelling methodology. The other, which we term pattern detection, is a new science. Pattern detection is concerned with defining and detecting local anomalies within large data sets, and tools and methods have been developed in parallel by several applications communities, typically with no awareness of developments elsewhere. Most of the work to date has focussed on the development of practical methodology, with little attention being paid to the development of an underlying theoretical base to parallel the theoretical base developed over the last century to underpin modelling approaches. W e suggest that the time is now right for the development of a theoretical base, so that important common aspects of the work can be identified, so that key directions for future research can be characterised, and so that the various different application domains can benefit from the work in other areas. We attempt describe a unified approach to the subject, and also attempt to provide theoretical base on which future developments can stand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Pattern Mining: Current Challenges and Opportunities

Pattern mining: current status and emerging topics

Article 08 March 2016

A tutorial on statistically sound pattern discovery

Article Open access 20 December 2018

References

Grenander U.: General Pattern Theory: a Mathematical Study of Regular Structures. Clarendon Press, Oxford (1993)
Google Scholar
Klösgen, W.: Subgroup patterns. In: Klösgen, W., Zytkow, J.M. (eds.): Handbook of data mining and knowledge discovery. Oxford University Press, New York (1999)
Google Scholar
Friedman, J.H., Fisher, N.I.: Bump hunting in high-dimensional data. Statistics and Computing 9(2) (1999) 1–20
Article Google Scholar
Hand D.J., Blunt G., Kelly M.G., Adams N.M.: Data mining for fun and profit. Statistical Science 15 (2000) 111–131
Article Google Scholar
Hand D.J., Mannila H., Smyth P.: Principles of Data Mining. MIT Press (2001)
Google Scholar
Chau T., Wong A.K.C.: Pattern discovery by residual analysis and recursive partitioning. IEEE Transactions on Knowledge and Data Engineering 11 (1999) 833–852
Article Google Scholar
Adams N.M., Hand D.J., Till, R.J.: Mining for classes and patterns in behavioural data.Journal of the Operational Research Society 52 (2001) 1017–1024
Article MATH Google Scholar
Bolton R.J., Hand D.J.: Significance tests for patterns in continuous data. In: Proceedings of the IEEE International Conference on Data Mining, San Jose, CA. Springer-Verlag (2001)
Google Scholar
Edwards R.D., Magee F.: Technical Analysis of Stock Trends. 7th edn. AMACOM, New York (1997)
Google Scholar
Jobman D.R.: The Handbook of Technical Analysis. Probus Publishing Co. (1995)
Google Scholar
Zembowicz R., Zytkow J.: From contingency tables to various forms of knowledge in databases. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): Advances in Knowledge Discovery and Data Mining, Menlo Park, California, AAAI Press (1996) 329–349
Google Scholar
Liu B., Hsu W., Ma Y.: Pruning and summarizing the discovered associations. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, ACM Press (1999) 125–134
Google Scholar
DuMouchel, W.: Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System. The American Statistician 53 (1999) 177–202
Article Google Scholar
Jelinek F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge, Massachusetts (1997)
Google Scholar
Sinha S., Tompa M.: A statistical method for finding transcription factor binding sites.In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, La Jolla, CA, AAAI Press (2000) 344–354
Google Scholar
Chudova, D., Smyth, P.: Unsupervised identification of sequential patterns under a Markov assumption. In: Proceedings of the KDD 2001 Workshop on Temporal Data Mining, San Francisco, CA (2001)
Google Scholar
Durbin R., Eddy S., Krogh A., Mitchison G.: Biological Sequence Analysis. Cambridge University Press: Cambridge (1998)
MATH Google Scholar
Hand D.J., Bolton R.J.: Pattern detection in data mining. Technical Report, Department of Mathematics, Imperial College, London (2002)
Google Scholar
Dong G., Li J.: Interestingness of discovered association rules in terms of neighbourhood-based unexpectedness.In: Proceedings of the Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD), Lecture Notes in Computer Science, Vol.1394., Springer-Verlag, Berlin Heidelberg New York (1998) 72–86
Google Scholar
Toivonen H., Klemettinen M., Ronkainen P., Hätönen, Mannila H.: Pruning and grouping discovered association rules.In: Mlnet Workshop on Statistics, Machine Learning, and Discovery in Databases, Crete, Greece, MLnet (1995) 47–52
Google Scholar
Brin S., Motwani R., Ullma J.D., Tsur S.: Dynamic itemset counting and implication rules for market basket data.In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, ACM Press (1997) 255–264
Google Scholar
Miller R.G.: Simultaneous Statistical Inference. 2nd ed. Springer-Verlag, New York (1981)
MATH Google Scholar
Pigeot I.: Basic concepts of multiple tests-a survey. Statistical Papers 41 (2000) 3–36
Article MATH MathSciNet Google Scholar
Benjamini Y., Hochberg Y.: Controlling the false discovery rate. Journal of the Royal Statistical Society, Series B 57 (1995) 289–300
MATH MathSciNet Google Scholar
Bolton R.J., Hand D.J., Adams, N.: Determining hit rate in pattern search. In: These Proceedings (2002)
Google Scholar
Berry M.J.A., Lino. G.: Mastering data mining. The art and science of customer relationship management. Wiley, New York (2000)
Google Scholar
Brunskill A.J.: Some sources of error in the coding of birth weight. American Journal of Public Health 80 (1990) 72–3
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Imperial College, 180 Queen’s Gate, SW7 2BZ, London, UK
David J. Hand

Authors

David J. Hand
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics, Imperial College of Science, Technology and Medicine, Huxley Building, 180 Queen’s Gate, SW7 2BZ, London, UK
David J. Hand , Niall M. Adams & Richard J. Bolton , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hand, D.J. (2002). Pattern Detection and Discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds) Pattern Detection and Discovery. Lecture Notes in Computer Science(), vol 2447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45728-3_1

Download citation

DOI: https://doi.org/10.1007/3-540-45728-3_1
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44148-9
Online ISBN: 978-3-540-45728-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics