Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Debuse, Justin C.W.; Rayward-Smith, Victor J.

doi:10.1023/A:1008339026836

Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Published: November 1999

Volume 11, pages 285–295, (1999)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Justin C.W. Debuse¹ &
Victor J. Rayward-Smith¹

68 Accesses
8 Citations
Explore all metrics

Abstract

An introduction to the approaches used to discretise continuous database features is given, together with a discussion of the potential benefits of such techniques. These benefits are investigated by applying discretisation algorithms to two large commercial databases; the discretisations yielded are then evaluated using a simulated annealing based data mining algorithm. The results produced suggest that dramatic reductions in problem size may be achieved, yielding improvements in the speed of the data mining algorithm. However, it is also demonstrated under certain circumstances that the discretisation produced may give an increase in problem size or allow overfitting by the data mining algorithm. Such cases, within which often only a small proportion of the database belongs to the class of interest, highlight the need both for caution when producing discretisations and for the development of more robust discretisation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Genetic Algorithms and Their Applications

References

J.C.W. Debuse, “Exploitation of modern heuristic techniques within a commercial data mining environment,” Ph.D. Thesis, University of East Anglia, 1997.
J.C.W. Debuse and V.J. Rayward-Smith, “Feature subset selection within a simulated annealing data mining algorithm,” Journal of Intelligent Information Systems, vol. 9, pp. 57-81, 1997.
Google Scholar
J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Prieditis and Russell [30], pp. 194-202.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.
J.R. Quinlan, “Successor to C4.5,” Knowledge Discovery Nuggets, vol. 97:09, 1997.
R. Kerber, “Chimerge: Discretization of numerical attributes,” in Proc. of the Tenth National Conf. on Artificial Intelligence, MIT Press, 1992, pp. 123-128.
R. Kohavi and M. Sahami, “Error-based and entropy-based discretization of continuous features,” in Simoudis et al. [31], pp. 114-119.
J. Catlett, “Megainduction: machine learning on very large databases,” Ph.D. Thesis, University of Sydney, 1991.
A.K.C. Wong and D.K.Y. Cgiu, “Synthesizing statistical knowledge from incomplete mixed-mode data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-9,no. 6, pp. 796-805, 1987.
Google Scholar
R.S. Garfinkel and G.L. Nemhauser, Integer Programming, Wiley: New York, 1972.
Google Scholar
J.C.W. Debuse and V.J. Rayward-Smith, “One and a half dimensional clustering,” in Proc. of the Conf. on Applied Decision Technologies, UNICOM, Brunel, 1995, pp. 377-389.
Google Scholar
W.D. Fisher, “On grouping for maximum homogeneity,” J. Am. Stat. Assoc., vol. 53, pp. 789-798, 1958.
Google Scholar
J.A. Hartigan, Clustering Algorithms, Wiley: New York, 1975.
Google Scholar
J.W. Carmichael, J.A. George, and R.S. Julius, “Finding natural clusters,” Syst. Zool., vol. 17, pp. 144-150, 1968.
Google Scholar
J.W. Carmichael and P.H. Sneath, “Taxonometric maps,” Syst. Zool., vol. 18, pp. 402-415, 1969.
Google Scholar
B. Everitt, Cluster Analysis, Wiley: New York, 1980.
Google Scholar
R.C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, vol. 11, pp. 63-91, 1993.
Google Scholar
J. Catlett, “On changing continuous attributes into ordered discrete attributes,” in Proc. of the European Working Session on Learning, edited by Y. Kodratoff, Springer-Verlag, 1991, pp. 164-178.
U.M. Fayyad and K.B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” in Proc. of the Thirteenth Int. Joint Conf. on Artificial Intelligence, Morgan Kaufmann, 1993, pp. 1022-1027.
J. Rissanen, “A universal prior for integers and estimation by minimum description length,” Annals of Statistics, vol. 11, pp. 416-431, 1983.
Google Scholar
J.R. Quinlan, “Improved use of continuous attributes in C4.5,” Journal of Artificial Intelligence Research, vol. 4, pp. 77-90, 1996.
Google Scholar
J.R. Quinlan and R.L. Rivest, “Inferring decision trees using the minimum description length principle,” Information and Computation, vol. 80, pp. 227-248, 1989.
Google Scholar
W. Maass, “Efficient agnostic PAC-learning with simple hypotheses,” in Proc. of the Seventh Annual ACM Conf. on Computational Learning Theory, 1994, pp. 67-75.
P. Auer, R. Holte, and W. Maass, “Theory and application of agnostic pac-learning with small decision trees,” in Prieditis and Russell [30], pp. 21-29.
B. de la Iglesia, J.C.W. Debuse, and V.J. Rayward-Smith, “Discovering knowledge in commercial databases using modern heuristic techniques,” in Simoudis et al. [31], pp. 44-49.
N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, “Equation of state calculation by fast computing machines,” Journal of Chemical Physics, vol. 21, pp. 1087-1091, 1953.
Google Scholar
K.A. Dowsland, “Simulated annealing,” in Modern Heuristic Techniques for Combinatorial Problems, edited by C.R. Reeves, Blackwell Scientific, pp. 20-69, 1993.
J.T. Alander, “An indexed bibliography of genetic algorithms and simulated annealing: Hybrids and comparisons,” Tech. Rep., Department of Information Technology and Production Economics, University of Vaasa, Finland, 1995.
Google Scholar
M. Lundy and A. Mees, “Convergence of an annealing algorithm,” Mathematical Programming, vol. 34, pp. 111-124, 1986.
Google Scholar
A. Prieditis and S. Russell (Eds.), Proc. of the Twelfth Int. Conf. on Machine Learning, Morgan Kaufmann, 1995.
E. Simoudis, J.W. Han, and U. Fayyad (Eds.), Proc. of the Second Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), 1996.

Download references

Author information

Authors and Affiliations

School of Information Systems, University of East Anglia, Norwich, NR4 7TJ, UK
Justin C.W. Debuse & Victor J. Rayward-Smith

Authors

Justin C.W. Debuse
View author publications
You can also search for this author in PubMed Google Scholar
Victor J. Rayward-Smith
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Debuse, J.C., Rayward-Smith, V.J. Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm. Applied Intelligence 11, 285–295 (1999). https://doi.org/10.1023/A:1008339026836

Download citation

Issue Date: November 1999
DOI: https://doi.org/10.1023/A:1008339026836

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Genetic Algorithms and Their Applications

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Discretisation of Continuous Commercial Database Features for a Simulated Annealing Data Mining Algorithm

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Genetic Algorithms and Their Applications

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation