Skip to main content

SLIQ: A fast scalable classifier for data mining

  • Data Mining
  • Conference paper
  • First Online:
Advances in Database Technology — EDBT '96 (EDBT 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1057))

Included in the following conference series:

Abstract

Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classifier and presents the design of SLIQ, a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Trans. on Knowledge and Data Engineering, 5(6), Dec. 1993.

    Google Scholar 

  2. J. Catlett. Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, 1991.

    Google Scholar 

  3. P. K. Chan and S. J. Stolfo. Meta-learning for multistrategy and parallel learning. In Proc. Second Intl. Workshop on Multistrategy Learning, pages 150–165, 1993.

    Google Scholar 

  4. L. Breiman et. al. Classification and Regression Trees. Wadsworth, Belmont, 1984.

    Google Scholar 

  5. R. Agrawal et. al. An interval classifier for database mining applications. In Proc. of the VLDB Conf., Vancouver, British Columbia, Canada, August 1992.

    Google Scholar 

  6. M. Mehta, J. Rissanen, and R. Agrawal. MDL-based decision tree pruning. In Int'l Conf. on Knowledge Discovery in Databases and Data Mining (KDD-95), Montreal, Canada, Aug. 1995.

    Google Scholar 

  7. D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.

    Google Scholar 

  8. NASA Ames Res. Ctr. Intro. to IND Version 2.1, GA23-2475-02 edition, 1992.

    Google Scholar 

  9. J. R. Quinlan and R. L. Rivest. Inferring decision trees using minimum description length principle. Information and Computation, 1989.

    Google Scholar 

  10. J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.

    Google Scholar 

  11. J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publ. Co., 1989.

    Google Scholar 

  12. C. Wallace and J. Patrick. Coding decision trees. Machine Learning, 11:7–22, 1993.

    Google Scholar 

  13. S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Peter Apers Mokrane Bouzeghoub Georges Gardarin

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mehta, M., Agrawal, R., Rissanen, J. (1996). SLIQ: A fast scalable classifier for data mining. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds) Advances in Database Technology — EDBT '96. EDBT 1996. Lecture Notes in Computer Science, vol 1057. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0014141

Download citation

  • DOI: https://doi.org/10.1007/BFb0014141

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61057-1

  • Online ISBN: 978-3-540-49943-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics