Skip to main content

Constrained Predictive Clustering

  • Chapter
  • First Online:
Inductive Databases and Constraint-Based Data Mining

Abstract

In this chapter, we extend predictive clustering by introducing constraints on the clusters and predictive models. A domain expert is usually not only interested in the most compact clusters or the most accurate model; other factors, such as model size and prediction cost, may also be important.We will see how such factors can be controlled by means of constraints. In predictive clustering trees, constraints can be imposed both from the clustering and the prediction point of view. We present an overview of various constraint types and look into algorithms for enforcing them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Almuallim. An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence, 83(2):347–362, 1996.

    Article  Google Scholar 

  2. N. Angelopoulos and J. Cussens. Exploiting informative priors for Bayesian classification and regression trees. In 19th Int’l Joint Conf. on Artificial Intelligence, pages 641–646, 2005.

    Google Scholar 

  3. S. Basu, M. Bilenko, and R.J. Mooney. A probabilistic framework for semi-supervised clustering. In 10th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pages 59–68, 2004.

    Google Scholar 

  4. M. Bilenko, S. Basu, and R.J. Mooney. Integrating constraints and metric learning in semisupervised clustering. In 21st Int’l Conf. on Machine Learning, pages 81–88, 2004.

    Google Scholar 

  5. S. Bistarelli and F. Bonchi. Extending the soft constraint based mining paradigm. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 24–41, 2007.

    Google Scholar 

  6. H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In 15th Int’l Conf. on Machine Learning, pages 55–63, 1998.

    Google Scholar 

  7. H. Blockeel, S. Džeroski, and J. Grbović. Simultaneous prediction of multiple chemical parameters of river water quality with Tilde. In 3rd European Conf. on Principles of Data Mining and Knowledge Discovery, pages 32–40, 1999.

    Google Scholar 

  8. Hendrik Blockeel. Top-down Induction of First Order Logical Decision Trees. PhD thesis, K.U. Leuven, Dep. of Computer Science, Leuven, Belgium, 1998.

    Google Scholar 

  9. M. Bohanec and I. Bratko. Trading accuracy for simplicity in decision trees. Machine Learning, 15(3):223–250, 1994.

    MATH  Google Scholar 

  10. P.S. Bradley, K.P. Bennett, and A. Demiriz. Constrained k-means clustering. Technical Report MSR-TR-2000-65, Microsoft Research, 2000.

    Google Scholar 

  11. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.

    MATH  Google Scholar 

  12. Rich Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997.

    Article  Google Scholar 

  13. I. Davidson and S.S. Ravi. Clustering with constraints: Feasibility issues and the k-means algorithm. In SIAM Int’l Data Mining Conf., 2005.

    Google Scholar 

  14. I. Davidson, K. Wagstaff, and S. Basu. Measuring constraint-set utility for partitional clustering algorithms. In 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pages 115–126, 2006.

    Google Scholar 

  15. D. Demšar, S. Džeroski, P. Henning Krogh, T. Larsen, and J. Struyf. Using multiobjective classification to model communities of soil microarthropods. Ecological Modelling, 191(1):131–143, 2006.

    Article  Google Scholar 

  16. S. Džeroski, I. Slavkov, V. Gjorgjioski, and J. Struyf. Analysis of time series data with predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 47–58, 2006.

    Google Scholar 

  17. A. Friedman, Schuster A., and R. Wolff. k-anonymous decision tree induction. In 10th European Conf. on Principles and Practice of Knowledge Discovery in Databases, pages 151–162, 2006.

    Google Scholar 

  18. M. Garofalakis, D. Hyun, R. Rastogi, and K. Shim. Building decision trees with constraints. Data Mining and Knowledge Discovery, 7(2):187–214, 2003.

    Article  MathSciNet  Google Scholar 

  19. D. Kocev, J. Struyf, and S. Džeroski. Beam search induction and similarity constraints for predictive clustering trees. In 5th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 134–151, 2007.

    Google Scholar 

  20. D. Kocev, C. Vens, J. Struyf, and S. Džeroski. Ensembles of multi-objective decision trees. In 18th European Conf. on Machine Learning, pages 624–631, 2007.

    Google Scholar 

  21. C. X. Ling, Q. Yang, J. Wang, and S. Zhang. Decision trees with minimal costs. In 21 Int’l Conf on Machine Learning, pages 544–551, 2004.

    Google Scholar 

  22. R.S. Michalski and R.E. Stepp. Learning from observation: Conceptual clustering. In Machine Learning: An Artificial Intelligence Approach, volume 1. Tioga Publishing Company, 1983.

    Google Scholar 

  23. C. Nédellec, H. Adé, F. Bergadano, and B. Tausend. Declarative bias in ILP. In Advances in Inductive Logic Programming, volume 32 of Frontiers in Artificial Intelligence and Applications, pages 82–103. IOS Press, 1996.

    Google Scholar 

  24. S. Nijssen and E. Fromont. Optimal constraint-based decision tree induction from itemset lattices. Data Mining and Knowledge Discovery, 21(1):9–51, 2010.

    Article  Google Scholar 

  25. J.R. Quinlan. Learning with continuous classes. In 5th Australian Joint Conf. on Artificial Intelligence, pages 343–348. World Scientific, 1992.

    Google Scholar 

  26. L.E. Raileanu and K. Stoffel. Theoretical comparison between the Gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence, 41(1):77–93, 2004.

    Article  MATH  MathSciNet  Google Scholar 

  27. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, 1999.

    Article  MATH  Google Scholar 

  28. J. Struyf and S. Džeroski. Constraint based induction of multi-objective regression trees. In 4th Int’l Workshop on Knowledge Discovery in Inductive Databases, pages 222–233, 2006.

    Google Scholar 

  29. J. Struyf and S. Džeroski. Clustering trees with instance level constraints. In 18th European Conf. on Machine Learning, pages 359–370, 2007.

    Google Scholar 

  30. L. Todorovski, B. Cestnik, M. Kline, N. Lavrač, and S. Džeroski. Qualitative clustering of short time-series: A case study of firms reputation data. In Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, pages 141–149, 2002.

    Google Scholar 

  31. P. Turney. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. J. of Artificial Intelligence Research, 2:369–409, 1995.

    Google Scholar 

  32. C. Vens, J. Struyf, L. Schietgat, S. Džeroski, and H. Blockeel. Decision trees for hierarchical multi-label classification. Machine Learning, 73(2):185–214, 2008.

    Article  Google Scholar 

  33. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In 17th Int’l Conf. on Machine Learning, pages 1103–1110, 2000.

    Google Scholar 

  34. K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In 18th Int’l Conf. on Machine Learning, pages 577–584, 2001.

    Google Scholar 

  35. B. Ženko and S. Džeroski. Learning classification rules for multiple target attributes. In Advances in Knowledge Discovery and Data Mining, pages 454–465, 2008.

    Google Scholar 

  36. S. Zhong and J. Ghosh. Scalable, balanced model-based clustering. In SIAM Int’l Conf. on Data Mining, pages 71–82, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Struyf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Struyf, J., Džeroski, S. (2010). Constrained Predictive Clustering. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7738-0_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7737-3

  • Online ISBN: 978-1-4419-7738-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics