Abstract
Contrast sets have been shown to be a useful mechanism for describing differences between groups. A contrast set is a conjunction of attribute-value pairs that differ significantly in their distribution across groups. These groups are defined by a selected property that distinguishes one from the other (e.g customers who default on their mortgage versus those that don’t). In this paper, we propose a new search algorithm which uses a vertical approach for mining maximal contrast sets on categorical and quantitative data. We utilize a novel yet simple discretization technique, akin to simple binning, for continuous-valued attributes. Our experiments on real datasets demonstrate that our approach is more efficient than two previously proposed algorithms, and more effective in filtering interesting contrast sets.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bay, S.D., Pazzani, M.J.: Detecting change in categorical data: Mining contrast sets. In: KDD, pp. 302–306 (1999)
Bay, S.D., Pazzani, M.J.: Detecting group differences: Mining contrast sets. Data Min. Knowl. Discov. 5(3), 213–246 (2001)
Hilderman, R., Peckham, T.: A statistically sound alternative approach to mining contrast sets. In: AusDM, pp. 157–172 (2005)
Simeon, M., Hilderman, R.J.: Exploratory quantitative contrast set mining: A discretization approach. In: ICTAI, vol. (2), pp. 124–131 (2007)
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. In: SIGMOD Conference, pp. 85–93 (1998)
Wong, T.T., Tseng, K.L.: Mining negative contrast sets from data with discrete attributes. Expert Syst. Appl. 29, 401–407 (2005)
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
Lin, J., Keogh, E.J.: Group SAX: Extending the notion of contrast sets to time series and multimedia data. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 284–296. Springer, Heidelberg (2006)
Lin, J., Keogh, E.J., Lonardi, S., chi Chiu, B.Y.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD, pp. 2–11 (2003)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: VLDB, pp. 432–444 (1995)
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: KDD, pp. 326–335 (2003)
Gouda, K., Zaki, M.J.: Genmax: An efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Simeon, M., Hilderman, R. (2011). COSINE: A Vertical Group Difference Approach to Contrast Set Mining. In: Butz, C., Lingras, P. (eds) Advances in Artificial Intelligence. Canadian AI 2011. Lecture Notes in Computer Science(), vol 6657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21043-3_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-21043-3_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21042-6
Online ISBN: 978-3-642-21043-3
eBook Packages: Computer ScienceComputer Science (R0)