Abstract
Contrast set mining has been developed as a data mining task which aims at discerning differences across groups. These groups can be patients, organizations, molecules, and even time-lines. A valid correlated contrast set is a conjunction of attribute-value pairs that are highly correlated with each other and differ significantly in their distribution across groups. Although the search for valid correlated contrast sets produces a comparatively smaller set of results than the search for valid contrast sets, these results must still be further filtered in order to be examined by a domain expert and have decisions enacted from them. In this paper, we apply the minimum support ratio threshold which measures the ratio of maximum to minimum support across groups. We propose a contrast set mining technique which utilizes the minimum support ratio threshold to discover maximal valid correlated contrast sets. We also demonstrate how four probability-based objective measures developed for association rules can be used to rank contrast sets. Our experiments on real datasets demonstrate the efficiency and effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.
Stephen D. Bay and Michael J. Pazzani. Detecting group differences: Mining contrast sets. Data Min. Knowl. Discov., 5(3):213–246, 2001.
Sergey Brin, RajeevMotwani, Jeffrey D. Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec., 26(2):255–264, 1997.
Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 2006.
Liqiang Geng and Howard J. Hamilton. Interestingness measures for data mining: A survey. ACM Comput. Surv., 38(3):9, 2006.
R.J. Hilderman and T. Peckham. A statistically sound alternative approach to mining contrast sets. Proceedings of the 4th Australasian Data Mining Conference (AusDM’05), pages 157–172, Dec. 2005.
S Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70, 1979.
Petra Kralj, Nada Lavrac, Dragan Gamberger, and Antonija Krstacic. Contrast set mining for distinguishing between similar diseases. In AIME, pages 109–118, 2007.
Nada Lavrac, Peter A. Flach, and Blaz Zupan. Rule evaluation measures: A unifying view. In ILP, pages 174–185, 1999.
Jessica Lin and Eamonn J. Keogh. Group sax: Extending the notion of contrast sets to time series and multimedia data. In PKDD, pages 284–296, 2006.
Zohreh Nazeri, Daniel Barbar´a, Kenneth A. De Jong, George Donohue, and Lance Sherry. Contrast-set mining of aircraft accidents and incidents. In ICDM, pages 313–322, 2008.
Gregory Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Knowledge Discovery in Databases, pages 229–248. AAAI/MIT Press, 1991.
Mondelle Simeon and Robert J. Hilderman. Exploratory quantitative contrast set mining: A discretization approach. In ICTAI (2), pages 124–131, 2007.
Mondelle Simeon and Robert J. Hilderman. COSINE: A Vertical Group Difference Approach to Contrast Set Mining. In Canadian Conference on AI, pages 359–371, 2011.
Mondelle Simeon and Robert J. Hilderman. GENCCS: A Correlated Group Difference Approach to Contrast Set Mining. In MLDM, pages 140–154, 2011.
Mondelle Simeon, Robert J. Hilderman, and Howard J. Hamilton. Mining interesting contrast sets. In INTENSIVE 2012, pages 14–21, 2012.
Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right objective measure for association analysis. Inf. Syst., 29(4):293–313, 2004.
Tzu-Tsung Wong and Kuo-Lung Tseng. Mining negative contrast sets from data with discrete attributes. Expert Syst. Appl., 29(2):401–407, 2005.
Masaharu Yoshioka. Analyzing multiple news sites by contrasting articles. In SITIS ’08, pages 45–51, Washington, DC, USA, 2008. IEEE Computer Society.
G. Udny Yule. On the association of attributes in statistics:With illustrations from the material of the childhood society, &c. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 194(252-261):257–319, 1900.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this paper
Cite this paper
Simeon, M., Hilderman, R.J., Hamilton, H.J. (2012). Mining Interesting Correlated Contrast Sets. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_4
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4739-8_4
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4738-1
Online ISBN: 978-1-4471-4739-8
eBook Packages: Computer ScienceComputer Science (R0)