Abstract
Recently, several constraint programming (CP)/propositional satisfiability (SAT) based encodings have been proposed to deal with various data mining problems including itemset and sequence mining problems. This research issue allows to model data mining problems in a declarative way, while exploiting efficient and generic solving techniques. In practice, for large datasets, they usually lead to constraints network/Boolean formulas of huge size. Space complexity is clearly identified as the main bottleneck behind the competitiveness of these new declarative and flexible models w.r.t. specialized data mining approaches. In this paper, we address this issue by considering SAT based encodings of itemset mining problems. By partitioning the transaction database, we propose a new encoding framework for SAT based itemset mining problems. Experimental results on several known datasets show significant improvements, up to several orders of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Asin, R., Nieuwenhuis, R., Oliveras, A., Rodriguez-Carbonell, E.: Cardinality networks: a theoretical and empirical study. Constraints 16(2), 195–221 (2011)
Cambazard, H., Hadzic, T., O’Sullivan, B.: Knowledge compilation for itemset mining. In: ECAI 2010, pp. 1109–1110 (2010)
Guns, T., Dries, A., Tack, G., Nijssen, S., De Raedt, L.: Miningzinc: A modeling language for constraint-based mining. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI 2013, pp. 1365–1372 (2013)
Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: A constraint programming perspective. Artificial Intelligence 175(12–13), 1951–1983 (2011)
Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: A constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)
Jabbour, S., Lonlac, J., Sais, L., Salhi, Y.: Extending modern sat solvers for models enumeration. In: Proceedings of the 11th IEEE International Conference on Information Reuse and Integration (IEEE-IRI 2014), San Francisco, 13–15 September 2014 (2014) (to appear). http://arxiv.org/abs/1305.0574, CoRR 2013
Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k sat problem. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD 2013), pp. 403–418 (2013)
Khiari, M., Boizumault, P., Crémilleux, B.: Combining csp and constraint-based mining for pattern discovery. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds.) ICCSA 2010, Part II. LNCS, vol. 6017, pp. 432–447. Springer, Heidelberg (2010)
Metivier, J.P., Boizumault, P., Crémilleux, B., Khiari, M., Loudni, S.: A constraint-based Language for Declarative Pattern Discovery. In: 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW), Vancouver, Canada, pp. 1112–1119 (2011)
Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)
Marques-Silva, J., Lynce, I.: Towards robust cnf encodings of cardinality constraints. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741, pp. 483–497. Springer, Heidelberg (2007)
Sinz, C.: Towards an optimal cnf encoding of boolean cardinality constraints. In: van Beek, P. (ed.) CP 2005. LNCS, vol. 3709, pp. 827–831. Springer, Heidelberg (2005)
Tseitin, G.: On the complexity of derivations in the propositional calculus. In: Structures in Constructives Mathematics and Mathematical Logic, Part II, pp. 115–125 (1968)
Warners, J.P.: A linear-time transformation of linear inequalities into conjunctive normal form. Information Processing Letters (1996)
Yang, G.: The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: KDD 04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 344–353. ACM Press (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Jabbour, S., Sais, L., Salhi, Y. (2015). Decomposition Based SAT Encodings for Itemset Mining Problems. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)