Skip to main content

A Comparative Study of SAT-Based Itemsets Mining

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXXIII (SGAI 2016)

Abstract

Mining frequent itemsets from transactional datasets is a well known problem. Thus, various methods have been studied to deal with this issue. Recently, original proposals have emerged from the cross-fertilization between data mining and artificial intelligence. In these declarative approaches, the itemset mining problem is modeled either as a constraint network or a propositional formula whose models correspond to the patterns of interest. In this paper, we focus on the propositional satisfiability based itemset mining framework. Our main goal is to enhance the efficiency of SAT model enumeration algorithms. This issue is particularly crucial for the scalability and competitiveness of such declarative itemset mining approaches. In this context, we deeply analyse the effect of the different SAT solver components on the efficiency of the model enumeration problem. Our analysis includes the main components of modern SAT solvers such as restarts, activity based variable ordering heuristics and clauses learning mechanism. Through extensive experiments, we show that these classical components play an essential role in such procedure to improve the performance by pushing forward the efficiency of SAT solvers. More precisely, our experimental evaluation includes a comparative study in enumerating all the models corresponding to the closed frequent itemsets. Additionally, our experimental analysis is extended to include the Top-k itemset mining problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    FIMI: http://fimi.ua.ac.be/data/.

  2. 2.

    CP4IM: http://dtai.cs.kuleuven.be/CP4IM/datasets/.

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. SIGMOD ’93, pp. 207–216. ACM, New York, NY, USA (1993)

    Google Scholar 

  2. Asín, R., Nieuwenhuis, R., Oliveras, A., Rodríguez-Carbonell, E.: Cardinality networks: a theoretical and empirical study. Constraints 16(2), 195–221 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bailleux, O., Boufkhad, Y.: Efficient CNF encoding of boolean cardinality constraints. In: CP, pp. 108–122 (2003)

    Google Scholar 

  4. Biere, A., Heule, M.J.H., van Maaren, H., Walsh, T. (eds.): Handbook of Satisfiability, Frontiers in AI and Applications, vol. 185. IOS Press (2009)

    Google Scholar 

  5. Chauhan, P., Clarke, E.M., Kroening, D.: Using sat based image computation for reachability analysis. Tech. rep., Technical Report CMU-CS-03-151 (2003)

    Google Scholar 

  6. Coquery, E., Jabbour, S., Saïs, L., Salhi, Y.: A SAT-based approach for discovering frequent, closed and maximal patterns in a sequence. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI’12), pp. 258–263 (2012)

    Google Scholar 

  7. Davis, M., Logemann, G., Loveland, D.W.: A machine program for theorem-proving. Commun. ACM 5(7), 394–397 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fu, A.W.C., Kwong, R.W.W., Tang, J.: Mining n-most interesting itemsets. In: Proceedings of the 12th International Symposium on Methodologies for Intelligent Systems (ISMIS 2000), Lecture Notes in Computer Science, pp. 59–67. Springer (2000)

    Google Scholar 

  9. Gebser, M., Kaufmann, B., Neumann, A., Schaub, T.: Conflict-driven answer set enumeration. In: Baral, C., Brewka, G., Schlipf, J. (eds.) Logic Programming and Nonmonotonic Reasoning. Lecture notes in computer science, vol. 4483, pp. 136–148. Springer, Berlin (2007)

    Chapter  Google Scholar 

  10. Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  11. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 1–12 (2000)

    Article  Google Scholar 

  12. Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9–12 December 2002, Maebashi City, Japan, pp. 211–218 (2002)

    Google Scholar 

  13. Jabbour, S., Lonlac, J., Sais, L., Salhi, Y.: Extending modern SAT solvers for models enumeration. In: Proceedings of the 15th IEEE International Conference on Information Reuse and Integration, IRI 2014, Redwood City, CA, USA, August 13–15, 2014, pp. 803–810 (2014)

    Google Scholar 

  14. Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: 22nd ACM International Conference on Information and Knowledge Management (CIKM’13), pp. 649–658. ACM (2013)

    Google Scholar 

  15. Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP 13(4-5-Online-Supplement) (2013)

    Google Scholar 

  16. Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k sat problem. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’03), pp. 403–418 (2013)

    Google Scholar 

  17. Jeroslow, R.G., Wang, J.: Solving propositional satisfiability problems. Ann. Math. Artif. Intell. 1, 167–187 (1990)

    Article  MATH  Google Scholar 

  18. Jin, H., Han, H., Somenzi, F.: Efficient conflict analysis for finding all satisfying assignments of a boolean circuit. In: In TACAS’05, LNCS 3440, pp. 287–300. Springer (2005)

    Google Scholar 

  19. Marques-Silva, J.P., Sakallah, K.A.: GRASP—A new search algorithm for satisfiability. In: Proceedings of IEEE/ACM CAD, pp. 220–227 (1996)

    Google Scholar 

  20. McMillan, K.L.: Applying sat methods in unbounded symbolic model checking. In: Proceedings of the 14th International Conference on Computer Aided Verification (CAV’02), pp. 250–264 (2002)

    Google Scholar 

  21. Morgado, A.R., Marques-Silva, J.A.P.: Good Learning and Implicit Model Enumeration. In: International Conference on Tools with Artificial Intelligence (ICTAI’2005), pp. 131–136. IEEE (2005)

    Google Scholar 

  22. Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 441–448 (2001)

    Google Scholar 

  23. Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)

    Google Scholar 

  24. Silva, J.P.M., Lynce, I.: Towards robust cnf encodings of cardinality constraints. In: CP, pp. 483–497 (2007)

    Google Scholar 

  25. Sinz, C.: Towards an optimal cnf encoding of boolean cardinality constraints. In: CP’05, pp. 827–831 (2005)

    Google Scholar 

  26. Tiwari, A., Gupta, R., Agrawal, D.: A survey on frequent pattern mining: current status and challenging issues. Inf. Technol. J 9, 1278–1293 (2010)

    Article  Google Scholar 

  27. Tseitin, G.: On the complexity of derivations in the propositional calculus. In: H. Slesenko (ed.) Structures in Constructives Mathematics and Mathematical Logic, Part II, pp. 115–125 (1968)

    Google Scholar 

  28. Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, November 1, 2004 (2004)

    Google Scholar 

  29. Zhang, L., Madigan, C.F., Moskewicz, M.W., Malik, S.: Efficient conflict driven learning in Boolean satisfiability solver. In: IEEE/ACM CAD’2001, pp. 279–285 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Said Jabbour .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Dlala, I.O., Jabbour, S., Sais, L., Yaghlane, B.B. (2016). A Comparative Study of SAT-Based Itemsets Mining. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXIII. SGAI 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-47175-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47175-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47174-7

  • Online ISBN: 978-3-319-47175-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics