Abstract
A periodic high-utility sequential pattern (PHUSP) is a pattern that not only yields a high-utility (e.g. high profit) but also appears regularly in a sequence database. Finding PHUSPs is useful for several applications such as market basket analysis, where it can reveal recurring and profitable customer behavior. Although discovering PHUSPs is desirable, it is computationally difficult. To discover PHUSPs efficiently, this paper proposes a structure for periodic high-utility sequential pattern mining (PHUSPM) named PUSP. Furthermore, to reduce the search space and speed up PHUSPM, a pruning strategy is developed. This results in an efficient algorithm called periodic high-utility sequential pattern optimal miner (PUSOM). An experimental evaluation was performed on both synthetic and real-life datasets to compare the performance of PUSOM with state-of-the-art PHUSPM algorithms in terms of execution time, memory usage and scalability. Experimental results show that the PUSOM algorithm can efficiently discover the complete set of PHUSPs. Moreover, it outperforms the other four algorithms as the former can prune many unpromising patterns using its designed structure and pruning strategy.
Similar content being viewed by others
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, 1995, IEEE, pp 3–14. https://doi.org/10.1109/icde.1995.380415
Ahmed CF, Tanbeer SK, Jeong B-S (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686. https://doi.org/10.4218/etrij.10.1510.0066
Amphawan K, Lenca P, Surarerks A (2009) Mining top-k periodic-frequent pattern from transactional databases without support threshold. In: Advances in information technology, pp 18–29. https://doi.org/10.1007/978-3-642-10392-6_3
Dinh T, Huynh V-N, Le B (2017) Mining periodic high utility sequential patterns. In: Asian conference on intelligent information and database systems. Springer, Berlin, pp 545–555. https://doi.org/10.1007/978-3-319-54472-4_51
Dinh T, Quang MN, Le B (2015). In: Proceedings of the sixth international symposium on information and communication technology. ACM, A novel approach for hiding high utility sequential patterns, pp 121–128. https://doi.org/10.1145/2833258.2833271
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, Tseng VS et al (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393. https://doi.org/10.1007/978-3-319-46131-1_8
Fournier-Viger J, Lin C-W, Dinh T, Le HB (2016) Mining correlated high-utility itemsets using the bond measure. In: International conference on hybrid artificial intelligence systems. Springer, pp 53–65. https://doi.org/10.1007/978-3-319-32034-2_5
Fournier-Viger P, Lin JC-W, Duong Q-H, Dam T-L (2016) Phm: mining periodic high-utility itemsets. In: Industrial Conference on Data Mining. Springer, pp 64–79. https://doi.org/10.1007/978-3-319-41561-1_6
Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
Fournier-Viger P, Zhang Y, Lin JC-W, Dinh D-T, Le HB (2018) Mining correlated high-utility itemsets using various measures. Logic J IGPL
Fradkin D, Mörchen F (2015) Mining sequential patterns for classification. Knowl Inf Syst 45(3):731–749. https://doi.org/10.1007/s10115-014-0817-0
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Hong T-P, Fujita H (2018) A survey of incremental high-utility itemset mining. Wiley Interdiscip Rev: Data Min Knowl Discov 8(2):e1242. https://doi.org/10.1002/widm.1242
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Yu PS (2018) A survey of utility-oriented pattern mining. arXiv:1805.10511
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2018) A Survey of Parallel Sequential Pattern Mining. arXiv:1805.10515
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: 15th international conference on data engineering, 1999. Proceedings. IEEE, pp 106–115. https://doi.org/10.1109/icde.1999.754913
Kiran RU, Kitsuregawa M, Reddy PK (2016) Efficient discovery of periodic-frequent patterns in very large databases. J Syst Softw 112:110–121. https://doi.org/10.1016/j.jss.2015.10.035
Kiran RU, Venkatesh JN, Toyoda M, Kitsuregawa M, Reddy PK (2017) Discovering partial periodic-frequent patterns in a transactional database. J Syst Softw 125:170–182. https://doi.org/10.1016/j.jss.2016.11.035
Lan G-C, Hong T-P, Tseng VS, Wang S-L (2014) Applying the maximum utility measure in high utility sequential pattern mining. Exp Syst Appl 41(11):5071–5081. https://doi.org/10.1016/j.eswa.2014.02.022
Le B, Dinh D-T, Huynh V-N, Nguyen Q-M, Fournier-Viger P (2018) An efficient algorithm for hiding high utility sequential patterns. Int J Approx Reason. https://doi.org/10.1016/j.ijar.2018.01.005
Le B, Huynh U, Dinh D-T (2018) A pure array structure and parallel strategy for high-utility sequential pattern mining. Exp Syst Appl 104:107–120. https://doi.org/10.1016/j.eswa.2018.03.019
Lin JC-W, Zhang J, Fournier-Viger P (2017) High-utility sequential pattern mining with multiple minimum utility thresholds. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) joint conference on web and big data. Springer, pp 215–229. https://doi.org/10.1007/978-3-319-63579-8_17
Lin JC-W, Zhang J, Fournier-Viger P (2017) High-utility sequential pattern mining with multiple minimum utility thresholds. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) joint conference on web and big data. Springer, pp 215–229. https://doi.org/10.1109/smc.2016.7844710
Lin JC-W, Zhang J, Fournier-Viger P, Hong T-P, Zhang J (2017) A two-phase approach to mine short-period high-utility itemsets in transactional databases. Adv Eng Inform 33:29–43. https://doi.org/10.1016/j.aei.2017.04.007
Perera D, Kay J, Koprinska I, Yacef K, Zaïane OR (2009) Clustering and sequential pattern mining of online collaborative learning data. IEEE Trans Knowl Data Eng 21(6):759–772. https://doi.org/10.1109/tkde.2008.138
Quang MN, Dinh T, Huynh U, Le B (2016) Mhhusp: an integrated algorithm for mining and hiding high utility sequential patterns. In: 2016 Eighth international conference on knowledge and systems engineering (KSE). IEEE, pp 13–18. https://doi.org/10.1109/kse.2016.7758022
Quang MN, Huynh U, Dinh T, Le NH, Le B (2016) An approach to decrease execution time and difference for hiding high utility sequential patterns. In: Integrated uncertainty in knowledge modelling and decision making: 5th international symposium, IUKM 2016, Da Nang, Vietnam, November 30–December 2, 2016, Proceedings 5. Springer, pp 435–446. https://doi.org/10.1007/978-3-319-49046-5_37
Shie B-E, Cheng J-H, Chuang K-T, Tseng VS (2012) A one-phase method for mining high utility mobile sequential patterns in mobile commerce environments. In: Proceedings of 25th international conference on industrial engineering and other applications of applied intelligent systems. Springer, pp 616–626. https://doi.org/10.1007/978-3-642-31087-4_63
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Advances in Database Technology–EDBT’96, pp 1–17. https://doi.org/10.1007/bfb0014140
Surana A, Kiran RU, Reddy PK (2012) An efficient approach to mine periodic-frequent patterns in transactional databases. Springer, Berlin, pp 254–266. https://doi.org/10.1007/978-3-642-28320-8_22
Tanbeer SK, Ahmed CF, Jeong B-S, Lee Y-K (2009) Discovering periodic-frequent patterns in transactional databases. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 242–253. https://doi.org/10.1007/978-3-642-01307-2_24
Wang J-Z, Huang J-L, Chen Y-C (2016) On efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627. https://doi.org/10.1007/s10115-015-0914-8
Wright AP, Wright AT, McCoy AB, Sittig DF (2015) The use of sequential pattern mining to predict next prescribed medications. J Biomed Inform 53:73–80. https://doi.org/10.1016/j.jbi.2014.09.003
Wu Y, Wang L, Ren J, Ding W, Wu X (2014) Mining sequential patterns with periodic wildcard gaps. Appl Intell 41(1):99–116. https://doi.org/10.1007/s10489-013-0499-4
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM international conference on data mining. SIAM, pp 482–486. https://doi.org/10.1137/1.9781611972740.51
Yin J, Zheng Z, Cao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 660–668. https://doi.org/10.1145/2339530.2339636
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: 2013 IEEE 13th international conference on data mining (ICDM). IEEE, pp 1259–1264. https://doi.org/10.1109/icdm.2013.148
Yu X, Zhang Z, Yu H, Jiang F, Ji W (2015) An asynchronous periodic sequential pattern mining algorithm with multiple minimum item supports for ad hoc networking. J Sens. https://doi.org/10.1155/2015/461659
Yun U, Kim D, Yoon E, Fujita H (2017) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst https://doi.org/10.1016/j.knosys.2017.12.029
Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowledge-Based Syst 124:188–206. https://doi.org/10.1016/j.knosys.2017.03.016
Zihayat M, Chen Y, An A (2017) Memory-adaptive high utility sequential pattern mining over data streams. Mach Learn 106(6):799–836. https://doi.org/10.1007/s10994-016-5617-1
Acknowledgments
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2015.07.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dinh, DT., Le, B., Fournier-Viger, P. et al. An efficient algorithm for mining periodic high-utility sequential patterns. Appl Intell 48, 4694–4714 (2018). https://doi.org/10.1007/s10489-018-1227-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1227-x