skip to main content
10.1145/3543873.3587586acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Weighted Statistically Significant Pattern Mining

Published:30 April 2023Publication History

ABSTRACT

Pattern discovery (aka pattern mining) is a fundamental task in the field of data science. Statistically significant pattern mining (SSPM) is the task of finding useful patterns that statistically occur more often from databases for one class than for another. The existing SSPM task does not consider the weight of each item. While in the real world, the significant level of different items/objects is various. Therefore, in this paper, we introduce the Weighted Statistically Significant Patterns Mining (WSSPM) problem and propose a novel WSSpm algorithm to successfully solve it. We present a new framework that effectively mines weighted statistically significant patterns by combining the weighted upper-bound model and the multiple hypotheses test. We also propose a new weighted support threshold that can satisfy the demand of WSSPM and prove its correctness and completeness. Besides, our weighted support threshold and modified weighted upper-bound can effectively shrink the mining range. Finally, experimental results on several real datasets show that the WSSpm algorithm performs well in terms of execution time and memory storage.

References

  1. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In The ACM SIGMOD International Conference on Management of Data. ACM, 207–216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yoav Benjamini and Daniel Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 4 (2001), 1165–1188.Google ScholarGoogle ScholarCross RefCross Ref
  3. Carlo Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8 (1936), 3–62.Google ScholarGoogle Scholar
  4. Chien-Ming Chen, Lili Chen, Wensheng Gan, Lina Qiu, and Weiping Ding. 2021. Discovering high utility-occupancy patterns from uncertain data. Information Sciences 546 (2021), 1208–1229.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pilsun Choi and Buhyun Hwang. 2017. Dynamic weighted sequential pattern mining for USN system. In The 11th International Conference on Ubiquitous Information Management and Communication. ACM, 1–6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. EunYi Chung and Joseph P Romano. 2013. Exact and asymptotically robust permutation tests. Annals of Statistics 41, 2 (2013), 484–507.Google ScholarGoogle ScholarCross RefCross Ref
  7. Guozhu Dong and James Bailey. 2012. Contrast data mining: concepts, algorithms, and applications. CRC Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wouter Duivesteijn and Arno Knobbe. 2011. Exploiting false discoveries–statistical validation of patterns and quality measures in subgroup discovery. In The IEEE 11th International Conference on Data Mining. IEEE, 151–160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ronald A Fisher. 1922. On the interpretation of χ 2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society 85, 1 (1922), 87–94.Google ScholarGoogle ScholarCross RefCross Ref
  10. Philippe Fournier-Viger, Wensheng Gan, Youxi Wu, Mourad Nouioua, Wei Song, Tin Truong, and Hai Duong. 2022. Pattern mining: Current challenges and opportunities. In International Conference Database Systems for Advanced Applications International Workshops. Springer, 34–49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita. 2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 2 (2018), e1242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Vincent S Tseng, and Philip S Yu. 2021. A survey of utility-oriented pattern mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2021), 1306–1327.Google ScholarGoogle ScholarCross RefCross Ref
  13. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S Yu. 2019. A survey of parallel sequential pattern mining. ACM Transactions on Knowledge Discovery from Data 13, 3 (2019), 1–34.Google ScholarGoogle Scholar
  14. Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Justin Zhan. 2017. Mining of frequent patterns with multiple minimum supports. Engineering Applications of Artificial Intelligence 60 (2017), 83–96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wensheng Gan, Jerry Chun Wei Lin, Philippe Fournier-Viger, Han Chieh Chao, Justin Zhan, and Ji Zhang. 2018. Exploiting highly qualified pattern with frequency and weight occupancy. Knowledge and Information Systems 56, 1 (2018), 165–196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S Yu. 2021. Fast utility mining on sequence data. IEEE Transactions on Cybernetics 51, 2 (2021), 487–500.Google ScholarGoogle ScholarCross RefCross Ref
  17. Aristides Gionis, Heikki Mannila, Taneli Mielikäinen, and Panayiotis Tsaparas. 2007. Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1, 3 (2007), 14–24.Google ScholarGoogle Scholar
  18. Yijie Gui, Wensheng Gan, Yao Chen, and Yongdong Wu. 2022. Mining with Rarity for Web Intelligence. In Companion Proceedings of the Web Conference. ACM, 973–981.Google ScholarGoogle Scholar
  19. Wilhelmiina Hämäläinen. 2012. Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowledge and Information Systems 32 (2012), 383–414.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wilhelmiina Hämäläinen and Geoffrey I Webb. 2019. A tutorial on statistically sound pattern discovery. Data Mining and Knowledge Discovery 33, 2 (2019), 325–377.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM SIGMOD Record 29, 2 (2000), 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sabrina Zaman Ishita, Faria Noor, and Chowdhury Farhan Ahmed. 2018. An efficient approach for mining weighted sequential patterns in dynamic databases. In The Industrial Conference on Data Mining. Springer, 215–229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Md Ashraful Islam, Mahfuzur Rahman Rafi, Al-amin Azad, and Jesan Ahammed Ovi. 2021. Weighted frequent sequential pattern mining. Applied Intelligence 52, 1 (2021), 1–28.Google ScholarGoogle Scholar
  24. Junpei Komiyama, Masakazu Ishihata, Hiroki Arimura, Takashi Nishibayashi, and Shin-ichi Minato. 2017. Statistical emerging pattern mining with multiple testing correction. In The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 897–906.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guo-Cheng Lan, Tzung-Pei Hong, and Hong-Yu Lee. 2014. An efficient approach for finding weighted sequential patterns from sequence databases. Applied Intelligence 41, 2 (2014), 439–452.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Duy Nguyen Le Vo, Takuto Sakuma, Taiju Ishiyama, Hiroki Toda, Kazuya Arai, Masayuki Karasuyama, Yuta Okubo, Masayuki Sunaga, Hiroyuki Hanada, and Yasuo Tabei. 2020. Stat-DSM: Statistically discriminative sub-trajectory mining with multiple testing correction. IEEE Transactions on Knowledge and Data Engineering 34, 3 (2020), 1477–1488.Google ScholarGoogle Scholar
  27. Jiuyong Li, Jixue Liu, Hannu Toivonen, Kenji Satou, Youqiang Sun, and Bingyu Sun. 2014. Discovering statistically non-redundant subgroups. Knowledge-Based Systems 67 (2014), 315–327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Vincent S Tseng. 2016. Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowledge-Based Systems 96 (2016), 171–187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Felipe Llinares-López, Mahito Sugiyama, Laetitia Papaxanthos, and Karsten Borgwardt. 2015. Fast and memory-efficient significant pattern mining via permutation testing. In The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 725–734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shin-ichi Minato, Takeaki Uno, Koji Tsuda, Aika Terada, and Jun Sese. 2014. A Fast Method of Statistical Assessment for Combinatorial Hypotheses Based on Frequent Itemset Enumeration. In The European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 422–436.Google ScholarGoogle Scholar
  31. Leonardo Pellegrina, Matteo Riondato, and Fabio Vandin. 2019. SPuManTE: Significant pattern mining with unconditional testing. In The 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1528–1538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leonardo Pellegrina and Fabio Vandin. 2018. Efficient mining of the most significant patterns with permutation testing. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2070–2079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Md Mahmudur Rahman, Chowdhury Farhan Ahmed, and Carson Kai-Sang Leung. 2019. Mining weighted frequent sequences in uncertain databases. Information Sciences 479 (2019), 76–100.Google ScholarGoogle ScholarCross RefCross Ref
  34. GD Ramkumar, Sanjay Ranka, and Shalom Tsur. 1998. Weighted association rules: Model and algorithm. In The 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 661–666.Google ScholarGoogle Scholar
  35. Kashob Kumar Roy, Md Hasibul Haque Moon, Md Mahmudur Rahman, Chowdhury Farhan Ahmed, and Carson Kai-Sang Leung. 2022. Mining weighted sequential patterns in incremental uncertain databases. Information Sciences 582 (2022), 865–896.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Huijun Tang, Jiangbo Qian, Yangguang Liu, and Xiao-Zhi Gao. 2022. Mining statistically significant patterns with high utility. International Journal of Computational Intelligence Systems 15, 1 (2022), 1–19.Google ScholarGoogle ScholarCross RefCross Ref
  37. Feng Tao, Fionn Murtagh, and Mohsen Farid. 2003. Weighted association rule mining using weighted support and significance framework. In The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 661–666.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Robert E Tarone. 1990. A modified Bonferroni method for discrete data. Biometrics 46, 2 (1990), 515–522.Google ScholarGoogle ScholarCross RefCross Ref
  39. Aika Terada, Mariko Okada-Hatakeyama, Koji Tsuda, and Jun Sese. 2013. Statistical significance of combinatorial regulations. The National Academy of Sciences 110, 32 (2013), 12996–13001.Google ScholarGoogle ScholarCross RefCross Ref
  40. Aika Terada, Koji Tsuda, and Jun Sese. 2013. Fast Westfall-Young permutation procedure for combinatorial regulation discovery. In The IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 153–158.Google ScholarGoogle ScholarCross RefCross Ref
  41. Thien Q Tran, Kazuto Fukuchi, Youhei Akimoto, and Jun Sakuma. 2020. Statistically significant pattern mining with ordinal utility. In The 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1645–1655.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Shicheng Wan, Jiahui Chen, Peifeng Zhang, Wensheng Gan, and Tianlong Gu. 2022. Discovering top-k profitable patterns for smart manufacturing. In Companion Proceedings of the Web Conference. ACM, 956–964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Geoffrey I Webb. 2008. Layered critical values: a powerful direct-adjustment approach to discovering significant patterns. Machine Learning 71, 2 (2008), 307–323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Peter H Westfall and S Stanley Young. 1993. Resampling-based multiple testing: Examples and methods for p-value adjustment. Vol. 279. John Wiley & Sons.Google ScholarGoogle Scholar
  45. Unil Yun and John J Leggett. 2005. WFIM: Weighted frequent itemset mining with a weight range and a minimum weight. In The 15th SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 636–640.Google ScholarGoogle ScholarCross RefCross Ref
  46. Unil Yun and John J Leggett. 2006. WSpan: Weighted sequential pattern mining in large sequence databases. In The 3Rd International IEEE Conference Intelligent Systems. IEEE, 512–517.Google ScholarGoogle ScholarCross RefCross Ref
  47. Chunkai Zhang, Zilin Du, Yuting Yang, Wensheng Gan, and Philip S Yu. 2021. On-shelf utility mining of sequence data. ACM Transactions on Knowledge Discovery from Data 16, 2 (2021), 1–31.Google ScholarGoogle Scholar

Index Terms

  1. Weighted Statistically Significant Pattern Mining

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
            April 2023
            1567 pages
            ISBN:9781450394192
            DOI:10.1145/3543873

            Copyright © 2023 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 April 2023

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate1,899of8,196submissions,23%

            Upcoming Conference

            WWW '24
            The ACM Web Conference 2024
            May 13 - 17, 2024
            Singapore , Singapore
          • Article Metrics

            • Downloads (Last 12 months)55
            • Downloads (Last 6 weeks)5

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format