Skip to main content

Hui2Vec: Learning Transaction Embedding Through High Utility Itemsets

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13773))

Abstract

Mining frequent itemsets (FIs) in transaction databases is a very popular task in data mining. It helps create meaningful and effective representations for customer transactions which is a key step in the process of transaction classification and clustering. To improve the quality of these representations, previous studies have adapted vector embedding methods to learn transaction embeddings from items and FIs. However, FIs are still a simple pattern type that ignores important information about transactions such as the purchase quantities of items and their unit profits. To address this issue, we propose to learn transaction embeddings from items and high-utility itemsets (HUIs), a more general pattern type. Since HUIs were shown to be more appropriate than FIs for a wide range of applications, we take for hypothesis that transaction embeddings learned from HUIs will be more representative and meaningful. We introduce an unsupervised method, named Hui2Vec, to learn transaction embeddings by combining both singleton items and HUIs. We demonstrate the superior quality of the embedding achieved with the proposed method compared to the embeddings learned from items and FIs on four datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: ICDE 2007, pp. 716–725 (2007)

    Google Scholar 

  2. Fournier-Viger, P., Lin, J.C.-W., Vo, B., Chi, T.T., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdiscip. Data Min. Knowl. Discov. 7(4), e1207 (2017)

    Google Scholar 

  3. He, Z., Feiyang, G., Zhao, C., Liu, X., Jun, W., Wang, J.: Conditional discriminative pattern mining: concepts and algorithms. Inf. Sci. 375, 1–15 (2017)

    Article  Google Scholar 

  4. Kameya, Y., Sato, T.: RP-growth, Top-k mining of relevant patterns with minimum support raising. In: SIAM International Conference on Data Mining 2012, pp. 816–827 (2012)

    Google Scholar 

  5. Nguyen, D., Nguyen, T.D., Luo, W., Venkatesh, S.: Trans2Vec: learning transaction embedding via items and frequent itemsets. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 361–372. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_29

    Chapter  Google Scholar 

  6. Zida, S., Fournier-Viger, P., Chun-Wei Lin, J., Wu, C.W., Tseng, V.S.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)

    Article  Google Scholar 

  7. Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)

    Article  Google Scholar 

  8. Fournier-Viger, P., Wu, C.-W., Tseng, V.S.: Novel concise representations of high utility itemsets using generator patterns. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS (LNAI), vol. 8933, pp. 30–43. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14717-8_3

    Chapter  Google Scholar 

  9. Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)

    Article  Google Scholar 

  10. Tseng, V.S., Shie, B.E., Wu, C.W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)

    Article  Google Scholar 

  11. Thilagu, M., Nadarajan, R.: Effciently mining of effective web traversal patterns with average utility. In: Proceedings of the International Conference on Communication, Computing, and Security, pp. 444–451. CRC Press (2012)

    Google Scholar 

  12. Fournier-Viger, P., Lin, J.C.-W., Nkambou, R., Vo, B., Tseng, V.S. (eds.): High-Utility Pattern Mining. SBD, vol. 51. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04921-8

    Book  Google Scholar 

  13. Liu, Y., Cheng, C., Tseng, V.S.: Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinform. 14(230) (2013)

    Google Scholar 

  14. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119 (2013)

    Google Scholar 

  15. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML 2014, pp. 1188–1196 (2014)

    Google Scholar 

  16. Chen, M.: Efficient vector representation for documents through corruption. In: ICLR 2017 (2017)

    Google Scholar 

  17. Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014)

    MATH  Google Scholar 

  18. Lan, G.C., Hong, T.P., Tseng, V.S.: An efficient projection-based indexing approach for mining high utility itemsets. Knowl. Inf. Syst. 38(1), 85–107 (2014)

    Article  Google Scholar 

  19. Liu, J., Wang, K., Fung, B.: Direct discovery of high utility itemsets without candidate generation. In: Proceedings of the 12th IEEE International Conference on Data Mining, IEEE, Brussels, Belgium, December 2012, p. 984989 (2012)

    Google Scholar 

  20. Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79

    Chapter  Google Scholar 

  21. Song, W., Liu, Y., Li, J.: BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Proc. Int. J. Data Warehous. Min. 10(1), 1–15 (2014)

    Article  Google Scholar 

  22. Yun, U., Ryang, H., Ryu, K.H.: High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst. Appl. 41(8), 3861–3878 (2014)

    Article  Google Scholar 

  23. Grohe, M.: Word2vec, Node2vec, Graph2vec, X2vec: towards a theory of vector embeddings of structured data. In: ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, pp. 1–16 (2020)

    Google Scholar 

  24. Luo, J., Xiao, S., Jiang, S.: Ripple2Vec: node embedding with ripple distance of structures. Data Sci. Eng. 7, 156–174 (2022)

    Article  Google Scholar 

  25. Cao, S., Lu, W., Xu, Q.: GraRep. Learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900 (2015)

    Google Scholar 

  26. Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1105–1114 (2016)

    Google Scholar 

  27. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

    Google Scholar 

  28. Grover, A., Leskovec., J.: Node2Vec: scalable feature learning for networks. In: Krishnapuram, B.B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)

    Google Scholar 

  29. Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal., S.: Graph2Vec: learning distributed representations of graphs. ArXiv (CoRR), arXiv:1707.05005 [cs.AI] (2017)

  30. Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018. pp. 2609–2615 (2018)

    Google Scholar 

  31. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)

    Article  Google Scholar 

  32. Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: ACL 2015, pp. 1702–1712 (2015)

    Google Scholar 

Download references

Acknowledgment

Authors would like to thank the authors of Trans2Vec [5] for providing their source code. This work is partially supported by NARD Intelligence\(^{1}\).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Belghith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belghith, K., Fournier-Viger, P., Jawadi, J. (2022). Hui2Vec: Learning Transaction Embedding Through High Utility Itemsets. In: Roy, P.P., Agarwal, A., Li, T., Krishna Reddy, P., Uday Kiran, R. (eds) Big Data Analytics. BDA 2022. Lecture Notes in Computer Science, vol 13773. Springer, Cham. https://doi.org/10.1007/978-3-031-24094-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24094-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24093-5

  • Online ISBN: 978-3-031-24094-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics