Skip to main content
Log in

A multi-valued and sequential-labeled decision tree method for recommending sequential patterns in cold-start situations

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

We plan to recommend some initial suitable single-itemed sequences like a flight itinerary based on a preference pattern in the form of personalized sequential pattern to each cold-start user. However, sequential pattern mining has never treated a conventional sequential pattern as a personalized pattern. Besides, as a cold-start user lacks the personalized sequential pattern, collaborative filtering cannot recommend one any single-itemed sequences. Thus, we first design such a preference pattern, namely representative sequential pattern, which reflects one’s main frequently recurring buying behavior mined from the item-sequences during a time period. After sampling a training-set from non-cold-start users who prefer similar items, we propose an auxiliary algorithm to mine the representative sequential pattern as the sequential class labels of each training instance. A multi-label classifier seems therefore be trained to predict the sequential-label for each cold-start user based on one’s features. However, most multi-label classification methods are designed to classify data whose class labels are non-sequential. Besides, some of the predictor attributes would be multi-valued in the real world. Aiming to handle such data, we have developed a novel algorithm, named MSDT (Multi-valued and Sequential-labeled Decision Tree). Experimental results indicate it outperforms all the baseline multi-label algorithms in accuracy even if three of them are deep learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Agrawal R, Ghosh S, Imielinski T, Iyer B, Swami A (1992) An interval classifier for database mining applications. In: Proceedings of VLDB’92, Vancouver, pp 560–573

  2. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of ACM SIGKDD KDD’02, Edmonton, pp 429–435

  3. Biswas S, Lakshmanan LVS, Ray SB (2017) Combating the Cold Start User Problem in Model Based Collaborative Filtering. arXiv:1703.00397 [cs.IR]

  4. Breiman L (2001) Random Forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  5. Chen YL, Hsu CL, Chou SC (2003) Constructing a multi-valued and multi-labeled decision tree. Expert Syst Appl 25(2):199–209

    Article  Google Scholar 

  6. Chen WJ, Shao YH, Li CN, Deng NY (2016) MLTSVM: A novel twin support vector machine to multi-label learning. Pattern Recognit 52:61–74

    Article  Google Scholar 

  7. Chou S C, Hsu CL (2005) MMDT: A multi-valued and multi-labeled decision tree classifier for data mining. Expert Syst Appl 28(4):799–812

    Article  Google Scholar 

  8. Dhaliwal J, Puglisi SJ, Turpin A (2012) Practical efficient string mining. IEEE Trans Knowl Data Eng 24(4):735–744

    Article  Google Scholar 

  9. Fader PS, Hardie BGS (2001) Forecasting repeat sales at CDNOW: a case study. Interfaces 31(3):S94–S107

    Article  Google Scholar 

  10. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of IJCAI’93, Chambery, vol 2, pp 1022–1027

  11. Fournier-Viger P, Wu CW, Tseng VS (2013) Mining maximal sequential patterns without candidate maintenance. In: Proceedings of ADMA’13, Hangzhou, pp 169–180

  12. Fournier-Viger P, Wu CW, Gomariz A, Tseng VS (2014A) VMSP: Efficient vertical mining of maximal sequential patterns. In: Proceedings of Canadian AI’14, Montréal, pp 83–94

  13. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014b) SPMF: A java open-source pattern mining library. J Mach Learn Res 15:3569–3573

    MATH  Google Scholar 

  14. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

    Article  Google Scholar 

  15. Graham B, Dodd D (2008) Security analysis, 6th edn. McGraw-Hill, New York

    Google Scholar 

  16. Han J, Kamber M (2006) Data mining: Concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  17. Jaro MA (1989) Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. J Am Stat Assoc 84(406):414–420

    Article  Google Scholar 

  18. Kadous MW, Sammut C (2005) Classification of multivariate time series and structured data using constructive induction. Mach Learn 58:179–216

    Article  Google Scholar 

  19. Kohavi R (1995) A study of cross validation and bootstrap for accuracy estimation and model selection. In: Proceedings of IJCAI’95, Montreal, pp 1137–1143

  20. Osojnik A, Panov P (2018) Tree-based methods for online multi-target regression. Intell Inf Syst 50(2):315–339

    Article  Google Scholar 

  21. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in Python. Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  22. Plantevit M, Choong YW, Laurent A, Laurent D, Teisseire M (2005) M2SP: Mining sequential patterns among several dimensions. In: Proceedings of PKDD’05, Porto, pp 205–216

  23. Quinlan JR (1979) Discovering rules from large collections of examples: a case study. In: Michie D (ed) Expert systems in the microelectronic age. 6th edn. Edinburgh University Press, Edinburgh, pp 169–201

  24. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  25. Quinlan JR (1993) C4.5: Programs for machine learn. Morgan Kaufmann, San Mateo

    Google Scholar 

  26. Raiko T, Valpola H, Lecun Y (2012) Deep learning made easier by linear transformations in perceptrons. In: Proceedings of PMLR, La Palma, vol 22, pp 924–932

  27. Read J, Martino L, Luengo D, Olmos P (2015) Scalable multi-output label prediction: From classifier chains to classifier trellises. Pattern Recogn 48(6):2096–2109

    Article  Google Scholar 

  28. Sahoo N, Singh PV, Mukhopadhyay T (2012) A hidden Markov model for collaborative filtering. MIS Q 36(4):1329–1356

    Article  Google Scholar 

  29. Schlüter T, Conrad S (2012) Hidden markov model-based time series prediction using motifs for detecting inter-time-serial correlations. In: Proceedings of ACM SAC’12, Riva del Garda, pp 158–164

  30. Scikit-learn developers (2018) Scikit-learn user guide Release Release 0.19.2. https://scikit-learn.org/0.19/_downloads/scikit-learn-docs.pdf

  31. Schafer JB, Frankowski D, Herlocker J, Sen S (2007) Collaborative filtering recommender systems. In: Brusilovsky P, Kobsa A, Nejdl W (eds) The adaptive web. Lecture notes in computer science. Springer, Berlin, p 4321

  32. Steinberg D, Colla P (2009) Cart: Classification and regression trees. In: Wu X, Kumar V (eds) The top ten algorithms in data mining, vol 9. CRC press, pp 179–203

  33. Szymański P, Kajdanowicz T (2019) Scikit-multilearn: a scikit-based Python environment for performing multi-label classification. J Mach Learn Res 20(1):209–230

    MATH  Google Scholar 

  34. Szymański P, Kajdanowicz T, Kersting K (2016) How is a data-driven approach better than random choice in label space division for multi-label classification? Entropy 18(282):1–30

    Google Scholar 

  35. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison Wesley, Boston

    Google Scholar 

  36. Tsai CJ (2014) A study of improving the performance of mining multi-valued and multi-labeled data. Inf-Lithuan 25:95–111

    Google Scholar 

  37. Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-Labelsets for Multilabel Classification. IEEE Knowl Data En 23(7):1079–1089

    Article  Google Scholar 

  38. Wang M, Iyer B, Vitter JS (1998) Scalable mining for classification rules in relational databases. In: Proceedings of IDEAS’98, Cardiff, pp 58–67

  39. Winkler WE (2006) Overview of record linkage and current research directions, research report series, statistics #2006-2. U.S Census Bureau, Washington

  40. Xiao S, Dong M (2015) Hidden semi-Markov model-based reputation management system for online to offline (O2O) e-commerce markets. Decis Support Syst 77:87–99

    Article  Google Scholar 

  41. Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor Newsl 12(1):40–48

    Article  Google Scholar 

  42. Zaki M J (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1-2):31–60

    Article  Google Scholar 

  43. Zhang ML, Zhou ZH (2007) ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

  44. Zheng Y (2015) Trajectory data mining: an overview. ACM Trans on Intell Syst and Technol 6(3:29):1–41

    Article  Google Scholar 

  45. Zhu X, Davidson I (2007) Knowledge discovery and data mining: Challenges and realities. IGI Global, Hershey

    Book  Google Scholar 

Download references

Acknowledgements

I am very grateful to Professor Ray-I Chang and Huichen Huang for refining the writing of this paper; and those authors for sharing their API or open source codes: Jonathan Liang and Oliver Mannion, the authors of the API, CasperDataSets; Philippe Fournier-Viger, the author of the SPAM algorithm; Philippe Fournier-Viger and Antonio Gomariz, the authors of the VMSP algorithm; and all the authors of APIs in Python or Java, used to code all the baseline algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang-Ling Hsu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsu, CL. A multi-valued and sequential-labeled decision tree method for recommending sequential patterns in cold-start situations. Appl Intell 51, 506–526 (2021). https://doi.org/10.1007/s10489-020-01806-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01806-0

Keywords

Navigation