Skip to main content
Log in

Customer Activity Sequence Classification for Debt Prevention in Social Security

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Trans. Speech and Audio Signal Processing, May 1997, 5(3): 257–265.

    Article  Google Scholar 

  2. Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. Text classification using string kernels. Journal of Machine Learning Research, 2002, 2: 419–444.

    Article  MATH  Google Scholar 

  3. Baker L D, McCallum A K. Distributional clustering of words for text classification. In Proc. the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 24–28, 1998, pp.96–103.

  4. Wu C, Berry M, Shivakumar S, McLarty J. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, October, 1995, 21(1/2): 177–193.

    Article  Google Scholar 

  5. Chuzhanova N A, Jones A J, Margetts S. Feature selection for genetic sequence classification. Bioinformatics, 1998, 14(2): 139–143.

    Article  Google Scholar 

  6. She R, Chen F, Wang K, Ester M, Gardy J L, Brinkman F S L. Frequent-subsequence-based prediction of outer membrane proteins. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2003), Washington DC, USA, August 24–27, 2003, pp.436–445.

  7. Sonnenburg S, R¨atsch G, Sch¨afer C. Learning interpretable SVMs for biological sequence classification. In Proc. Research in Computational Molecular Biology (RECOMB2005), Cambridge, USA, May 14–18, 2005, pp.389–407.

  8. Hakeem A, Sheikh Y, Shah M. CASEE: A hierarchical event representation for the analysis of videos. In Proc. the Nineteenth National Conference on Artificial Intelligence (AAAI2004), San Jose, USA., July 25–29, 2004, pp.263–268.

  9. Eichinger F, Nauck D D, Klawonn F. Sequence mining for customer behaviour predictions in telecommunications. In Proc. the Workshop on Practical Data Mining at ECML/PKDD, Berlin, Germany, September 18–22, 2006, pp.3–10.

  10. Centrelink Annual Report 2007-2008. Technical Report, Centrelink, 2008.

  11. Lesh N, Zaki M J, Ogihara M. Mining features for sequence classification. In Proc. the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, August 15–18, 1999, pp.342–346.

  12. Tseng V S M, Lee C-H. CBS: A new classification method by using sequential patterns. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.596–600.

  13. Xing Z, Pei J, Dong G, Yu P S. Mining sequence classifiers for early prediction. In Proc. SIAM International Conference on Data Mining (SDM2008), Atlanta, USA, April 24–26, 2008, pp.644–655.

  14. Exarchos T P, Tsipouras M G, Papaloukas C, Fotiadis D I. A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data & Knowledge Engineering, September 2008, 66(3): 467–487.

    Article  Google Scholar 

  15. Agrawal R, Srikant R. Mining sequential patterns. In Proc. the Eleventh IEEE International Conference on Data Engineering (ICDE 1995), Taipei, China, March 6–10, 1995, pp.3–14.

  16. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 17th IEEE International Conference on Data Engineering (ICDE 2001), Heidelberg, Germany, April 2–6, 2001, pp.215–224.

  17. Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.429–435.

  18. Yan X, Han J, Afshar R. Clospan: Mining closed sequential patterns in large datasets. In Proc. SIAM International Conference on Data Mining (SDM2003), San Francisco, USA, May 1–3, 2003, pp.166–177.

  19. Zaki M J. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31–60.

    Article  MATH  Google Scholar 

  20. Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In Proc. the 4th ACM International Conference on Knowledge Discovery and Data Mining (KDD1998), Menlo Park, USA, August 27–31, 1998, pp.80–86.

  21. Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. the First IEEE International Conference on Data Mining (ICDM2001), Los Alamitos, USA, Nov. 29–Dec.2, 2001, pp.369–376.

  22. Cheng H, Yan X, Han J, Hsu C-W. Discriminative frequent pattern analysis for effective classification. In Proc. 23rd IEEE International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 17–20, 2007, pp.716–725.

  23. Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, August 2007, 15(1): 55–86.

    Article  MathSciNet  Google Scholar 

  24. Verhein F, Chawla S. Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In Proc. the Seventh IEEE International Conference on Data Mining (ICDM2007), Omaha, USA, Oct. 28–31, 2007, pp.679–684.

  25. Antonie M L, Zaiane O R, Holte R C. Learning to use a learned model: A two-stage approach to classification. In Proc. the Sixth International Conference on Data Mining (ICDM2006), Hong Kong, China, Dec. 18–22, 2006, pp.33–42.

  26. Baralis E, Garza P. A lazy approach to pruning classification rules. In Proc. the Second IEEE International Conference on Data Mining (ICDM2002), Maebashi City, Japan, Dec. 9–12, 2002, pp.35–42.

  27. Wang J, Karypis G. Harmony: Efficiently mining the best rules for classification. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.205–216.

  28. Cheng H, Yan X, Han J, Yu P S. Direct discriminative pattern mining for effective classification. In Proc. the 24th IEEE International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7–12, 2008, pp.169–178.

  29. Tan P-N, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.32–41.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaifeng Zhang.

Additional information

This work is supported by Australian Research Council Linkage Project under Grant No. LP0775041 and the Early Career Researcher Grant under Grant No. 2007002448 from University of Technology, Sydney, Australia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Zhao, Y., Cao, L. et al. Customer Activity Sequence Classification for Debt Prevention in Social Security. J. Comput. Sci. Technol. 24, 1000–1009 (2009). https://doi.org/10.1007/s11390-009-9288-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-009-9288-2

Keywords

Navigation