Customer Activity Sequence Classification for Debt Prevention in Social Security

Zhang, Huaifeng; Zhao, Yanchang; Cao, Longbing; Zhang, Chengqi; Bohlscheid, Hans

doi:10.1007/s11390-009-9288-2

Customer Activity Sequence Classification for Debt Prevention in Social Security

Regular Paper
Published: 06 November 2009

Volume 24, pages 1000–1009, (2009)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Huaifeng Zhang¹,
Yanchang Zhao²,
Longbing Cao²,
Chengqi Zhang² &
…
Hans Bohlscheid¹

108 Accesses
7 Citations
Explore all metrics

Abstract

From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective contrast sequential pattern mining approach to taxpayer behavior analysis

Article 03 June 2015

An Effective Approach for Mining Weighted Sequential Patterns

Overview on Sequential Mining Algorithms and Their Extensions

References

Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Trans. Speech and Audio Signal Processing, May 1997, 5(3): 257–265.
Article Google Scholar
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C. Text classification using string kernels. Journal of Machine Learning Research, 2002, 2: 419–444.
Article MATH Google Scholar
Baker L D, McCallum A K. Distributional clustering of words for text classification. In Proc. the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 24–28, 1998, pp.96–103.
Wu C, Berry M, Shivakumar S, McLarty J. Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning, October, 1995, 21(1/2): 177–193.
Article Google Scholar
Chuzhanova N A, Jones A J, Margetts S. Feature selection for genetic sequence classification. Bioinformatics, 1998, 14(2): 139–143.
Article Google Scholar
She R, Chen F, Wang K, Ester M, Gardy J L, Brinkman F S L. Frequent-subsequence-based prediction of outer membrane proteins. In Proc. the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2003), Washington DC, USA, August 24–27, 2003, pp.436–445.
Sonnenburg S, R¨atsch G, Sch¨afer C. Learning interpretable SVMs for biological sequence classification. In Proc. Research in Computational Molecular Biology (RECOMB2005), Cambridge, USA, May 14–18, 2005, pp.389–407.
Hakeem A, Sheikh Y, Shah M. CASE^E: A hierarchical event representation for the analysis of videos. In Proc. the Nineteenth National Conference on Artificial Intelligence (AAAI2004), San Jose, USA., July 25–29, 2004, pp.263–268.
Eichinger F, Nauck D D, Klawonn F. Sequence mining for customer behaviour predictions in telecommunications. In Proc. the Workshop on Practical Data Mining at ECML/PKDD, Berlin, Germany, September 18–22, 2006, pp.3–10.
Centrelink Annual Report 2007-2008. Technical Report, Centrelink, 2008.
Lesh N, Zaki M J, Ogihara M. Mining features for sequence classification. In Proc. the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, August 15–18, 1999, pp.342–346.
Tseng V S M, Lee C-H. CBS: A new classification method by using sequential patterns. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.596–600.
Xing Z, Pei J, Dong G, Yu P S. Mining sequence classifiers for early prediction. In Proc. SIAM International Conference on Data Mining (SDM2008), Atlanta, USA, April 24–26, 2008, pp.644–655.
Exarchos T P, Tsipouras M G, Papaloukas C, Fotiadis D I. A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data & Knowledge Engineering, September 2008, 66(3): 467–487.
Article Google Scholar
Agrawal R, Srikant R. Mining sequential patterns. In Proc. the Eleventh IEEE International Conference on Data Engineering (ICDE 1995), Taipei, China, March 6–10, 1995, pp.3–14.
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 17th IEEE International Conference on Data Engineering (ICDE 2001), Heidelberg, Germany, April 2–6, 2001, pp.215–224.
Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.429–435.
Yan X, Han J, Afshar R. Clospan: Mining closed sequential patterns in large datasets. In Proc. SIAM International Conference on Data Mining (SDM2003), San Francisco, USA, May 1–3, 2003, pp.166–177.
Zaki M J. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 2001, 42(1/2): 31–60.
Article MATH Google Scholar
Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In Proc. the 4th ACM International Conference on Knowledge Discovery and Data Mining (KDD1998), Menlo Park, USA, August 27–31, 1998, pp.80–86.
Li W, Han J, Pei J. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. the First IEEE International Conference on Data Mining (ICDM2001), Los Alamitos, USA, Nov. 29–Dec.2, 2001, pp.369–376.
Cheng H, Yan X, Han J, Hsu C-W. Discriminative frequent pattern analysis for effective classification. In Proc. 23rd IEEE International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 17–20, 2007, pp.716–725.
Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, August 2007, 15(1): 55–86.
Article MathSciNet Google Scholar
Verhein F, Chawla S. Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In Proc. the Seventh IEEE International Conference on Data Mining (ICDM2007), Omaha, USA, Oct. 28–31, 2007, pp.679–684.
Antonie M L, Zaiane O R, Holte R C. Learning to use a learned model: A two-stage approach to classification. In Proc. the Sixth International Conference on Data Mining (ICDM2006), Hong Kong, China, Dec. 18–22, 2006, pp.33–42.
Baralis E, Garza P. A lazy approach to pruning classification rules. In Proc. the Second IEEE International Conference on Data Mining (ICDM2002), Maebashi City, Japan, Dec. 9–12, 2002, pp.35–42.
Wang J, Karypis G. Harmony: Efficiently mining the best rules for classification. In Proc. SIAM International Conference on Data Mining (SDM2005), Newport Beach, USA, April 21–23, 2005, pp.205–216.
Cheng H, Yan X, Han J, Yu P S. Direct discriminative pattern mining for effective classification. In Proc. the 24th IEEE International Conference on Data Engineering (ICDE 2008), Cancun, Mexico, April 7–12, 2008, pp.169–178.
Tan P-N, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In Proc. the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada, July 23–26, 2002, pp.32–41.

Download references

Author information

Authors and Affiliations

Payment Reviews Branch, Business Integrity Division, Centrelink, Canberra, Australia
Huaifeng Zhang (Member, IEEE) & Hans Bohlscheid
Centre for Quantum Computation and Intelligent Systems (QCIS), University of Technology, Sydney, Australia
Yanchang Zhao (Member, IEEE), Longbing Cao (Senior Member, IEEE) & Chengqi Zhang (Senior Member, IEEE)

Authors

Huaifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanchang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Longbing Cao
View author publications
You can also search for this author in PubMed Google Scholar
Chengqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hans Bohlscheid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaifeng Zhang.

Additional information

This work is supported by Australian Research Council Linkage Project under Grant No. LP0775041 and the Early Career Researcher Grant under Grant No. 2007002448 from University of Technology, Sydney, Australia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Zhao, Y., Cao, L. et al. Customer Activity Sequence Classification for Debt Prevention in Social Security. J. Comput. Sci. Technol. 24, 1000–1009 (2009). https://doi.org/10.1007/s11390-009-9288-2

Download citation

Received: 28 February 2009
Revised: 16 July 2009
Published: 06 November 2009
Issue Date: November 2009
DOI: https://doi.org/10.1007/s11390-009-9288-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Customer Activity Sequence Classification for Debt Prevention in Social Security

Abstract

Access this article

Similar content being viewed by others

An effective contrast sequential pattern mining approach to taxpayer behavior analysis

An Effective Approach for Mining Weighted Sequential Patterns

Overview on Sequential Mining Algorithms and Their Extensions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Customer Activity Sequence Classification for Debt Prevention in Social Security

Abstract

Access this article

Similar content being viewed by others

An effective contrast sequential pattern mining approach to taxpayer behavior analysis

An Effective Approach for Mining Weighted Sequential Patterns

Overview on Sequential Mining Algorithms and Their Extensions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation