Skip to main content
Log in

An effective contrast sequential pattern mining approach to taxpayer behavior analysis

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Agichtein, E., Zheng, Z.: Identifying best bet web search results by mining past user behavior. In: KDD 2006, 902–908. ACM (2006)

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)

  3. Attenberg, J., Pandey, S., Suel, T.: Modeling and predicting user behavior in sponsored search. In: KDD 2009, pp. 1067–1076, ACM. (2009)

  4. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining Using a Bitmap representation. In: KDD 2002, pp. 429–435 (2002)

  5. Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. Prin Data Min. Knowl. Disc. 2431, 187–208 (2002)

    MATH  Google Scholar 

  6. Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. SIGMOD (1998)

  7. Chan, S., Kao, B., Yip, C., Tang, M.: Mining emerging substrings. In: DASFAA 2003, pp. 119–126 (2003)

  8. Cao, L.: Behavior informatics and analytics: Let behavior talk. In: ICDM 2008 Workshops, pp. 87–96 (2008)

  9. Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: Discovering informative knowledge in complex data. IEEE Trans. Syst. Man. Cybern. B. Cybern. 41(3), 699–712 (2011)

    Article  Google Scholar 

  10. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD 1999, pp. 43–52 (1999)

  11. Dong, G., Li, J., Zhang, X.: Discovering Jumping Emerging Patterns and Experiments on Real Datasets. (IDC99) (1999)

  12. Dong, G., Zhang, X., Wong, L., Caep, J.Li.: Classification by aggregating emerging patterns. In: Discovery Science, vol. 1721, pp. 737–737 (1999)

  13. Fan, H., Ramamohanarao, K.: Efficiently mining interesting emerging patterns. In: WAIM2003, pp. 189–201 (2003)

  14. Fan, H., Ramamohanarao, K.: Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. TKDE 18(6), 721–737 (2006)

    Google Scholar 

  15. Han, J., Pei, J., mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: Freespan: Frequent Pattern-projected Sequential Pattern Mining. In: KDD, pp. 355–359 (2000)

  16. Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11, 259–286 (2007)

    Article  Google Scholar 

  17. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001, pp. 369–376 (2001)

  18. Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using binary decision diagrams. In: SIGKDD 2006, pp. 307–316 (2006)

  19. Mannila, H., Toivonen, H.: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Disc. 1(3), 41 (1997)

    Google Scholar 

  20. Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E., Kaushansky, H.: Predicting subscriber dissatisfaction and improving retention in the wireless telecommunica- tions industry. IEEE Trans. Neural Netw. 11(3), 690–696 (2000)

    Article  Google Scholar 

  21. Pasquier, N., Bastide, R., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules using Closed Itemset Lattices. Information Systems 24(1) (1999)

  22. Pei, J., Han, J., Asl, M.B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: ICDE, pp. 215–226 (2001)

  23. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)

    Google Scholar 

  24. Ramamohanarao, K., Bailey, J.: Emerging patterns: mining and applications. In: ICISIP 2004, pp. 409–414 (2004)

  25. Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints. DASFAA 372–387 (2014)

  26. Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequence. Mach. Learn. 42, 31–60 (2001)

    Article  MATH  Google Scholar 

  27. Zhao, Y., Zhang, H., Cao, L., Zhang, C., Bohlscheid, H.: Combined Pattern Mining: From Learned Rules to Actionable Knowledge. AI 393–403 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang Zheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Z., Wei, W., Liu, C. et al. An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web 19, 633–651 (2016). https://doi.org/10.1007/s11280-015-0350-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0350-4

Keywords

Navigation