Abstract
Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.
Similar content being viewed by others
References
Agichtein, E., Zheng, Z.: Identifying best bet web search results by mining past user behavior. In: KDD 2006, 902–908. ACM (2006)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)
Attenberg, J., Pandey, S., Suel, T.: Modeling and predicting user behavior in sponsored search. In: KDD 2009, pp. 1067–1076, ACM. (2009)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential Pattern Mining Using a Bitmap representation. In: KDD 2002, pp. 429–435 (2002)
Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. Prin Data Min. Knowl. Disc. 2431, 187–208 (2002)
Bayardo, R.J.: Efficiently Mining Long Patterns from Databases. SIGMOD (1998)
Chan, S., Kao, B., Yip, C., Tang, M.: Mining emerging substrings. In: DASFAA 2003, pp. 119–126 (2003)
Cao, L.: Behavior informatics and analytics: Let behavior talk. In: ICDM 2008 Workshops, pp. 87–96 (2008)
Cao, L., Zhang, H., Zhao, Y., Luo, D., Zhang, C.: Combined mining: Discovering informative knowledge in complex data. IEEE Trans. Syst. Man. Cybern. B. Cybern. 41(3), 699–712 (2011)
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: KDD 1999, pp. 43–52 (1999)
Dong, G., Li, J., Zhang, X.: Discovering Jumping Emerging Patterns and Experiments on Real Datasets. (IDC99) (1999)
Dong, G., Zhang, X., Wong, L., Caep, J.Li.: Classification by aggregating emerging patterns. In: Discovery Science, vol. 1721, pp. 737–737 (1999)
Fan, H., Ramamohanarao, K.: Efficiently mining interesting emerging patterns. In: WAIM2003, pp. 189–201 (2003)
Fan, H., Ramamohanarao, K.: Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. TKDE 18(6), 721–737 (2006)
Han, J., Pei, J., mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.-C.: Freespan: Frequent Pattern-projected Sequential Pattern Mining. In: KDD, pp. 355–359 (2000)
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11, 259–286 (2007)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001, pp. 369–376 (2001)
Loekito, E., Bailey, J.: Fast mining of high dimensional expressive contrast patterns using binary decision diagrams. In: SIGKDD 2006, pp. 307–316 (2006)
Mannila, H., Toivonen, H.: Levelwise Search and Borders of Theories in Knowledge Discovery. Data Min. Knowl. Disc. 1(3), 41 (1997)
Mozer, M., Wolniewicz, R., Grimes, D., Johnson, E., Kaushansky, H.: Predicting subscriber dissatisfaction and improving retention in the wireless telecommunica- tions industry. IEEE Trans. Neural Netw. 11(3), 690–696 (2000)
Pasquier, N., Bastide, R., Taouil, R., Lakhal, L.: Efficient Mining of Association Rules using Closed Itemset Lattices. Information Systems 24(1) (1999)
Pei, J., Han, J., Asl, M.B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: ICDE, pp. 215–226 (2001)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Los Altos (1993)
Ramamohanarao, K., Bailey, J.: Emerging patterns: mining and applications. In: ICISIP 2004, pp. 409–414 (2004)
Wang, X., Duan, L., Dong, G., Yu, Z., Tang, C.: Efficient Mining of Density-Aware Distinguishing Sequential Patterns with Gap Constraints. DASFAA 372–387 (2014)
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequence. Mach. Learn. 42, 31–60 (2001)
Zhao, Y., Zhang, H., Cao, L., Zhang, C., Bohlscheid, H.: Combined Pattern Mining: From Learned Rules to Actionable Knowledge. AI 393–403 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, Z., Wei, W., Liu, C. et al. An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web 19, 633–651 (2016). https://doi.org/10.1007/s11280-015-0350-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-015-0350-4