Abstract
Mining robust frequent itemsets has attracted much attention due to its wide applications in noisy data. In this paper, we study the problem of mining proportional fault-tolerant frequent itemsets in a large transactional database. A fault-tolerant frequent itemset allows a small amount of errors in each item and each supporting transaction. This problem is challenging since the anti-monotone property does not hold for candidate generation and the problem of fault-tolerant support counting is known to be NP-hard. We propose techniques that substantially speed up the state-of-the-art algorithm for the problem. We also develop an efficient heuristic method to solve an approximation version of the problem. Our experimental results show that the proposed speedup techniques are effective. In addition, our heuristic algorithm is much faster than the exact algorithms while the error is acceptable.
This work was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 122512].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD 1998, pp. 94–105 (1998)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)
Besson, J., Pensa, R.G., Robardet, C., Boulicaut, J.-F.: Constraint-based mining of fault-tolerant patterns from boolean data. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 55–71. Springer, Heidelberg (2006)
Cheng, H., Yu, P.S., Han, J.: Approximate frequent itemset mining in the presence of random noise. In: Soft Computing for Knowledge Discovery and Data Mining, pp. 363–389 (2008)
Cong, G., Tung, K., Anthony, Xu, X., Pan, F., Yang, J.: FARMER: finding interesting rule groups in microarray datasets. In: SIGMOD 2004, pp. 143–154 (2004)
Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense implicit communities in the web graph. ACM Trans. Web 3(2), 7:1–7:36 (2009)
Gupta, R., Fang, G., Field, B., Steinbach, M., Kumar, V.: Quantitative evaluation of approximate frequent pattern mining algorithms. In: KDD 2008, pp. 301–309 (2008)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15(1), 55–86 (2007)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000, pp. 1–12 (2000)
Koh, J.-L., Yo, P.-W.: An efficient approach for mining fault-tolerant frequent patterns based on bit vector representations. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 568–575. Springer, Heidelberg (2005)
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1:1–1:58 (2009)
Lee, G., Peng, S.-L., Lin, Y.-T.: Proportional fault-tolerant data mining with applications to bioinformatics. Information Systems Frontiers 11(4), 461–469 (2009)
Liu, J., Paulsen, S., Sun, X., Wang, W., Nobel, A., Prins, J.: Mining approximate frequent itemsets in the presence of noise: algorithm and analysis. In: SDM 2006, pp. 405–416 (2006)
Pei, J., Tung, A.K.H., Han, J.: Fault-tolerant frequent pattern mining: Problems and challenges. In: DMKD 2001, pp. 7–12 (2001)
Poernomo, A.K., Gopalkrishnan, V.: Mining statistical information of frequent fault-tolerant patterns in transactional databases. In: ICDM 2007, pp. 272–281 (2007)
Poernomo, A.K., Gopalkrishnan, V.: Towards efficient mining of proportional fault-tolerant frequent itemsets. In: KDD 2009, pp. 697–706 (2009)
Seppänen, J.K., Mannila, H.: Dense itemsets. In: KDD 2004, pp. 683–688 (2004)
Sim, K., Li, J., Gopalkrishnan, V., Liu, G.: Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In: ICDM 2006, pp. 1059–1063 (2006)
Wang, X., Borgelt, C., Kruse, R.: Fuzzy frequent pattern discovering based on recursive elimination. In: ICMLA 2005, pp. 391–396 (2005)
Wang, S.-S., Lee, S.-Y.: Mining fault-tolerant frequent patterns in large databases. In: International Computer Symposium 2002 (2002)
Yang, C., Fayyad, U., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: KDD 2001, pp. 194–203 (2001)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD 1997, pp. 283–286 (1997)
Zeng, J.-J., Lee, G., Lee, C.-C.: Mining fault-tolerant frequent patterns efficiently with powerful pruning. In: SAC 2008, pp. 927–931 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, S., Poon, C.K. (2014). On Mining Proportional Fault-Tolerant Frequent Itemsets. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-05810-8_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)