On Mining Proportional Fault-Tolerant Frequent Itemsets

Liu, Shengxin; Poon, Chung Keung

doi:10.1007/978-3-319-05810-8_23

On Mining Proportional Fault-Tolerant Frequent Itemsets

Shengxin Liu²² &
Chung Keung Poon²³

Conference paper

1696 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8421))

Abstract

Mining robust frequent itemsets has attracted much attention due to its wide applications in noisy data. In this paper, we study the problem of mining proportional fault-tolerant frequent itemsets in a large transactional database. A fault-tolerant frequent itemset allows a small amount of errors in each item and each supporting transaction. This problem is challenging since the anti-monotone property does not hold for candidate generation and the problem of fault-tolerant support counting is known to be NP-hard. We propose techniques that substantially speed up the state-of-the-art algorithm for the problem. We also develop an efficient heuristic method to solve an approximation version of the problem. Our experimental results show that the proposed speedup techniques are effective. In addition, our heuristic algorithm is much faster than the exact algorithms while the error is acceptable.

This work was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 122512].

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD 1998, pp. 94–105 (1998)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)
Google Scholar
Besson, J., Pensa, R.G., Robardet, C., Boulicaut, J.-F.: Constraint-based mining of fault-tolerant patterns from boolean data. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 55–71. Springer, Heidelberg (2006)
Chapter Google Scholar
Cheng, H., Yu, P.S., Han, J.: Approximate frequent itemset mining in the presence of random noise. In: Soft Computing for Knowledge Discovery and Data Mining, pp. 363–389 (2008)
Google Scholar
Cong, G., Tung, K., Anthony, Xu, X., Pan, F., Yang, J.: FARMER: finding interesting rule groups in microarray datasets. In: SIGMOD 2004, pp. 143–154 (2004)
Google Scholar
Dourisboure, Y., Geraci, F., Pellegrini, M.: Extraction and classification of dense implicit communities in the web graph. ACM Trans. Web 3(2), 7:1–7:36 (2009)
Google Scholar
Gupta, R., Fang, G., Field, B., Steinbach, M., Kumar, V.: Quantitative evaluation of approximate frequent pattern mining algorithms. In: KDD 2008, pp. 301–309 (2008)
Google Scholar
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15(1), 55–86 (2007)
Article MathSciNet Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD 2000, pp. 1–12 (2000)
Google Scholar
Koh, J.-L., Yo, P.-W.: An efficient approach for mining fault-tolerant frequent patterns based on bit vector representations. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 568–575. Springer, Heidelberg (2005)
Chapter Google Scholar
Kriegel, H.-P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1:1–1:58 (2009)
Google Scholar
Lee, G., Peng, S.-L., Lin, Y.-T.: Proportional fault-tolerant data mining with applications to bioinformatics. Information Systems Frontiers 11(4), 461–469 (2009)
Article Google Scholar
Liu, J., Paulsen, S., Sun, X., Wang, W., Nobel, A., Prins, J.: Mining approximate frequent itemsets in the presence of noise: algorithm and analysis. In: SDM 2006, pp. 405–416 (2006)
Google Scholar
Pei, J., Tung, A.K.H., Han, J.: Fault-tolerant frequent pattern mining: Problems and challenges. In: DMKD 2001, pp. 7–12 (2001)
Google Scholar
Poernomo, A.K., Gopalkrishnan, V.: Mining statistical information of frequent fault-tolerant patterns in transactional databases. In: ICDM 2007, pp. 272–281 (2007)
Google Scholar
Poernomo, A.K., Gopalkrishnan, V.: Towards efficient mining of proportional fault-tolerant frequent itemsets. In: KDD 2009, pp. 697–706 (2009)
Google Scholar
Seppänen, J.K., Mannila, H.: Dense itemsets. In: KDD 2004, pp. 683–688 (2004)
Google Scholar
Sim, K., Li, J., Gopalkrishnan, V., Liu, G.: Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In: ICDM 2006, pp. 1059–1063 (2006)
Google Scholar
Wang, X., Borgelt, C., Kruse, R.: Fuzzy frequent pattern discovering based on recursive elimination. In: ICMLA 2005, pp. 391–396 (2005)
Google Scholar
Wang, S.-S., Lee, S.-Y.: Mining fault-tolerant frequent patterns in large databases. In: International Computer Symposium 2002 (2002)
Google Scholar
Yang, C., Fayyad, U., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: KDD 2001, pp. 194–203 (2001)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD 1997, pp. 283–286 (1997)
Google Scholar
Zeng, J.-J., Lee, G., Lee, C.-C.: Mining fault-tolerant frequent patterns efficiently with powerful pruning. In: SAC 2008, pp. 927–931 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong, China
Shengxin Liu
Department of Computer Science and Center for Excellence, Caritas Institute of Higher Education, Hong Kong, China
Chung Keung Poon

Authors

Shengxin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chung Keung Poon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore, Singapore
Sourav S. Bhowmick
Department of Computer Science, Utah State University, Old Main Hill, 4205, 84322-4205, Logan, UT, USA
Curtis E. Dyreson
Department of Computer Science, Aalborg University, Selma Lagerløfs Vej 300, 9220, Aalborg Øst, Denmark
Christian S. Jensen
Department of Computer Science, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
Mong Li Lee
Department of Computer Science, Udayana University, Jl. Kampus Unud Jimbaran Bali, 80364, Badung, Bali, Indonesia
Agus Muliantara
Information Systems Engineering, Christian-Albrechts-Universität zu Kiel, Olshausenstrasse 40, 24098, Kiel, Germany
Bernhard Thalheim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, S., Poon, C.K. (2014). On Mining Proportional Fault-Tolerant Frequent Itemsets. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-05810-8_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05809-2
Online ISBN: 978-3-319-05810-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics