Summarizing transactional databases with overlapped hyperrectangles

Xiang, Yang; Jin, Ruoming; Fuhry, David; Dragan, Feodor F.

doi:10.1007/s10618-010-0203-9

Summarizing transactional databases with overlapped hyperrectangles

Published: 24 October 2010

Volume 23, pages 215–251, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Yang Xiang¹,
Ruoming Jin³,
David Fuhry² &
…
Feodor F. Dragan³

413 Accesses
46 Citations
Explore all metrics

Abstract

Transactional data are ubiquitous. Several methods, including frequent itemset mining and co-clustering, have been proposed to analyze transactional databases. In this work, we propose a new research problem to succinctly summarize transactional databases. Solving this problem requires linking the high level structure of the database to a potentially huge number of frequent itemsets. We formulate this problem as a set covering problem using overlapped hyperrectangles (a concept generally regarded as tile according to some existing papers); we then prove that this problem and its several variations are NP-hard, and we further reveal its relationship with the compact representation of a directed bipartite graph. We develop an approximation algorithm Hyper which can achieve a logarithmic approximation ratio in polynomial time. We propose a pruning strategy that can significantly speed up the processing of our algorithm, and we also propose an efficient algorithm Hyper+ to further summarize the set of hyperrectangles by allowing false positive conditions. Additionally, we show that hyperrectangles generated by our algorithms can be properly visualized. A detailed study using both real and synthetic datasets shows the effectiveness and efficiency of our approaches in summarizing transactional databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Frequent and Homogeneous Closed Itemsets

GridWall: A Novel Condensed Representation of Frequent Itemsets

Transactional Tree Mining

References

Afrati FN, Gionis A, Mannila H (2004) Approximating a collection of frequent sets. In: KDD, pp 12–19
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487–499
Agrawal R, Borgida A, Jagadish HV (1989) Efficient management of transitive relationships in large data, knowledge bases. In: SIGMOD Conference, pp 253–262
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: SIGMOD Conference, pp 207–216
Agrawal A, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. Adv Knowl Discov Data Min 307–308
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD Conference, pp 94–105
Besson J, Robardet C, De Raedt L, Boulicaut J-F (2006) Mining bi-sets in numerical data. In: KDID, pp 11–23
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Discov 7(1): 5–22
Article MathSciNet Google Scholar
Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11): 1490–1504
Article Google Scholar
Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Discov 14(1): 171–206
Article MathSciNet Google Scholar
Chandola V, Kumar V (2007) Summarization—compressing data into an informative representation. Knowl Inf Syst 12(3): 355–378
Article Google Scholar
Chvátal V. (1979) A greedy heuristic for the set-covering problem. Math Oper Res 4: 233–235
Article MATH MathSciNet Google Scholar
Faloutsos C, Megalooikonomou V (2007) On data mining, compression, kolmogorov complexity. Data Min Knowl Discov 15(1): 3–20
Article MathSciNet Google Scholar
Gao Byron J, Ester M (2006) Turning clusters into patterns: rectangle-based discriminative data description. In: ICDM, pp 200–211
Gao Byron J, Ester M, Cai JY, Schulte O, Xiong H (2007) The minimum consistent subset cover problem, its applications in data mining. In: KDD, pp 310–319
Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Discovery science, pp 278–289
Gionis A, Mannila H, Seppänen JK (2004) Geometric, combinatorial tiles in 0-1 data. In: PKDD, pp 173–184
Han J, Kamber M (2006) Data mining: concepts, techniques, second edition. Morgan Kaufmann, San Francisco
Google Scholar
Harper LH (1964) Optimal assignments of numbers to vertices. J Soc Ind Appl Math 12(1): 131–135
Article MATH MathSciNet Google Scholar
Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337): 123–129
Article Google Scholar
Jin R, Xiang Y, Fuhry D, Dragan FF (2008) Overlapping matrix pattern visualization: a hypergraph approach. In: ICDM, pp 313–322 (© IEEE, 2008. doi:10.1109/ICDM.2008.102)
Jin R, Xiang Y, Liu L (2009) Cartesian contour: a concise representation for a collection of frequent sets. In: KDD, pp 417–426
Johnson D, Krishnan S, Chhugani J, Kumar S, Venkatasubramanian S (2004) Compressing large boolean matrices using reordering techniques. In: VLDB, pp 13–23
Kellerer H, Pferschy U, Pisinger D (2004) Knapsack problems. Springer Verlag, New York
MATH Google Scholar
Lakshmanan LVS, Ng RT, Wang CX, Zhou X, Johnson TJ (2002) The generalized mdl approach for summarization. In: VLDB, pp 766–777
Li T (2005) A general model for clustering binary data. In: KDD, pp 188–197
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1): 24–45
Article Google Scholar
Minoux M (1977) Accelerated greedy algorithms for maximizing submodular set functions. In: the 8th IFIP Conference on Optimization Techniques
Mirkin B (1996) Mathematical classification and clustering. Kluwer Academic Publishers, Boston
MATH Google Scholar
Peeters R (2003) The maximum edge biclique problem is np-complete. Discret Appl Math 131(3): 651–654
Article MATH MathSciNet Google Scholar
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5): 570–594
Article Google Scholar
Richardson TJ, Urbanke RL (2008) Modern coding theory. Cambridge University Press, Cambridge
Book MATH Google Scholar
Safro I, Ron D, Brandt A (2006) Graph minimum linear arrangement by multilevel weighted edge contractions. J Algorithms 60(1): 24–41
Article MATH MathSciNet Google Scholar
Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: SDM
Steinbach M, Tan P-N, Kumar V (2004) Support envelopes: a technique for exploring the structure of association patterns. In: KDD, pp 296–305
van Leeuwen M, Vreeken J, Siebes A (2006) Compression picks item sets that matter. In: PKDD, pp 585–592
Vreeken J, van Leeuwen M, Siebes A (2007) Characterising the difference. In: KDD, pp 765–774
Wang J, Karypis G (2006) On efficiently summarizing categorical databases. Knowl Inf Syst 9(1): 19–37
Article Google Scholar
Xiang Y, Jin R, Fuhry D, Dragan FF (2008) Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In: KDD, pp 758–766 (© ACM, 2008. http://doi.acm.org/10.1145/1401890.1401981)
Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: VLDB, pp 709–720

Download references

Author information

Authors and Affiliations

Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
Yang Xiang
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210, USA
David Fuhry
Department of Computer Science, Kent State University, Kent, OH, 44242, USA
Ruoming Jin & Feodor F. Dragan

Authors

Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Ruoming Jin
View author publications
You can also search for this author in PubMed Google Scholar
David Fuhry
View author publications
You can also search for this author in PubMed Google Scholar
Feodor F. Dragan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Xiang.

Additional information

Responsible editor: Bart Goethals.

A preliminary version of this article appeared in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Xiang et al. 2008). This submission has substantially extended the previous paper and contains new and major-value added contribution in comparison with the conference publication.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiang, Y., Jin, R., Fuhry, D. et al. Summarizing transactional databases with overlapped hyperrectangles. Data Min Knowl Disc 23, 215–251 (2011). https://doi.org/10.1007/s10618-010-0203-9

Download citation

Received: 13 January 2009
Accepted: 20 September 2010
Published: 24 October 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10618-010-0203-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Summarizing transactional databases with overlapped hyperrectangles

Abstract

Access this article

Similar content being viewed by others

Mining Frequent and Homogeneous Closed Itemsets

GridWall: A Novel Condensed Representation of Frequent Itemsets

Transactional Tree Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Summarizing transactional databases with overlapped hyperrectangles

Abstract

Access this article

Similar content being viewed by others

Mining Frequent and Homogeneous Closed Itemsets

GridWall: A Novel Condensed Representation of Frequent Itemsets

Transactional Tree Mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation