SILVERBACK+: scalable association mining via fast list intersection for columnar social data

Xie, Yusheng; Chen, Zhengzhang; Palsetia, Diana; Trajcevski, Goce; Agrawal, Ankit; Choudhary, Alok

doi:10.1007/s10115-016-0962-8

SILVERBACK+: scalable association mining via fast list intersection for columnar social data

Regular Paper
Published: 04 July 2016

Volume 50, pages 969–997, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yusheng Xie^1,3,
Zhengzhang Chen²,
Diana Palsetia¹,
Goce Trajcevski¹,
Ankit Agrawal¹ &
…
Alok Choudhary¹

315 Accesses
Explore all metrics

Abstract

We present Silverback+, a scalable probabilistic framework for accurate association rule and frequent item-set mining of large-scale social behavioral data. Silverback+ tackles the problem of efficient storage utilization and management via: (1) probabilistic columnar infrastructure and (2) using Bloom filters and sampling techniques. In addition, probabilistic pruning techniques based on Apriori method are developed, for accelerating the mining of frequent item-sets. The proposed target-driven techniques yield a significant reduction of the size of the frequent item-set candidates, as well as the required number of repetitive membership checks through a novel list intersection algorithm. Extensive experimental evaluations demonstrate the benefits of this context-aware consideration and incorporation of the infrastructure limitations when utilizing the corresponding research techniques. When compared to the traditional Hadoop-based approach for improving scalability by straightforwardly adding more hosts, Silverback+ exhibits a much better runtime performance, with negligible loss of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

B-mine: Frequent Pattern Mining and Its Application to Knowledge Discovery from Social Networks

A Review of Scalable Approaches for Frequent Itemset Mining

Mining Popular Patterns: A Novel Mining Problem and Its Application to Static Transactional Databases and Dynamic Data Streams

Notes

http://www.mongodb.org.

References

Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD’93. ACM, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the VLDB Endow, VLDB’94, pp 487–499
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: SIGMOD’98. ACM, New York, NY, USA, pp 85–93
Bloom BH (1970) Space/time trade-offs in hash coding with allowable errors, vol 13. ACM, New York, pp 422–426
MATH Google Scholar
Cao H, Wolfson O, Trajcevski G (2006) Spatio-temporal data reduction with deterministic error bounds. VLDB J 15(3):211–228
Article Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. In: OSDI’06. USENIX Association, pp 15–15
Chen J, Stallaer J (2014) An economic analysis of online advertising using behavioral targeting. MIS Quarterly 38(2):429–449
Google Scholar
Chung S, Luo C (2003) Parallel mining of maximal frequent itemsets from databases. In: ICTAI’03, pp 134–139
Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman JD, Yang C ( 2001) Finding interesting associations without support pruning, vol 13. IEEE, pp 64–78
Cormode G, Garofalakis MN (2008) Approximate continuous querying over distributed streams. ACM Trans Database Syst 33(2):1–39
Article Google Scholar
Grupcev V, Yuan Y, Tu Y-C, Huang J, Chen S, Pandit S, Weng M (2013) Approximate algorithms for computing spatial distance histograms with accuracy guarantees. IEEE Trans Knowl Data Eng 25(9):1982–1996
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD’00. ACM, pp 1–12
Hofmann T, Buhmann J (1997) Pairwise data clustering by deterministic annealing, vol 19. IEEE, pp 1–14
Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, Jones EPC, Madden S, Stonebraker M, Zhang Y, Hugg J, Abadi DJ (2008) H-store: a high-performance, distributed main memory transaction processing system, vol 1, VLDB Endowment, pp 1496–1499
Kendall M (1938) A new measure of rank correlation, vol 30. Biometrika Trust, pp 81–93
Kimura N, Latifi S (2005) A survey on data compression in wireless sensor networks. In: ITCC (2), pp 8–13
Kumar A, Grupcev V, Yuan Y, Huang J, Tu YC, Shen G (2014) Computing spatial distance histograms for large scientific data sets on-the-fly, vol 26. IEEE, pp 2410–2424
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system, vol 44. ACM, New York, pp 35–40
Google Scholar
Lan B, Ooi BC, Tan K-L (2002) Efficient indexing structures for mining frequent patterns. In: ICDE’02, pp 453–462
Lee J, Bengio S, Kim S, Lebanon G, Singer Y (2014) Local collaborative ranking. In: Proceedings of the 23rd international conference on World Wide Web. In: WWW’14. ACM, New York, NY, USA, pp 85–96
Li H, Wang Y, Zhang D, Zhang M, Chang E (2008) Pfp: parallel fp-growth for query recommendation. In: RecSys’08, pp 107–114
Lin M-Y, Lee P-Y, Hsueh S-C ( 2012) Apriori-based frequent itemset mining algorithms on mapreduce. In: ICUIMC’12
Ozkural E, Aykanat C (2004) A space optimization for FP-growth. In: FIMI
Pu IM (2006) Fundamental data compression. Elsevier, Amsterdam
Google Scholar
Qiu L, Li Y, Wu X (2007) Preserving privacy in association rule mining with Bloom filters. J Intell Inf Syst 29(3):253–278
Article Google Scholar
Sparse matrices (2014) http://docs.scipy.org/doc/scipy/reference/sparse.html
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison Wesley, Reading
Google Scholar
Turrisi R, Jaccard J (2003) Interaction effects in multiple regression, vol 72. Sage, London
Google Scholar
Vitter JS (1985) Random sampling with a reservoir, vol 11. ACM, New York, pp 37–57
MATH Google Scholar
Xie Y, Chen Z, Zhang K, Patwary M, Cheng Y, Liu H, Agrawal A, Choudhary A ( 2013) Graphical modeling of macro behavioral targeting in social networks. In: SDM, pp 740–748
Xie Y, Cheng Y, Honbo D, Zhang K, Agrawal A, Choudhary AN, Gao Y, Gou J (2012) Probabilistic macro behavioral targeting. In: DUBMMSM, pp 7–10
Xie Y, Palsetia D, Trajcevski G, Agrawal A, Choudhary AN (2014) Silverback: scalable association mining for temporal data in columnar probabilistic databases. In: ICDE, pp 1072–1083
Ye Y, Chiang C-C (2006) A parallel apriori algorithm for frequent itemsets mining. In: SERA’06. IEEE, pp 87–94
Zaki MJ (2000) Scalable algorithms for association mining, vol 12. IEEE Educational Activities Department, Piscataway, pp 372–390
Google Scholar
Zaki MJ, Parthasarathy S, Li W (1997) A localized algorithm for parallel association mining. In: SPAA’97, pp 321–330

Download references

Acknowledgments

This work is supported in part by the following Grants: NSF awards CCF-1029166, IIS-1343639, CCF-1409601, CNS-0910952 and III 1213038; DOE awards DE-SC0007456, DE-SC0014330; ONR Grant N00014-14-1-0215.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL, USA
Yusheng Xie, Diana Palsetia, Goce Trajcevski, Ankit Agrawal & Alok Choudhary
NEC Laboratories America, Princeton, NJ, USA
Zhengzhang Chen
Baidu Research, Sunnyvale, CA, USA
Yusheng Xie

Authors

Yusheng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zhengzhang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Diana Palsetia
View author publications
You can also search for this author in PubMed Google Scholar
Goce Trajcevski
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Alok Choudhary
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yusheng Xie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, Y., Chen, Z., Palsetia, D. et al. SILVERBACK+: scalable association mining via fast list intersection for columnar social data. Knowl Inf Syst 50, 969–997 (2017). https://doi.org/10.1007/s10115-016-0962-8

Download citation

Received: 22 December 2014
Revised: 14 April 2016
Accepted: 27 May 2016
Published: 04 July 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10115-016-0962-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SILVERBACK+: scalable association mining via fast list intersection for columnar social data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

B-mine: Frequent Pattern Mining and Its Application to Knowledge Discovery from Social Networks

A Review of Scalable Approaches for Frequent Itemset Mining

Mining Popular Patterns: A Novel Mining Problem and Its Application to Static Transactional Databases and Dynamic Data Streams

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

SILVERBACK+: scalable association mining via fast list intersection for columnar social data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

B-mine: Frequent Pattern Mining and Its Application to Knowledge Discovery from Social Networks

A Review of Scalable Approaches for Frequent Itemset Mining

Mining Popular Patterns: A Novel Mining Problem and Its Application to Static Transactional Databases and Dynamic Data Streams

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation