Skip to main content
Log in

Correlation range query for effective recommendations

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Efficient correlation computation has been an active research area of data mining. Given a large dataset and a specified query item, we are interested in finding items in the dataset that are within certain range of correlation with the query item. Such a problem, known as the correlation range query, has been a common task in many application domains. In this paper, we identify piecewise monotone properties of the upper and lower bounds of the ϕ coefficient, and propose an efficient correlation range query algorithm, called CORAQ. The CORAQ algorithm effectively prunes many items without computing their actual correlation coefficients with the query item. CORAQ also attains completeness and correctness of the query results. Experiments with large benchmark datasets show that this algorithm is much faster than its brute-force alternative and scales well with large datasets. As case studies, real-world datasets from recommendation applications are analyzed to demonstrate that CORAQ can help effectively identify interesting items to recommend to users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)

  2. Bell, R.M., Koren, Y.: Improved neighborhood-based collaborative filtering. In: Proceedings of KDD-Cup Workshop at the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)

  3. Bennett, J., Lanning, S.: The netflix prize. In: Proceedings of KDD-Cup Workshop at the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)

  4. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: ACM SIGMOD Record, vol. 26, pp. 265–276. ACM (1997)

  5. Feng, J., Bhargava, H.K., Pennock, D.M.: Implementing sponsored search in web search engines: computational evaluation of alternative mechanisms. INFORMS J. Comput. 19(1), 137–148 (2007)

    Article  Google Scholar 

  6. Hillard, D., Manavoglu, E., Raghavan, H., Leggetter, C., Cantú-Paz, E., Iyer, R.: The sum of its parts: reducing sparsity in click estimation with query segments. Inf. Retr. 14(3), 315–336 (2011)

    Article  Google Scholar 

  7. Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: ACM SIGMOD International Conference on Management of Data, pp. 647–658 (2004)

  8. Jermaine, C.: The computational complexity of high-dimensional correlation search. In: Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 249–256. IEEE (2001)

  9. Jermaine, C.: Playing hide-and-seek with correlations. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 559–564. ACM (2003)

  10. Kamakura, W.A., Wedel, M., de Rosa, F., Mazzon, J.A.: Cross-selling through database marketing: a mixed data factor analyzer for data augmentation and prediction. Int. J. Res. Mark. 20, 45–65 (2003)

    Article  Google Scholar 

  11. Li, W.: Random texts exhibit zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 38(6), 1842–1845 (1992)

    Article  Google Scholar 

  12. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci 96(8), 4285–4288 (1999)

    Article  Google Scholar 

  13. Seller, M., Gray, P.: A survey of database marketing. Tech. rep., I.T. in Business, Center for Research on Information Technology and Organizations, UC Irvine (1999)

  14. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)

  15. Xiong, H., Shekhar, S., Ning Tan, P., Kumar, V.: Exploiting a support-based upper bound of pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 334–343 (2004)

  16. Xiong, H., He, X., Ding, C., Zhang, Y., Kumar, V., Holbrook, S.R.: Identification of functional modules in protein complexes via hyperclique pattern discovery. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 221–232 (2005)

  17. Xiong, H., Brodie, M., Ma, S.: TOP-COP: mining top-k strongly correlated pairs in large databases. In: Proceedings of the 2006 IEEE International Conference on Data Mining, pp. 1162–1166 (2006)

  18. Xiong, H., Shekhar, S., Tan, P.N., Kumar, V.: TAPER: a two-step approach for all-strong-pairs correlation query in large databases. IEEE Trans. Knowl. Data Eng. 18(4), 493–508 (2006)

    Article  Google Scholar 

  19. Xiong, H., Zhou, W., Brodie, M., Ma, S.: Top-k ϕ correlation computation. INFORMS J. Comput. 20(4), 539–552 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  20. Zhang, L., Tang, L., Luo, P., Chen, E., Jiao, L., Wang, M., Liu, G.: Harnessing the wisdom of the crowds for accurate web page clipping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 570–578 (2012)

  21. Zhou, W., Xiong, H.: Volatile correlation computation: a checkpoint view. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 848–856 (2008)

  22. Zhou, W., Xiong, H.: Checkpoint evolution for volatile correlation computing. Mach. Learn. 83(1), 103–131 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  23. Zhou, W., Zhang, H.: Correlation range query. In: Proceedings of the 14th International Conference on Web-Age Information Management (WAIM 2013), pp. 490–501 (2013)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjun Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, W., Zhang, H. Correlation range query for effective recommendations. World Wide Web 18, 709–729 (2015). https://doi.org/10.1007/s11280-013-0265-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0265-x

Keywords

Navigation