Skip to main content

An Empirical Study of a Large Scale Online Recommendation System

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9461))

Included in the following conference series:

  • 661 Accesses

Abstract

The online recommendation service has a wide range of usages for the various applications of Telecommunication companies. For such applications, the user base is usually tremendous with a variety of user characteristics and habits. Therefore, it is a challenge to achieve the high click through rate (CTR) for the online recommendations. In this paper, we proposed an approach of combining the technologies of ensemble trees and logistic regression (LR). The ensemble trees are effective in capturing the joint information of different features, which are then used by the LR scheme. In addition, to deal with the scalability issues, we implemented our system with both Apache Storm (for real-time prediction and classification) and Apache Spark (for fast off-line model training). A group of experiments were carried out with real-world data sets and the results show the efficiency and effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, X.: Four directions of big data analytics in telecommunication industry. J. Telecommun. Tech. 6 (2013)

    Google Scholar 

  2. Neter, J., Kutner, M.H., Nachtsheim, C.J. Wasserman, W.: Applied linear statistical models. Irwin Chicago, vol. 4 (1996)

    Google Scholar 

  3. Richardson, M., Dominowska, E., Ragno, R.: Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on World Wide Web, pp. 521–530. ACM (2007)

    Google Scholar 

  4. Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In: Proceedings of the 27th International Conference on Machine Learning (ICML- 2010), pp. 13–20 (2010)

    Google Scholar 

  5. Agarwal, D., Agrawal, R., Khanna, R., Kota, N.: Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 213–222. ACM (2010)

    Google Scholar 

  6. Lee, K.C., Orten, B.B., Dasdan, A., Li, W.: Estimating conversion rate in display advertising from past performance data, uS Patent App. 13/584,545, August 2012

    Google Scholar 

  7. Menon, A.K., Chitrapura, K.P., Garg, S., Agarwal, D., Kota, N.: Response prediction using collaborative filtering with hierarchies and side-information. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 141–149. ACM (2011)

    Google Scholar 

  8. Yan, L., Li, W.J., Xue, G.R., Han, D.: Coupled group lasso for web-scale ctr prediction in display advertising. In: Proceedings of the 31st International Conference on Machine Learning (ICML-2014), pp. 802–810 (2014)

    Google Scholar 

  9. Stern, D.H., Herbrich, R., Graepel, T.: Matchbox: large scale online bayesian recommendations. In: Proceedings of the 18th International Conference on World Wide Web, p. 111120. ACM(2009)

    Google Scholar 

  10. He, X., Pan, J., Jin, O., Xu, T., Liu, B., Xu, T., Shi, Y., Atallah, A., Herbrich, R., Bowers S, et al., Practical lessons from predicting clicks on ads at facebook. In: Proceedings of 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1–9. ACM (2014)

    Google Scholar 

  11. Gradient-Boosted Decision Trees, https://spark.apache.org/docs/1.2.1/mllib-ensembles.html#gradient-boosted-trees-gbts

  12. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham J. et al., Storm@ Twitter. In: Proceedings of ACM SIGMOD, pp. 147–156 (2014)

    Google Scholar 

  13. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, pp. 2–2 (2012)

    Google Scholar 

  14. Kreps, J., Narkhede, N., Rao, J. et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of 6th International Workshop on Networking Meets Databases (NetDB), Athens, Greece (2011)

    Google Scholar 

  15. HDFS, http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

  16. Limited-memory BFGS, http://en.wikipedia.org/wiki/Limited-memory_BFGS

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huazheng Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Fu, H., Chen, K., Ding, J. (2015). An Empirical Study of a Large Scale Online Recommendation System. In: Cai, R., Chen, K., Hong, L., Yang, X., Zhang, R., Zou, L. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9461. Springer, Cham. https://doi.org/10.1007/978-3-319-28121-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28121-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28120-9

  • Online ISBN: 978-3-319-28121-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics