Skip to main content
Log in

Collaborative filtering driven by fast semantic feature analysis on Spark

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

Collaborative filtering (CF) is a prevailing technique utilized for recommendation systems and has been comprehensively explored to tackle the problem of information overload particularly in the Big Data context. The traditional CF algorithms are capable to perform adequately under various circumstances, nevertheless, there exist some shortcomings involving cold start and data sparsity. Moreover, a potential breakthrough rests in taking full advantage of any valuable semantic information contained in items. Therefore, for alleviating these defects, in this paper, we propose a two-stage collaborative filtering approach driven by Simhash-based semantic feature analysis, of which the first stage is Simhash-based semantic feature extraction for items and categories, and the second stage is reinforced CF rating prediction driven by intensely compressed category features. The rich semantic features of vast items and their categories can be rapidly extracted and compressed in the first stage by employing the Simhash, with being utilized to promote the traditional collaborative filtering processes. Besides, to solve the problems pertaining to the Big Data context, we design a parallel algorithm on Spark to accelerate the time-consuming process of semantic feature extraction for vast items. Finally, we conduct comprehensive experiments to validate the reinforced CF approach by adopting practical datasets, and the results reveal that compared with the traditional CF algorithms it can accomplish a promising performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Khan, M. M., Ibrahim, R., & Ghani, I. (2017). Cross domain recommender systems: A systematic literature review. ACM Computing Surveys, 50(3), 1–34.

    Article  Google Scholar 

  2. Yang, Z., Bing, W., Zheng, K., Wang, X., & Lei, L. (2017). A survey of collaborative filtering-based recommender systems for mobile internet applications. IEEE Access, 4, 3273–3287.

    Article  Google Scholar 

  3. Huang, H., Yin, H., Min, G., Zhang, J., Yulei, W., & Zhang, X. (2018). Energy-aware dual-path geographic routing to bypass routing holes in wireless sensor networks. IEEE Transactions on Mobile Computing, 17(6), 1339–1352.

    Article  Google Scholar 

  4. Min, G., Yulei, W., & Al-Dubai, A. Y. (2012). Performance modelling and analysis of cognitive mesh networks. IEEE Transactions on Communications, 60(6), 1471–1478.

    Google Scholar 

  5. Zhao, F., Yan, F., Jin, H., Yang, L. T., & Chen, Y. (2017). Personalized mobile searching approach based on combining content-based filtering and collaborative filtering. IEEE Systems Journal, 11(1), 324–332.

    Article  Google Scholar 

  6. Elahi, M., Ricci, F., & Rubens, N. (2016). A survey of active learning in collaborative filtering recommender systems. Computer Science Review, 20, 29–50.

    Article  MathSciNet  Google Scholar 

  7. Huang, H., Yin, H., Min, G., Jiang, H., Zhang, J., & Yulei, W. (2017). Data-driven information plane in software-defined networking. IEEE Communications Magazine, 55(6), 218–224.

    Article  Google Scholar 

  8. Erdt, M., Fernandez, A., & Rensing, C. (2015). Evaluating recommender systems for technology enhanced learning: A quantitative survey. IEEE Transactions on Learning Technologies, 8(4), 326–344.

    Article  Google Scholar 

  9. Lai, C., Giuliani, A., & Semeraro, G. (2017). Information filtering and retrieval. Berlin: Springer.

    Book  Google Scholar 

  10. Yao, L., Sheng, Q. Z., Ngu, A. H. H., Yu, J., & Segev, A. (2015). Unified collaborative and content-based web service recommendation. IEEE Transactions on Services Computing, 8(3), 453–466.

    Article  Google Scholar 

  11. Hong, T.-P., Lin, C.-W., Yang, K.-T., & Wang, S.-L. (2013). Using tf-idf to hide sensitive itemsets. Applied Intelligence, 38(4), 502–510.

    Article  Google Scholar 

  12. Hazimeh, H., & Zhai, C. (2015). Axiomatic analysis of smoothing methods in language models for pseudo-relevance feedback. In Proceedings of the 2015 international conference on the theory of information retrieval (pp. 141–150). ACM.

  13. Mooney, R. J., & Roy, L. (2000). Content-based book recommending using learning for text categorization. In Proceedings of the fifth ACM conference on Digital libraries (pp. 195–204). ACM.

  14. Yang, X., Guo, Y., & Liu, Y. (2013). Bayesian-inference-based recommendation in online social networks. IEEE Transactions on Parallel and Distributed Systems, 24(4), 642–651.

    Article  Google Scholar 

  15. Yang, X., Guo, Y., Liu, Y., & Steck, H. (2014). A survey of collaborative filtering based social recommender systems. Computer Communications, 41, 1–10.

    Article  Google Scholar 

  16. Sahoo, N., Singh, P. V., & Mukhopadhyay, T. (2012). A hidden markov model for collaborative filtering. Management Information Systems Quarterly, 36, 1329–1356.

    Article  Google Scholar 

  17. Wang, J., De Vries, A. P., & Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 501–508). ACM.

  18. Gu, L., Yang, P., & Dong, Y. (2014). An dynamic-weighted collaborative filtering approach to address sparsity and adaptivity issues. In 2014 IEEE Congress on Evolutionary Computation (CEC) (pp. 3044–3050). IEEE.

  19. Wei, J., He, J., Chen, K., Zhou, Y., & Tang, Z. (2017). Collaborative filtering and deep learning based recommendation system for cold start items. Expert Systems with Applications, 69, 29–39.

    Article  Google Scholar 

  20. Lian, J., Zhang, F., Xie, X., & Sun, G. (2017). Cccfnet: A content-boosted collaborative filtering neural network for cross domain recommender systems. In Proceedings of the 26th international conference on World Wide Web companion (pp. 817–818). International World Wide Web Conferences Steering Committee.

  21. Gu, L., Yang, P., & Dong, Y. (2015). SHDC: A fast documents classification method based on Simhash. In International conference on algorithms and architectures for parallel processing (pp. 198–212). Cham: Springer.

  22. Hong, T. P., Lin, C. W., Yang, K. T., & Wang, S. L. (2013). Using tf-idf to hide sensitive itemsets. Applied Intelligence, 38(4), 502–510.

    Article  Google Scholar 

  23. Kulis, B., Jain, P., & Grauman, K. (2009). Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2143–2157.

    Article  Google Scholar 

  24. Costa, G., Manco, G., & Ortale, R. (2010). An incremental clustering scheme for data de-duplication. Data Mining and Knowledge Discovery, 20(1), 152–187.

    Article  MathSciNet  Google Scholar 

  25. Charikar, M. S. (2002). Similarity estimation techniques from rounding algorithms. In Thiry-fourth ACM symposium on theory of computing (pp. 380–388).

  26. Manku, G. S., Jain, A., & Sarma, A. D. (2007). Detecting near-duplicates for web crawling. In International conference on World Wide Web (pp. 141–150).

  27. Yulei, W., Min, G., Li, K., & Javadi, B. (2012). Modeling and analysis of communication networks in multicluster systems under spatio-temporal bursty traffic. IEEE Transactions on Parallel and Distributed Systems, 23(5), 902–912.

    Article  Google Scholar 

  28. Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. In Usenix conference on hot topics in cloud computing (p. 10).

  29. Har-Peled, S., Indyk, P., & Motwani, R. (2012). Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1), 321–350.

    Article  MathSciNet  Google Scholar 

  30. Zhang, W., Yoshida, T., & Tang, X. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, 21(8), 879–886.

    Article  Google Scholar 

  31. Jia, W., Pan, S., Zhu, X., Cai, Z., Zhang, P., & Zhang, C. (2015). Self-adaptive attribute weighting for naive bayes classification. Expert Systems with Applications, 42(3), 1487–1502.

    Article  Google Scholar 

  32. Yang, P., Li, Y., Lv, R., Wu, G., Zhou, Y., et al. (2017). Uniform content label format specification. National standard of People’s Republic of China (GB/T 35304-2017).

Download references

Acknowledgements

We would like to thank Professor Youping Li, the director of Future Network Research Center of SEU, for his enlightening suggestions for enhancing the traditional collaborative filtering via Simhash-based category features. This work is supported by the National Natural Science Foundation of China under Grants No. 61472080, No. 61672155, the Academician Consulting Project of Chinese Academy of Engineering under Grant No. 2018-XY-07, and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, P., Gu, L. & Liu, X. Collaborative filtering driven by fast semantic feature analysis on Spark. Wireless Netw 28, 1321–1334 (2022). https://doi.org/10.1007/s11276-018-01901-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-018-01901-8

Keywords

Navigation