Elsevier

Neurocomputing

Volume 332, 7 March 2019, Pages 137-148
Neurocomputing

A novel Enhanced Collaborative Autoencoder with knowledge distillation for top-N recommender systems

https://doi.org/10.1016/j.neucom.2018.12.025Get rights and content

Abstract

In most recommender systems, the data of user feedbacks are usually represented with a set of discrete values, which are difficult to exactly describe users’ interests. This problem makes it not easy to exactly model users’ latent preferences for recommendation. Intuitively, a basic idea for this issue is to predict continuous values through a trained model to reveal users’ essential feedbacks, and then make use of the generated data to retrain another model to learn users’ preferences. However, since these continuous data are generated by an imperfect model which are trained by discrete data, there exists a lot of noise among the generated data. This problem may have a severe adverse impact on the performance. Towards this problem, we propose a novel Enhanced Collaborative Autoencoder (ECAE) to learn robust information from generated soft data with the technique of knowledge distillation. First, we propose a tightly coupled structure to incorporate the generation and retraining stages into a unified framework. So that the generated data can be fine tuned to reduce the noise by propagating training errors of retraining network. Second, for that each unit of the generated data contains different level of noise, we propose a novel distillation layer to balance the influence of noise and knowledge. Finally, we propose to take both predict results of generation and retraining network into account to make final recommendations for each user. The experimental results on four public datasets for top-N recommendation show that the ECAE model performs better than several state-of-the-art algorithms on metrics of MAP and NDCG.

Introduction

With the rapid development of Internet and E-commerce, information overloading has become a severe problem that makes it difficult to find useful information for users [1], [2]. To address this problem, numerous recommender systems are proposed to make personal recommendations for users to help finding information to feed their requirements [3], [4], [5], [6], [7], [8], [9]. These methods are widely applied in most web-services, such as Tmall, Ciao and Epinions.

Collaborative filtering (CF) is one of the most popular and successful solutions for recommendation [10]. The basic idea of CF is to model users and items based on historical user feedbacks for items. There are a lot of works been proposed to model user feedbacks based on matrix factorization techniques [1], [11], [12], [13], [14], [15]. Although these methods are quite effective for industrial recommender systems, their performance are potentially limited by the linear model of matrix factorization [16].

In recent years, deep neural networks have achieved great success in a lot of domains, such as image recognition [17], [18], object detective [19] and machine translation [20]. Compared with classic machine learning approaches [21], [22], [23], [24], [25], [26], [27], deep neural networks demonstrate much stronger learning ability for pattern recognition tasks [17]. Motivated by the great learning power of deep learning models, it attracts a lot of attention on how to make use of the techniques of deep learning to improve recommender systems [28], [29], [30], [31]. In [28], Sedhain et al. propose the AutoRec model based on Autoencoder network to predict unknown ratings for each user. Wu et.al. [29] further propose to inject a user node into the autoencoder to make predictions for top-N recommender systems. Zuo et al. [30] propose to utilize a deep neural network to learn robust representations from tag information for recommendation. More recently, Li et al. [31] propose to train a deep autoencoder layer by layer through three stages.

In general, it requires a very large amount of data to train an accurate and robust deep neural network, which contains a lot of parameters to fit training data. For example, there are a lot of deep neural networks [17] been successfully trained based on the ImageNet, which is a very large dataset of images. However, the data of user feedbacks in recommender systems are always very facing critical sparse problem, due to that most users prefer to rate a few items. This problem may severely limit the performance of deep neural networks for recommender systems. Therefore, it calls for new methods to mine more robust information from sparse data for recommender systems.

As a matter of fact, the data of user feedbacks collected by most online applications are in format of discrete values. However, it is difficult to exactly reflect users’ interests for items by simply using discrete values. The implicit information and interactions among these latent preferences are difficult to learn in this way. In other words, there exist a lot of noise among the discrete input data. These noises make it not easy to learn robust information from discrete input data.

Towards this problem, there are some works been proposed to leverage generated soft targets as auxiliary information to help learning robust representations [32], [33], [34]. In [32], Hinton et al. propose to utilize the “soft targets” that generated by a well-trained network to transfer knowledge to another new model and achieves competitive performance [32]. After that, Tang et al. develop a novel training method for recurrent neural networks by utilizing the technique of soft targets to prevent overfitting [33]. In [34], Kuchaiev et al. utilize the generated targets as input data to retrain the neural networks for several times. Their experiments demonstrate that these soft targets can be used to improve the performance of representation learning in the ways of regularization or pre-training. These works give a possible solution to mine more implicit information from discrete data of user feedbacks for recommendation.

However, there exists a critical problem for the application of soft targets. Since the trained model is not perfect, the generated soft targets are not exact and very noisy. In [32], Hinton et al. propose to suppress noise of soft targets by using a parameter of temperature to make them softer or else. As they stated in [32], how to balance these two effects by this temperature parameter is still remaining an experimental and challenging question. It’s worth noting that this technique is developed for outputs by ‘softmax’ function, which is not suitable for recommender systems. Therefore, it is necessary to develop a new method to adjust the generated soft targets for retraining.

In this paper, we develop a novel Enhanced Collaborative Denoising Autoencoder (ECAE) model to learn useful knowledge from discrete data of user feedbacks for top-N recommendation. To learn robust representations from soft targets, we propose to fuse these separated training stages of generation and retraining into a unified framework, so that the soft targets generated by generation network can be tuned dynamically along with the training errors of distillation network to reduce noise and improve performance. Moreover, we propose a novel distillation layer to balance the impact of knowledge and noise for each unit of output vector based on the corresponding reliability. Finally, since we get two prediction functions of generation and retraining networks in different perspectives for each user, we propose to ensemble both of them together to achieve a robust result without additional training cost.

In summary, the main contributions in this paper is summarized as follows:

  • To learn robust information from generated soft targets for recommendation, we propose to jointly incorporate the generation and retraining states into a unified framework. In this way, the training errors of retraining network can be propagated to updated the soft targets to reduce noise and remain useful knowledge.

  • To balance the influence of knowledge and noise of generated soft targets, we propose a novel distillation layer for generation network which is developed with multiple positive labels in recommender systems.

  • We propose an enhanced knowledge distillation method based on the reliabilities of output units of distillation layer. We also carefully develop a reasonable mapping function to measure the reliabilities for all units based on the corresponding number of positive labels.

  • To make full use of the prediction results of generation and retraining network, we further propose to take both results of them into account to make robust recommendations for users.

  • We conduct a large number of experiments to compare the proposed ECAE model with state-of-the-art recommender systems. Experimental results on four datasets demonstrate that the ECAE performs significantly better than these comparisons on metrics of MAP and NDCG.

Section snippets

Related work

In this section, we discuss the related works of ECAE model in three perspectives: collaborative filtering, deep learning for recommender systems and knowledge distillation.

Proposed method

The proposed ECAE model is consist of three components: generation network, distillation layer and retraining network. Specially, we utilize the CDAE model [29] to generate soft targets in ECAE model. Followed by this network, we propose a novel distillation layer to balance the knowledge and noise of outputs of CDAE. After that, we utilize a retraining network to learn implicit knowledge from soft targets based on autoencoder structure. Finally, we ensemble both predictions of generation and

Experiments studies

To evaluate the performance of ECAE model, we conduct a lot of experiments to compare ECAE model with several state-of-the-art algorithms and study the influence of each module of ECAE model.

Conclusions

In this paper, we explored the problem of learning user features from soft targets, which are generated by a pretrained model. Specially, we propose a tightly coupled structure to incorporate the generation and retraining stages into a unified framework. In this way, the soft targets can be dynamic updated to remain knowledge and reduce noise by propagating training errors of both generation and retraining networks. Moreover, we propose a novel distillation layer to adjust the outputs of

Acknowledgments

This research has been supported by the National Science Foundation of China (Grant No.61472289) and the National Key Research and Development Project (Grant No.2016YFC0106305).

Yiteng Pan is currently a Ph.D. candidate at the School of Computer Science in Wuhan University. His research interests include data mining, image processing and deep learning.

References (53)

  • YanX. et al.

    An efficient particle swarm optimization for large-scale hardware/software co-design system

    Int. J. Cooper. Inf. Syst.

    (2018)
  • Y. Koren

    Factorization meets the neighborhood: a multifaceted collaborative filtering model

    Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2008)
  • WuY. et al.

    Service-oriented feature-based data exchange for cloud-based design and manufacturing

    IEEE Trans. Serv. Comput.

    (2018)
  • HuL. et al.

    Deep modeling of group preferences for group-based recommendation

    Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence

    (2014)
  • QiL. et al.

    Structural balance theory-based E-commerce recommendation over big rating data

    IEEE Trans. Big Data

    (2018)
  • B. Sarwar et al.

    Item-based collaborative filtering recommendation algorithms

    Proceedings of the Tenth International Conference on World Wide Web

    (2001)
  • LuoX. et al.

    An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems

    IEEE Trans. Ind. Inf.

    (2014)
  • LuoX. et al.

    An efficient second-order approach to factorize sparse matrices in recommender systems

    IEEE Trans. Ind. Inf.

    (2015)
  • LuoX. et al.

    A nonnegative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method

    IEEE Trans. Neural Netw. Learn. Syst.

    (2016)
  • LuoX. et al.

    An incremental-and-static-combined scheme for matrix-factorization-based collaborative filtering

    IEEE Trans. Autom. Sci. Eng.

    (2016)
  • LuoX. et al.

    A novel approach to extracting non-negative latent factors from non-negative big sparse matrices

    IEEE Access

    (2016)
  • HeX. et al.

    Neural collaborative filtering

    Proceedings of the Twenty-Sixth International Conference on World Wide Web

    (2017)
  • A. Krizhevsky et al.

    ImageNet classification with deep convolutional neural networks

    Advances in Neural Information Processing Systems

    (2012)
  • HeK. et al.

    Delving deep into rectifiers: surpassing human-level performance on ImageNet classification

    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    (2015)
  • J. Dai, Y. Li, K. He, J. Sun, R-FCN: object detection via region-based fully convolutional networks, arXiv preprint....
  • LiuS. et al.

    A recursive recurrent neural network for statistical machine translation

    Proceedings of the ACL

    (2014)
  • Cited by (0)

    Yiteng Pan is currently a Ph.D. candidate at the School of Computer Science in Wuhan University. His research interests include data mining, image processing and deep learning.

    Fazhi He received Ph.D. degree from Wuhan University of Technology. He was post-doctor researcher in The State Key Laboratory of CAD&CG at Zhejiang University, a visiting researcher in Korea Advanced Institute of Science & Technology and a visiting faculty member in the University of North Carolina at Chapel Hill. Now he is a professor in School of Computer, Wuhan University. His research interests are Computer Graphics, Computer-Aided Design, Image Processing and Computer Supported Cooperative Work.

    Haiping Yu is currently a Ph.D. candidate at the school of computer science in Wuhan University. Her research interests are pattern recognition, image processing and computer graphics.

    View full text