Elsevier

Knowledge-Based Systems

Volume 200, 20 July 2020, 106030
Knowledge-Based Systems

Restricted Boltzmann Machine-driven Interactive Estimation of Distribution Algorithm for personalized search

https://doi.org/10.1016/j.knosys.2020.106030Get rights and content

Abstract

Effective and efficient personalized search is one of the most pursued objectives in the era of big data. The challenge of this problem lies in its complex quantifying evaluations and dynamic user preferences. A user-involved interactive evolutionary algorithm is a good choice if it has reliable preference surrogate and powerful evolutionary strategies. A Restricted Boltzmann Machine (RBM) assisted Interactive Estimation of Distribution Algorithm (IEDA) is presented to enhance the IEDA in solving the personalized search. Specifically, a dual-RBM module is developed to simultaneously provide a preference surrogate and a probability model for conducting the individual selection and generation of the IEDA. Firstly, the positive and negative preferences of the currently involved user in IEDA are distinguished and combined to achieve a dual-RBM, and then the weighted energy functions of the RBM model together with social group information from users with similar preferences are designed as the preference surrogate. The probability of the trained positive RBM on the visible units is fetched as the reproduction model of EDA since it reflects the attribute distributions of more preferred items. Some benchmarks from the Movielens and Amazon datasets are applied to experimentally demonstrate the superiority of the proposed algorithm in improving the efficiency and effectiveness of the interactive evolutionary computations served personalized search.

Introduction

The task of personalized search is to find items that meet a user’s specific (can be changeable) preferences or requirements, therefore, its nature is an optimization problem. Evolutionary algorithms (EAs) will be effective on solving this problem supposing the user’s preferences or intentions can be explicitly expressed with accurate mathematical models. Unfortunately, such an assumption is hard to be satisfied even if the user’s preference is very certain and clear, not to mention the changeable scenarios.

The optimized objective of personalized search is based on users’ qualitative evaluation, comparison and decisions with their experiential knowledge and preferences, i.e., it is subjective, variable and fuzzy compared with traditional mathematically defined objectives. Accordingly, traditional optimization methods as well as various successful nature-inspired EAs for explicitly defined mathematical functions are no longer applicable. It is of practical significance to develop suitable EAs to effectively solve personalized search problems.

In the family of EAs, interactive evolutionary computations (IECs) are powerful for optimizing problems with qualitative objectives and expected to be effective for the personalized search [1], [2], [3]. In the past decades, fruitful studies on IECs have been devoted to alleviate users’ evaluation burdens in the evolutionary process, especially for complicated optimization tasks. The corresponding work can be classified into three groups: (1) Designing friendly interfaces, e.g., changing continuous evaluation mode to a discrete or fuzzy number ones [2], [4]; (2) enhancing evolutionary operators to accelerate the evolution process, e.g., Chen et al. [5] presented a Bayesian model based IEC to effectively reduce initial decision space according to the historical search; (3) developing surrogate or learning assisted IECs to quantitatively approximate the preference or evaluation of a given user on a candidate, i.e., in such IECs, the fitness function of the qualitative objective is estimated to drive the evolutionary operations as traditional ones [4], [5], [6]. We here try to effectively solve the personalized search with surrogate-assisted IECs since they have been successfully applied to some complex design and multi-objective decision problems.

Surrogate assisted IECs are similar to that of EAs. A user is required to first evaluate some individuals along with the evolutionary search, and these individuals together with the evaluated scores are used to train or build a model to approximate the user’s preferences. Then, the model is applied as a fitness surrogate in the subsequent evolution process, and the user only needs to revise few wrongly evaluated estimations by the surrogate. The model will be managed or updated when the user finds that the estimation is far from his/her preferences. Clearly, the surrogate building, including data collection, model selection and training, is critical for developing a reliable surrogate assisted IEC [7], [8], [9]. Model selection and training have been greatly attracted in various applications. Sun et al. [1] presented a semi-supervised learning based surrogate when the training data of interactive genetic algorithms are difficult to be sufficiently collected in handling complicated design problems. Pan et al. [8] proposed a classification-based surrogate to improve interactive decisions when using many-objective EA for numerically defined expensive optimization problems. Integrating parallel computing with surrogate-based EAs, Akinsolu et al. [10] proposed a parallel surrogate assisted algorithm to enhance the mutation operators of differential evolution for electromagnetic design. We also used probabilistic conditional preference network as a surrogate for personalized book search [11]. As for collecting the training data, only few studies have been developed. Chen et al. [5] presented a Bayesian induced interactive Estimation of Distribution Algorithm (IEDA) for personalized laptop search, in which users’ interactive time is used to construct an RBF-based surrogate model. Tian et al. [12] articulated granularity into a surrogate building to effectively collect the training data with relatively smaller computation cost when solving high-dimensional expensive optimization problems.

These surrogate-assisted EAs are effective on solving quantitatively or qualitatively defined complex problems. They endeavor to construct/manage the surrogate model with supervised or semi-supervised learning methods with evaluated individuals, and then use the model to approximate the individual fitness to perform evolutionary operators. The following three deficiencies of the exiting algorithms can be concluded. (1) A given user must provide initial interactions for constructing a surrogate, no matter by explicit or implicit ways, which inevitably conflicts with the motivation of alleviating user fatigue. To address this problem, unsupervised learning-based surrogates are more helpful and expectable. (2) The relationship among the evolutionary operators and the surrogate has rarely been considered, i.e., the information implicated in the surrogate construction may be valuable to strengthen the performance of the operators. (3) The intrinsic preference features of a user hidden in his/her historical interactions can greatly benefit to accurately reveal the user’s preferences, however, which has not been concerned even such a technology has been well developed and used in personalized recommendation. Therefore, integrating the achievement of user interest model in personalized recommendation into surrogate-assisted IECs will greatly improve the performance of personalized search.

As for using an unsupervised learning model to construct a surrogate and further capturing the relationship among the evolutionary operators and the surrogate, we presented a Restricted Boltzmann Machine (RBM)-based Estimation of Distribution Algorithm (EDA) for complex numerical problems [13]. In this algorithm, EDA is first performed for some generations on real problems to obtain the training data, and then RBM is trained with those better individuals (without using specific fitness values). Both the probability model of EDA and the fitness function are simultaneously fetched from the trained RBM, i.e., the joint probability of the visible layer in RBM is calculated as the probability model in EDA for population reproduction, and the energy function of the RBM is used to estimate the individual fitness of the optimized complex problem. Experimental results demonstrate its superior in effectively reducing computational complexity and improving the accuracy of fitness estimation. Inspired by these results, we here further study an unsupervised RBM surrogate-assisted IEC for personalized search since it figures out the shortages in existing surrogate-based ECs.

With regard to extract the intrinsic features of a user’s preference, many interest models used in personalized recommendation will provide valuable references [14], [15], e.g., Bayesian model [16], [17], Factorization Machine [18], [19], Multilayer Perceptron [20], [21], RBM [22], [23], Autoencoder model [24], [25], Convolutional Neural Network (CNN) [26], [27]. Rendle et al. [16] presented a Bayesian learning method for personalized ranking by maximizing the posterior estimator. In this method, the training data are grouped into evaluated items and unrated ones as positive and negative information. Cheng et al. [20] proposed a Wide & Deep learning based interest model by jointly training wide linear models and deep neural networks (DNN) to combine the benefits of memorization and generalization for recommendation. Kim et al. [26] presented a novel context-aware recommendation model, called as Convolutional Matrix Factorization (ConvMF), which makes full use of the positive and negative preferences to combine CNN with probabilistic matrix factorization for improving the prediction accuracy. Zhou et al. [28] proposed an attention-based user behavior modeling framework, which effectively integrates all of users’ historical interactive behaviors. However, these models have not been well combined with the IEC process to further effectively improve the personalized evolutionary search.

Motivated by our previous work, we here expand it to interactive personalized search and present a dual-RBM-assisted IEDA by articulating interest model construction with historical interactive behaviors. The RBM-based surrogate in [13] is first enhanced by modifying it into a dual-module one according to the grouped historical information for precisely extracting the user preference features. After the dual-module RBM is trained, the probability model of EDA will be constructed using the critical features of the positive RBM. The surrogate for estimating the fitness of the searched items is ultimately obtained by using the energy functions of the RBMs. The probability and surrogate models will be applied in IEDA to effectively find the satisfied TopN items for the current user. Adequate experiments on typical real-world datasets demonstrate that the proposed algorithm can effectively not only enhance the performance of the personalized search but also alleviate users’ evaluation burdens to improve user experiences in the searching process.

Accordingly, the main contributions of our work are as follows: (1) A dual-RBM module is presented by constructing two related RBM models, i.e., positive and negative ones. These two models are trained with dominant and inferior items evaluated by the current user to accurately track the user preference features. The module is then used to define the probability model and fitness surrogate for EDA; (2) the reproduction probability model of EDA for generating more preferred individuals is defined based on the probability of the visible layer in the positive RBM model by sufficiently using the positive preference features and effectively impairing the impacts of the negative ones; (3) the fitness surrogate is obtained by not only weighting the energy functions of the positive and negative RBM models but also social group knowledge.

The remainder of the paper is organized as follows. Section 2 introduces the notations of our study and the related preliminary work. The proposed dual RBM-driven IEDA is addressed in detail in Section 3. Section 4 presents the comparative experiments and corresponding experimental analysis. The conclusion is finally followed.

Section snippets

Notation of personalized search

Personalized search is a searching process in which a user finds out the satisfied items according to his/her interests and preferences. It can be described as a combinatorial optimization problem with qualitative index. An item (solution) with n attributes (decision variables) is expressed as x=x1,x2,,xn, and the objective function fu(x) of a user u in the personalized search can be formally expressed as: fu(x)s.t.xGwhere fu(x) represents the preference of user u on the item x and often

Framework

As aforementioned, RBM has been successfully used in approximating users’ preferences for recommendation, but not applied in IECs for personalized search. Here, we propose a dual RBM-driven IEDA with social knowledge (shorted as SC_DRBMIEDA) for the personalized search. The framework of the proposed algorithm is presented in Fig. 2.

The SC_DRBMIEDA algorithm consists of three main contents:

(1) Construction of RBM

For getting a more reliable RBM to enhance the performance of IEDA, we here utilize

Experimental settings

Two kinds of typical datasets used in personalized recommendation are employed here to objectively demonstrate the performance of the proposed algorithms. The MovieLens datasets [33], i.e., MovieLens-latest-small (ML-l-s) and Amazon datasets [34], i.e., Digital_Music (Music), Apps_for_Android (Apps) and Movies_and_TV (Movies) are selected as the benchmark tasks. The statistical information of the datasets is shown in Table 1.

In the experiments, we run Python 3.6 on a computer with an AMD Ryzen

Conclusions

Personalized search is an optimization problem from the viewpoint of finding users’ satisfied items, and few ECs have been developed to solve such problems. Motivated by the researches of the user interest model in recommender system and the surrogate model in IECs, we present an enhanced RBM assisted IEDA by integrating social knowledge with a dual-RBM user preference model, in which the energy functions of RBMs are designed as a user’s preference surrogate to approximate the individual

CRediT authorship contribution statement

Lin Bao: Methodology, Software, Validation, Writing-original draft preparation. Xiaoyan Sun: Conceptualization, Funding acquisition, Resources, Writing-reviewing and editing. Yang Chen: Investigation. Dunwei Gong: Formal analysis, Project administration, Supervision. Yongwei Zhang: Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was jointly supported by the National Natural Science Foundation of China under grants No. 61876184 and No. 61473298. We also thank the anonymous reviewers for their valuable suggestions for helping improve the quality of this manuscript.

References (38)

  • SunX. et al.

    A new surrogate-assisted interactive genetic algorithm with weighted semisupervised learning

    IEEE Trans. Cybern.

    (2013)
  • TianJ. et al.

    Multiobjective infill criterion driven gaussian process-assisted particle swarm optimization of high-dimensional expensive problems

    IEEE Trans. Evol. Comput.

    (2018)
  • PanL. et al.

    A classification-based surrogate-assisted evolutionary algorithm for expensive many-objective optimization

    IEEE Trans. Evol. Comput.

    (2018)
  • TianY. et al.

    A surrogate-assisted multiobjective evolutionary algorithm for large-scale task-oriented pattern mining

    IEEE Trans. Emerg. Top. Comput. Intell.

    (2018)
  • AkinsoluM.O. et al.

    A parallel surrogate model assisted evolutionary algorithm for electromagnetic design optimization

    IEEE Trans. Emerg. Top. Comput. Intell.

    (2019)
  • BaoL. et al.

    Restricted boltzmann machine-assisted estimation of distribution algorithm for complex problems

    Complexity

    (2018)
  • RendleS. et al.

    Bpr: bayesian personalized ranking from implicit feedback

  • WangX. et al.

    Cmbpr: category-aided multi-channel bayesian personalized ranking for short video recommendation

    IEEE Access

    (2019)
  • RendleS.

    Factorization machines with libfm

    ACM Trans. Intell. Syst. Technol. (TIST)

    (2012)
  • Cited by (13)

    • Multi-local Collaborative AutoEncoder

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Because of its excellent representation learning capability, it is getting more and more attention for deep clustering [9] on the image data and classification [10] on the medical data. Restricted Boltzmann Machines (RBMs) and relevant autoencoders have been proved to be provided with the capability of representation learning [11–20]. In our previous work [17], we also proposed a powerful variant of GRBM called pcGRBM for semi-supervised representation learning.

    • Constrained evolutionary optimization based on reinforcement learning using the objective function and constraints

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Therefore, how to use intelligent computing methods to solve COPs has become a hot topic of discussion for researchers and practitioners [7]. Evolutionary algorithms (EAs), as meta-heuristic algorithms based on the population, are widely used to solve COPs [8–10], because of their robustness and wide applicability. Among various EAs [11,12], differential evolution (DE) has attracted attention due to its efficient and powerful performance.

    • Forecasting energy generation in large photovoltaic plants using radial belief neural network

      2021, Sustainable Computing: Informatics and Systems
      Citation Excerpt :

      Several models in the study and numeric simulation were developed for the calculation of global solar radiation data, insolation and daily cleanliness index on different scales. The existing methods [1–15,17–26,29,30] at times falls with inaccurate forecast due to increased parameters and that causes higher prediction error. Usually, these models encounter into various other problems like missing data, inaccurate forecast on long run, prediction of data based on a specific location with inaccurate measurement devices.

    View all citing articles on Scopus
    View full text