Keywords

1 Introduction

In modern lives, air transportation has always been one of the most important ways for long-distance travels. The huge travel demand promotes the growth and prosperity of the civil aviation industry. According to a bulletin from the Civil Aviation Administration of China (CAAC), the entire industry of China completed a passenger transportation volume of 551.56 million in 2017 achieving an annual growth rate of 13.0% [3]. Among the huge number of passengers, quite a lot of them book tickets through online agents. Thus there is a need for recommendation services that can understand passengers’ travel demands and give suggestions about flights and airline carriers accordingly.

Inspired by the ubiquitous applications of recommender systems in online retail markets [15] and driven by the potential market and research value, companies and researchers have been devoted to airline customer analysis and service recommendation [2, 4, 5, 14, 17]. However, the personalized travel air route and airline predictions still remain challenging. Reasons behind this are threefold. First, there lack enough profiling features that reflect passengers’ travel demands or preferences. For security and privacy concerns, detailed customer information like demographics, job titles, or social accounts are strictly confidential to researchers, which creates obstacles for accurate passenger profiling. Second, the passenger behavior data is usually sparse. Due to the high prices of flight tickets, traveling by air is usually to fulfill some specific needs like business or vacations rather than a daily way of traveling for most people. So it is also difficult to fully understand passengers’ travel demands from their behavior data. Finally, both the travel frequency and the demands on different routes show a long-tail distribution with respect to the number of passengers, as illustrated in Fig. 1. Therefore, the behavior data of most passengers are submerged by low-frequency travelers and cannot be well modeled.

Fig. 1.
figure 1

Travel behavior analysis on a two-year PNR data. (a) The long-tail distribution of travel frequencies; (b) The long-tail distribution of the demands on routes.

In this paper, to deal with the problems mentioned above, we propose a Joint Weighted Non-negative Matrix Factorization (JWNMF) model to learn latent representations of heterogeneous passengers, air routes and airlines in shared semantic space. Specifically, we first establish a heterogeneous information network (HIN) from the Passenger Name Record (PNR) data. Individual nodes can be extracted through statistical analysis, where each of them represents an instance of passengers, routes, or airlines. And the edges between different nodes describe their interactions. For example, a passenger-route edge describes how many times the passenger has taken flights on that route. In the meanwhile, we also extract some auxiliary attributes to depict nodes’ characteristics from the perspective of travel behaviors. Attributes of a passenger reflect how he travels, while attributes of a route or airline describe what kind of passenger groups tend to take it. On the basis of the air travel HIN, we further devise a joint matrix factorization framework to learn node representations by integrating the heterogeneous interactions and node attributes, which alleviates the data sparsity problem. Finally, to deal with the long-tail distribution of data, we utilize a weighting strategy based on the analysis of the implicit feedback contained in passenger behavior data. The influence of imbalanced edge weights can be solved, and the performances are improved. Heterogeneous recommendations are conducted in the shared latent space.

To summarize, in this paper, we make the following contributions:

  • We analyze the characteristics of PNR data and formulate the air route and airline recommendation problem under a HIN analysis framework that integrates both the interactions and the attributes of different nodes.

  • For information integration purpose, a joint factorization model is proposed to simultaneously learn the latent representations of passengers, routes, and airlines based on the HIN.

  • Based on the analysis of the implicit feedback information, a weighting strategy is also devised to deal with the imbalanced edge weights caused by the long-tail distribution.

  • We conduct experiments to evaluate our proposed framework on the heterogeneous recommendation task with a real-world PNR dataset. Experimental results demonstrate the superiority of our model.

The remainder of this paper is organized as follows. Section 2 highlights related work. In Sect. 3, we formulate the problem and give the technical details of the proposed model. We evaluate the proposed model and analyze the experimental results in Sect. 4. Finally, we conclude our work in Sect. 5.

2 Related Work

With the development of the air travel industry, large quantities of complex and rapidly changing data are being created every second. Air travel data mining has attracted a lot of researchers’ attention [1, 18], and research results have been achieved on hot issues like security and safety [10, 22], intelligent marketing [7, 18], customer choice modeling and relation management [16, 17], and personalized recommendation [4, 5, 14], etc. In this paper, we focus on the personalized recommendation problem.

Inspired by the success of recommender systems in online retailing and other industries [2, 15], recommendations of flights, air routes, airlines, and auxiliary services are studied to improve the service quality and customer satisfaction. Cao et al. proposed a personalized flight recommendation approach based on the maximization of user’s choice utility over flight tickets through a paired-choice analysis of historical orders [5]. To overcome the problem of insufficient historical data, air route recommendation is modeled as a cross-domain recommendation problem in which the cross-domain data is integrated [4]. The combinations of user choice models and recommender systems are also explored for airline itinerary suggestion [17]. Other than flights, routes, or airlines, auxiliary services like in-flight music can also be recommended to enhance user experiences [13]. In this paper, we focus on the fundamental air route and airline company recommendation task and propose a matrix factorization framework.

Matrix factorization models are popular in recommender systems [8, 12, 19, 20], especially the Non-negative Matrix Factorization (NMF) model [11]. Gu et al. proposed a weighted NMF model to incorporate the attributes and relations of users and items into the factorization of user-item rating matrix [8]. Lian et al. incorporated the spatial clustering information of human mobility behavior into the factorization process for POI recommendation [12]. A deep matrix factorization model is also proposed to make use of both explicit ratings and implicit feedback with the help of deep neural networks [20]. And the joint matrix factorization models are popular as they help incorporate various auxiliary information into the factorization process [19, 21].

3 Approach

In this section, we first introduce how the air travel HIN is constructed from PNR datasets, followed by the details of the proposed model. The joint factorization model incorporates both the heterogeneous interactions and attribute information to overcome data sparsity. And the weighting strategy further deals with the imbalanced connection weights caused by the long-tail distribution.

3.1 The Air Travel HIN

The PNR datasets are made up of the flight records of passengers. Each entry usually contains brief passenger information such as ID number, age, and gender, and the flight-specific information such as the air route and the airline company. We focus on learning representations from such PNR datasets. A HIN \(\mathcal {G}=\{\mathcal {V}, \mathcal {E}\}\) is first constructed based on the extracted entities and their relations. We focus on the most important three kinds of entities, i.e., passengers, air routes, and airline companies. Thus we have \(\mathcal {V} = \mathcal {U} \cup \mathcal {R} \cup \mathcal {C}\), where \(\mathcal {U}\), \(\mathcal {R}\), and \(\mathcal {C}\) denote the set of passengers, routes, and airline companies respectively.

Usually, when a passenger needs to take a flight, he has a departure airport and an arrival airport in mind and just needs to figure out which flight of which airline suits him best. Based on this intuitive understanding, two kinds of relations are extracted, i.e., the passenger-route interaction \(\mathcal {E}^{ur} \in \mathcal {U} \times \mathcal {R}\) and the passenger-airline interaction \(\mathcal {E}^{uc} \in \mathcal {U} \times \mathcal {C}\), thus we have \(\mathcal {E} = \mathcal {E}^{ur} \cup \mathcal {E}^{uc}\).

Apart from the relation information, there also exist some factors that influence passengers’ choices over routes and airlines such as passengers’ age, gender, total travel mileage, and travel seasons. Therefore, we conduct statistical analysis on how these factors affect passengers’ travel behaviors by calculating the percentage of flight records generated by passengers with that attribute or in that season. Results are shown in Fig. 2 where the age and mileage are segmented into groups by maximizing the information gain [6], and the seasons are separated according to the Chinese lunar calendar. It can be observed that passengers have different travel frequency distributions over these factors, which illustrates the necessity to take them into account when modeling passengers’ behaviors. Therefore, the passenger attribute matrix \(\mathbf A ^{u}\) is built where each row describes the corresponding passenger’s age group, gender, travel mileage, and travel preference on different seasons. And the route attribute matrix \(\mathbf A ^{r}\) and airline attribute matrix \(\mathbf A ^{c}\) are also built by calculating the average attribute values of their passenger groups. In addition, customer loyalty and market share are also considered.

Fig. 2.
figure 2

Distribution of flight records on different factors. (a) Passengers’ age; (b) Passengers’ gender; (c) Total travel mileage of passengers; (d) Travel season.

3.2 The Joint Factorization Model

Through the above analysis, interactions between nodes and their attributes are extracted. We use matrices to denote them, i.e., the passenger-route interaction matrix \(\mathbf E ^{ur} \in \mathbb {R}_{+}^{|\mathcal {U}| \times |\mathcal {R}|}\), the passenger-airline interaction matrix \(\mathbf E ^{uc} \in \mathbb {R}_{+}^{|\mathcal {U}| \times |\mathcal {C}|}\), the passenger attribute matrix \(\mathbf A ^{u}\in \mathbb {R}_{+}^{|\mathcal {U}| \times d^{u}}\), the route attribute matrix \(\mathbf A ^{r}\in \mathbb {R}_{+}^{|\mathcal {R}| \times d^{r}}\), and the airline attribute matrix \(\mathbf A ^{c}\in \mathbb {R}_{+}^{|\mathcal {C}| \times d^{c}}\), where all theses matrices are non-negative and \(d^{u}\), \(d^{r}\), and \(d^{c}\) are the dimensions of passenger attributes, route attributes, and airline attributes respectively.

For information integration purpose, we devise a joint non-negative matrix factorization model to learn node representations. Use \(\mathbf U \), \(\mathbf R \), \(\mathbf C \), \(\mathbf H ^{u}\), \(\mathbf H ^{r}\), and \(\mathbf H ^{c}\) to denote the latent representation matrices of passengers, routes, airlines, and their attributes respectively, the model learns them through the reconstruction of the interaction and attribute matrices. Specifically, we aim to minimize the reconstruction loss in Eq. (1), where the conventional squared Euclidean distance [11] is used to measure the reconstruction loss. The non-negative \(\lambda _1\), \(\lambda _2\), \(\lambda _3\), and \(\lambda _4\) tune the weights of different parts, and K, \(d^{u}\), \(d^{r}\), and \(d^{c}\) are the latent dimensions. By minimizing the objective function, the interaction and attribute information can be integrated and the latent representations of heterogeneous nodes can be learned.

$$\begin{aligned} \begin{aligned}&\mathcal {D}(\mathbf E ^{ur}, \mathbf E ^{uc}, \mathbf A ^{u}, \mathbf A ^{r}, \mathbf A ^{c}| \mathbf U , \mathbf R , \mathbf C , \mathbf H ^{u}, \mathbf H ^{r}, \mathbf{H }^{c}) \\&= \mathop {\sum }_{{e}^{ur}_{ij}> 0} {({e}^{ur}_{ij} - \mathbf u _i\mathbf r _j^\top )^2} + \lambda _{1} \mathop {\sum }_{{e}^{uc}_{ij} > 0} {({e}^{uc}_{ij} - \mathbf u _i\mathbf c _j^\top )^2} + \lambda _{2} \mathop {\sum }_{i, j} {({a}^{u}_{ij} - \mathbf u _i{\mathbf{h }^{u}_j}^\top )^2}\\&+ \lambda _{3} \mathop {\sum }_{i, j} {({a}^{r}_{ij} - \mathbf r _i{\mathbf{h }^{r}_j}^\top )^2} + \lambda _{4} \mathop {\sum }_{i, j} {({a}^{c}_{ij} - \mathbf c _i{\mathbf{h }^{c}_j}^\top )^2},\\&s.t. \ \mathbf U \in \mathbb {R}_{+}^{|\mathcal {U}| \times K}, \mathbf R \in \mathbb {R}_{+}^{|\mathcal {R}| \times K}, \mathbf C \in \mathbb {R}_{+}^{|\mathcal {C}| \times K}, \mathbf H ^{u}\in \mathbb {R}_{+}^{K \times d^{u}}, \mathbf H ^{r}\in \mathbb {R}_{+}^{K \times d^{r}}, \\&\mathbf H ^{c}\in \mathbb {R}_{+}^{K \times d^{c}}. \end{aligned} \end{aligned}$$
(1)

3.3 The Weighting Strategy

The joint factorization model in Eq. (1) focuses on the reconstruction loss of positive edges and node attributes in the air travel HIN. However, there are only positive examples in \(\mathcal {E}^{ur}\) and \(\mathcal {E}^{uc}\) that implicitly describe passengers’ demands and preferences on the corresponding route and airlines without any negative information about what routes or airlines the passengers do not need or dislike. What is worse, due to the sparsity and long-tail distribution of travel behaviors, there exists an imbalance problem in the edge weights. Although the integration of attribute information can help to overcome this problem to some extent, it is still difficult to fit the imbalanced edge weights. Therefore, we adopt a weighting strategy that reduces the imbalance of edge weights by taking advantage of the implicit feedback information [9].

First, passengers’ preferences on the routes and airlines are extracted from \(\mathbf E ^{ur}\) and \(\mathbf E ^{uc}\) by binarizing the edge weights:

$$\begin{aligned} p_{ij}^{ur} = \left\{ \begin{array}{ll} 1 \quad \quad &{} {{e}^{ur}_{ij}> 0}\\ 0 \quad \quad &{} {{e}^{ur}_{ij} = 0} \\ \end{array} \right. , \quad \quad p_{ij}^{uc} = \left\{ \begin{array}{ll} 1 \quad \quad &{} {{e}^{uc}_{ij} > 0}\\ 0 \quad \quad &{} {{e}^{uc}_{ij} = 0} \\ \end{array} \right. . \end{aligned}$$
(2)

In other words, if a passenger has taken flights on a route (\({e}^{ur}_{ij} > 0\)) or from an airline company (\({e}^{uc}_{ij} > 0\)), it implicates his preferences on them. Otherwise, no preferences are assumed. The binary preference strategy effectively reduces the gap between frequent and infrequent interactions and makes the fitting process easier. However, it loses the indication about which routes or airlines attract the passengers more. So a linear weighting strategy is utilized to assign different confidence scores to the preferences according to the edge weights:

$$\begin{aligned} w_{ij}^{ur} = \alpha {e}^{ur}_{ij} + 1, \quad w_{ij}^{uc} = \alpha {e}^{uc}_{ij} + 1, \end{aligned}$$
(3)

where \(\alpha \) is a non-negative hyperparameter that controls the increase rate of the confidence scores. With such a weighting strategy, when there is no interaction between passenger i and route j (\(e_{ij}^{ur} = 0\)), a minimum confidence score \(w_{ij}^{ur}=1\) is assigned, which means that it is uncertain whether the passenger has interest in the route or not. However, with the growth of \(e_{ij}^{ur}\), there is a larger confidence that the route meets the passenger’s needs. And it is the same for passenger-airline pairs.

Together, the binary preferences and the linear confidences solve the imbalance problem caused by data sparsity and the long-tail distributions. So the objective function can be updated as in Eq. (4), where both the observed and unobserved passenger-route and passenger-airline interactions are fitted with different confidences. In this way, there exists no gap between the fitting targets of frequent and infrequent passenger-route or passenger-airline interactions, which makes the learning process easier. However, the valuable frequency information is not abandoned but used to decide how much weight JWNMF should put on each fitting target.

$$\begin{aligned} \begin{aligned}&\mathcal {D}(\mathbf E ^{ur}, \mathbf E ^{uc}, \mathbf A ^{u}, \mathbf A ^{r}, \mathbf A ^{c}| \mathbf U , \mathbf R , \mathbf C , \mathbf H ^{u}, \mathbf H ^{r}, \mathbf H ^{c}) \\&= \mathop {\sum }_{i,j} {w_{ij}^{ur}({p}^{ur}_{ij} - \mathbf u _i\mathbf r _j^\top )^2} + \lambda _{1} \mathop {\sum }_{i,j} {w_{ij}^{uc}({p}^{uc}_{ij} - \mathbf u _i\mathbf c _j^\top )^2}\\&+ \lambda _{2} \mathop {\sum }_{i, j} {({a}^{u}_{ij} - \mathbf u _i{\mathbf{h }^{u}_j}^\top )^2} + \lambda _{3} \mathop {\sum }_{i, j} {({a}^{r}_{ij} - \mathbf r _i{\mathbf{h }^{r}_j}^\top )^2}\\&+ \lambda _{4} \mathop {\sum }_{i, j} {({a}^{c}_{ij} - \mathbf c _i{\mathbf{h }^{c}_j}^\top )^2}, \end{aligned} \end{aligned}$$
(4)

3.4 Model Optimization

By optimizing the objective function in Eq. (4), the interaction and attribute information can be effectively integrated thus latent representations can be learned. Here we present the details of the optimization process. The derivatives of the objective function \(\mathcal {D}\) with respect to the latent variables are:

$$\begin{aligned} \begin{aligned} \frac{\partial \mathcal {D}}{\partial \mathbf U } =&-2(\mathbf W ^{ur} \otimes (\mathbf P ^{ur} - \mathbf U {} \mathbf R ^\top ))\mathbf R -2\lambda _1(\mathbf W ^{uc} \otimes (\mathbf P ^{uc} - \mathbf U {} \mathbf C ^\top ))\mathbf C \\&- 2\lambda _2(\mathbf A ^{u}- \mathbf U {\mathbf{H }^{u}}^\top )\mathbf H ^{u}, \\ \frac{\partial \mathcal {D}}{\partial \mathbf R } =&-2(\mathbf W ^{ur} \otimes (\mathbf P ^{ur} - \mathbf U {} \mathbf R ^\top ))^\top \mathbf U - 2\lambda _3(\mathbf A ^{r} - \mathbf R {\mathbf{H }^{r}}^\top )\mathbf H ^{r}, \\ \frac{\partial \mathcal {D}}{\partial \mathbf C } =&-2\lambda _1(\mathbf W ^{uc} \otimes (\mathbf P ^{uc} - \mathbf U {} \mathbf C ^\top ))^\top \mathbf U - 2\lambda _4(\mathbf A ^{c} - \mathbf C {\mathbf{H }^{c}}^\top )\mathbf H ^{c}, \\ \frac{\partial \mathcal {D}}{\partial \mathbf H ^{u}} =&-2\lambda _2({\mathbf{A }^{u}}^\top - \mathbf H ^{u}{} \mathbf U ^\top )\mathbf U , \ \frac{\partial \mathcal {D}}{\partial \mathbf H ^{r}} = -2\lambda _3({\mathbf{A }^{r}}^\top - \mathbf H ^{r}{} \mathbf R ^\top )\mathbf R , \\ \frac{\partial \mathcal {D}}{\partial \mathbf H ^{c}} =&-2\lambda _4({\mathbf{A }^{c}}^\top - \mathbf H ^{c}{} \mathbf C ^\top )\mathbf C . \end{aligned} \end{aligned}$$
(5)

With the gradients given in Eq. (5), the objective function can be optimized with any gradient-descent based methods. In this work, we adopt the popular multiplicative update method [11] which is guaranteed to converge to at least a locally optimal solution.

Table 1. Statistics of the datasets.

4 Experiments

To investigate the effectiveness of JWNMF in learning latent representations for nodes in the air travel HIN, we evaluate our proposed method on a real-world PNR dataset. The learned representations are evaluated according to the route and airline recommendation performances. And the experimental results prove our points.

4.1 Dataset

We use a two-year anonymized PNR dataset which contains 2,956,088 passengers, 2728 routes, and 22 airline companies. To comprehensively analyze JWNMF’s performs on both frequent flyers and normal passengers, two sub-datasets are extracted. The first subset is extracted by selecting the top 100,000 passengers and their records according to the travel frequencies, we denote it as Top100K. While the second subset contains randomly selected 100,000 passengers and their records denoted as Rand100K. Details about the two sub-datasets are demonstrated in Table 1. It can be observed from the number of records and data density that frequent flyers behave differently from normal passengers. And models should perform well on both the valuable frequent flyers and the huge group of normal passengers.

4.2 Baselines

To achieve comprehensive and comparative analysis of our approach, we compare it with three kinds of baselines: the trivial methods, the collaborative filtering methods, and the matrix factorization methods.

  • Random. Random guess (Random) is a trivial method in recommender systems. For each passenger, N routes and airlines are randomly selected from the candidate sets and recommended.

  • ItemPop. Item popularity (ItemPop) is another trivial method. The routes and airlines are sorted according to the frequencies they appear in the records. And the top N routes and airlines are recommended to all passengers

  • UCF. User-based collaborative filtering (UCF) is widely used in a lot of applications. The passenger-item (route or airline) relevance score \(\mathrm {UCF}(i, j)\) is calculated as a weighted sum of the passenger’s similarity to all passengers that have consumed the item. We adopt the cosine similarity between passengers’ flight records and attributes with a parameter \(0 \le \beta _1 \le 1\) tuning the weight. After that, the top N items are recommended according to \(\mathrm {UCF}(i, j)\).

  • ICF. Item-based collaborative filtering (ICF) is also widely used in various applications. The passenger-item (route or airline) relevance score \(\mathrm {ICF}(i, j)\) is calculated as a weighted sum of the item’s cosine similarity to all items that the passenger has consumed. We adopt the cosine similarity between flight records and attributes with a parameter \(0 \le \beta _2 \le 1\) tuning the weight. After that, the top N items are recommended according to \(\mathrm {ICF}(i, j)\).

  • NMF. NMF is the most popular matrix factorization method in recommendations and is also the basic model of our JWNMF. Latent representations are learned by independently factorizing the passenger-route matrix \(\mathbf E ^{ur}\) or passenger-airline matrix \(\mathbf E ^{uc}\). And the routes and airlines are recommended according to their similarity to passengers in the latent space.

  • JNMF. To evaluate the performances of the integration of the heterogeneous edges and node attributes, the joint NMF (JNMF) model in Eq. (1) is used in the experiments. JNMF simultaneously learns the representations of passengers, routs, and airlines in the shared latent space in which the recommendations are conducted.

  • WNMF. The weighting strategy is also independently evaluated by comparison of a weighted NMF (WNMF). The binary preference strategy in Eq. (2) and the linear weights in Eq. (3) are applied to \(\mathbf E ^{ur}\) or \(\mathbf E ^{uc}\). A preference matrix is factorized with the aid of the corresponding weight matrix, and recommendations are conducted accordingly.

  • JWNMF. JWNMF combines the advantages of both JNMF and WNMF as shown in Eq. (4). And recommendations are conducted in the shared latent representation space.

4.3 Experimental Settings

For both the Top100K and Rand100K datasets, 10% of the entries in \(\mathbf E ^{ur}\) and \(\mathbf E ^{uc}\) are randomly sampled for test purpose. For both CF models, the hyperparameters \(\beta _1\) and \(\beta _2\) are tuned in the range of 0 to 1 with stepsize 0.1 where 0 and 1 means the attribute-only and record-only similarity measures respectively. The latent dimension K in all factorization models are tuned in the range of 50 to 500 with stepsize 50, and the maximum iteration number is set to 200. Finally, both the linear parameters \(\alpha \) in Eq. (3) and the weight parameters \(\lambda \)s are tuned in the range of \(10^{-3}\) to \(10^{3}\), multiplied by 10 at each step. After the representations are learned, the top \(N=5\) and \(N=10\) routes and airlines are recommended to each passenger according to their relevance scores in the latent space. And the performances are evaluated with the micro-averaged precision (P), recall (R), and F1 scores.

Table 2. Recommendation performances on the Top100K dataset.
Table 3. Recommendation performances on the Rand100K dataset.

4.4 Experimental Results

Tables 2 and 3 show the results on the Top100K and Rand100K datasets respectively, where the best results are boldfaced. From these results, we have the following observations and analysis:

  • JWNMF achieves the best performances on both datasets and all six evaluation measures, which proves the superiority of the proposed model. On both datasets, both JNMF and WNMF perform better than NMF. What is more, by combining both the joint factorization and weighting strategy, JWNMF consistently performs better than all of them. Therefore, both of the proposed modifications are effective and necessary.

  • JNMF performs better than WNMF on the Rand100K dataset. The reason is that the Rand100K dataset is more sparse than the Top100K dataset, as demonstrated in Table 1. Because the joint factorization technique is proposed to deal with the data sparsity problem, JNMF achieves more significant improvements than the weighting strategy.

  • On the other side, WNMF performs better than JNMF on the route recommendation task on the Top100K dataset. Because frequent flyers often interact frequently with specific routes that differ from each other, the passenger-route interactions are denser and have bigger value differences. Thus WNMF achieves more significant improvements by narrowing the gap between fitting targets while keeping the frequency information.

  • Due to the fact that the travel demands and behavior patterns of frequent flyers are more clearly reflected by the dense flight records, performances on the Top100K dataset are generally better than on the Rand100K dataset. However, our JWNMF demonstrates its robustness by achieving the best performances on both datasets.

Fig. 3.
figure 3

Analysis of the linear weight parameter \(\alpha \) and the balancing parameter \(\lambda _1\) on the Top100K dataset. (a) The F1@5 scores of route recommendation when \(\alpha \) varies; (b) The F1@5 scores of route recommendation when \(\lambda _1\) varies; (c) The F1@5 scores of airline recommendation when \(\lambda _1\) varies.

4.5 Parameter Analysis

There are two types of important parameters in JWNMF, the linear weight \(\alpha \) in the weighting strategy and the balancing parameters \(\lambda _1\), \(\lambda _2\), \(\lambda _3\), \(\lambda _4\) in the objective function. Other parameters like the latent dimension K and the iteration number also matter. However, for space limitation, we only analysis how \(\alpha \) and \(\lambda _1\) affect the performances in this subsection as illustrated in Fig. 3.

With the increase of both parameters, all curves rise first and then decline. Because the entries in \(\mathbf E ^{ur}\) and \(\mathbf E ^{uc}\) are integer frequencies, a small \(\alpha \) (<1) fails to recognize the relevance information contained in high frequencies, while a large \(\alpha \) (>1) makes the model concentrate too much on high frequencies and overfit. Therefor, we set \(\alpha =1\) in experiments. On the other hand, \(\lambda _1=0.1\) achieves the best performances on route recommendation, but it is \(\lambda _1=1\) on airline recommendation, which demonstrates the trade-off between two tasks. Taking full account of the overall performances, we set \(\lambda _1=0.1\) in the experiments.

5 Conclusion

In this paper, we introduced a heterogeneous item recommendation framework JWNMF which incorporates heterogeneous information from the air travel HIN derived from the PNR dataset. The proposed JWNMF leverages both the interaction information between entities and their attributes, which effectively models passengers’ travel demands and behavior patterns. Through a joint weighted factorization framework, representations of multiple kinds of entities are simultaneously learned and mutually enhanced. Experiments conducted on a real-world PNR dataset demonstrated the effectiveness and superiority of JWNMF on air route and airline recommendation tasks.