A recommender system using GA K-means clustering in an online shopping market
Introduction
Since the Internet has become popular, the consumer-oriented electronic commerce market has grown so huge that now companies are convinced of the importance of understanding this new emerging market. It is becoming more important for the companies to analyze and understand the needs and expectations of their online users or customers because the Internet is one of the most effective media to gather, disseminate and utilize the information about its users or customers. Thus, it is easier to extract knowledge out of the shopping process to create new business opportunities under the Internet environment.
Market segmentation is one of the ways in which such knowledge can be represented. It attempts to discover the classes in which the consumers can be naturally grouped, according to the information available (Velido, Lisboa, & Meehan, 1999). It can be the basis for effective targeting and predicting prospects through the identification of the proper segments. Although much of the marketing literature has proposed various market segmentation techniques, clustering techniques are frequently used in practice (Wedel & Kamakura, 1998). In addition, K-means clustering is the most frequently used market segmentation technique among the clustering techniques (Gehrt and Shim, 1998, Kuo et al., 2004). However, the major drawback of K-means clustering is that it often falls in local optima and the result largely depends on the initial cluster centers. Prior studies pointed out this limitation and tried to integrate K-means clustering and global search techniques including genetic algorithms (see Babu and Murty, 1993, Kuo et al., 2005, Maulik and Bandyopadhyay, 2000, Murthy and Chowdhury, 1996, Pena et al., 1999).
In this paper, we try to apply hybrid K-means clustering and genetic algorithms to carry out an exploratory segmentation of an online shopping market. To find the most effective clustering method for those kinds of data, we adopt a number of clustering methods and compare the performance of each clustering method by using our suggested performance criteria. In addition, we validate the usefulness of our proposed model in a real-world application.
The rest of this paper is organized as follows: the next section reviews two traditional clustering algorithms, K-means and self-organizing map (SOM), along with the performance criteria. Section 3 proposes the GA approach to optimize the K-means clustering and Section 4 describes the data and the experiments. In this section, the empirical results are also summarized and discussed. In the final section, conclusions and the limitations of this study are presented.
Section snippets
Clustering algorithms
Cluster analysis is an effective tool in scientific or managerial inquiry. It groups a set of data in d-dimensional feature space to maximize the similarity within the clusters and minimize the similarity between two different clusters. There are various clustering methods and they are currently widely used. Among them, we apply two popular methods, K-means and SOM, and a novel hybrid method to market segmentation. Before providing a brief description of each method, the following assumptions
GA K-means clustering algorithm
As indicated in Section 2.1, the K-means algorithm does not have any mechanism for choosing appropriate initial seeds. However, selecting different initial seeds may generate huge differences in clustering results, especially when the target sample contains many outliers. In addition, random selection of initial seeds often causes the clustering quality to fall into local optimization (Bradley & Fayyad, 1998). So, it is very important to select appropriate initial seeds in the traditional K
Experimental design
We adopt three clustering algorithms – simple K-means, SOM and GA K-means – to our data. We try to segment the Internet users into 5 clusters (that is, K = 5). In the case of SOM, we set the learning rate (α) at 0.5.
For the controlling parameters of the GA search, the population size is set at 200 organisms. The value of the crossover rate is set at 0.7 while the mutation rate is set at 0.1. This study performs the crossover using a uniform crossover routine. The uniform crossover method is
Conclusions
This study suggests a new clustering algorithm, GA K-means. We applied it to a real-world case for market segmentation in electronic commerce, and found that GA K-means might result in better segmentation than other traditional clustering algorithms including simple K-means and SOM from the perspective of intraclass inertia. In addition, we empirically examined the usefulness of GA K-means as a preprocessing tool for recommendation model.
However, this study has some limitations. Although we
References (31)
- et al.
Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce
Expert Systems with Applications
(2004) - et al.
A personalized recommender system based on Web usage mining and decision tree induction
Expert Systems with Applications
(2002) - et al.
A shopping orientation segmentation of French consumers: implications for catalog marketing
Journal of Interactive Marketing
(1998) - et al.
A framework for the description of evolutionary algorithms
European Journal of Operational Research
(2000) - et al.
A personalized recommendation procedure for Internet shopping support
Electronic Commerce Research and Applications
(2002) - et al.
The cluster-indexing method for case-based reasoning using self-organizing maps and learning vector quantization for bond rating cases
Expert Systems with Applications
(2001) - et al.
Integration of self-organizing feature maps neural network and genetic K-means algorithm for market segmentation
Expert Systems with Applications
(2006) - et al.
Integration of ART2 neural network and genetic K-means algorithm for analyzing Web browsing paths in electronic commerce
Decision Support Systems
(2005) - et al.
Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes
Analytica Chimica Acta
(2004) - et al.
Genetic algorithm-based clustering technique
Pattern Recognition
(2000)
Clustering techniques
Future Generation Computer Systems
In search of optimal clusters using genetic algorithms
Pattern Recognition Letters
An empirical comparison of four initialization methods for the K-means algorithm
Pattern Recognition Letters
Case-based reasoning supported by genetic algorithms for corporate bond rating
Expert Systems with Applications
A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm
Pattern Recognition Letters
Cited by (282)
Improving collaborative recommender system using hybrid clustering and optimized singular value decomposition
2023, Engineering Applications of Artificial IntelligenceThe impact of multi-criteria ratings in social networking sites on the performance of online recommendation agents
2023, Telematics and InformaticsIdentifying urban morphological archetypes for microclimate studies using a clustering approach
2022, Building and EnvironmentCitation Excerpt :We consider using a one-way ANOVA analysis to validate the clustering result as we have one parameter. Moreover, several studies have effectively used one-way ANOVA for validating the clustering analysis results [81–83]. Therefore, we validate the clustering result with the help of a one-way ANOVA to see whether the mean land surface temperature (LST) of the blocks varies across the clusters.
nTechnological trend mining: identifying new technology opportunities using patent semantic analysis
2022, Information Processing and ManagementNonparametric K-means clustering-based adaptive unsupervised colour image segmentation
2024, Pattern Analysis and Applications