A recommender system using GA K-means clustering in an online shopping market

https://doi.org/10.1016/j.eswa.2006.12.025Get rights and content

Abstract

The Internet is emerging as a new marketing channel, so understanding the characteristics of online customers’ needs and expectations is considered a prerequisite for activating the consumer-oriented electronic commerce market. In this study, we propose a novel clustering algorithm based on genetic algorithms (GAs) to effectively segment the online shopping market. In general, GAs are believed to be effective on NP-complete global optimization problems, and they can provide good near-optimal solutions in reasonable time. Thus, we believe that a clustering technique with GA can provide a way of finding the relevant clusters more effectively. The research in this paper applied K-means clustering whose initial seeds are optimized by GA, which is called GA K-means, to a real-world online shopping market segmentation case. In this study, we compared the results of GA K-means to those of a simple K-means algorithm and self-organizing maps (SOM). The results showed that GA K-means clustering may improve segmentation performance in comparison to other typical clustering algorithms. In addition, our study validated the usefulness of the proposed model as a preprocessing tool for recommendation systems.

Introduction

Since the Internet has become popular, the consumer-oriented electronic commerce market has grown so huge that now companies are convinced of the importance of understanding this new emerging market. It is becoming more important for the companies to analyze and understand the needs and expectations of their online users or customers because the Internet is one of the most effective media to gather, disseminate and utilize the information about its users or customers. Thus, it is easier to extract knowledge out of the shopping process to create new business opportunities under the Internet environment.

Market segmentation is one of the ways in which such knowledge can be represented. It attempts to discover the classes in which the consumers can be naturally grouped, according to the information available (Velido, Lisboa, & Meehan, 1999). It can be the basis for effective targeting and predicting prospects through the identification of the proper segments. Although much of the marketing literature has proposed various market segmentation techniques, clustering techniques are frequently used in practice (Wedel & Kamakura, 1998). In addition, K-means clustering is the most frequently used market segmentation technique among the clustering techniques (Gehrt and Shim, 1998, Kuo et al., 2004). However, the major drawback of K-means clustering is that it often falls in local optima and the result largely depends on the initial cluster centers. Prior studies pointed out this limitation and tried to integrate K-means clustering and global search techniques including genetic algorithms (see Babu and Murty, 1993, Kuo et al., 2005, Maulik and Bandyopadhyay, 2000, Murthy and Chowdhury, 1996, Pena et al., 1999).

In this paper, we try to apply hybrid K-means clustering and genetic algorithms to carry out an exploratory segmentation of an online shopping market. To find the most effective clustering method for those kinds of data, we adopt a number of clustering methods and compare the performance of each clustering method by using our suggested performance criteria. In addition, we validate the usefulness of our proposed model in a real-world application.

The rest of this paper is organized as follows: the next section reviews two traditional clustering algorithms, K-means and self-organizing map (SOM), along with the performance criteria. Section 3 proposes the GA approach to optimize the K-means clustering and Section 4 describes the data and the experiments. In this section, the empirical results are also summarized and discussed. In the final section, conclusions and the limitations of this study are presented.

Section snippets

Clustering algorithms

Cluster analysis is an effective tool in scientific or managerial inquiry. It groups a set of data in d-dimensional feature space to maximize the similarity within the clusters and minimize the similarity between two different clusters. There are various clustering methods and they are currently widely used. Among them, we apply two popular methods, K-means and SOM, and a novel hybrid method to market segmentation. Before providing a brief description of each method, the following assumptions

GA K-means clustering algorithm

As indicated in Section 2.1, the K-means algorithm does not have any mechanism for choosing appropriate initial seeds. However, selecting different initial seeds may generate huge differences in clustering results, especially when the target sample contains many outliers. In addition, random selection of initial seeds often causes the clustering quality to fall into local optimization (Bradley & Fayyad, 1998). So, it is very important to select appropriate initial seeds in the traditional K

Experimental design

We adopt three clustering algorithms – simple K-means, SOM and GA K-means – to our data. We try to segment the Internet users into 5 clusters (that is, K = 5). In the case of SOM, we set the learning rate (α) at 0.5.

For the controlling parameters of the GA search, the population size is set at 200 organisms. The value of the crossover rate is set at 0.7 while the mutation rate is set at 0.1. This study performs the crossover using a uniform crossover routine. The uniform crossover method is

Conclusions

This study suggests a new clustering algorithm, GA K-means. We applied it to a real-world case for market segmentation in electronic commerce, and found that GA K-means might result in better segmentation than other traditional clustering algorithms including simple K-means and SOM from the perspective of intraclass inertia. In addition, we empirically examined the usefulness of GA K-means as a preprocessing tool for recommendation model.

However, this study has some limitations. Although we

References (31)

  • P. Michaud

    Clustering techniques

    Future Generation Computer Systems

    (1997)
  • C.A. Murthy et al.

    In search of optimal clusters using genetic algorithms

    Pattern Recognition Letters

    (1996)
  • J.M. Pena et al.

    An empirical comparison of four initialization methods for the K-means algorithm

    Pattern Recognition Letters

    (1999)
  • K.S. Shin et al.

    Case-based reasoning supported by genetic algorithms for corporate bond rating

    Expert Systems with Applications

    (1999)
  • G.P. Babu et al.

    A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm

    Pattern Recognition Letters

    (1993)
  • Cited by (282)

    • Identifying urban morphological archetypes for microclimate studies using a clustering approach

      2022, Building and Environment
      Citation Excerpt :

      We consider using a one-way ANOVA analysis to validate the clustering result as we have one parameter. Moreover, several studies have effectively used one-way ANOVA for validating the clustering analysis results [81–83]. Therefore, we validate the clustering result with the help of a one-way ANOVA to see whether the mean land surface temperature (LST) of the blocks varies across the clusters.

    View all citing articles on Scopus
    View full text