An intelligent system for customer targeting: a data mining approach

https://doi.org/10.1016/S0167-9236(03)00008-3Get rights and content

Abstract

We propose a data mining approach for market managers that uses artificial neural networks (ANNs) guided by genetic algorithms (GAs). Our predictive model allows the selection of an optimal target point where expected profit from direct mailing is maximized. Our approach also produces models that are easier to interpret by using a smaller number of predictive features. Through sensitivity analysis, we also show that our chosen model significantly outperforms the baseline algorithms in terms of hit rate and expected net profit on key target points.

Introduction

The ultimate goal of decision support systems is to provide managers with information that is useful for understanding various managerial aspects of a problem and to choose a best solution among many alternatives. In this paper, we focus on a very specific decision support system on behalf of market managers who want to develop and implement efficient marketing programs by fully utilizing a customer database. This is important because, due to the growing interest in micro-marketing, many firms devote considerable resources to identifying households that may be open to targeted marketing messages. This becomes more critical through the easy availability of data warehouses combining demographic, psychographic and behavioral information.

Both the marketing [8], [19], [33] and data-mining communities [4], [32], [27], [13] have presented various database-based approaches for direct marketing. A good review of how data mining can be integrated into a knowledge-based marketing can be found in [41]. Traditionally, the optimal selection of mailing targets has been considered one of the most important factors for direct marketing to be successful. Thus, many models aim to identify as many customers as possible who will respond to a specific solicitation campaign letter, based on the customer's estimated probability of responding to marketing program.

This problem becomes more complicated when the interpretability of the model is important. For example, in database marketing applications, it is critical for managers to understand the key drivers of consumer response. A predictive model that is essentially a “black box” is not useful for developing comprehensive marketing strategies. At the same time, a rule-based system that consists of too many if-then statements can make it difficult for users to identify the key drivers. Note that two principal goals, model interpretability and predictive accuracy, can be in conflict.

Another important but often neglected aspect of models is the decision support function that helps market managers make strategic marketing plans. For example, market managers want to know how many customers should be targeted to maximize the expected net profit or increase market share while at least recovering the operational costs of a specific campaign. In order to attain this goal, market managers need a sensitivity analysis that shows how the value of the objective function (e.g., the expected net profit from the campaign) changes as campaign parameters vary (e.g., the campaign scope measured by the number of customers targeted).

In this paper, we propose a data-mining approach to building predictive models that satisfies these requirements efficiently and effectively. First, we show how to build predictive models that combine artificial neural networks (ANNs) [37] with genetic algorithms (GAs) [18] to help market managers identify prospective households. ANNs have been used in other marketing applications such as customer clustering [16], [1] and market segmentation [21], [2]. We use ANNs to identify optimal campaign targets based on each individual's likelihood of responding to campaign message positively. This can be done by learning linear or possibly nonlinear relationships between given input variables and the response indicator. We go one step further from this traditional approach. Because we are also interested in isolating key determinants of customer response, we select different subsets of variables using GAs and use only those selected variables to train different ANNs.

GAs have become a very powerful tool in finance, economics, accounting, operations research, and other fields as an alternative to hill-climbing search algorithms. This is mainly because those heuristic algorithms might lead to a local optimum, while GAs are more likely to avoid local optima by evaluating multiple solutions simultaneously and adjusting their search bias toward more promising areas. Further, GAs have been known to have superior performance to other search algorithms for data sets with high dimensionality [28].

Second, we demonstrate through a sensitivity analysis that our approach can be used to determine the scope of marketing campaign given marginal revenue per customer and marginal cost per campaign mail. This can be a very useful tool for market managers who want to assess the impacts of various factors such as mailing cost and limited campaign budget on the outcomes of marketing campaign.

Finally, we enhance the interpretability of our model by reducing the dimensionality of data sets. Traditionally, feature extraction algorithms including principal component analysis (PCA) have been often used for this purpose. However, PCA is not appropriate when the ultimate goal is not only to reduce the dimensionality, but also to obtain highly accurate predictive models. This is because PCA does not take into account the relationship between dependant and other input variables in the process of data reduction. Further, the resulting principal components from PCA can be difficult to interpret when the space of input variables is huge.

Data reduction is performed via feature selection in our approach. Feature selection is defined as the process of choosing a subset of the original predictive variables by eliminating features that are either redundant or possess little predictive information. If we extract as much information as possible from a given data set while using the smallest number of features, we cannot only save a great amount of computing time and cost, but also build a model that generalizes better to households not in the test mailing. Feature selection can also significantly improve the comprehensibility of the resulting classifier models. Even a complicated model—such as a neural network—can be more easily understood if constructed from only a few variables.

Our methodology exploits the desirable characteristics of GAs and ANNs to achieve two principal goals of household targeting at a specific target point: model interpretability and predictive accuracy. A standard GA is used to search through the possible combinations of features. The input features selected by GA are used to train ANNs. The trained ANN is tested on an evaluation set, and a proposed model is evaluated in terms of two quality measurements—cumulative hit rate (which is maximized) and complexity (which is minimized). We define the cumulative hit rate as the ratio of the number of actual customers identified out of the total number of actual customers in a data set. This process is repeated many times as the algorithm searches for a desirable balance between predictive accuracy and model complexity. The result is a highly accurate predictive model that uses only a subset of the original features, thus simplifying the model and reducing the risk of overfitting. It also provides useful information on reducing future data collection costs.

In order to help market managers determine the campaign scope, we run the GA/ANN model repeatedly over different target points to obtain local solutions. A local solution is a predictive feature subset with the highest fitness value at a specific target point. At a target point i where 0≤i≤100, our GA/ANN model searches for a model that is optimal when the best i% of customers in a new data set is targeted based on the estimated probability of responding to the marketing campaign. Once we obtain local solutions, we combine them into an Ensemble, a global solution that is used to choose the best target point. Note that our Ensemble model is different from popular ensemble algorithms such as Bagging [7] and Boosting [15] that combine the predictions of multiple models by voting. Each local solution in our Ensemble model scores and selects prospects at a specific target point independently of other local solutions. Finally, in order to present the performance of local solutions and an Ensemble, we use a lift curve that shows the relationship between target points and corresponding cumulative hit rate.

This paper is organized as follows. In Section 2, we explain GAs for feature selection in detail, and motivate the use of a GA to search for the global optimum. In Section 3, we describe the structure of the GA/ANN model, and review the feature subset selection procedure. In Section 4, we present experimental results of both the GA/ANN model and a single ANN with the complete set of features. In particular, a global solution is constructed by incorporating the local solutions obtained over various target points. We show that such a model can be used to help market managers determine the best target point where the expected profit is maximized. In Section 5, we review related work for direct marketing from both the marketing and data mining communities. Section 6 concludes the paper and provides suggestions for future research directions.

Section snippets

Genetic algorithms for feature selection

A genetic algorithm (GA) is a parallel search procedure that simulates the evolutionary process by applying genetic operators. We provide a simple introduction to a standard GA in this section. More extensive discussions on GAs can be found in [18].

Since [18], various types of GAs have been used for many different applications. However, many variants still share common characteristics. Typically, a GA starts with and maintains a population of chromosomes that correspond to solutions to the

GA/ANN model for customer targeting

Our predictive model of household buying behavior is a hybrid of the GA and ANN procedures. In our approach, the GA identifies relevant consumer descriptors that are used by the ANN to forecast consumer choice given a specific target point. Our final solution, Ensemble, consists of multiple local solutions each of which is an optimized solution at a specific target point. In this section, we present the structure of our GA/ANN model and the evaluation criteria used to select an appropriate

Application

The new GA/ANN methodology is applied to the prediction of households interested in purchasing an insurance policy for recreational vehicles. To benchmark the new procedure, we contrast the predictive performance of Ensemble to a single ANN with the complete set of features. We do not compare our approach to a standard logit regression model because a logit regression model is a special case of single ANN with one hidden node.

Related work

Various multivariate statistical and analytical techniques in marketing community have been applied to the database marketing problem. Routine mailings to existing customers are typically based upon the RFM (recency, frequency, monetary) approach that targets households using knowledge of the customer's purchase history [40]. However, the RFM model has major disadvantages including its limited applicability to current customers only and redundancy because of inter-dependency among RFM

Conclusion

In this paper, we presented a novel approach for customer targeting in database marketing. We used a genetic algorithm to search for possible combinations of features and an artificial neural network to score customers. One of the clear strengths of the GA/ANN approach is its ability to construct predictive models that reflect the direct marketer's decision process. In particular, with information of campaign costs and profit per additional actual customer, we show that our system not only

Acknowledgements

The authors wish to thank Peter van der Putten and Maarten van Someren for making the CoIL data available for this paper. This work is partially supported by NSF grant IIS-99-96044.

Dr. YongSeog Kim is an Assistant Professor in Business Information Systems department at the Utah State University. He received his MS degree in Computer Science and PhD in Business Administration from the University of Iowa. Dr. Kim's primary research area is data mining including feature selection, clustering, ensemble methods, streaming data analysis, and spatial and temporal data analysis. Recently, he becomes interested in applying data mining algorithms to solve some business problems in

References (43)

  • G.R. Bitran et al.

    Mailing decisions in the catalog sales industry

    Management Science

    (1996)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • J.R. Bult et al.

    Optimal selection for direct mail

    Marketing Science

    (1995)
  • P.K. Chan et al.

    Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection

  • S. Chen et al.

    Non-standard crossover for a standard representation—commonality-based feature subset selection

  • P.B. Chou et al.

    Identifying prospective customers

  • F. Coetzee et al.

    Feature selection in web applications using ROC inflections

  • P. Domingos et al.

    Mining the network value of customers

  • T. Fawcett et al.

    Combining data mining and machine learning for effective user profiling

  • Y. Freund et al.

    Experiments with a new boosting algorithm

  • I. Gath et al.

    Unsupervised optimal fuzzy clustering

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1988)
  • Cited by (131)

    • From free to fee: Monetizing digital content through expected utility-based recommender systems

      2022, Information and Management
      Citation Excerpt :

      The ARM approach aims specifically to gauge the extent to which the sale of one item influences the sale of another, determining the relational strength of the two items based on the frequency with which they are sold simultaneously in one transaction. The ARM method has been widely applied and utilized as a key analytical tool in numerous business contexts, including marketing promotion, cross-selling [6, 17, 28], website analysis [5, 52, 60], decision support [31, 41, 65], credit evaluation [15], privacy issues [38, 54], criminal event prediction [71], customer behavior analysis [32], and fraud detection [39]. Despite its wide application in practice, ARM is often criticized for its dependence on the frequency of transactions and strength of affinity between items while paying scant attention to the business significance and impact of quantitative patterns [21, 72].

    View all citing articles on Scopus

    Dr. YongSeog Kim is an Assistant Professor in Business Information Systems department at the Utah State University. He received his MS degree in Computer Science and PhD in Business Administration from the University of Iowa. Dr. Kim's primary research area is data mining including feature selection, clustering, ensemble methods, streaming data analysis, and spatial and temporal data analysis. Recently, he becomes interested in applying data mining algorithms to solve some business problems in customer targeting, e-commerce, and computational finance and economics. His other research interests include digital government, electronic books, and computer–human interface.

    Dr. Nick Street is an Associate Professor in the Management Sciences Department at the University of Iowa. He received a PhD in 1994 in Computer Sciences from the University of Wisconsin. His research interests are in machine learning and data mining, particularly the use of mathematical optimization in inductive learning techniques. His recent work has focused on dimensionality reduction (feature selection) in high-dimensional data for both classification and clustering, ensemble prediction methods for massive and streaming data sets, and learning shapes for image segmentation, classification, and retrieval. He has received an NSF CAREER award and an NIH INRSA postdoctoral fellowship.

    View full text