An intelligent system for customer targeting: a data mining approach
Introduction
The ultimate goal of decision support systems is to provide managers with information that is useful for understanding various managerial aspects of a problem and to choose a best solution among many alternatives. In this paper, we focus on a very specific decision support system on behalf of market managers who want to develop and implement efficient marketing programs by fully utilizing a customer database. This is important because, due to the growing interest in micro-marketing, many firms devote considerable resources to identifying households that may be open to targeted marketing messages. This becomes more critical through the easy availability of data warehouses combining demographic, psychographic and behavioral information.
Both the marketing [8], [19], [33] and data-mining communities [4], [32], [27], [13] have presented various database-based approaches for direct marketing. A good review of how data mining can be integrated into a knowledge-based marketing can be found in [41]. Traditionally, the optimal selection of mailing targets has been considered one of the most important factors for direct marketing to be successful. Thus, many models aim to identify as many customers as possible who will respond to a specific solicitation campaign letter, based on the customer's estimated probability of responding to marketing program.
This problem becomes more complicated when the interpretability of the model is important. For example, in database marketing applications, it is critical for managers to understand the key drivers of consumer response. A predictive model that is essentially a “black box” is not useful for developing comprehensive marketing strategies. At the same time, a rule-based system that consists of too many if-then statements can make it difficult for users to identify the key drivers. Note that two principal goals, model interpretability and predictive accuracy, can be in conflict.
Another important but often neglected aspect of models is the decision support function that helps market managers make strategic marketing plans. For example, market managers want to know how many customers should be targeted to maximize the expected net profit or increase market share while at least recovering the operational costs of a specific campaign. In order to attain this goal, market managers need a sensitivity analysis that shows how the value of the objective function (e.g., the expected net profit from the campaign) changes as campaign parameters vary (e.g., the campaign scope measured by the number of customers targeted).
In this paper, we propose a data-mining approach to building predictive models that satisfies these requirements efficiently and effectively. First, we show how to build predictive models that combine artificial neural networks (ANNs) [37] with genetic algorithms (GAs) [18] to help market managers identify prospective households. ANNs have been used in other marketing applications such as customer clustering [16], [1] and market segmentation [21], [2]. We use ANNs to identify optimal campaign targets based on each individual's likelihood of responding to campaign message positively. This can be done by learning linear or possibly nonlinear relationships between given input variables and the response indicator. We go one step further from this traditional approach. Because we are also interested in isolating key determinants of customer response, we select different subsets of variables using GAs and use only those selected variables to train different ANNs.
GAs have become a very powerful tool in finance, economics, accounting, operations research, and other fields as an alternative to hill-climbing search algorithms. This is mainly because those heuristic algorithms might lead to a local optimum, while GAs are more likely to avoid local optima by evaluating multiple solutions simultaneously and adjusting their search bias toward more promising areas. Further, GAs have been known to have superior performance to other search algorithms for data sets with high dimensionality [28].
Second, we demonstrate through a sensitivity analysis that our approach can be used to determine the scope of marketing campaign given marginal revenue per customer and marginal cost per campaign mail. This can be a very useful tool for market managers who want to assess the impacts of various factors such as mailing cost and limited campaign budget on the outcomes of marketing campaign.
Finally, we enhance the interpretability of our model by reducing the dimensionality of data sets. Traditionally, feature extraction algorithms including principal component analysis (PCA) have been often used for this purpose. However, PCA is not appropriate when the ultimate goal is not only to reduce the dimensionality, but also to obtain highly accurate predictive models. This is because PCA does not take into account the relationship between dependant and other input variables in the process of data reduction. Further, the resulting principal components from PCA can be difficult to interpret when the space of input variables is huge.
Data reduction is performed via feature selection in our approach. Feature selection is defined as the process of choosing a subset of the original predictive variables by eliminating features that are either redundant or possess little predictive information. If we extract as much information as possible from a given data set while using the smallest number of features, we cannot only save a great amount of computing time and cost, but also build a model that generalizes better to households not in the test mailing. Feature selection can also significantly improve the comprehensibility of the resulting classifier models. Even a complicated model—such as a neural network—can be more easily understood if constructed from only a few variables.
Our methodology exploits the desirable characteristics of GAs and ANNs to achieve two principal goals of household targeting at a specific target point: model interpretability and predictive accuracy. A standard GA is used to search through the possible combinations of features. The input features selected by GA are used to train ANNs. The trained ANN is tested on an evaluation set, and a proposed model is evaluated in terms of two quality measurements—cumulative hit rate (which is maximized) and complexity (which is minimized). We define the cumulative hit rate as the ratio of the number of actual customers identified out of the total number of actual customers in a data set. This process is repeated many times as the algorithm searches for a desirable balance between predictive accuracy and model complexity. The result is a highly accurate predictive model that uses only a subset of the original features, thus simplifying the model and reducing the risk of overfitting. It also provides useful information on reducing future data collection costs.
In order to help market managers determine the campaign scope, we run the GA/ANN model repeatedly over different target points to obtain local solutions. A local solution is a predictive feature subset with the highest fitness value at a specific target point. At a target point i where 0≤i≤100, our GA/ANN model searches for a model that is optimal when the best i% of customers in a new data set is targeted based on the estimated probability of responding to the marketing campaign. Once we obtain local solutions, we combine them into an Ensemble, a global solution that is used to choose the best target point. Note that our Ensemble model is different from popular ensemble algorithms such as Bagging [7] and Boosting [15] that combine the predictions of multiple models by voting. Each local solution in our Ensemble model scores and selects prospects at a specific target point independently of other local solutions. Finally, in order to present the performance of local solutions and an Ensemble, we use a lift curve that shows the relationship between target points and corresponding cumulative hit rate.
This paper is organized as follows. In Section 2, we explain GAs for feature selection in detail, and motivate the use of a GA to search for the global optimum. In Section 3, we describe the structure of the GA/ANN model, and review the feature subset selection procedure. In Section 4, we present experimental results of both the GA/ANN model and a single ANN with the complete set of features. In particular, a global solution is constructed by incorporating the local solutions obtained over various target points. We show that such a model can be used to help market managers determine the best target point where the expected profit is maximized. In Section 5, we review related work for direct marketing from both the marketing and data mining communities. Section 6 concludes the paper and provides suggestions for future research directions.
Section snippets
Genetic algorithms for feature selection
A genetic algorithm (GA) is a parallel search procedure that simulates the evolutionary process by applying genetic operators. We provide a simple introduction to a standard GA in this section. More extensive discussions on GAs can be found in [18].
Since [18], various types of GAs have been used for many different applications. However, many variants still share common characteristics. Typically, a GA starts with and maintains a population of chromosomes that correspond to solutions to the
GA/ANN model for customer targeting
Our predictive model of household buying behavior is a hybrid of the GA and ANN procedures. In our approach, the GA identifies relevant consumer descriptors that are used by the ANN to forecast consumer choice given a specific target point. Our final solution, Ensemble, consists of multiple local solutions each of which is an optimized solution at a specific target point. In this section, we present the structure of our GA/ANN model and the evaluation criteria used to select an appropriate
Application
The new GA/ANN methodology is applied to the prediction of households interested in purchasing an insurance policy for recreational vehicles. To benchmark the new procedure, we contrast the predictive performance of Ensemble to a single ANN with the complete set of features. We do not compare our approach to a standard logit regression model because a logit regression model is a special case of single ANN with one hidden node.
Related work
Various multivariate statistical and analytical techniques in marketing community have been applied to the database marketing problem. Routine mailings to existing customers are typically based upon the RFM (recency, frequency, monetary) approach that targets households using knowledge of the customer's purchase history [40]. However, the RFM model has major disadvantages including its limited applicability to current customers only and redundancy because of inter-dependency among RFM
Conclusion
In this paper, we presented a novel approach for customer targeting in database marketing. We used a genetic algorithm to search for possible combinations of features and an artificial neural network to score customers. One of the clear strengths of the GA/ANN approach is its ability to construct predictive models that reflect the direct marketer's decision process. In particular, with information of campaign costs and profit per additional actual customer, we show that our system not only
Acknowledgements
The authors wish to thank Peter van der Putten and Maarten van Someren for making the CoIL data available for this paper. This work is partially supported by NSF grant IIS-99-96044.
Dr. YongSeog Kim is an Assistant Professor in Business Information Systems department at the Utah State University. He received his MS degree in Computer Science and PhD in Business Administration from the University of Iowa. Dr. Kim's primary research area is data mining including feature selection, clustering, ensemble methods, streaming data analysis, and spatial and temporal data analysis. Recently, he becomes interested in applying data mining algorithms to solve some business problems in
References (43)
- et al.
Competitive learning algorithms for vector quantization
Neural Networks
(1990) - et al.
Comparative performance of the FSCL neural net and K-means algorithm for market segmentation
European Journal of Operation Research
(1996) - et al.
Comparing performance of feedforward neural nets and K-means for market segmentation
European Journal of Operational Research
(1999) - et al.
Applying latent trait analysis in the evaluation of prospects for cross-selling of financial services
- et al.
Comparison of algorithms that select features for pattern classifiers
Pattern Recognition
(2000) Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms
International Journal of Computer Standards and Interfaces
(1994)- et al.
Knowledge management and data mining for marketing
Decision Support Systems
(2001) Predictive modeling
Direct marketing response models using genetic algorithms
Evolutionary algorithms in data mining: multi-objective performance modeling for direct marketing
Mailing decisions in the catalog sales industry
Management Science
Bagging predictors
Machine Learning
Optimal selection for direct mail
Marketing Science
Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection
Non-standard crossover for a standard representation—commonality-based feature subset selection
Identifying prospective customers
Feature selection in web applications using ROC inflections
Mining the network value of customers
Combining data mining and machine learning for effective user profiling
Experiments with a new boosting algorithm
Unsupervised optimal fuzzy clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (131)
Predicting millionaires from Ethereum transaction histories using node embeddings and artificial neural nets
2023, Expert Systems with ApplicationsFrom free to fee: Monetizing digital content through expected utility-based recommender systems
2022, Information and ManagementCitation Excerpt :The ARM approach aims specifically to gauge the extent to which the sale of one item influences the sale of another, determining the relational strength of the two items based on the frequency with which they are sold simultaneously in one transaction. The ARM method has been widely applied and utilized as a key analytical tool in numerous business contexts, including marketing promotion, cross-selling [6, 17, 28], website analysis [5, 52, 60], decision support [31, 41, 65], credit evaluation [15], privacy issues [38, 54], criminal event prediction [71], customer behavior analysis [32], and fraud detection [39]. Despite its wide application in practice, ARM is often criticized for its dependence on the frequency of transactions and strength of affinity between items while paying scant attention to the business significance and impact of quantitative patterns [21, 72].
Data Science Methodologies: Current Challenges and Future Approaches
2021, Big Data ResearchThe evolving role of artificial intelligence in marketing: A review and research agenda
2021, Journal of Business ResearchDirect marketing campaigns in retail banking with the use of deep learning and random forests
2019, Expert Systems with ApplicationsIntelligent video surveillance beyond robust background modeling
2018, Expert Systems with Applications
Dr. YongSeog Kim is an Assistant Professor in Business Information Systems department at the Utah State University. He received his MS degree in Computer Science and PhD in Business Administration from the University of Iowa. Dr. Kim's primary research area is data mining including feature selection, clustering, ensemble methods, streaming data analysis, and spatial and temporal data analysis. Recently, he becomes interested in applying data mining algorithms to solve some business problems in customer targeting, e-commerce, and computational finance and economics. His other research interests include digital government, electronic books, and computer–human interface.
Dr. Nick Street is an Associate Professor in the Management Sciences Department at the University of Iowa. He received a PhD in 1994 in Computer Sciences from the University of Wisconsin. His research interests are in machine learning and data mining, particularly the use of mathematical optimization in inductive learning techniques. His recent work has focused on dimensionality reduction (feature selection) in high-dimensional data for both classification and clustering, ensemble prediction methods for massive and streaming data sets, and learning shapes for image segmentation, classification, and retrieval. He has received an NSF CAREER award and an NIH INRSA postdoctoral fellowship.