An intelligent system for customer targeting: a data mining approach

doi:10.1016/S0167-9236(03)00008-3

Decision Support Systems

Volume 37, Issue 2, May 2004, Pages 215-228

https://doi.org/10.1016/S0167-9236(03)00008-3 Get rights and content

Abstract

We propose a data mining approach for market managers that uses artificial neural networks (ANNs) guided by genetic algorithms (GAs). Our predictive model allows the selection of an optimal target point where expected profit from direct mailing is maximized. Our approach also produces models that are easier to interpret by using a smaller number of predictive features. Through sensitivity analysis, we also show that our chosen model significantly outperforms the baseline algorithms in terms of hit rate and expected net profit on key target points.

Introduction

The ultimate goal of decision support systems is to provide managers with information that is useful for understanding various managerial aspects of a problem and to choose a best solution among many alternatives. In this paper, we focus on a very specific decision support system on behalf of market managers who want to develop and implement efficient marketing programs by fully utilizing a customer database. This is important because, due to the growing interest in micro-marketing, many firms devote considerable resources to identifying households that may be open to targeted marketing messages. This becomes more critical through the easy availability of data warehouses combining demographic, psychographic and behavioral information.

Both the marketing [8], [19], [33] and data-mining communities [4], [32], [27], [13] have presented various database-based approaches for direct marketing. A good review of how data mining can be integrated into a knowledge-based marketing can be found in [41]. Traditionally, the optimal selection of mailing targets has been considered one of the most important factors for direct marketing to be successful. Thus, many models aim to identify as many customers as possible who will respond to a specific solicitation campaign letter, based on the customer's estimated probability of responding to marketing program.

This problem becomes more complicated when the interpretability of the model is important. For example, in database marketing applications, it is critical for managers to understand the key drivers of consumer response. A predictive model that is essentially a “black box” is not useful for developing comprehensive marketing strategies. At the same time, a rule-based system that consists of too many if-then statements can make it difficult for users to identify the key drivers. Note that two principal goals, model interpretability and predictive accuracy, can be in conflict.

Another important but often neglected aspect of models is the decision support function that helps market managers make strategic marketing plans. For example, market managers want to know how many customers should be targeted to maximize the expected net profit or increase market share while at least recovering the operational costs of a specific campaign. In order to attain this goal, market managers need a sensitivity analysis that shows how the value of the objective function (e.g., the expected net profit from the campaign) changes as campaign parameters vary (e.g., the campaign scope measured by the number of customers targeted).

In this paper, we propose a data-mining approach to building predictive models that satisfies these requirements efficiently and effectively. First, we show how to build predictive models that combine artificial neural networks (ANNs) [37] with genetic algorithms (GAs) [18] to help market managers identify prospective households. ANNs have been used in other marketing applications such as customer clustering [16], [1] and market segmentation [21], [2]. We use ANNs to identify optimal campaign targets based on each individual's likelihood of responding to campaign message positively. This can be done by learning linear or possibly nonlinear relationships between given input variables and the response indicator. We go one step further from this traditional approach. Because we are also interested in isolating key determinants of customer response, we select different subsets of variables using GAs and use only those selected variables to train different ANNs.

GAs have become a very powerful tool in finance, economics, accounting, operations research, and other fields as an alternative to hill-climbing search algorithms. This is mainly because those heuristic algorithms might lead to a local optimum, while GAs are more likely to avoid local optima by evaluating multiple solutions simultaneously and adjusting their search bias toward more promising areas. Further, GAs have been known to have superior performance to other search algorithms for data sets with high dimensionality [28].

Second, we demonstrate through a sensitivity analysis that our approach can be used to determine the scope of marketing campaign given marginal revenue per customer and marginal cost per campaign mail. This can be a very useful tool for market managers who want to assess the impacts of various factors such as mailing cost and limited campaign budget on the outcomes of marketing campaign.

Finally, we enhance the interpretability of our model by reducing the dimensionality of data sets. Traditionally, feature extraction algorithms including principal component analysis (PCA) have been often used for this purpose. However, PCA is not appropriate when the ultimate goal is not only to reduce the dimensionality, but also to obtain highly accurate predictive models. This is because PCA does not take into account the relationship between dependant and other input variables in the process of data reduction. Further, the resulting principal components from PCA can be difficult to interpret when the space of input variables is huge.

Data reduction is performed via feature selection in our approach. Feature selection is defined as the process of choosing a subset of the original predictive variables by eliminating features that are either redundant or possess little predictive information. If we extract as much information as possible from a given data set while using the smallest number of features, we cannot only save a great amount of computing time and cost, but also build a model that generalizes better to households not in the test mailing. Feature selection can also significantly improve the comprehensibility of the resulting classifier models. Even a complicated model—such as a neural network—can be more easily understood if constructed from only a few variables.

Our methodology exploits the desirable characteristics of GAs and ANNs to achieve two principal goals of household targeting at a specific target point: model interpretability and predictive accuracy. A standard GA is used to search through the possible combinations of features. The input features selected by GA are used to train ANNs. The trained ANN is tested on an evaluation set, and a proposed model is evaluated in terms of two quality measurements—cumulative hit rate (which is maximized) and complexity (which is minimized). We define the cumulative hit rate as the ratio of the number of actual customers identified out of the total number of actual customers in a data set. This process is repeated many times as the algorithm searches for a desirable balance between predictive accuracy and model complexity. The result is a highly accurate predictive model that uses only a subset of the original features, thus simplifying the model and reducing the risk of overfitting. It also provides useful information on reducing future data collection costs.

In order to help market managers determine the campaign scope, we run the GA/ANN model repeatedly over different target points to obtain local solutions. A local solution is a predictive feature subset with the highest fitness value at a specific target point. At a target point i where 0≤i≤100, our GA/ANN model searches for a model that is optimal when the best i% of customers in a new data set is targeted based on the estimated probability of responding to the marketing campaign. Once we obtain local solutions, we combine them into an Ensemble, a global solution that is used to choose the best target point. Note that our Ensemble model is different from popular ensemble algorithms such as Bagging [7] and Boosting [15] that combine the predictions of multiple models by voting. Each local solution in our Ensemble model scores and selects prospects at a specific target point independently of other local solutions. Finally, in order to present the performance of local solutions and an Ensemble, we use a lift curve that shows the relationship between target points and corresponding cumulative hit rate.

This paper is organized as follows. In Section 2, we explain GAs for feature selection in detail, and motivate the use of a GA to search for the global optimum. In Section 3, we describe the structure of the GA/ANN model, and review the feature subset selection procedure. In Section 4, we present experimental results of both the GA/ANN model and a single ANN with the complete set of features. In particular, a global solution is constructed by incorporating the local solutions obtained over various target points. We show that such a model can be used to help market managers determine the best target point where the expected profit is maximized. In Section 5, we review related work for direct marketing from both the marketing and data mining communities. Section 6 concludes the paper and provides suggestions for future research directions.

Section snippets

Genetic algorithms for feature selection

A genetic algorithm (GA) is a parallel search procedure that simulates the evolutionary process by applying genetic operators. We provide a simple introduction to a standard GA in this section. More extensive discussions on GAs can be found in [18].

Since [18], various types of GAs have been used for many different applications. However, many variants still share common characteristics. Typically, a GA starts with and maintains a population of chromosomes that correspond to solutions to the

GA/ANN model for customer targeting

Our predictive model of household buying behavior is a hybrid of the GA and ANN procedures. In our approach, the GA identifies relevant consumer descriptors that are used by the ANN to forecast consumer choice given a specific target point. Our final solution, Ensemble, consists of multiple local solutions each of which is an optimized solution at a specific target point. In this section, we present the structure of our GA/ANN model and the evaluation criteria used to select an appropriate

Application

The new GA/ANN methodology is applied to the prediction of households interested in purchasing an insurance policy for recreational vehicles. To benchmark the new procedure, we contrast the predictive performance of Ensemble to a single ANN with the complete set of features. We do not compare our approach to a standard logit regression model because a logit regression model is a special case of single ANN with one hidden node.

Related work

Various multivariate statistical and analytical techniques in marketing community have been applied to the database marketing problem. Routine mailings to existing customers are typically based upon the RFM (recency, frequency, monetary) approach that targets households using knowledge of the customer's purchase history [40]. However, the RFM model has major disadvantages including its limited applicability to current customers only and redundancy because of inter-dependency among RFM

Conclusion

In this paper, we presented a novel approach for customer targeting in database marketing. We used a genetic algorithm to search for possible combinations of features and an artificial neural network to score customers. One of the clear strengths of the GA/ANN approach is its ability to construct predictive models that reflect the direct marketer's decision process. In particular, with information of campaign costs and profit per additional actual customer, we show that our system not only

Acknowledgements

The authors wish to thank Peter van der Putten and Maarten van Someren for making the CoIL data available for this paper. This work is partially supported by NSF grant IIS-99-96044.

Dr. YongSeog Kim is an Assistant Professor in Business Information Systems department at the Utah State University. He received his MS degree in Computer Science and PhD in Business Administration from the University of Iowa. Dr. Kim's primary research area is data mining including feature selection, clustering, ensemble methods, streaming data analysis, and spatial and temporal data analysis. Recently, he becomes interested in applying data mining algorithms to solve some business problems in

References (43)

S.C. Ahalt et al.
Competitive learning algorithms for vector quantization
Neural Networks
(1990)
P.V.S. Balakrishnan et al.
Comparative performance of the FSCL neural net and K-means algorithm for market segmentation
European Journal of Operation Research
(1996)
H. Hruschka et al.
Comparing performance of feedforward neural nets and K-means for market segmentation
European Journal of Operational Research
(1999)
W.A. Kamakura et al.
Applying latent trait analysis in the evaluation of prospects for cross-selling of financial services
M. Kudo et al.
Comparison of algorithms that select features for pattern classifiers
Pattern Recognition
(2000)
M. Riedmiller
Advanced supervised learning in multi-layer perceptrons—from backpropagation to adaptive learning algorithms
International Journal of Computer Standards and Interfaces
(1994)
M.J. Shaw et al.
Knowledge management and data mining for marketing
Decision Support Systems
(2001)
J. Banslaben
Predictive modeling
S. Bhattacharyya
Direct marketing response models using genetic algorithms
S. Bhattacharyya
Evolutionary algorithms in data mining: multi-objective performance modeling for direct marketing

G.R. Bitran et al.

Mailing decisions in the catalog sales industry

Management Science

(1996)

L. Breiman

Bagging predictors

Machine Learning

(1996)

J.R. Bult et al.

Optimal selection for direct mail

Marketing Science

(1995)

P.K. Chan et al.

Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection

S. Chen et al.

Non-standard crossover for a standard representation—commonality-based feature subset selection

P.B. Chou et al.

Identifying prospective customers

F. Coetzee et al.

Feature selection in web applications using ROC inflections

P. Domingos et al.

Mining the network value of customers

T. Fawcett et al.

Combining data mining and machine learning for effective user profiling

Y. Freund et al.

Experiments with a new boosting algorithm

I. Gath et al.

Unsupervised optimal fuzzy clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1988)

Cited by (131)

Predicting millionaires from Ethereum transaction histories using node embeddings and artificial neural nets
2023, Expert Systems with Applications
We demonstrate the possibility of predicting the millionaire status of an externally-owned Ethereum account based on information queried out of the Blockchain and features describing its role in the network based on node embedding techniques. These features are fed into artificial neural nets and used for status prediction within a systematic model selection process. We report key methodological insights about training and comparing cross-validated deep and shallow neural networks that yield up to 82 percent testset accuracy in predicting the millionaire status of account users.
From free to fee: Monetizing digital content through expected utility-based recommender systems
2022, Information and Management
Citation Excerpt :
The ARM approach aims specifically to gauge the extent to which the sale of one item influences the sale of another, determining the relational strength of the two items based on the frequency with which they are sold simultaneously in one transaction. The ARM method has been widely applied and utilized as a key analytical tool in numerous business contexts, including marketing promotion, cross-selling [6, 17, 28], website analysis [5, 52, 60], decision support [31, 41, 65], credit evaluation [15], privacy issues [38, 54], criminal event prediction [71], customer behavior analysis [32], and fraud detection [39]. Despite its wide application in practice, ARM is often criticized for its dependence on the frequency of transactions and strength of affinity between items while paying scant attention to the business significance and impact of quantitative patterns [21, 72].
This study proposes a novel framework for designing business rule analytics to assist businesses offering digital content in effectively converting free-only users (FOUs) into paying customers. Based on the theory of expected utility, we expand upon traditional frequency-driven rule analytics by integrating three business-relevant factors (target size, conversion profit, and conversion likelihood) into the process of generating recommendations for FOUs in digital content markets. The framework was tested using two different types of empirical analysis. We conducted a field experiment collaborating with a nationwide e-book store to determine how FOUs responded to the recommendations generated under the proposed framework. Furthermore, we analyzed over 5 million transactions collected from the e-book seller and a mobile application provider to examine the impact of customer segmentation on the effectiveness of our approach. Our findings suggest that business analytics derived from the utility-based mechanisms can significantly enhance digital content providers' business performance.
Data Science Methodologies: Current Challenges and Future Approaches
2021, Big Data Research
Data science has employed great research efforts in developing advanced analytics, improving data models and cultivating new algorithms. However, not many authors have come across the organizational and socio-technical challenges that arise when executing a data science project: lack of vision and clear objectives, a biased emphasis on technical issues, a low level of maturity for ad-hoc projects and the ambiguity of roles in data science are among these challenges. Few methodologies have been proposed on the literature that tackle these type of challenges, some of them date back to the mid-1990, and consequently they are not updated to the current paradigm and the latest developments in big data and machine learning technologies. In addition, fewer methodologies offer a complete guideline across team, project and data & information management. In this article we would like to explore the necessity of developing a more holistic approach for carrying out data science projects. We first review methodologies that have been presented on the literature to work on data science projects and classify them according to the their focus: project, team, data and information management. Finally, we propose a conceptual framework containing general characteristics that a methodology for managing data science projects with a holistic point of view should have. This framework can be used by other researchers as a roadmap for the design of new data science methodologies or the updating of existing ones.
The evolving role of artificial intelligence in marketing: A review and research agenda
2021, Journal of Business Research
An increasing amount of research on Intelligent Systems/Artificial Intelligence (AI) in marketing has shown that AI is capable of mimicking humans and performing activities in an ‘intelligent’ manner. Considering the growing interest in AI among marketing researchers and practitioners, this review seeks to provide an overview of the trajectory of marketing and AI research fields. Building upon the review of 164 articles published in Web of Science and Scopus indexed journals, this article develops a context-specific research agenda. Our study of selected articles by means of Multiple Correspondence Analysis (MCA) procedure outlines several research avenues related to the adoption, use, and acceptance of AI technology in marketing, the role of data protection and ethics, the role of institutional support for marketing AI, as well as the revolution of the labor market and marketers’ competencies.
Direct marketing campaigns in retail banking with the use of deep learning and random forests
2019, Expert Systems with Applications
Credit products are a crucial part of business of banks and other financial institutions. A novel approach based on time series of customer’s data representation for predicting willingness to take a personal loan is shown. Proposed testing procedure based on moving window allows detection of complex, sequential, time based dependencies between particular transactions. Moreover, this approach reduces noise by eliminating irrelevant dependencies that would occur due to the lack of time dimension analysis.
The system for identifying customers interested in credit products, based on classification with random forests and deep neural networks is proposed. The promising results of empirical studies prove that the system is able to extract significant patterns from customers historical transfer and transactional data and predict credit purchase likelihood. Our approach, including the testing method, is not limited to banking sector and can be easily transferred and implemented as a general purpose direct marketing campaign system.
Intelligent video surveillance beyond robust background modeling
2018, Expert Systems with Applications
The increasing number of video surveillance cameras is challenging video control systems. Monitoring centers require tools to guide the process of supervision. Different video analysis methods have effectively met the main requirements from the industry of perimeter protection. High accuracy detection systems are able to process real time video on affordable hardware. However some problematic environments cause a massive number of false alerts. Many approaches in the literature do not consider this kind of environments while others use metrics that dilute their impact on results. An intelligent video solution for perimeter protection must select and show the cameras which are more likely witnessing a relevant event but systems based only on background modeling tend to give importance to problematic situations no matter if an intrusion is taking place or not. We propose to add a module based on machine learning and global features, bringing adaptability to the video surveillance solution, so that problematic situations can be recognized and given the right priority. Tests with thousands of hours of video show how good an intruder detector can perform but also how a simple fault in a camera can flood a monitoring center with alerts. The new proposal is able to learn and recognize events such that alerts from problematic environments can be properly handled.

View all citing articles on Scopus

Dr. Nick Street is an Associate Professor in the Management Sciences Department at the University of Iowa. He received a PhD in 1994 in Computer Sciences from the University of Wisconsin. His research interests are in machine learning and data mining, particularly the use of mathematical optimization in inductive learning techniques. His recent work has focused on dimensionality reduction (feature selection) in high-dimensional data for both classification and clustering, ensemble prediction methods for massive and streaming data sets, and learning shapes for image segmentation, classification, and retrieval. He has received an NSF CAREER award and an NIH INRSA postdoctoral fellowship.

View full text