Abstract
This paper presents a novel embedded feature selection approach for Support Vector Machines (SVM) in a choice-based conjoint context. We extend the L1-SVM formulation and adapt the RFE-SVM algorithm to conjoint analysis to encourage sparsity in consumer preferences. This sparsity can be attributed to consumers being selective about the attributes they consider when evaluating alternatives in choice tasks. Given limited individual data in choice-based conjoint, we control for heterogeneity by pooling information across consumers and shrinking the individual weights of the relevant attributes towards a population mean. We tested our approach through an extensive simulation study that shows that the proposed approach can capture the sparseness implied by irrelevant attributes. We also illustrate the characteristics and use of our approach on two real-world choice-based conjoint data sets. The results show that the proposed method has better predictive accuracy than competitive approaches, and that it provides additional information at an individual level. Implications for product design decisions are discussed.
Similar content being viewed by others
References
Abernethy J, Evgeniou T, Toubia O, Vert J (2008) Eliciting consumer preferences using robust adaptive choice questionnaires. IEEE Trans Knowl Data Eng 20(2):145–155
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Arora N, Huber J (2001) Improving parameter estimates and model prediction by aggregate customization in choice experiments. J Consum Res 28:273–283
Bi J, Bennett K, Embrechts M, Breneman C, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97 (1-2):245–271
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Shavlik J (ed) Proceedings of the Fifteenth International Conference on Machine Learning (ICML’98), Morgan Kaufmann, San Francisco, California, pp 82–90
Cerrada M, Sánchez R V, Pacheco F, Cabrera D, Zurita G, Li C (2016) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl Intell 44(3):687–703
Chapelle O, Harchaoui Z (2005) A machine learning approach to conjoint analysis. Adv Neural Inf Proces Syst 17:257–264
Cui D, Curry D (2005) Prediction in marketing using the support vector machine. Mark Sci 24(4):595–615
Djuric N, Lan L, Vucetic S, Wang Z (2013) Budgetedsvm: A toolbox for scalable svm approximations. J Mach Learn Res 14:3813–3817
Evgeniou T, Boussios C, Zacharia G (2005) Generalized robust conjoint estimation. Mark Sci 24 (3):415–429
Evgeniou T, Pontil M, Toubia O (2007) A convex optimization approach to modeling heterogeneity in conjoint estimation. Mark Sci 26(6):805–818
Gao S, Ye Q, Ye N (2011) 1-norm least squares twin support vector machines. Neurocomputing 74 (17):35903597
Gelman A, Pardoe I (2006) Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics 48(2):241–251
Green P E, Rao V R (1971) Conjoint measurement for quantifying judgmental data. J Mark Res 8:355–363
Green P E, Krieger A M, Wind Y (2001) Thirty years of conjoint analysis: Reflections and prospects. Interfaces 31(3):S56–S73
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Gunn S, Nikravesh M, Zadeh L A (2006) Feature extraction foundations and applications. Springer, Berlin
Hensher D A, Rose J M, Greene W H (2012) Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design. Transportation 39 (2):235–245
Hsu C W, Chang CC, Lin C (2010) A practical guide to support vector classification
Le Thi H, Pham Dinh T, Thiao M (2016) Efficient approaches for l2-l0 regularization and applications to feature selection in svm. Applied Intelligence In press 45(2):549–565
Maldonado S, Weber R, Basak J (2011) Kernel-penalized svm for feature selection. Inf Sci 181(1):115–128
Maldonado S, Flores A, Verbraken T, Baesens R B W (2015a) Profit-based feature selection using support vector machines - general framework and an application for customer churn prediction. Appl Soft Comput 35:740–748
Maldonado S, Montoya R, Weber R (2015b) Advanced conjoint analysis using feature selection via support vector machines. Eur J Oper Res 241(2):564–574
Orme B (2005) The cbc/hb system for hierarchical bayes estimation
Pan X, Xu Y (2016) Two effective sample selection methods for support vector machine. J Intell Fuzzy Syst 30:659–670
Rao V R (2014) Applied conjoint analysis. Springer
Rossi P E, Allenby G M, McCulloch R (2005) Bayesian statistics and marketing. Wiley, New York
Shen Q, Jensen R (2008) Approximation-based feature selection and application for algae population estimation. Appl Intell 28(2):167–181
Toubia O, Evgeniou T, Hauser J (2007a) Optimization-based and machine-learning methods for conjoint analysis: Estimation and question design. Conjoint Measurement p 231
Toubia O, Hauser J, Garcia R (2007b) Probabilistic polyhedral methods for adaptive choice-based conjoint anaysis. Mark Sci 26(5):596–610
Tsai H C, Hsiao S W (2004) Evaluation of alternatives for product customization using fuzzy logic. Inf Sci 158:233–262
Vapnik V, Chervonenkis A (1991) The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit Image Anal 1(3):283–305
Weston J, Elisseeff A, BakIr G, Sinz F (2005) The spider machine learning toolbox. Software available at http://www.kyb.tuebingen.mpg.de/bs/people/spider/
Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-norm support vector machines. In: Neural Information Processing Systems, MIT Press, pp 16–23
Acknowledgments
The authors thank Olivier Toubia and Bryan Orme for providing the data for the two empirical applications. The first author was supported by FONDECYT projects 1140831 and 1160738. The second author was supported by FONDECYT project 1151395. The third author was supported by FONDECYT project 1160894 and CONICYT Anillo ACT1106. This research was partially funded by the Complex Engineering Systems Institute, ISCI (ICM-FIC: P05-004-F, CONICYT: FB0816).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: HB mixed logit estimation
1.1 Prior and full conditional distributions
We denote by 𝜃 i the set of random-effect parameters.
1.2 Priors
Random-effect parameters 𝜃 i
1.3 Likelihood
where P(data|{𝜃 i }) corresponds to the Multinomial Logit model.
1.4 Full conditionals
where
The MCMC procedure generates a sequence of draws from the posterior distribution of the model’s parameters. Since the full conditionals for 𝜃 i do not have a closed form, the Metropolis-Hastings (M-H) algorithm is used to draw the samples. In particular, we use a Gaussian random-walk M-H where the proposal vector of parameters φ (t) for 𝜃 i at iteration t is drawn from N(φ (t−1),σ 2Δ) and accepted using the M-H acceptance ratio. The tuning parameters σ and Δ are chosen adaptively to yield an acceptance rate of approximately 20 %.
We use the following uninformative prior hyperparameters: μ 0=0, V 0=103 I N 𝜃×N 𝜃 , d f 0 = N 𝜃+5, S 0 = d f 0 C, where N is the number of individuals, and C is an N 𝜃×N 𝜃 matrix with 2 on the diagonal and 1 off the diagonal for the levels of each attribute. We assume that the parameters are a priori uncorrelated across attributes (see e.g. [25]).
Appendix A: HB mixed logit estimation
In the proposed models, three parameters need to be calibrated: regularization parameter C, threshold 𝜖, and shrinkage 𝜃. We analyze how the performance of each model varies as a function of each parameter. For illustration purposes, we show the procedure used for the Camera data set. Similar analyses were conducted for the other data sets. Our goal was to assess whether the results are stable along different values of these parameters. A less rigorous validation strategy can be used in such a case. In contrast, a high variance in the performance requires an exhaustive model selection procedure such as LOOCV in order to find the best combination of parameters.
Figure 1 depicts the LOOCV hit rates as a function of C, 𝜖, and 𝜃 for the proposed feature selection approach.
Figure 1 reveals the influence of parameters C, 𝜖, and 𝜃 in the predictive performance (Leave-one-out validation hit rate). Results are relatively stable for small values of 𝜃 and 𝜖, and values of C around the unit, although we observe an important influence of these parameters in the final outcome of the proposed method.
Performing an adequate grid search is highly recommended, varying the parameters C, 𝜖, and 𝜃 along the suggested values in order to obtain the desired results. Additionally, the fact that the optimal values for these parameters are always above zero confirms the importance of feature selection and shrinkage to control for potential overfitting when a relatively small number of respondents is present.
Rights and permissions
About this article
Cite this article
Maldonado, S., Montoya, R. & López, J. Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty. Appl Intell 46, 775–787 (2017). https://doi.org/10.1007/s10489-016-0852-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0852-5