Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty

Maldonado, Sebastián; Montoya, Ricardo; López, Julio

doi:10.1007/s10489-016-0852-5

Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty

Published: 10 November 2016

Volume 46, pages 775–787, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Sebastián Maldonado¹,
Ricardo Montoya² &
Julio López³

525 Accesses
6 Citations
Explore all metrics

Abstract

This paper presents a novel embedded feature selection approach for Support Vector Machines (SVM) in a choice-based conjoint context. We extend the L1-SVM formulation and adapt the RFE-SVM algorithm to conjoint analysis to encourage sparsity in consumer preferences. This sparsity can be attributed to consumers being selective about the attributes they consider when evaluating alternatives in choice tasks. Given limited individual data in choice-based conjoint, we control for heterogeneity by pooling information across consumers and shrinking the individual weights of the relevant attributes towards a population mean. We tested our approach through an extensive simulation study that shows that the proposed approach can capture the sparseness implied by irrelevant attributes. We also illustrate the characteristics and use of our approach on two real-world choice-based conjoint data sets. The results show that the proposed method has better predictive accuracy than competitive approaches, and that it provides additional information at an individual level. Implications for product design decisions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Beatriz Flamia Azevedo, Ana Maria A. C. Rocha & Ana I. Pereira

References

Abernethy J, Evgeniou T, Toubia O, Vert J (2008) Eliciting consumer preferences using robust adaptive choice questionnaires. IEEE Trans Knowl Data Eng 20(2):145–155
Article Google Scholar
Argyriou A, Evgeniou T, Pontil M (2008) Convex multi-task feature learning. Mach Learn 73(3):243–272
Article Google Scholar
Arora N, Huber J (2001) Improving parameter estimates and model prediction by aggregate customization in choice experiments. J Consum Res 28:273–283
Article Google Scholar
Bi J, Bennett K, Embrechts M, Breneman C, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
MATH Google Scholar
Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97 (1-2):245–271
Article MathSciNet MATH Google Scholar
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Shavlik J (ed) Proceedings of the Fifteenth International Conference on Machine Learning (ICML’98), Morgan Kaufmann, San Francisco, California, pp 82–90
Cerrada M, Sánchez R V, Pacheco F, Cabrera D, Zurita G, Li C (2016) Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl Intell 44(3):687–703
Article Google Scholar
Chapelle O, Harchaoui Z (2005) A machine learning approach to conjoint analysis. Adv Neural Inf Proces Syst 17:257–264
Google Scholar
Cui D, Curry D (2005) Prediction in marketing using the support vector machine. Mark Sci 24(4):595–615
Article Google Scholar
Djuric N, Lan L, Vucetic S, Wang Z (2013) Budgetedsvm: A toolbox for scalable svm approximations. J Mach Learn Res 14:3813–3817
MathSciNet MATH Google Scholar
Evgeniou T, Boussios C, Zacharia G (2005) Generalized robust conjoint estimation. Mark Sci 24 (3):415–429
Article Google Scholar
Evgeniou T, Pontil M, Toubia O (2007) A convex optimization approach to modeling heterogeneity in conjoint estimation. Mark Sci 26(6):805–818
Article Google Scholar
Gao S, Ye Q, Ye N (2011) 1-norm least squares twin support vector machines. Neurocomputing 74 (17):35903597
Google Scholar
Gelman A, Pardoe I (2006) Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics 48(2):241–251
Article MathSciNet Google Scholar
Green P E, Rao V R (1971) Conjoint measurement for quantifying judgmental data. J Mark Res 8:355–363
Article Google Scholar
Green P E, Krieger A M, Wind Y (2001) Thirty years of conjoint analysis: Reflections and prospects. Interfaces 31(3):S56–S73
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Guyon I, Gunn S, Nikravesh M, Zadeh L A (2006) Feature extraction foundations and applications. Springer, Berlin
Book MATH Google Scholar
Hensher D A, Rose J M, Greene W H (2012) Inferring attribute non-attendance from stated choice data: implications for willingness to pay estimates and a warning for stated choice experiment design. Transportation 39 (2):235–245
Article Google Scholar
Hsu C W, Chang CC, Lin C (2010) A practical guide to support vector classification
Le Thi H, Pham Dinh T, Thiao M (2016) Efficient approaches for l2-l0 regularization and applications to feature selection in svm. Applied Intelligence In press 45(2):549–565
Article Google Scholar
Maldonado S, Weber R, Basak J (2011) Kernel-penalized svm for feature selection. Inf Sci 181(1):115–128
Article Google Scholar
Maldonado S, Flores A, Verbraken T, Baesens R B W (2015a) Profit-based feature selection using support vector machines - general framework and an application for customer churn prediction. Appl Soft Comput 35:740–748
Maldonado S, Montoya R, Weber R (2015b) Advanced conjoint analysis using feature selection via support vector machines. Eur J Oper Res 241(2):564–574
Orme B (2005) The cbc/hb system for hierarchical bayes estimation
Pan X, Xu Y (2016) Two effective sample selection methods for support vector machine. J Intell Fuzzy Syst 30:659–670
Article Google Scholar
Rao V R (2014) Applied conjoint analysis. Springer
Rossi P E, Allenby G M, McCulloch R (2005) Bayesian statistics and marketing. Wiley, New York
Book MATH Google Scholar
Shen Q, Jensen R (2008) Approximation-based feature selection and application for algae population estimation. Appl Intell 28(2):167–181
Article Google Scholar
Toubia O, Evgeniou T, Hauser J (2007a) Optimization-based and machine-learning methods for conjoint analysis: Estimation and question design. Conjoint Measurement p 231
Toubia O, Hauser J, Garcia R (2007b) Probabilistic polyhedral methods for adaptive choice-based conjoint anaysis. Mark Sci 26(5):596–610
Tsai H C, Hsiao S W (2004) Evaluation of alternatives for product customization using fuzzy logic. Inf Sci 158:233–262
Article Google Scholar
Vapnik V, Chervonenkis A (1991) The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognit Image Anal 1(3):283–305
Google Scholar
Weston J, Elisseeff A, BakIr G, Sinz F (2005) The spider machine learning toolbox. Software available at http://www.kyb.tuebingen.mpg.de/bs/people/spider/
Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-norm support vector machines. In: Neural Information Processing Systems, MIT Press, pp 16–23

Download references

Acknowledgments

The authors thank Olivier Toubia and Bryan Orme for providing the data for the two empirical applications. The first author was supported by FONDECYT projects 1140831 and 1160738. The second author was supported by FONDECYT project 1151395. The third author was supported by FONDECYT project 1160894 and CONICYT Anillo ACT1106. This research was partially funded by the Complex Engineering Systems Institute, ISCI (ICM-FIC: P05-004-F, CONICYT: FB0816).

Author information

Authors and Affiliations

Facultad de Ingeniería y Ciencias Aplicadas, Universidad de los Andes, Monseñor Álvaro del Portillo, 12455, Las Condes, Santiago, Chile
Sebastián Maldonado
Department of Industrial Engineering, University of Chile, República 701, Santiago, Chile
Ricardo Montoya
Facultad de Ingeniería, Universidad Diego Portales, Ejército 441, Santiago, Chile
Julio López

Authors

Sebastián Maldonado
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Montoya
View author publications
You can also search for this author in PubMed Google Scholar
Julio López
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastián Maldonado.

Appendices

Appendix A: HB mixed logit estimation

1.1 Prior and full conditional distributions

We denote by 𝜃 _i the set of random-effect parameters.

1.2 Priors

Random-effect parameters 𝜃 _i

$$\begin{array}{@{}rcl@{}} \boldsymbol{\theta}_{i} &\sim& N(\boldsymbol{\mu}_{\theta},\mathbf{\Sigma}_{\theta}) \Rightarrow P(\boldsymbol{\theta}_{i}) \propto \exp\left( \frac{1}{2}(\boldsymbol{\theta}_{i}\,-\,\boldsymbol{\mu}_{\theta})^{\top}\mathbf{\Sigma}_{\theta}^{-1}(\boldsymbol{\theta}_{i} \,-\,\boldsymbol{\mu}_{\theta})\right) \\ \boldsymbol{\mu}_{\theta} &\sim& N(\boldsymbol{\mu}_{0},\mathbf{V}_{0}) \!\Rightarrow\! P(\boldsymbol{\mu}_{\theta}) \propto \exp\left( \frac{1}{2}(\boldsymbol{\mu}_{\theta}\,-\,\boldsymbol{\mu}_{0})^{\top}\mathbf{V}_{0}^{-1}(\boldsymbol{\mu}_{\theta} \,-\, \boldsymbol{\mu}_{0})\right) \\ \mathbf{\Sigma}_{\theta}^{-1} &\sim& W(df_{0},\mathbf{S}_{0}) \end{array} $$

1.3 Likelihood

$$\begin{array}{@{}rcl@{}} L(\text{data},\{\boldsymbol{\theta}_{i}\},\boldsymbol{\mu}_{\theta},\mathbf{\Sigma}_{\theta})\,=\,P(\text{data}|\{\boldsymbol{\theta}_{i}\}) P(\{\boldsymbol{\theta}_{i}\}|\boldsymbol{\mu}_{\theta},\mathbf{\Sigma}_{\theta})P(\boldsymbol{\mu}_{\theta})P(\mathbf{\Sigma}_{\theta}), \end{array} $$

where P(data|{𝜃 _i}) corresponds to the Multinomial Logit model.

1.4 Full conditionals

$$\begin{array}{@{}rcl@{}} P(\boldsymbol{\theta}_{i}|\boldsymbol{\mu}_{\theta},\mathbf{\Sigma}_{\theta},\text{data}_{i}) \propto \exp\left( \,-\,\frac{1}{2}(\boldsymbol{\theta}_{i}\,-\,\boldsymbol{\mu}_{\theta})^{\top}\mathbf{\Sigma}_{\theta}^{-1}(\boldsymbol{\theta}_{i} \,-\, \boldsymbol{\mu}_{\theta})\right)P(\text{data}_{i}|\boldsymbol{\theta}_{i}) \end{array} $$

$$\boldsymbol{\mu}_{\theta} \sim N(\boldsymbol{\mu}_{i},\mathbf{V}_{i}), \mathbf{\Sigma}_{\theta}^{-1} \sim W(df_{1},\mathbf{S}_{1}) $$

where

$$\begin{array}{@{}rcl@{}} \mathbf{V}_{i}^{-1}&=&[\mathbf{V}_{0}^{-1}+N\mathbf{\Sigma}_{\theta}^{-1}]\\ \boldsymbol{\mu}_{i}&=&\mathbf{V}_{i}[\boldsymbol{\mu}_{0}\mathbf{V}_{0}^{-1}+N\bar{\theta}\mathbf{\Sigma}_{\theta}^{-1}]\\ df_{1} &=&df_{0}+N\\ \mathbf{S}_{1} &=& \sum\limits_{i=1}^{N}(\boldsymbol{\theta}_{i}-\boldsymbol{\mu}_{\theta})(\boldsymbol{\theta}_{i}-\boldsymbol{\mu}_{\theta})^{\top} +\mathbf{S}_{0}^{-1}. \end{array} $$

The MCMC procedure generates a sequence of draws from the posterior distribution of the model’s parameters. Since the full conditionals for 𝜃 _i do not have a closed form, the Metropolis-Hastings (M-H) algorithm is used to draw the samples. In particular, we use a Gaussian random-walk M-H where the proposal vector of parameters φ ^(t) for 𝜃 _i at iteration t is drawn from N(φ ^(t−1),σ ²Δ) and accepted using the M-H acceptance ratio. The tuning parameters σ and Δ are chosen adaptively to yield an acceptance rate of approximately 20 %.

We use the following uninformative prior hyperparameters: μ ₀=0, V ₀=10³ I _{N
𝜃×N
𝜃}, d f ₀ = N 𝜃+5, S ₀ = d f ₀ C, where N is the number of individuals, and C is an N 𝜃×N 𝜃 matrix with 2 on the diagonal and 1 off the diagonal for the levels of each attribute. We assume that the parameters are a priori uncorrelated across attributes (see e.g. [25]).

Appendix A: HB mixed logit estimation

In the proposed models, three parameters need to be calibrated: regularization parameter C, threshold 𝜖, and shrinkage 𝜃. We analyze how the performance of each model varies as a function of each parameter. For illustration purposes, we show the procedure used for the Camera data set. Similar analyses were conducted for the other data sets. Our goal was to assess whether the results are stable along different values of these parameters. A less rigorous validation strategy can be used in such a case. In contrast, a high variance in the performance requires an exhaustive model selection procedure such as LOOCV in order to find the best combination of parameters.

Figure 1 depicts the LOOCV hit rates as a function of C, 𝜖, and 𝜃 for the proposed feature selection approach.

Figure 1 reveals the influence of parameters C, 𝜖, and 𝜃 in the predictive performance (Leave-one-out validation hit rate). Results are relatively stable for small values of 𝜃 and 𝜖, and values of C around the unit, although we observe an important influence of these parameters in the final outcome of the proposed method.

Performing an adequate grid search is highly recommended, varying the parameters C, 𝜖, and 𝜃 along the suggested values in order to obtain the desired results. Additionally, the fact that the optimal values for these parameters are always above zero confirms the importance of feature selection and shrinkage to control for potential overfitting when a relatively small number of respondents is present.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maldonado, S., Montoya, R. & López, J. Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty. Appl Intell 46, 775–787 (2017). https://doi.org/10.1007/s10489-016-0852-5

Download citation

Published: 10 November 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10489-016-0852-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Feature selection techniques for machine learning: a survey of more than two decades of research

Hybrid approaches to optimization and machine learning methods: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: HB mixed logit estimation

1.1 Prior and full conditional distributions

1.2 Priors

1.3 Likelihood

1.4 Full conditionals

Appendix A: HB mixed logit estimation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Feature selection techniques for machine learning: a survey of more than two decades of research

Hybrid approaches to optimization and machine learning methods: a systematic literature review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: HB mixed logit estimation

1.1 Prior and full conditional distributions

1.2 Priors

1.3 Likelihood

1.4 Full conditionals

Appendix A: HB mixed logit estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation