skip to main content
10.1145/2835776.2835783acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Your Cart tells You: Inferring Demographic Attributes from Purchase Data

Published: 08 February 2016 Publication History

Abstract

Demographic attributes play an important role in retail market to characterize different types of users. Such signals however are often only available for a small fraction of users in practice due to the difficulty in manual collection process by retailers. In this paper, we aim to harness the power of big data to automatically infer users' demographic attributes based on their purchase data. Typically, demographic prediction can be formalized as a multi-task multi-class prediction problem, i.e., multiple demographic attributes (e.g., gender, age and income) are to be inferred for each user where each attribute may belong to one of N possible classes (N-2). Most previous work on this problem explores different types of features and usually predicts different attributes independently. However, modeling the tasks separately may lose the ability to leverage the correlations among different attributes. Meanwhile, manually defined features require professional knowledge and often suffer from under specification. To address these problems, we propose a novel Structured Neural Embedding (SNE) model to automatically learn the representations from users' purchase data for predicting multiple demographic attributes simultaneously. Experiments are conducted on a real-world retail dataset where five attributes (gender, marital status, income, age, and education level) are to be predicted. The empirical results show that our SNE model can improve the performance significantly compared with state-of-the-art baselines.

References

[1]
E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. J. Mach. Learn. Res., 1:113--141, Sept. 2001.
[2]
B. Bi, M. Shokouhi, M. Kosinski, and T. Graepel. Inferring the demographics of search users: Social data meets search queries. In Proceedings of the 22Nd International Conference on World Wide Web, WWW '13, pages 131--140, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee.
[3]
A. Culotta, N. R. Kumar, and J. Cutler. Predicting the demographics of twitter users from website traffic data. In Twenty-ninth National Conference on Artificial Intelligence (AAAI), 2015.
[4]
Y. Dong, Y. Yang, J. Tang, Y. Yang, and N. V. Chawla. Inferring user demographics and social strategies in mobile social networks. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 15--24, New York, NY, USA, 2014. ACM.
[5]
S. Duarte Torres and I. Weber. What and how children search on the web. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM '11, pages 393--402, New York, NY, USA, 2011. ACM.
[6]
P. Eckert. Gender and sociolinguistic variation. Readings in Language and Gender, 1997.
[7]
T. Evgeniou and M. Pontil. Regularized multi--task learning. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 109--117, New York, NY, USA, 2004. ACM.
[8]
A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. CoRR, abs/1303.5778, 2013.
[9]
J. Hu, H.-J. Zeng, H. Li, C. Niu, and Z. Chen. Demographic prediction based on user's browsing behavior. In Proceedings of the 16th International Conference on World Wide Web, WWW '07, pages 151--160, New York, NY, USA, 2007. ACM.
[10]
Y. Ji and S. Sun. Multitask multiclass support vector machines: Model and experiments. Pattern Recogn., 46(3):914--924, Mar. 2013.
[11]
K. Kalyanam and D. S. Putler. Incorporating demographic variables in brand choice models: An indivisible alternatives framework. Marketing Science, 16(2):166--181, May 1997.
[12]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.
[13]
D. S. M. Kosinski and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013.
[14]
C. Micchelli and M. Pontil. Kernels for multi-task learning. NIPS, 2005.
[15]
A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: Inferring user profiles in online social networks. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM '10, pages 251--260, New York, NY, USA, 2010. ACM.
[16]
A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 641--648, New York, NY, USA, 2007. ACM.
[17]
D. Murray and K. Durrell. Inferring demographic attributes of anonymus internet users. In Revised Papers from the International Workshop on Web Usage Analysis and User Profiling, WEBKDD '99, pages 7--20, London, UK, UK, 2000. Springer-Verlag.
[18]
S. Nowozin and C. H. Lampert. Structured learning and prediction in computer vision. Found. Trends. Comput. Graph. Vis., 6(3-4):185--365, Mar. 2011.
[19]
J. Otterbacher. Inferring gender of movie reviewers: Exploiting writing style, content and metadata. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, pages 369--378, New York, NY, USA, 2010. ACM.
[20]
T. M. Quoc V. Le. distributed representations of sentences and documents. The 31st International Conference on Machine Learning, 2014.
[21]
I. S. C. Rick L. Andrews. Identifying segments with identical choice behaviors across product categories: An intercategory logit mixture model. International Journal of Research in Marketing, 2002.
[22]
J. Schler, M. Koppel, S. Argamon, and J. W. Pennebaker. Effects of age and gender on blogging. In AAAI, 2006.
[23]
S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen. Social collaborative filtering for cold-start recommendations. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys '14, pages 345--348, New York, NY, USA, 2014. ACM.
[24]
K. C. G. C. J. D. Tomas Mikolov, Ilya Sutskever. Distributed representations of words and phrases and their compositionality. Conference on Neural Information Processing Systems 2013. Proceedings, pages 3111--3119, 2013.
[25]
Y. Yi Liu, Zheng. One-against-all multi-class svm classification using reliability measures. IEEE International Joint Conference on Neural Networks, 2005.
[26]
M.-L. Zhang and K. Zhang. Multi-label learning by exploiting label dependency. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '10, pages 999--1008, New York, NY, USA, 2010. ACM.
[27]
X. W. Zhao, Y. Guo, Y. He, H. Jiang, Y. Wu, and X. Li. We know what you want to buy: A demographic-based system for product recommendation on microblogs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, pages 1935--1944, New York, NY, USA, 2014. ACM.
[28]
E. Zhong, B. Tan, K. Mo, and Q. Yang. User demographics prediction based on mobile data. Pervasive Mob. Comput., 9(6):823--837, Dec. 2013.
[29]
Y. Zhong, N. J. Yuan, W. Zhong, F. Zhang, and X. Xie. You are where you go: Inferring demographic attributes from location check-ins. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM '15, pages 295--304, New York, NY, USA, 2015. ACM.
[30]
Y. J. Zhou J, Chen J. Clustered multi-task learning via alternating structure optimization. Advances in neural information processing systems., 2011.

Cited By

View all
  • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
  • (2024)Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization ApproachProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657822(448-457)Online publication date: 10-Jul-2024
  • (2024)User Profiling for Personalized Service Recommendation with Dual High-order Feature Learning2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00049(269-280)Online publication date: 7-Jul-2024
  • Show More Cited By

Index Terms

  1. Your Cart tells You: Inferring Demographic Attributes from Purchase Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
    February 2016
    746 pages
    ISBN:9781450337168
    DOI:10.1145/2835776
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. demographic attribute
    2. multitask multi-class prediction
    3. structured neural embedding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WSDM 2016
    WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
    February 22 - 25, 2016
    California, San Francisco, USA

    Acceptance Rates

    WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
    • (2024)Fair Recommendations with Limited Sensitive Attributes: A Distributionally Robust Optimization ApproachProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657822(448-457)Online publication date: 10-Jul-2024
    • (2024)User Profiling for Personalized Service Recommendation with Dual High-order Feature Learning2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00049(269-280)Online publication date: 7-Jul-2024
    • (2024)Social demographics imputation based on similarity in multi-dimensional activity-travel pattern: A two-step approachTravel Behaviour and Society10.1016/j.tbs.2024.10084337(100843)Online publication date: Oct-2024
    • (2023)Minimally supervised contextual inference from human mobilityProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/272(2450-2458)Online publication date: 19-Aug-2023
    • (2022)Representation Learning on Variable Length and Incomplete Wearable-Sensory Time SeriesACM Transactions on Intelligent Systems and Technology10.1145/353122813:6(1-21)Online publication date: 30-Sep-2022
    • (2022)Attribute Inference Based on User Similarity and Random Walk2022 IEEE International Conference on Services Computing (SCC)10.1109/SCC55611.2022.00040(215-220)Online publication date: Jul-2022
    • (2022)Prediction of the Impact of Mobile Payments on the Consumption of Individual Nodes on Communication Networks2022 IEEE 9th International Conference on Cyber Security and Cloud Computing (CSCloud)/2022 IEEE 8th International Conference on Edge Computing and Scalable Cloud (EdgeCom)10.1109/CSCloud-EdgeCom54986.2022.00040(186-194)Online publication date: Jun-2022
    • (2022)A Brief Survey on Privacy-Preserving Methods for Graph-Structured DataThe International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021)10.1007/978-981-16-6963-7_52(573-583)Online publication date: 3-Mar-2022
    • (2021)Estimating Multiple Socioeconomic Attributes via Home Location—A Case Study in ChinaJournal of Social Computing10.23919/JSC.2021.00032:1(71-88)Online publication date: Mar-2021
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media