Skip to main content
Log in

Cognitive computing for customer profiling: meta classification for gender prediction

  • Research Paper
  • Published:
Electronic Markets Aims and scope Submit manuscript

Abstract

Analyzing data from micro blogs is an increasingly interesting option for enterprises to learn about customer sentiments, public opinion, or unsatisfied needs. A better understanding of the underlying customer profiles (considering e.g. gender or age) can substantially enhance the economic value of the customer intimacy provided by this type of analytics. In a design science approach, we draw on information processing theory and meta machine learning to propose an extendable, cognitive classifier that, for profiling purposes, integrates and combines various isolated base classifiers. We evaluate its feasibility and the performance via a technical experiment, its suitability in a real use case, and its utility via an expert workshop. Thus, we augment the body of knowledge by a cognitive method that enables the integration of existing, as well as emerging customer profiling classifiers for an improved overall prediction performance. Specifically, we contribute a concrete classifier to predict the gender of German-speaking Twitter users. We enable enterprises to reap information from micro blog data to develop customer intimacy and to tailor individual offerings for smarter services.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. namedict.txt from https://heise.de/ct, softlink 0717182 by Jörg Michael, last accessed 15-11-2016

  2. Keywords (alphabetical, case-insensitive): bmw i3; e-tankstelle; eauto; ecar; egolf; electric mobility; electric vehicle; elektroauto; elektrofahrzeug; elektromobilitaet; elektromobilität; e-mobility; emobility; eup; fortwo electric drive; ladesaeule; ladesäule; miev; nissan leaf; opel ampera; peugeot ion; renault zoe; tesla model s.

  3. The expert workshop took place in October 2014 with four experts from the domain of e-mobility, as well as two research associates. The experts were identified by recent publications and activities in publicly-funded e-mobility research projects.

References

  • Allport, G. W., & Odbert, H. S. (1936). Trait-names: a psycho-lexical study. Psychological Monographs, 47, 171–220.

    Article  Google Scholar 

  • Alowibdi, J. S., Buy, U. a., & Yu, P. (2013). Language independent gender classification on Twitter. Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - ASONAM ‘13, (May), 739–743.

  • Argamon, S., Koppel, M., Pennebaker, J., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2), 119–123.

    Article  Google Scholar 

  • Arnold, K. A., & Bianchi, C. (2001). Relationship marketing, gender, and culture: implications for consumer behaviour. In C. G. Mary & M. L. Joan (Eds), Advances in consumer research (vol. 28, pp. 100–105). Valdosta: Association for Consumer Research.

    Google Scholar 

  • Arroju, M., Hassan, A., & Farnadi, G. (2015). Age, gender and personality recognition using tweets in a multilingual setting. In 6th Conference and Labs of the Evaluation Forum (CLEF 2015): Experimental IR meets multilinguality, multimodality, and interaction, Toulouse, France, pp. 23–31.

  • Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: A survey. Multimedia Systems (Vol. 16). https://doi.org/10.1007/s00530-010-0182-0.

  • Baird, C. H., & Parasnis, G. (2011). From social media to social customer relationship management. Strategy & Leadership, 39, 30–37.

    Article  Google Scholar 

  • Bergsma, S., Dredze, M., Van Durme, B., Wilson, T., & Yarowsky, D. (2013). Broadly improving user classification via communication-based name and location clustering on Twitter. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), Atlanta, USA, pp. 1010–1019.

  • Blair, D. C. (1979). Information retrieval, 2nd edn. Journal of the American Society for Information Science, 30(6), 374–375. https://doi.org/10.1002/asi.4630300621.

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(421), 123–140.

    Google Scholar 

  • Burger, J. D., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Association for Computational Linguistics, 146, 1301–1309.

    Google Scholar 

  • Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: a framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. https://doi.org/10.1016/S0022-5371(72)80001-X.

    Article  Google Scholar 

  • Cranshaw, J., Schwartz, R., Hong, J. I. & Sadeh, N. (2012). The livehoods project: utilizing social media to understand the dynamics of a city. In Proceedings of the 6th International Conference on Weblogs and Social Media (ICWSM’12), Dublin, Ireland, AAAI Press, pp. 58–65.

  • Dietterich, T. G. (1997). Machine-learning research. AI Magazine, 18(4), 97. https://doi.org/10.1609/aimag.v18i4.1324.

    Article  Google Scholar 

  • Džeroski, S., & Ženko, B. (2004). Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3), 255–273.

    Article  Google Scholar 

  • Estival, D., Gaustad, T., Pham, S. B., Radford, W., & Hutchinson, B. (2007). Author profiling for English emails. 10th conference of the Pacific Association for Computational Linguistics, 263–272.

  • European Commission. (2017). Reducing CO2 emissions from passenger car. Retrieved June 21, 2018, from http://ec.europa.eu/clima/policies/transport/vehicles/cars/index_en.htm.

  • Fischer, E., & Arnold, S. J. (1994). Sex, gender identity, gender role attitudes, and consumer behavior. Psychology and Marketing, 11, 163–182.

    Article  Google Scholar 

  • Gama, J., & Brazdil, P. (2000). Cascade Generalization. Machine Learning, 41(3), 315–343.

    Article  Google Scholar 

  • Giraud-Carrier, C., Giraud-Carrier, C., Vilalta, R., Vilalta, R., Brazdil, P., & Brazdil, P. (2004). Introduction to the special issue on Meta-learning. Machine Learning, 54, 187–193.

    Article  Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Retrieved June 21, 2018, from http://www.deeplearningbook.org/.

  • Gottipati S., Qiu M., Yang L., Zhu F., & Jiang J. (2014). An integrated model for user attribute discovery: a case study on political affiliation identification. In V. S. Tseng, T. B. Ho, Z. H. Zhou, A. L. P. Chen & H. Y. Kao (Eds.), Advances in knowledge discovery and data mining. PAKDD 2014. Lecture Notes in Computer Science (vol. 8443). Cham: Springer.

  • Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science types of knowledge in design science research. MIS Quarterly, 37(2), 337–355.

    Article  Google Scholar 

  • Gregor, S., & Jones, D. (2007). The anatomy of a design theory. Journal of the Association for Information Systems, 8(5), 1–25.

    Google Scholar 

  • Grimes, T. (1990). Audio-video correspondence and its role in attention and memory. Educational Technology Research and Development, 38(3), 15–25.

    Article  Google Scholar 

  • Habryn, F. (2012). Customer intimacy analytics: leveraging operational data to assess customer knowledge and relationships and to measure their business impact. KIT Scientific Publishing. https://doi.org/10.5445/KSP/1000028159.

  • Heimbach, I., Gottschlich, J., & Hinz, O. (2015). The value of user’s Facebook profile data for product recommendation generation. Electronic Markets, 25(2), 125–138.

    Article  Google Scholar 

  • Hevner, A., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105.

    Article  Google Scholar 

  • Hirt, R., & Kühl, N. (2018). Cognition in the era of smart service systems: Inter-organizational analytics through meta and transfer learning. In Proceedings of the Thirty Ninth International Conference on Information Systems (ICIS), San Francisco, CA, USA, 13th–16th December 2018.

  • Hsu, C., Chang, C.-C., & Lin, C.-J. (2008). A practical guide to support vector classifcation. Bioinformatics, 1(1), 1–15.

    Google Scholar 

  • IBM. (2016). Watson Visual Recognition service. Retrieved October 16, 2016, from http://www.ibm.com/watson/developercloud/doc/visual-recognition/.

  • Ikeda, K., Hattori, G., Ono, C., Asoh, H., & Higashino, T. (2013). Twitter user profiling based on text and community mining for market analysis. Knowledge-Based Systems, 51, 35–47.

    Article  Google Scholar 

  • Jenkins, M.-C., Churchill, R., Cox, S., & Smith, D. (2007). Analysis of user interaction with service oriented Chatbot systems. Human Computer Interaction, 4552, 76–83.

    Google Scholar 

  • Kludas J., Bruno E., & Marchand-Maillet S. (2008). Information fusion in multimedia information retrieval. In N. Boujemaa, M. Detyniecki & A. Nürnberger (Eds.), Adaptive multimedia retrieval: retrieval, user, and semantics. AMR 2007. Lecture Notes in Computer Science (vol. 4918). Berlin, Heidelberg: Springer.

  • Kraftfahrt-Bundesamt. (2014). Anzahl der Neuzulassungen von Elektroautos im Zeitraum von 2011 bis 2014.

  • Kuechler, W., & Vaishnavi, V. (2012). A framework for theory development in design science research: multiple perspectives. Journal of the Association for Information Systems, 13(6), 395–423.

    Article  Google Scholar 

  • Kühl, N., Scheurenbrand, J., & Satzger, G. (2016). Needmining: Identifying micro blog data containing customer needs. Proceedings of the 24th European Conference on Information Systems, 1–16.

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710.

    Google Scholar 

  • Liu, W., & Ruths, D. (2013). What’s in a name? Using first names as features for gender inference in Twitter. Analyzing Microtext: Papers from the 2013 AAAI Spring Symposium, 10–16.

  • Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(June), 22–31.

    Google Scholar 

  • Michie, E. D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Technometrics, 37(4), 459.

    Google Scholar 

  • Miller, G. A. (1956). The magical number 7, plus or minus 2 - some limits on our capacity for processing information. Psychological Review, 63, 81–97. https://doi.org/10.1037/h0043158.

    Article  Google Scholar 

  • Modha, D. S., Ananthanarayanan, R., Esser, S. K., Ndirango, A., Sherbondy, A. J., & Singh, R. (2011). Cognitive computing. Communications of the ACM, 54(8), 62.

    Article  Google Scholar 

  • Narr, S., Hulfenhaus, M., & Albayrak, S. (2012). Language-independent twitter sentiment analysis. Knowledge discovery and machine learning (KDML), LWA, 12–14.

  • Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31–88.

    Article  Google Scholar 

  • Neuhofer, B., Buhalis, D., & Ladkin, A. (2015). Smart technologies for personalized experiences: a case study in the hospitality domain. Electronic Markets, 25(3), 243–254.

    Article  Google Scholar 

  • Nguyen, D., Gravel, R., Trieschnigg, D., & Meder, T. (2013). How old do you think I am ?: A study of language and age in Twitter. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, 8-11 July 2013, Cambridge, Massachusetts, USA, 439–448.

  • Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3), 45–77.

    Article  Google Scholar 

  • Peffers K., Rothenberger M., Tuunanen T., & Vaezi R. (2012). Design science research evaluation. In K. Peffers, M. Rothenberger & B. Kuechler (Eds.), Design science research in information systems. Advances in theory and practice. DESRIST 2012. Lecture Notes in Computer Science (vol. 7286). Berlin, Heidelberg: Springer.

  • Powers, D. M. W. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.

    Google Scholar 

  • Quinlan, J. R. (2006). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence, 5(Quinlan 1993), 725–730.

  • Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in twitter. Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents - SMUC ‘10, 37.

  • Rumelhart, D. E., & Mcclelland, J. L. (1986). Parallel distributed processing: explorations in the microstructure of cognition. Volume 1: Foundations. MIT Press: Cambridge. 

  • Scheurenbrand, J., Engel, C., Peters, F., & Kühl, N. (2015). Holistically defining E-mobility: a modern approach to systematic literature reviews. Karlsruhe Service Summit, 17–27. https://doi.org/10.5445/KSP/1000045634.

  • Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One, 8(9), e73791.

    Article  Google Scholar 

  • Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25, 1–9.

    Google Scholar 

  • Sonnenberg, C., & Vom Brocke, J. (2012). Evaluations in the science of the artificial - reconsidering the build-evaluate pattern in design science research. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). https://doi.org/10.1007/978-3-642-29863-9_28.

  • Statista. (2016). Anzahl der monatlich aktiven Nutzer von Twitter in Deutschland in den Jahren 2014 und 2015 sowie eine Prognose für 2016 (in Millionen). Retrieved August 4, 2016, from http://de.statista.com/statistik/daten/studie/546761/umfrage/anzahl-der-monatlich-aktiven-twitter-nutzer-in-deutschland/.

  • Stone, M. (1977). Asymptotics for and against cross-validation. Biometrika, 64(1), 29–35.

    Article  Google Scholar 

  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.

    Article  Google Scholar 

  • Todorovski, L., & Džeroski, S. (2003). Combining classifiers with meta decision trees. Machine Learning, 50(3), 223–249.

    Article  Google Scholar 

  • Treacy, M., & Wiersema, F. (1993). Customer intimacy and other value disciplines customer intimacy and other value disciplines. Harvard Business Review, 71(9301), 84–93.

    Google Scholar 

  • Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77–95. https://doi.org/10.1023/A:1019956318069.

    Article  Google Scholar 

  • Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: writing a literature review. MIS Quarterly, 26(2), xiii–xxiii https://doi.org/10.2307/4132319.

    Article  Google Scholar 

  • Wieneke, A., & Lehrer, C. (2016). Generating and exploiting customer insights from social media data. Electronic Markets, 26(3), 245–268.

    Article  Google Scholar 

  • Zhou, G., Shen, D., Zhang, J., Su, J., & Tan, S. (2005). Recognition of protein/gene names from text using an ensemble of classifiers. BMC Bioinformatics, 6(1), 1.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robin Hirt.

Additional information

Responsible Editor: Haluk Demirkan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Smart Services: the move to customer- orientation

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hirt, R., Kühl, N. & Satzger, G. Cognitive computing for customer profiling: meta classification for gender prediction. Electron Markets 29, 93–106 (2019). https://doi.org/10.1007/s12525-019-00336-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12525-019-00336-z

Keywords

JEL classification

Navigation