skip to main content
10.1145/3281375.3281383acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

Predicting user gender on social media sites using geographical information

Published: 25 September 2018 Publication History

Abstract

Recently, the tourism industry has developed remarkably. Marketing for revitalizing the tourism market has attracted intense attention. To perform effective marketing, analyzing attributes such as gender, age, and residential areas of visitors is a fundamentally important approach because it is possible to present an appropriate advertisement to each user considering user attributes. As described in this paper, we propose a method to estimate user attributes based on geographical information annotated to contents posted by users in social media posts. Attributes of people visiting a specific area might be biased, such as "men visit Shimbashi frequently" and "women often visit Harajuku." Our approach assumes that "people with a certain attribute often visit a certain area" and that "such areas differ depending on attributes." Based on those assumptions, we create feature vectors based on geographical information related to social media sites. Furthermore, we propose a method to estimate user attributes with feature vectors using machine learning. As described in this paper, we specifically examine estimation of user gender. Our experiments demonstrated evaluation of the efficiency of gender estimation using the proposed method from a dataset obtained from Twitter and Flickr.

References

[1]
John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011), 1301--1309.
[2]
Swarup Chandra, Latifur Khan, and Fahad Bin Muhaya. 2011. Estimating twitter user location using social interactions-a content based approach. Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on (2011), 838--843.
[3]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016), 785--794.
[4]
Zhenpeng Chen, Xuan Lu, Wei Ai, Huoran Li, Qiaozhu Mei, and Xuanzhe Liu. 2018. Through a Gender Lens: Learning Usage Patterns of Emojis from Large-Scale Android Users. Proceedings of the 2018 World Wide Web Conference on World Wide Web (2018), 763--772.
[5]
Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. Proceedings of the 19th ACM international conference on Information and knowledge management (2010), 759--768.
[6]
Malcolm Corney, Olivier De Vel, Alison Anderson, and George Mohay. 2002. Gender-preferential text mining of e-mail discourse. Computer Security Applications Conference, 2002. Proceedings. 18th Annual (2002), 282--289.
[7]
Corinna Cortes and Vladimir Vapnik. 1995. Support vector machine. Machine learning 20, 3 (1995), 273--297.
[8]
William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Author gender prediction in an email stream using neural networks. Journal of Intelligent Learning Systems and Applications 4, 03 (2012), 169.
[9]
William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Gender identification on twitter using the modified balanced winnow. Communications and network 4, 3 (2012), 189--195.
[10]
Jacob Eisenstein, Brendan O'Connor, Noah A Smith, and Eric P Xing. 2010. A latent variable model for geographic lexical variation. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010), 1277--1287.
[11]
Daisuke Ikeda, Hiroya Takamura, and Manabu Okumura. 2008. Semi-Supervised Learning for Blog Classification. AAAI (2008), 1156--1161.
[12]
Masataka Izumi, Takao Miura, and Isamu Shioya. 2008. Entropy-Based Age Estimation of Blog Authors. Computer Software and Applications, 2008. COMPSAC'08. 32nd Annual IEEE International (2008), 795--800.
[13]
Andy Liaw and Matthew Wiener. 2002. Classification and regression by random-Forest. R news 2, 3 (2002), 18--22.
[14]
Zachary Miller, Brian Dickinson, and Wei Hu. 2012. Gender prediction on twitter using stream algorithms with n-gram character features. International Journal of Intelligence Science 2, 04 (2012), 143.
[15]
Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. ICWSM 11, 5th (2011), 25.
[16]
Jon Oberlander and Scott Nowson. 2006. Whose thumb is it anyway?: classifying author personality from weblog text. Proceedings of the COLING/ACL on Main conference poster sessions (2006), 627--634.
[17]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.
[18]
Marco Pennacchiotti and Ana-Maria Popescu. 2011. Democrats, republicans and starbucks afficionados: user classification in twitter. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (2011), 430--438.
[19]
Marco Pennacchiotti and Ana-Maria Popescu. 2011. A Machine Learning Approach to Twitter User Classification. Icwsm 11, 1 (2011), 281--288.
[20]
Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent user attributes in twitter. Proceedings of the 2nd international workshop on Search and mining user-generated contents (2010), 37--44.
[21]
Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James W Pennebaker. 2006. Effects of age and gender on blogging. AAAI spring symposium: Computational approaches to analyzing weblogs 6 (2006), 199--205.
[22]
Zhen Wen and Ching-Yung Lin. 2010. On the quality of inferring interests from social neighbors. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (2010), 373--382.
[23]
Zhen Wen and Ching-Yung Lin. 2011. Improving User Interest Inference from Social Neighbors. (2011), 1001--1006.
[24]
Norihito Yasuda, Tsutomu Hirao, Jun Suzuki, and Hideki Isozaki. 2006. Identifying Bloggers' Residential Areas. AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (2006), 231--236.

Cited By

View all
  • (2024)Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping ReviewJournal of Medical Internet Research10.2196/4792326(e47923)Online publication date: 15-Mar-2024
  • (2021)Social Big Data: Case StudiesTransactions on Large-Scale Data- and Knowledge-Centered Systems XLVII10.1007/978-3-662-62919-2_4(80-111)Online publication date: 17-Jan-2021

Index Terms

  1. Predicting user gender on social media sites using geographical information

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      MEDES '18: Proceedings of the 10th International Conference on Management of Digital EcoSystems
      September 2018
      253 pages
      ISBN:9781450356220
      DOI:10.1145/3281375
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 September 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. gender classification
      2. geo-tag
      3. machine learning
      4. social media

      Qualifiers

      • Research-article

      Conference

      MEDES '18

      Acceptance Rates

      MEDES '18 Paper Acceptance Rate 29 of 77 submissions, 38%;
      Overall Acceptance Rate 267 of 682 submissions, 39%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping ReviewJournal of Medical Internet Research10.2196/4792326(e47923)Online publication date: 15-Mar-2024
      • (2021)Social Big Data: Case StudiesTransactions on Large-Scale Data- and Knowledge-Centered Systems XLVII10.1007/978-3-662-62919-2_4(80-111)Online publication date: 17-Jan-2021

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media