research-article

Predicting user gender on social media sites using geographical information

Authors:

Masaharu Hirota,

Hiroshi IshikawaAuthors Info & Claims

MEDES '18: Proceedings of the 10th International Conference on Management of Digital EcoSystems

Pages 219 - 226

https://doi.org/10.1145/3281375.3281383

Published: 25 September 2018 Publication History

Abstract

Recently, the tourism industry has developed remarkably. Marketing for revitalizing the tourism market has attracted intense attention. To perform effective marketing, analyzing attributes such as gender, age, and residential areas of visitors is a fundamentally important approach because it is possible to present an appropriate advertisement to each user considering user attributes. As described in this paper, we propose a method to estimate user attributes based on geographical information annotated to contents posted by users in social media posts. Attributes of people visiting a specific area might be biased, such as "men visit Shimbashi frequently" and "women often visit Harajuku." Our approach assumes that "people with a certain attribute often visit a certain area" and that "such areas differ depending on attributes." Based on those assumptions, we create feature vectors based on geographical information related to social media sites. Furthermore, we propose a method to estimate user attributes with feature vectors using machine learning. As described in this paper, we specifically examine estimation of user gender. Our experiments demonstrated evaluation of the efficiency of gender estimation using the proposed method from a dataset obtained from Twitter and Flickr.

References

[1]

John D Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011), 1301--1309.

Digital Library

[2]

Swarup Chandra, Latifur Khan, and Fahad Bin Muhaya. 2011. Estimating twitter user location using social interactions-a content based approach. Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on (2011), 838--843.

[3]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016), 785--794.

Digital Library

[4]

Zhenpeng Chen, Xuan Lu, Wei Ai, Huoran Li, Qiaozhu Mei, and Xuanzhe Liu. 2018. Through a Gender Lens: Learning Usage Patterns of Emojis from Large-Scale Android Users. Proceedings of the 2018 World Wide Web Conference on World Wide Web (2018), 763--772.

Digital Library

[5]

Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. Proceedings of the 19th ACM international conference on Information and knowledge management (2010), 759--768.

Digital Library

[6]

Malcolm Corney, Olivier De Vel, Alison Anderson, and George Mohay. 2002. Gender-preferential text mining of e-mail discourse. Computer Security Applications Conference, 2002. Proceedings. 18th Annual (2002), 282--289.

Digital Library

[7]

Corinna Cortes and Vladimir Vapnik. 1995. Support vector machine. Machine learning 20, 3 (1995), 273--297.

Digital Library

[8]

William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Author gender prediction in an email stream using neural networks. Journal of Intelligent Learning Systems and Applications 4, 03 (2012), 169.

[9]

William Deitrick, Zachary Miller, Benjamin Valyou, Brian Dickinson, Timothy Munson, and Wei Hu. 2012. Gender identification on twitter using the modified balanced winnow. Communications and network 4, 3 (2012), 189--195.

[10]

Jacob Eisenstein, Brendan O'Connor, Noah A Smith, and Eric P Xing. 2010. A latent variable model for geographic lexical variation. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010), 1277--1287.

Digital Library

[11]

Daisuke Ikeda, Hiroya Takamura, and Manabu Okumura. 2008. Semi-Supervised Learning for Blog Classification. AAAI (2008), 1156--1161.

Digital Library

[12]

Masataka Izumi, Takao Miura, and Isamu Shioya. 2008. Entropy-Based Age Estimation of Blog Authors. Computer Software and Applications, 2008. COMPSAC'08. 32nd Annual IEEE International (2008), 795--800.

Digital Library

[13]

Andy Liaw and Matthew Wiener. 2002. Classification and regression by random-Forest. R news 2, 3 (2002), 18--22.

[14]

Zachary Miller, Brian Dickinson, and Wei Hu. 2012. Gender prediction on twitter using stream algorithms with n-gram character features. International Journal of Intelligence Science 2, 04 (2012), 143.

[15]

Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J Niels Rosenquist. 2011. Understanding the Demographics of Twitter Users. ICWSM 11, 5th (2011), 25.

[16]

Jon Oberlander and Scott Nowson. 2006. Whose thumb is it anyway?: classifying author personality from weblog text. Proceedings of the COLING/ACL on Main conference poster sessions (2006), 627--634.

Digital Library

[17]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.

Digital Library

[18]

Marco Pennacchiotti and Ana-Maria Popescu. 2011. Democrats, republicans and starbucks afficionados: user classification in twitter. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (2011), 430--438.

Digital Library

[19]

Marco Pennacchiotti and Ana-Maria Popescu. 2011. A Machine Learning Approach to Twitter User Classification. Icwsm 11, 1 (2011), 281--288.

[20]

Delip Rao, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010. Classifying latent user attributes in twitter. Proceedings of the 2nd international workshop on Search and mining user-generated contents (2010), 37--44.

Digital Library

[21]

Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James W Pennebaker. 2006. Effects of age and gender on blogging. AAAI spring symposium: Computational approaches to analyzing weblogs 6 (2006), 199--205.

[22]

Zhen Wen and Ching-Yung Lin. 2010. On the quality of inferring interests from social neighbors. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (2010), 373--382.

Digital Library

[23]

Zhen Wen and Ching-Yung Lin. 2011. Improving User Interest Inference from Social Neighbors. (2011), 1001--1006.

Digital Library

[24]

Norihito Yasuda, Tsutomu Hirao, Jun Suzuki, and Hideki Isozaki. 2006. Identifying Bloggers' Residential Areas. AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (2006), 231--236.

Cited By

O'Connor KGolder SWeissenbacher DKlein AMagge AGonzalez-Hernandez G(2024)Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping ReviewJournal of Medical Internet Research10.2196/4792326(e47923)Online publication date: 15-Mar-2024
https://doi.org/10.2196/47923
Ishikawa HMiyata Y(2021)Social Big Data: Case StudiesTransactions on Large-Scale Data- and Knowledge-Centered Systems XLVII10.1007/978-3-662-62919-2_4(80-111)Online publication date: 17-Jan-2021
https://doi.org/10.1007/978-3-662-62919-2_4

Index Terms

Predicting user gender on social media sites using geographical information
1. Information systems
  1. Information systems applications
2. Social and professional topics
  1. User characteristics
    1. Geographic characteristics

Recommendations

Benefits and risks of LGBT social media use for sexual and gender minority individuals: An investigation of psychosocial mechanisms of LGBT social media use and well-being
Abstract
There has been a proliferation of lesbian, gay, bisexual, and transgender (LGBT) social media platforms and users over the past decade. Previous studies have reported mixed effects of social media use on well-being, but less is known ...
Highlights
- LGBT social media use may involve both benefits and risks.
- Effects of LGBT ...
Uses and gratifications of social networking sites for bridging and bonding social capital

Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Social media jealousy and intimate partner violence in young adults’ romantic relationships: A longitudinal study
Highlights
- Social media are a prime location for the emergence of jealousy in romantic relationships.
Abstract
Social media have profoundly transformed young adults’ social interactions, especially within their romantic relationships. For instance, jealousy induced by the partner's activity on social media can cause conflicts that can escalate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEDES '18: Proceedings of the 10th International Conference on Management of Digital EcoSystems

September 2018

253 pages

ISBN:9781450356220

DOI:10.1145/3281375

Conference Chairs:
Richard Chbeir,
Hiroshi Ishikawa,
Program Chairs:
Kazutoshi Sumiya,
Kenji Hatano,
Mario Koeppen

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MEDES '18

MEDES '18: The 10th International Conference on Management of Digital EcoSystems

September 25 - 28, 2018

Tokyo, Japan

Acceptance Rates

MEDES '18 Paper Acceptance Rate 29 of 77 submissions, 38%;

Overall Acceptance Rate 267 of 682 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
101
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

O'Connor KGolder SWeissenbacher DKlein AMagge AGonzalez-Hernandez G(2024)Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping ReviewJournal of Medical Internet Research10.2196/4792326(e47923)Online publication date: 15-Mar-2024
https://doi.org/10.2196/47923
Ishikawa HMiyata Y(2021)Social Big Data: Case StudiesTransactions on Large-Scale Data- and Knowledge-Centered Systems XLVII10.1007/978-3-662-62919-2_4(80-111)Online publication date: 17-Jan-2021
https://doi.org/10.1007/978-3-662-62919-2_4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents