Skip to main content

Gender Prediction Using Browsing History

  • Conference paper
Knowledge and Systems Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 244))

  • 1114 Accesses

Abstract

Demographic attributes such as gender and age of Internet users provide important information for marketing, personalization, and user behavior research. This paper addresses the problem of predicting users’ gender based on browsing history. We employ a classification-based approach to the problem and investigate a number of features derived from browsing log data. We show that high-level content features such as topics or categories are very predictive of gender and combining such features with features derived from access times and browsing patterns leads to significant improvements in prediction accuracy. We empirically verified the effectiveness of the method on real datasets from Vietnamese online media. The method substantially outperformed a baseline, and achieved a macro-averaged F1 score of 0.805. Experimental results also demonstrate the effectiveness of combining different feature types: a combination of features achieved 12% improvement of F1 score over the best performing individual feature type.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proc. of EMNLP 2011, pp. 1301–1309 (2011)

    Google Scholar 

  3. Computerworld Report: Men Want Facts, Women Seek Personal Connections on Web, http://www.computerworld.com/s/article/107391/Study_Men_want_facts_women_seek_personal_connections_on_Web

  4. Ellist, D.: Social (distributed) language modeling, clustering and dialectometry. In: Proc. of TextGraphs at ACL-IJCNLP 2009, pp. 1–4 (2009)

    Google Scholar 

  5. Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-CoNLL 2012 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1478–1488 (2012)

    Google Scholar 

  6. Garera, N., Yarowsky, D.: Modeling latent biographic attributes in conversational genres. In: Proc. of ACL-IJCNLP 2009, pp. 710–718 (2009)

    Google Scholar 

  7. Gillick, D.: Can conversational word usage be used to predict speaker demographics? In: Proceedings of Interspeech, Makuhari, Japan (2010)

    Google Scholar 

  8. Herring, S.C., Paolillo, J.C.: Gender and genre variation in weblogs. Journal of Sociolinguistics 10(4), 710–718 (2010)

    Google Scholar 

  9. Herring, S.C., Scheidt, L.A., Bonus, S., Wright, E.: Bridging the gap: A genre analysis of weblogs. In: HICSS 2004 (2004)

    Google Scholar 

  10. Hu, J., Zeng, H.J., Li, H., Niu, C., Chen, Z.: Demographic prediction based on user’s browsing behavior. In: Proceedings of the 16th International Conference on World Wide Web, pp. 151–160 (2007)

    Google Scholar 

  11. Kabbur, S., Han, E.H., Karypis, G.: Content-based methods for predicting web-site demographic attributes. In: Proceedings of ICDM 2010 (2010)

    Google Scholar 

  12. MacKinnon, I., Warren, R.: Age and geographic inferences of the LiveJournal social network. In: Statistical Network Analysis: Models, Issues, and New Directions Workshop at ICML 2006, Pittsburgh, PA (June 29, 2006)

    Google Scholar 

  13. Mulac, A., Seibold, D.R., Farris, J.R.: Female and male managers’ and professionals’ criticism giving: Differences in language use and effects. Journal of Language and Social Psychology 19(4), 389–415 (2000)

    Article  Google Scholar 

  14. Nowson, S., Oberlander, J.: The identity of bloggers: Openness and gender in personal weblogs. In: Proceedings of the AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Stanford, CA, March 27-29, pp. 163–167 (2006)

    Google Scholar 

  15. Otterbacher, J.: Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content and Metadata. In: Proceedings of CIKM 2010 (2010)

    Google Scholar 

  16. Pennachiotti, M., Popescu, A.M.: A machine learning approach to Twitter user classification. In: Proceedings of AAAI 2011 (2011)

    Google Scholar 

  17. Phuong, D.V., Phuong, T.M.: A keyword-topic model for contextual advertising. In: Proceedings of SoICT 2010 (2012)

    Google Scholar 

  18. Popescu, A., Grefenstette, G.: Mining user home location and gender from Flickr tags. In: Proc. of ICWSM 2010, pp. 1873–1876 (2010)

    Google Scholar 

  19. Rosenthal, S., McKeown, K.: Age prediction in blogs: A study of style, content, and online behavior in pre- and post-social media generations. In: Proc. of ACL 2011, pp. 763–772 (2011)

    Google Scholar 

  20. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: Proceedings of the AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Stanford, CA, March 27-29, pp. 199–205 (2006)

    Google Scholar 

  21. Search Engine Watch Journal, Behavioral Targeting and Contextual Advertising, http://www.searchenginejournal.com/?p=836

  22. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Processing KDD 2004. ACM, New York (2004)

    Google Scholar 

  23. Yan, X., Yan, L.: Gender classification of weblogs authors. In: Proceedings of the AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, Stanford, CA, March 27-29, pp. 228–230 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Do Viet Phuong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Phuong, D.V., Phuong, T.M. (2014). Gender Prediction Using Browsing History. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-02741-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02741-8_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02740-1

  • Online ISBN: 978-3-319-02741-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics