Skip to main content

Gender Prediction Based on Chinese Name

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2019)

Abstract

Much work has been done on the problem of gender prediction about English using the idea of probability models or traditional machine learning methods. Different from English or other alphabetic languages, Chinese characters are logosyllabic. Previous approaches work quite well for Indo-European languages in general and English in particular, however, their performance deteriorate in Asian languages such as Chinese, Japanese and Korean. In our work, we focus on Simplified Chinese characters and present a novel approach incorporating phonetic information (Pinyin) to enhance Chinese word embedding trained on BERT model. We compared our method with several previous methods, namely Naive Bayes, GBDT, and Random forest with word embedding via fastText as features. Quantitative and qualitative experiments demonstrate the superior of our model. The results show that we can achieve 93.45% test accuracy using our method. In addition, we have released two large-scale gender-labeled datasets (one with over one million first names and the other with over six million full names) used as a part of this study for the community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://baike.baidu.com.

  2. 2.

    https://en.wikipedia.org/wiki/Main_Page.

  3. 3.

    http://sofasofa.io/tutorials/naive_bayes_classifier/.

  4. 4.

    https://github.com/jijeng/gender-prediction.

References

  1. Mueller, J., Stumme, G.: Gender inference using statistical name characteristics in twitter. In: Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016, p. 47. ACM (2016)

    Google Scholar 

  2. Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., Strohmaier, M.: Inferring gender from names on the web: a comparative evaluation of gender detection methods. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, Republic and Canton of Geneva, Switzerland, pp. 53–54. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  3. Khachane, M.Y.: Gender estimation from first name: a rule based approach. Int. J. Adv. Res. Comput. Sci. 9(2), 609 (2018)

    Article  Google Scholar 

  4. Liu, W., Ruths, D.: What’s in a name? using first names as features for gender inference in twitter. In: 2013 AAAI Spring Symposium Series (2013)

    Google Scholar 

  5. Gu, C., Tian, X.-P., Yu, J.-D.: Automatic recognition of chinese personal name using conditional random fields and knowledge base. Mathematical Problems in Engineering (2015)

    Google Scholar 

  6. Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 1301–1309. Association for Computational Linguistics (2011)

    Google Scholar 

  7. Liu, M., Rus, V., Liao, Q., Liu, L.: Encoding and ranking similar chinese characters. J. Inf. Sci. Eng. 33(5), 1195–1211 (2017)

    Google Scholar 

  8. Huang, S., Wu, J.: A pragmatic approach for classical chinese word segmentation. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) 2018

    Google Scholar 

  9. Peng, N., Yu, M., Dredze, M.: An empirical study of chinese name matching and applications. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, pp. 377–383 (2015)

    Google Scholar 

  10. Huang, Y., Zhao, H.: Chinese pinyin aided IME, input what you have not keystroked yet. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 2923–2929. Association for Computational Linguistics, October-November 2018

    Google Scholar 

  11. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv e-prints, page arXiv:1810.04805, October 2018

  12. Chen, H., Gallagher, A.C., Girod, B.: What’s in a name? first names as facial attributes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013

    Google Scholar 

  13. Zhao, H., Kamareddine, F.: Advance gender prediction tool of first names and its use in analysing gender disparity in computer science in the uk, malaysia and china. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 222–227, December 2017

    Google Scholar 

  14. Jin, H., et al.: Incorporating Chinese Characters of Words for Lexical Sememe Prediction. arXiv e-prints, page arXiv:1806.06349, June 2018

  15. Gender Guesser. https://test.pypi.org/project/gender-guesser/. Accessed 4 May 2019

  16. Namsor Gender API. https://gender-api.com/. Accessed 4 May 2019

  17. Ngender. https://github.com/observerss/ngender/. Accessed 4 May 2019

  18. pypinyin. https://pypi.org/project/pypinyin/. Accessed 4 May 2019

  19. Most common surnames revealed. http://www.chinadaily.com.cn/a/201901/31/WS5c528e7ea3106c65c34e78cb.html. Accessed 4 May 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jizheng Jia or Qiyang Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jia, J., Zhao, Q. (2019). Gender Prediction Based on Chinese Name. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32236-6_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32235-9

  • Online ISBN: 978-3-030-32236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics