Skip to main content

Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9052))

Abstract

Conventional approaches to gender classification much rely on a large scale of labeled data, which is normally hard and expensive to obtain. In this paper, we propose a co-training approach to address this problem in gender classification. Specifically, we employ both non-interactive and interactive texts, i.e., the message and comment texts, as two different views in our co-training approach to well incorporate unlabeled data. Experimental results on a large data set from micro-blog demonstrate the appropriateness of leveraging interactive knowledge in gender classification and the effectiveness of the proposed co-training approach in gender classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://weibo.com/.

  2. 2.

    http://mallet.cs.umass.edu/.

References

  • Blum, A., Mitchell, T.: Combing labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  • Corney, M., Vel, O., Anderson, A., Mohay, G.: Gender-preferential text mining of E-mail discourse. In: Proceedings of the 18th Annual Computer Security Applications Conference, pp. 282–289 (2002)

    Google Scholar 

  • Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of twitter users in non-english contexts. In: Proceedings of EMNLP-13, pp. 1136–1145 (2013)

    Google Scholar 

  • Gianfortoni, P., Adamson, D., Rosé, C.: Modeling of stylistic variation in social media with stretchy patterns. In: Proceedings of EMNLP-11, pp. 49–59 (2011)

    Google Scholar 

  • Ikeda, D., Takamura, H., Okumura, M.: Semi-supervised learning for blog classification. In: Proceedings of AAAI-08, pp. 1156–1161 (2008)

    Google Scholar 

  • Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-12, pp. 1478–1488 (2012)

    Google Scholar 

  • Heylighen, F., Dewaele, J.: Variation in the contextuality of language: an empirical measure. Proc. Found. Sci. 7, 293–340 (2002)

    Article  Google Scholar 

  • Liu, N., He, Y., Chen, Q., Peng, M., Tian, Y.: A new method for micro-blog platform users classification based on infinitesimal-time. J. Inf. Computantional Sci. 10(9), 2569–2579 (2013)

    Article  Google Scholar 

  • Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP-11, pp. 207–217 (2010)

    Google Scholar 

  • Nowson, S., Oberlander, J.: The identity of bloggers: openness and gender in personal weblogs. In: Proceedings of AAAI-06, pp. 163–167 (2006)

    Google Scholar 

  • Peersman, C., Daelemans, W., Vaerenbergh, L.: Predicting age and gender in online social networks. In: SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010)

    Google Scholar 

  • Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceeding SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010)

    Google Scholar 

  • Volkova, S., Wilson, T., Yarowsky, D.: Exploring demographic language variations to improve multilingual sentiment analysis in social media. In: Proceedings of EMNLP-13, pp. 1815–1827 (2013)

    Google Scholar 

Download references

Acknowledgments

This research work has been partially supported by three NSFC grants, No. 61273320, No.61375073, No.61331011, and Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shoushan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, J., Xue, Y., Li, S., Zhou, G. (2015). Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22324-7_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22323-0

  • Online ISBN: 978-3-319-22324-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics