Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

Wang, Jingjing; Xue, Yunxia; Li, Shoushan; Zhou, Guodong

doi:10.1007/978-3-319-22324-7_23

Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

Jingjing Wang^18,19,
Yunxia Xue^18,19,
Shoushan Li^18,19 &
…
Guodong Zhou^18,19

Conference paper
First Online: 01 January 2015

1115 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9052))

Abstract

Conventional approaches to gender classification much rely on a large scale of labeled data, which is normally hard and expensive to obtain. In this paper, we propose a co-training approach to address this problem in gender classification. Specifically, we employ both non-interactive and interactive texts, i.e., the message and comment texts, as two different views in our co-training approach to well incorporate unlabeled data. Experimental results on a large data set from micro-blog demonstrate the appropriateness of leveraging interactive knowledge in gender classification and the effectiveness of the proposed co-training approach in gender classification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://weibo.com/.
2.
http://mallet.cs.umass.edu/.

References

Blum, A., Mitchell, T.: Combing labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Google Scholar
Corney, M., Vel, O., Anderson, A., Mohay, G.: Gender-preferential text mining of E-mail discourse. In: Proceedings of the 18th Annual Computer Security Applications Conference, pp. 282–289 (2002)
Google Scholar
Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of twitter users in non-english contexts. In: Proceedings of EMNLP-13, pp. 1136–1145 (2013)
Google Scholar
Gianfortoni, P., Adamson, D., Rosé, C.: Modeling of stylistic variation in social media with stretchy patterns. In: Proceedings of EMNLP-11, pp. 49–59 (2011)
Google Scholar
Ikeda, D., Takamura, H., Okumura, M.: Semi-supervised learning for blog classification. In: Proceedings of AAAI-08, pp. 1156–1161 (2008)
Google Scholar
Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-12, pp. 1478–1488 (2012)
Google Scholar
Heylighen, F., Dewaele, J.: Variation in the contextuality of language: an empirical measure. Proc. Found. Sci. 7, 293–340 (2002)
Article Google Scholar
Liu, N., He, Y., Chen, Q., Peng, M., Tian, Y.: A new method for micro-blog platform users classification based on infinitesimal-time. J. Inf. Computantional Sci. 10(9), 2569–2579 (2013)
Article Google Scholar
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP-11, pp. 207–217 (2010)
Google Scholar
Nowson, S., Oberlander, J.: The identity of bloggers: openness and gender in personal weblogs. In: Proceedings of AAAI-06, pp. 163–167 (2006)
Google Scholar
Peersman, C., Daelemans, W., Vaerenbergh, L.: Predicting age and gender in online social networks. In: SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010)
Google Scholar
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceeding SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010)
Google Scholar
Volkova, S., Wilson, T., Yarowsky, D.: Exploring demographic language variations to improve multilingual sentiment analysis in social media. In: Proceedings of EMNLP-13, pp. 1815–1827 (2013)
Google Scholar

Download references

Acknowledgments

This research work has been partially supported by three NSFC grants, No. 61273320, No.61375073, No.61331011, and Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Natural Language Processing Lab, School of Computer Science and Technology, Soochow University, Suzhou, China
Jingjing Wang, Yunxia Xue, Shoushan Li & Guodong Zhou
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China
Jingjing Wang, Yunxia Xue, Shoushan Li & Guodong Zhou

Authors

Jingjing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yunxia Xue
View author publications
You can also search for this author in PubMed Google Scholar
Shoushan Li
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shoushan Li .

Editor information

Editors and Affiliations

Soochow University, Suzhou, China
An Liu
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
Wuhan University, Wuhan, China
Tieyun Qian
University of Hong Kong, Hong Kong, China
Sarana Nutanong
Monash University, Clayton, Victoria, Australia
Muhammad Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Xue, Y., Li, S., Zhou, G. (2015). Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-22324-7_23
Published: 30 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22323-0
Online ISBN: 978-3-319-22324-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics