Weakly-Supervised Occupation Detection for Micro-blogging Users

Chen, Ying; Pei, Bei

doi:10.1007/978-3-662-45924-9_27

Ying Chen¹⁶ &
Bei Pei¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 496))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1885 Accesses

Abstract

In this paper, we propose a weakly-supervised occupation detection approach which can automatically detect occupation information for micro-blogging users. The weakly-supervised approach makes use of two types of user information (tweets and personal descriptions) through a rule-based user occupation detection and a MCS-based (MCS: a multiple classifier system) user occupation detection. First, the rule-based occupation detection uses the personal descriptions of some users to create pseudo-training data. Second, based on the pseudo-training data, the MCS-based occupation detection uses tweets to do further occupation detection. However, the pseudo-training data is severely skewed and noisy, which brings a big challenge to the MCS-based occupation detection. Therefore, we propose a class-based random sampling method and a cascaded ensemble learning method to overcome these data problems. The experiments show that the weakly-supervised occupation detection achieves a good performance. In addition, although our study is made on Chinese, the approach indeed is language-independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

User Occupation Prediction on Microblogs

Using Random String Classification to Filter and Annotate Automated Accounts

ASocTweetPred: Mining and Prediction of Anti-social and Abusive Tweets for Anti-social Behavior Detection Using Selective Preferential Learning

References

Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plaintext collections. In: Proceedings of the 5th ACM International Conference on Digital Libraries (2000)
Google Scholar
Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: Overview of the Web People Search Attribute Extraction Task. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI (2007)
Google Scholar
Barandela, R., Sanchez, J., Garcia, V., Rangel, E.: Strategies for Learning in Class Imbalance Problems. Pattern Recognition (2003)
Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research (2002)
Google Scholar
Chen, Y., Lee, S.Y.M., Huang, C.: PolyUHK: A Robust Information Extraction System for Web Personal Names. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)
Google Scholar
Gruhl, D., Nagarajan, M., Pieper, J., Robson, C., Sheth, A.: Context and domain knowledge enhanced entity spotting in informal text. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 260–276. Springer, Heidelberg (2009)
Chapter Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Proc. Int’l J. Conf. Intelligent Computing, pp. 878–887 (2005)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In: Proc. Int’l J. Conf. Neural Networks, pp.1322–1328 (2008)
Google Scholar
He, H., Garcia, E.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Article Google Scholar
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proc. Int’l Conf. Machine Learning (1997)
Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Inc., Hoboken (2004)
Book Google Scholar
Li, S., Wang, Z., Zhou, G., Lee, S.Y.M.: Semi-supervised Learning for Imbalanced Sentiment Classification. In: Proceedings of IJCAI (2011)
Google Scholar
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing Named Entities in Tweets. In: ACL (2011)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory Under Sampling for Class Imbalance Learning. In: Proc. Int’l Conf. Data Mining, pp. 965–969 (2006)
Google Scholar
Minkov, E., Wang, R.C., Cohen, W.W.: Extracting personal names from emails: Applying named entity recognition to informal text. In: HLT/EMNLP (2005)
Google Scholar
Sarawagi, S.: Information Extraction. Foundations and Trends in Databases (2008)
Google Scholar
Wang, B.X., Japkowicz, N.: Imbalanced Data Set Learning with Synthetic Samples. In: Proc. IRIS Machine Learning Workshop (2004)
Google Scholar
Zhang, J., Mani, I.: KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In: Proc. Int’l Conf. Machine Learning (ICML 2003), Workshop Learning from Imbalanced Data Sets (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

China Agricultural University, China, 100083
Ying Chen
Key Lab of Information Network Security, Ministry of Public Security, China, 200031
Bei Pei

Authors

Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bei Pei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 100190, Beijing, China
Chengqing Zong
Dept. of Computer Science and Operations Research, University of Montreal, Montreal, Quebec, Canada
Jian-Yun Nie
Peking University, Beijing, China
Dongyan Zhao
Institute of Computer Science & Technology, Peking University, 100871, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Pei, B. (2014). Weakly-Supervised Occupation Detection for Micro-blogging Users. In: Zong, C., Nie, JY., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2014. Communications in Computer and Information Science, vol 496. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45924-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-662-45924-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45923-2
Online ISBN: 978-3-662-45924-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics