Identifying Implicit Enterprise Users from the Imbalanced Social Data

You, Zhenni; Qian, Tieyun; Zhang, Baochao; Ying, Shi

doi:10.1007/978-3-319-48743-4_8

Zhenni You¹⁹,
Tieyun Qian¹⁹,
Baochao Zhang¹⁹ &
…
Shi Ying¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10042))

Included in the following conference series:

International Conference on Web Information Systems Engineering

Abstract

Identifying the implicit enterprise users in social media enables the improvement of data quality for many applications like user profiling and targeted advertisement, as they register as ordinary users but act like enterprise ones and hence become the noises in the data. The recognition of implicit enterprise users confronts two challenges: (1) it needs to be handled quickly with little cost due to the very nature of preprocessing, and (2) it is necessary to deal with the highly skewed distribution of implicit enterprise users and ordinary users, which is about 1:10 in a social media site Sina Weibo in China. To the best of our knowledge, this problem is so far unexplored.

In this paper, we present an efficient class-imbalance learning framework which involves several types of new features from the users’ profile. Specifically, a cost sensitive learning strategy is designed to overcome the problem arising from the skewed data, and a set of novel features are extracted from the profile rather than the main contents to greatly reduce the overhead of crawling and processing the microblogs. We conduct extensive experiments on a real data set consisting of 2200 users (2000 ordinary users and 200 implicit enterprise users, respectively) in Sina Weibo. The results demonstrate that our method significantly outperforms the baselines by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning from noisy label proportions for classifying online social data

Article 27 November 2017

Identifying Trendsetters in Online Social Networks – A Machine Learning Approach

Towards a Statistical Approach for User Classification in Twitter

References

Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: Proceedings of ASONAM, pp. 116–120 (2013)
Google Scholar
Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP, pp. 1478–1488 (2012)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. TKDE 21(9), 1263–1284 (2009)
Google Scholar
Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110, 5802–5805 (2013)
Article Google Scholar
Li, J., Ritter, A., Hovy, E.: Weakly supervised user profile extraction from Twitter. In: Proceedings of ACL, pp. 165–174 (2014)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory under-sampling for class-imbalance learning. In: Proceedings of ICDM, pp. 965–969 (2006)
Google Scholar
Liu, X.Y., Zhou, Z.H.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. TKDE 18, 63–77 (2006)
Google Scholar
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of International Workshop Utility-Based Data Mining, pp. 69–77 (2005)
Google Scholar
Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of KDD, pp. 632–640 (2013)
Google Scholar
Nguyen, D., Trieschnigg, D., Dog̀ruöz,, A.S., Grave, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING, pp. 1950–1961 (2014)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 3358–3378 (2007)
Article MATH Google Scholar
Wu, F., Shu, J., Huang, Y., Yuan, Z.: Social spammer and spam message co-detection in microblogging with social context regularization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1601–1610 (2015)
Google Scholar
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. ACM Trans. Knowl. Discov. Data 8, 2 (2014)
Article Google Scholar
Zhang, X., Li, Z., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in Twitter. ACM Trans. Web 10(1), 4:1–4:28 (2016)
Article Google Scholar

Download references

Acknowledgements

The work described in this paper has been supported in part by the NSFC projects (61572376, 61272275, 61373038), and the 111 project (B07037).

Author information

Authors and Affiliations

State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Zhenni You, Tieyun Qian, Baochao Zhang & Shi Ying

Authors

Zhenni You
View author publications
You can also search for this author in PubMed Google Scholar
Tieyun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Baochao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shi Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tieyun Qian .

Editor information

Editors and Affiliations

Poznań University of Economics, Poznan, Poland
Wojciech Cellary
University of Minnesota, Minneapolis, Minnesota, USA
Mohamed F. Mokbel
Tsinghua University, Beijing, China
Jianmin Wang
Victoria University, Melbourne, Victoria, Australia
Hua Wang
Victoria University, Melbourne, Victoria, Australia
Rui Zhou
Victoria University, Melbourne, Victoria, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

You, Z., Qian, T., Zhang, B., Ying, S. (2016). Identifying Implicit Enterprise Users from the Imbalanced Social Data. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10042. Springer, Cham. https://doi.org/10.1007/978-3-319-48743-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-48743-4_8
Published: 02 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48742-7
Online ISBN: 978-3-319-48743-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics