Abstract
Identifying the implicit enterprise users in social media enables the improvement of data quality for many applications like user profiling and targeted advertisement, as they register as ordinary users but act like enterprise ones and hence become the noises in the data. The recognition of implicit enterprise users confronts two challenges: (1) it needs to be handled quickly with little cost due to the very nature of preprocessing, and (2) it is necessary to deal with the highly skewed distribution of implicit enterprise users and ordinary users, which is about 1:10 in a social media site Sina Weibo in China. To the best of our knowledge, this problem is so far unexplored.
In this paper, we present an efficient class-imbalance learning framework which involves several types of new features from the users’ profile. Specifically, a cost sensitive learning strategy is designed to overcome the problem arising from the skewed data, and a set of novel features are extracted from the profile rather than the main contents to greatly reduce the overhead of crawling and processing the microblogs. We conduct extensive experiments on a real data set consisting of 2200 users (2000 ordinary users and 200 implicit enterprise users, respectively) in Sina Weibo. The results demonstrate that our method significantly outperforms the baselines by a large margin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: Proceedings of ASONAM, pp. 116–120 (2013)
Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP, pp. 1478–1488 (2012)
He, H., Garcia, E.A.: Learning from imbalanced data. TKDE 21(9), 1263–1284 (2009)
Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110, 5802–5805 (2013)
Li, J., Ritter, A., Hovy, E.: Weakly supervised user profile extraction from Twitter. In: Proceedings of ACL, pp. 165–174 (2014)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory under-sampling for class-imbalance learning. In: Proceedings of ICDM, pp. 965–969 (2006)
Liu, X.Y., Zhou, Z.H.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. TKDE 18, 63–77 (2006)
McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of International Workshop Utility-Based Data Mining, pp. 69–77 (2005)
Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of KDD, pp. 632–640 (2013)
Nguyen, D., Trieschnigg, D., Dog̀ruöz,, A.S., Grave, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING, pp. 1950–1961 (2014)
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 3358–3378 (2007)
Wu, F., Shu, J., Huang, Y., Yuan, Z.: Social spammer and spam message co-detection in microblogging with social context regularization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1601–1610 (2015)
Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. ACM Trans. Knowl. Discov. Data 8, 2 (2014)
Zhang, X., Li, Z., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in Twitter. ACM Trans. Web 10(1), 4:1–4:28 (2016)
Acknowledgements
The work described in this paper has been supported in part by the NSFC projects (61572376, 61272275, 61373038), and the 111 project (B07037).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
You, Z., Qian, T., Zhang, B., Ying, S. (2016). Identifying Implicit Enterprise Users from the Imbalanced Social Data. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10042. Springer, Cham. https://doi.org/10.1007/978-3-319-48743-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-48743-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48742-7
Online ISBN: 978-3-319-48743-4
eBook Packages: Computer ScienceComputer Science (R0)