Finding the True Crowds: User Filtering in Microblogs

Hao, Bin; Zhang, Min; Ma, Weizhi; Sun, Jiashen; Liu, Yiqun; Ma, Shaoping; Zhu, Xuan; Luo, Hengliang

doi:10.1007/978-3-319-50496-4_50

Bin Hao¹⁸,
Min Zhang¹⁸,
Weizhi Ma¹⁸,
Jiashen Sun¹⁹,
Yiqun Liu¹⁸,
Shaoping Ma¹⁸,
Xuan Zhu¹⁹ &
…
Hengliang Luo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

4627 Accesses

Abstract

Nowadays users like to share their opinions towards a product/service or policy in social media, which is important to the manufacturers and governments to collect feedbacks from the crowds. While in microblogs, information is highly unbalanced that lots of posts are published and spread by ghost-writers/spammers, sellers, official accounts, etc., but information provided by the true crowds is overwhelmed frequently. Previous studies mostly concern on how to find one specific type of users; but do not investigate how to filter multiple types of specific users so as to keep only the true crowds, which is the main topic of this work. In this paper, we first show the categorization on four different types of users, namely ghost-writers, sellers, official accounts and end-users (the former three are noted as a broad sense advertisers in the paper), and study their characteristics. Then we propose a Topic-Specific Divergence based model to filter out advertisers so that end-users can be kept. Meta-information, content are investigated in comparative analysis. Encouraging experimental results on real dataset clearly verify that the proposed approach outperforms the state-of-art methods significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., Xu, B., Hao, H., Zhou, S., Cao, J.: User-defined hot topic detection in microblogging. In: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, pp. 183–186. ACM (2013)
Google Scholar
Danezis, G., Mittal, P.: Sybilinfer: detecting sybil nodes using social networks. In: NDSS the Internet Society (2009)
Google Scholar
Ghosh, S., Sharma, N., Benevenuto, F., Ganguly, N., Gummadi, K.: Cognos: crowdsourcing search for topic experts in microblogs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 575–590. ACM (2012)
Google Scholar
Hu, X., Tang, J., Gao, H., Liu, H.: Social spammer detection with sentiment information. In: 2014 IEEE International Conference on Data Mining (ICDM), pp. 180–189. IEEE (2014)
Google Scholar
Hu, X., Tang, J., Liu, H.: Online social spammer detection. In: AAAI, pp. 59–65 (2014)
Google Scholar
Li, Y., Li, W., Li, S.: A hierarchical knowledge representation for expert finding on social media. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pp. 616–622 (2015)
Google Scholar
Nonnecke, B., Preece, J.: Why lurkers lurk. In: Paper presented at the Americas Conference on Information Systems, Boston (2001)
Google Scholar
Liu, L., Jia, K.: Detecting spam in chinese microblogs-a study on Sina Weibo. In: 2012 Eighth International Conference on Computational Intelligence and Security (CIS), pp. 578–581. IEEE (2012)
Google Scholar
Liu, Y., Wu, B., Wang, B., Li, G.: SDHM: a hybrid model for spammer detection in Weibo. In: 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 942–947 (2014)
Google Scholar
Tagarelli, A., Interdonato, R.: “Who’s out there?” Identifying and ranking lurkers in social networks. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 215–222. IEEE (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, China
Bin Hao, Min Zhang, Weizhi Ma, Yiqun Liu & Shaoping Ma
Samsung R&D Institute, Beijing, China
Jiashen Sun, Xuan Zhu & Hengliang Luo

Authors

Bin Hao
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weizhi Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jiashen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shaoping Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hengliang Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Hao .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, B. et al. (2016). Finding the True Crowds: User Filtering in Microblogs. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_50
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics