Skip to main content

Identifying Implicit Enterprise Users from the Imbalanced Social Data

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2016 (WISE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10042))

Included in the following conference series:

  • 979 Accesses

Abstract

Identifying the implicit enterprise users in social media enables the improvement of data quality for many applications like user profiling and targeted advertisement, as they register as ordinary users but act like enterprise ones and hence become the noises in the data. The recognition of implicit enterprise users confronts two challenges: (1) it needs to be handled quickly with little cost due to the very nature of preprocessing, and (2) it is necessary to deal with the highly skewed distribution of implicit enterprise users and ordinary users, which is about 1:10 in a social media site Sina Weibo in China. To the best of our knowledge, this problem is so far unexplored.

In this paper, we present an efficient class-imbalance learning framework which involves several types of new features from the users’ profile. Specifically, a cost sensitive learning strategy is designed to overcome the problem arising from the skewed data, and a set of novel features are extracted from the profile rather than the main contents to greatly reduce the overhead of crawling and processing the microblogs. We conduct extensive experiments on a real data set consisting of 2200 users (2000 ordinary users and 200 implicit enterprise users, respectively) in Sina Weibo. The results demonstrate that our method significantly outperforms the baselines by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  2. Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: Proceedings of ASONAM, pp. 116–120 (2013)

    Google Scholar 

  3. Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP, pp. 1478–1488 (2012)

    Google Scholar 

  4. He, H., Garcia, E.A.: Learning from imbalanced data. TKDE 21(9), 1263–1284 (2009)

    Google Scholar 

  5. Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. PNAS 110, 5802–5805 (2013)

    Article  Google Scholar 

  6. Li, J., Ritter, A., Hovy, E.: Weakly supervised user profile extraction from Twitter. In: Proceedings of ACL, pp. 165–174 (2014)

    Google Scholar 

  7. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory under-sampling for class-imbalance learning. In: Proceedings of ICDM, pp. 965–969 (2006)

    Google Scholar 

  8. Liu, X.Y., Zhou, Z.H.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. TKDE 18, 63–77 (2006)

    Google Scholar 

  9. McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of International Workshop Utility-Based Data Mining, pp. 69–77 (2005)

    Google Scholar 

  10. Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of KDD, pp. 632–640 (2013)

    Google Scholar 

  11. Nguyen, D., Trieschnigg, D., Dog̀ruöz,, A.S., Grave, R., Theune, M., Meder, T., de Jong, F.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING, pp. 1950–1961 (2014)

    Google Scholar 

  12. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 3358–3378 (2007)

    Article  MATH  Google Scholar 

  13. Wu, F., Shu, J., Huang, Y., Yuan, Z.: Social spammer and spam message co-detection in microblogging with social context regularization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1601–1610 (2015)

    Google Scholar 

  14. Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. ACM Trans. Knowl. Discov. Data 8, 2 (2014)

    Article  Google Scholar 

  15. Zhang, X., Li, Z., Zhu, S., Liang, W.: Detecting spam and promoting campaigns in Twitter. ACM Trans. Web 10(1), 4:1–4:28 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The work described in this paper has been supported in part by the NSFC projects (61572376, 61272275, 61373038), and the 111 project (B07037).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tieyun Qian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

You, Z., Qian, T., Zhang, B., Ying, S. (2016). Identifying Implicit Enterprise Users from the Imbalanced Social Data. In: Cellary, W., Mokbel, M., Wang, J., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2016. WISE 2016. Lecture Notes in Computer Science(), vol 10042. Springer, Cham. https://doi.org/10.1007/978-3-319-48743-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48743-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48742-7

  • Online ISBN: 978-3-319-48743-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics