Skip to main content

Person, Organization, or Personage: Towards User Account Type Prediction in Microblogs

  • Conference paper
  • First Online:
Electronic Governance and Open Society: Challenges in Eurasia (EGOSE 2018)

Abstract

During the past decade, microblog services have been extensively utilized by millions of business and private users as one of the most powerful information broadcasting tools. For example, Twitter attracted many social science researchers due to its high popularity, constrained format of thought expression, and the ability to react actual trends. However, unstructured data from microblogs often suffer from the lack of representativeness due to the tremendous amount of noise. Such noise is often introduced by the activity of organizational and fake user ac-counts that may not be useful in many application domains. Aiming to tackle the information filtering problem, in this paper, we classify Twitter accounts into three categories: “Personal”, “Organization”, and “Personage”. Specifically, we utilize various text-based data representation approaches to extract features for our proposed microblog account type prediction framework “POP-MAP”. To study the problem at a cross-language level, we harvested and learned from a multi-lingual Twitter dataset, which allows us to achieve better classification performance, as compared to various state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://twitter.com/adidas.

  2. 2.

    http://twitter.com/dbsbank.

  3. 3.

    http://twitter.com/hermitage_eng.

  4. 4.

    http://twitter.com/realgrumpycat.

  5. 5.

    http://twitter.com/arrypottah.

  6. 6.

    http://twitter.com/facebook.

  7. 7.

    http://twitter.com/google.

  8. 8.

    http://twitter.com/yandex.

  9. 9.

    http://twitter.com/vkontakte.

  10. 10.

    http://twitter.com/zeppos.

  11. 11.

    http://twitter.com/durov.

  12. 12.

    http://twitter.com/jimmy_wales.

  13. 13.

    http://dict.ruslang.ru/freq.php.

  14. 14.

    http://www.wordfrequency.info.

  15. 15.

    http://languagetool.org.

  16. 16.

    http://twitter.com/yandextaxi.

  17. 17.

    http://twitter.com/yandexmarket.

  18. 18.

    http://www.cs.waikato.ac.nz/ml/weka/.

References

  1. Aramaki, E., Maskawa, S., Morita, M.: Twitter catches the flu: detecting influenza epidemics using twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1568–1576. Association for Computational Linguistics (2011)

    Google Scholar 

  2. Barone, L.: Which type of twitter account should you create? (2010). http://smallbiztrends.com/2010/02/types-of-twitter-accounts.html. Accessed 15 Apr 2016

  3. Bartunov, S., Korshunov, A., Park, S.-T., Ryu, W., Lee, H.: Joint link-attribute user identity resolution in online social networks. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Workshop on Social Network Mining and Analysis. ACM (2012)

    Google Scholar 

  4. Boshmaf, Y., Muslukhov, I., Beznosov, K., Ripeanu, M.: Design and analysis of a social botnet. Comput. Netw. 57(2), 556–578 (2013)

    Article  Google Scholar 

  5. Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: Presented as Part of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, pp. 197–210 (2012)

    Google Scholar 

  6. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on Twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 21–30. ACM (2010)

    Google Scholar 

  7. Culotta, A.: Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122. ACM (2010)

    Google Scholar 

  8. Deitrick, W., Miller, Z., Valyou, B., Dickinson, B., Munson, T., Wei, H.: Gender identification on twitter using the modified balanced winnow. Commun. Netw. 4(3), 1–7 (2012)

    Google Scholar 

  9. Farseev, A., Akbari, M., Samborskii, I., Chua, T.-S.: 360° user profiling: past, future, and applications. ACM SIGWEB Newslett, (Summer), Article no. 4 (2016)

    Google Scholar 

  10. Farseev, A., Chua, T.-S.: TweetFit: fusing sensors and multiple social media for wellness profile learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. AAAI (2017)

    Google Scholar 

  11. Farseev, A., Kotkov, D., Semenov, A., Veijalainen, J., Chua, T.-S.: Cross-social network collaborative recommendation. In: Proceedings of the ACM Web Science Conference, p. 38. ACM (2015)

    Google Scholar 

  12. Farseev, A., Nie, L., Akbari, M., Chua, T.-S.: Harvesting multiple sources for user profile learning: a big data study. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 235–242. ACM (2015)

    Google Scholar 

  13. Farseev, A., Samborskii, I., Chua, T.-S.: bBridge: a big data platform for social multimedia analytics. In: Proceedings of the 2016 ACM Conference on Multimedia, pp. 759–761. ACM (2016)

    Google Scholar 

  14. Filchenkov, A.A., Azarov, A.A., Abramov, M.V.: What is more predictable in social media: election outcome or protest action? In: Proceedings of the 2014 Conference on Electronic Governance and Open Society: Challenges in Eurasia, pp. 157–161. ACM (2014)

    Google Scholar 

  15. Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., Weitzner, D.: Web science: an interdisciplinary approach to understanding the web. Commun. ACM 51(7), 60–69 (2008)

    Article  Google Scholar 

  16. Kafeza, E., Kanavos, A., Makris, C., Vikatos, P.: T-PICE: Twitter personality based influential communities extraction system. In: 2014 IEEE International Congress on Big Data, pp. 212–219. IEEE (2014)

    Google Scholar 

  17. Lee, K., Agrawal, A., Choudhary, A.: Real-time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1474–1477. ACM (2013)

    Google Scholar 

  18. Lin, J.: Automatic author profiling of online chat logs. Ph.D. thesis, Monterey, California. Naval Postgraduate School (2007)

    Google Scholar 

  19. Lin, J., Sugiyama, K., Kan, M.-T., Chua, T.-S.: Addressing cold-start in app recommendation: latent user models constructed from twitter followers. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 283–292. ACM (2013)

    Google Scholar 

  20. Oentaryo, R.J., Low, J.-W., Lim, E.-P.: Chalk and Cheese in twitter: discriminating personal and organization accounts. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 465–476. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_51

    Chapter  Google Scholar 

  21. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  22. Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One 8(9), e73791 (2013)

    Article  Google Scholar 

  23. Tavares, G., Faisal, A.: Scaling-laws of human broadcast communication enable distinction between human, corporate and robot twitter users. PLoS One 8(7), e65774 (2013)

    Article  Google Scholar 

  24. Tsakalidis, A., Papadopoulos, S., Cristea, A.I., Kompatsiaris, Y.: Predicting elections for multiple countries using twitter and polls. IEEE Intell. Syst. 30(2), 10–17 (2015)

    Article  Google Scholar 

  25. Varlamov, M.I., Turdakov, D.Y.: A survey of methods for the extraction of information from web resources. Program. Comput. Softw. 42(5), 279–291 (2016)

    Article  Google Scholar 

  26. Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: Foresti, S., Jajodia, S. (eds.) DBSec 2010. LNCS, vol. 6166, pp. 335–342. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13739-6_25

    Chapter  Google Scholar 

  27. Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.: A feature subset selection algorithm automatic recommendation method. J. Artif. Intell. Res. 47, 1–34 (2013)

    Article  Google Scholar 

  28. Zhao, W.X., et al.: Comparing twitter and traditional media using topic models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Filchenkov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samborskii, I., Filchenkov, A., Korneev, G., Farseev, A. (2019). Person, Organization, or Personage: Towards User Account Type Prediction in Microblogs. In: Chugunov, A., Misnikov, Y., Roshchin, E., Trutnev, D. (eds) Electronic Governance and Open Society: Challenges in Eurasia. EGOSE 2018. Communications in Computer and Information Science, vol 947. Springer, Cham. https://doi.org/10.1007/978-3-030-13283-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-13283-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-13282-8

  • Online ISBN: 978-3-030-13283-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics