Skip to main content
Log in

Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Community detection plays an important role in creation and transfer of information. Active learning has been employed recently to improve the performance of community detection techniques. Active learning provides a semi-automatic approach in a selective sampling of data. Based on this, a community trolling approach for topic based community detection in big data is proposed. Community trolling selectively samples the data relevant to the current context from polluted big data using active learning. Fine-tuned data is then used to study community and its sub-communities. Community trolling as a precursor to community detection leads to a reduction of the huge unreliable dataset into a reliable dataset and results in the better prediction of community elements such as important topics and important entities. Finally, the effectiveness of approach was evaluated by implementing it on a real world Tumbler dataset. The results illustrate that community trolling provides a richer dataset resulting in more appropriate communities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdelbary, H., El-Korany, A.: Semantic topics modeling approach for community detection. Int. J. Comput. Appl. 81(6), 50–58 (2013)

    Google Scholar 

  2. Agarwal, S., Sureka, A.: Semantically analyzed metadata of tumblr posts and bloggers. Accessed on February (2017)

  3. Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Yu, P.S.: Active learning: A survey. In: Data Classification: Algorithms and Applications, pp. 571–605. Chapman and Hall/CRC (2014)

  4. Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)

    MathSciNet  MATH  Google Scholar 

  5. Artificial Intelligence Lab Management Information Systems Department, U.o.A.: Islamic network forum dataset. https://s3-us-west-2.amazonaws.com/azsecure-forums-darkweb/IslamicNetwork.zip (2017). (Accessed on February, 2017)

  6. Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data 2.0 processing systems: Taxonomy and open challenges. J. Grid Comput. 14(3), 379–405 (2016)

    Article  Google Scholar 

  7. Balasubramanyan, R., Cohen, W.W.: Block-lda: Jointly modeling entity-annotated text and entity-entity links. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 450–461. SIAM (2011)

    Chapter  Google Scholar 

  8. Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. IEEE (2003)

  9. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  10. Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)

    Article  Google Scholar 

  11. Cohen, K., Johansson, F., Kaati, L., Mork, J.C.: Detecting linguistic markers for radical violence in social media. Terrorism and Political Violence 26(1), 246–256 (2014)

    Article  Google Scholar 

  12. Ding, Y.: Community detection: topological vs. topical. J. Inf. 5(4), 498–514 (2011)

    Google Scholar 

  13. Dos Santos, D.P., De Carvalho, A.C.: Comparison of active learning strategies and proposal of a multiclass hypothesis space search. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 618–629. Springer (2014)

  14. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  15. Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning. Knowl. Inf. Syst. 35(2), 1–35 (2013)

    Article  Google Scholar 

  16. Gadde, A., Gad, E.E., Avestimehr, S., Ortega, A.: Active learning for community detection in stochastic block models. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1889–1893. IEEE (2016)

  17. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report Stanford 1(12) (2009)

  18. Goldberg, Y., Levy, O.: Word2vec explained: Deriving Mikolov others’s negative-sampling word-embedding method. arXiv:1402.3722 (2014)

  19. Gupta, P., Sharma, A., Jindal, R.: Scalable machine-learning algorithms for big data analytics: a comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(6), 194–214 (2016)

    Google Scholar 

  20. Habashi, S., Ghanem, N.M., Ismail, M.A.: Enhanced community detection in social networks using active spectral clustering. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1178–1181. ACM (2016)

  21. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  Google Scholar 

  22. Leng, M., Yao, Y., Cheng, J., Lv, W., Chen, X.: Active semi-supervised community detection algorithm with label propagation. In: International Conference on Database Systems for Advanced Applications, pp. 324–338. Springer (2013)

  23. Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010)

  24. Li, D., Ding, Y., Sugimoto, C., He, B., Tang, J., Yan, E., Lin, N., Qin, Z., Dong, T.: Modeling topic and community structure in social tagging: The ttr-lda-community model. J. Am. Soc. Inf. Sci. Technol. 62(9), 1849–1866 (2011)

    Article  Google Scholar 

  25. Li, X.L., Liu, B.: Rule-based classification. In: Data Classification: Algorithms and applications, pp. 121–156. CRC Press . http://www.crcnetbase.com/doi/abs/10.1201/b17320-6 (2014)

  26. Liu, J., Li, J., Li, W., Wu, J.: Rethinking big data: A review on the data quality and usage issues. ISPRS J. Photogramm. Remote. Sens. 115, 134–142 (2016)

    Article  Google Scholar 

  27. McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: In Proc. of 2005 Int. Joint Conf. on Artificial Intelligence (IJCAI05), pp. 786–791 (2005)

  28. McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30, 249–272 (2007)

    Article  Google Scholar 

  29. Meoni, M., Perego, R., Tonellotto, N.: Dataset popularity prediction for caching of CMS big data. J. Grid Comput. 16(2), 1–18 (2018)

    Article  Google Scholar 

  30. Mikalef, P., Pappas, I.O., Krogstie, J., Giannakos, M.: Big data analytics capabilities: a systematic literature review and research agenda. Information Systems and e-Business Management, 1–32 (2017)

  31. Moore, C., Yan, X., Zhu, Y., Rouquier, J.B., Lane, T.: Active learning for node classification in assortative and disassortative networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 841– 849. ACM (2011)

  32. Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD Workshop, vol. 8 (2008)

  33. Planti’e, M., Crampes, M.: Survey on social community detection. In: Social Media Retrieval, pp. 65–85. Springer (2013)

  34. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)

  35. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc, New York (1986)

    MATH  Google Scholar 

  36. Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: CEAS, vol. 7 (2007)

  37. Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1), 1–114 (2012)

    Article  MathSciNet  Google Scholar 

  38. Silva, C., Antunes, M., Costa, J., Ribeiro, B.: Active manifold learning with twitter big data. Procedia Comput. Sci. 53, 208–215 (2015)

    Article  Google Scholar 

  39. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)

  40. Tang, L., Liu, H.: Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery 2(1), 1–137 (2010)

    Article  MathSciNet  Google Scholar 

  41. Tang, Z., Fu, Z., Gong, Z., Li, K., Li, K.: A parallel conditional random fields model based on spark computing environment. J. Grid Comput. 15(3), 323–342 (2017)

    Article  Google Scholar 

  42. Wang, M., Wang, C., Yu, J.X., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endowment 8(10), 998–1009 (2015)

    Article  Google Scholar 

  43. Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. (csur) 45(4), 43 (2013)

    Article  Google Scholar 

  44. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10(10-10), 95 (2010)

    Google Scholar 

  45. Zhou, X., Saha, A., Sindhwani, V.: Semi-supervised learning. In: Cost-Sensitive Machine Learning, pp. 31–59. CRC Press (2011)

  46. Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)

    Article  Google Scholar 

  47. Zou, L., Song, W.W.: Lda-tm: A two-step approach to twitter topic data clustering. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 342–347. IEEE (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Preeti Gupta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, P., Jindal, R. & Sharma, A. Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data. J Grid Computing 16, 553–567 (2018). https://doi.org/10.1007/s10723-018-9457-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-018-9457-z

Keywords

Navigation