Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data

Gupta, Preeti; Jindal, Rajni; Sharma, Arun

doi:10.1007/s10723-018-9457-z

Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data

Published: 10 August 2018

Volume 16, pages 553–567, (2018)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Preeti Gupta¹,
Rajni Jindal² &
Arun Sharma¹

249 Accesses
4 Citations
Explore all metrics

Abstract

Community detection plays an important role in creation and transfer of information. Active learning has been employed recently to improve the performance of community detection techniques. Active learning provides a semi-automatic approach in a selective sampling of data. Based on this, a community trolling approach for topic based community detection in big data is proposed. Community trolling selectively samples the data relevant to the current context from polluted big data using active learning. Fine-tuned data is then used to study community and its sub-communities. Community trolling as a precursor to community detection leads to a reduction of the huge unreliable dataset into a reliable dataset and results in the better prediction of community elements such as important topics and important entities. Finally, the effectiveness of approach was evaluated by implementing it on a real world Tumbler dataset. The results illustrate that community trolling provides a richer dataset resulting in more appropriate communities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local community detection with hints

Article 07 January 2022

Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network

Article Open access 13 March 2024

Adopting Different Strategies for Improving Local Community Detection: A Comparative Study

References

Abdelbary, H., El-Korany, A.: Semantic topics modeling approach for community detection. Int. J. Comput. Appl. 81(6), 50–58 (2013)
Google Scholar
Agarwal, S., Sureka, A.: Semantically analyzed metadata of tumblr posts and bloggers. Accessed on February (2017)
Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Yu, P.S.: Active learning: A survey. In: Data Classification: Algorithms and Applications, pp. 571–605. Chapman and Hall/CRC (2014)
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
MathSciNet MATH Google Scholar
Artificial Intelligence Lab Management Information Systems Department, U.o.A.: Islamic network forum dataset. https://s3-us-west-2.amazonaws.com/azsecure-forums-darkweb/IslamicNetwork.zip (2017). (Accessed on February, 2017)
Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data 2.0 processing systems: Taxonomy and open challenges. J. Grid Comput. 14(3), 379–405 (2016)
Article Google Scholar
Balasubramanyan, R., Cohen, W.W.: Block-lda: Jointly modeling entity-annotated text and entity-entity links. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 450–461. SIAM (2011)
Chapter Google Scholar
Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. IEEE (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Article Google Scholar
Cohen, K., Johansson, F., Kaati, L., Mork, J.C.: Detecting linguistic markers for radical violence in social media. Terrorism and Political Violence 26(1), 246–256 (2014)
Article Google Scholar
Ding, Y.: Community detection: topological vs. topical. J. Inf. 5(4), 498–514 (2011)
Google Scholar
Dos Santos, D.P., De Carvalho, A.C.: Comparison of active learning strategies and proposal of a multiclass hypothesis space search. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 618–629. Springer (2014)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Article MathSciNet Google Scholar
Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning. Knowl. Inf. Syst. 35(2), 1–35 (2013)
Article Google Scholar
Gadde, A., Gad, E.E., Avestimehr, S., Ortega, A.: Active learning for community detection in stochastic block models. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1889–1893. IEEE (2016)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report Stanford 1(12) (2009)
Goldberg, Y., Levy, O.: Word2vec explained: Deriving Mikolov others’s negative-sampling word-embedding method. arXiv:1402.3722 (2014)
Gupta, P., Sharma, A., Jindal, R.: Scalable machine-learning algorithms for big data analytics: a comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(6), 194–214 (2016)
Google Scholar
Habashi, S., Ghanem, N.M., Ismail, M.A.: Enhanced community detection in social networks using active spectral clustering. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1178–1181. ACM (2016)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Article Google Scholar
Leng, M., Yao, Y., Cheng, J., Lv, W., Chen, X.: Active semi-supervised community detection algorithm with label propagation. In: International Conference on Database Systems for Advanced Applications, pp. 324–338. Springer (2013)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010)
Li, D., Ding, Y., Sugimoto, C., He, B., Tang, J., Yan, E., Lin, N., Qin, Z., Dong, T.: Modeling topic and community structure in social tagging: The ttr-lda-community model. J. Am. Soc. Inf. Sci. Technol. 62(9), 1849–1866 (2011)
Article Google Scholar
Li, X.L., Liu, B.: Rule-based classification. In: Data Classification: Algorithms and applications, pp. 121–156. CRC Press . http://www.crcnetbase.com/doi/abs/10.1201/b17320-6 (2014)
Liu, J., Li, J., Li, W., Wu, J.: Rethinking big data: A review on the data quality and usage issues. ISPRS J. Photogramm. Remote. Sens. 115, 134–142 (2016)
Article Google Scholar
McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: In Proc. of 2005 Int. Joint Conf. on Artificial Intelligence (IJCAI05), pp. 786–791 (2005)
McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30, 249–272 (2007)
Article Google Scholar
Meoni, M., Perego, R., Tonellotto, N.: Dataset popularity prediction for caching of CMS big data. J. Grid Comput. 16(2), 1–18 (2018)
Article Google Scholar
Mikalef, P., Pappas, I.O., Krogstie, J., Giannakos, M.: Big data analytics capabilities: a systematic literature review and research agenda. Information Systems and e-Business Management, 1–32 (2017)
Moore, C., Yan, X., Zhu, Y., Rouquier, J.B., Lane, T.: Active learning for node classification in assortative and disassortative networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 841– 849. ACM (2011)
Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD Workshop, vol. 8 (2008)
Planti’e, M., Crampes, M.: Survey on social community detection. In: Social Media Retrieval, pp. 65–85. Springer (2013)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc, New York (1986)
MATH Google Scholar
Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: CEAS, vol. 7 (2007)
Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1), 1–114 (2012)
Article MathSciNet Google Scholar
Silva, C., Antunes, M., Costa, J., Ribeiro, B.: Active manifold learning with twitter big data. Procedia Comput. Sci. 53, 208–215 (2015)
Article Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)
Tang, L., Liu, H.: Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery 2(1), 1–137 (2010)
Article MathSciNet Google Scholar
Tang, Z., Fu, Z., Gong, Z., Li, K., Li, K.: A parallel conditional random fields model based on spark computing environment. J. Grid Comput. 15(3), 323–342 (2017)
Article Google Scholar
Wang, M., Wang, C., Yu, J.X., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endowment 8(10), 998–1009 (2015)
Article Google Scholar
Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. (csur) 45(4), 43 (2013)
Article Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10(10-10), 95 (2010)
Google Scholar
Zhou, X., Saha, A., Sindhwani, V.: Semi-supervised learning. In: Cost-Sensitive Machine Learning, pp. 31–59. CRC Press (2011)
Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)
Article Google Scholar
Zou, L., Song, W.W.: Lda-tm: A two-step approach to twitter topic data clustering. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 342–347. IEEE (2016)

Download references

Author information

Authors and Affiliations

Department of IT, Indira Gandhi Delhi Technical University for Women (IGDTUW), Kashmere Gate, Delhi, 110006, India
Preeti Gupta & Arun Sharma
Department of CSE, Delhi Technological University(DTU), Delhi, India
Rajni Jindal

Authors

Preeti Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Rajni Jindal
View author publications
You can also search for this author in PubMed Google Scholar
Arun Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Preeti Gupta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, P., Jindal, R. & Sharma, A. Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data. J Grid Computing 16, 553–567 (2018). https://doi.org/10.1007/s10723-018-9457-z

Download citation

Received: 25 August 2017
Accepted: 01 August 2018
Published: 10 August 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10723-018-9457-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data

Abstract

Access this article

Similar content being viewed by others

Local community detection with hints

Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network

Adopting Different Strategies for Improving Local Community Detection: A Comparative Study

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data

Abstract

Access this article

Similar content being viewed by others

Local community detection with hints

Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network

Adopting Different Strategies for Improving Local Community Detection: A Comparative Study

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation