Abstract
Community detection plays an important role in creation and transfer of information. Active learning has been employed recently to improve the performance of community detection techniques. Active learning provides a semi-automatic approach in a selective sampling of data. Based on this, a community trolling approach for topic based community detection in big data is proposed. Community trolling selectively samples the data relevant to the current context from polluted big data using active learning. Fine-tuned data is then used to study community and its sub-communities. Community trolling as a precursor to community detection leads to a reduction of the huge unreliable dataset into a reliable dataset and results in the better prediction of community elements such as important topics and important entities. Finally, the effectiveness of approach was evaluated by implementing it on a real world Tumbler dataset. The results illustrate that community trolling provides a richer dataset resulting in more appropriate communities.
Similar content being viewed by others
References
Abdelbary, H., El-Korany, A.: Semantic topics modeling approach for community detection. Int. J. Comput. Appl. 81(6), 50–58 (2013)
Agarwal, S., Sureka, A.: Semantically analyzed metadata of tumblr posts and bloggers. Accessed on February (2017)
Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Yu, P.S.: Active learning: A survey. In: Data Classification: Algorithms and Applications, pp. 571–605. Chapman and Hall/CRC (2014)
Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
Artificial Intelligence Lab Management Information Systems Department, U.o.A.: Islamic network forum dataset. https://s3-us-west-2.amazonaws.com/azsecure-forums-darkweb/IslamicNetwork.zip (2017). (Accessed on February, 2017)
Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data 2.0 processing systems: Taxonomy and open challenges. J. Grid Comput. 14(3), 379–405 (2016)
Balasubramanyan, R., Cohen, W.W.: Block-lda: Jointly modeling entity-annotated text and entity-entity links. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 450–461. SIAM (2011)
Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003. IEEE (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Cohen, K., Johansson, F., Kaati, L., Mork, J.C.: Detecting linguistic markers for radical violence in social media. Terrorism and Political Violence 26(1), 246–256 (2014)
Ding, Y.: Community detection: topological vs. topical. J. Inf. 5(4), 498–514 (2011)
Dos Santos, D.P., De Carvalho, A.C.: Comparison of active learning strategies and proposal of a multiclass hypothesis space search. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 618–629. Springer (2014)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Fu, Y., Zhu, X., Li, B.: A survey on instance selection for active learning. Knowl. Inf. Syst. 35(2), 1–35 (2013)
Gadde, A., Gad, E.E., Avestimehr, S., Ortega, A.: Active learning for community detection in stochastic block models. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1889–1893. IEEE (2016)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report Stanford 1(12) (2009)
Goldberg, Y., Levy, O.: Word2vec explained: Deriving Mikolov others’s negative-sampling word-embedding method. arXiv:1402.3722 (2014)
Gupta, P., Sharma, A., Jindal, R.: Scalable machine-learning algorithms for big data analytics: a comprehensive review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6(6), 194–214 (2016)
Habashi, S., Ghanem, N.M., Ismail, M.A.: Enhanced community detection in social networks using active spectral clustering. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, pp. 1178–1181. ACM (2016)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Leng, M., Yao, Y., Cheng, J., Lv, W., Chen, X.: Active semi-supervised community detection algorithm with label propagation. In: International Conference on Database Systems for Advanced Applications, pp. 324–338. Springer (2013)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640. ACM (2010)
Li, D., Ding, Y., Sugimoto, C., He, B., Tang, J., Yan, E., Lin, N., Qin, Z., Dong, T.: Modeling topic and community structure in social tagging: The ttr-lda-community model. J. Am. Soc. Inf. Sci. Technol. 62(9), 1849–1866 (2011)
Li, X.L., Liu, B.: Rule-based classification. In: Data Classification: Algorithms and applications, pp. 121–156. CRC Press . http://www.crcnetbase.com/doi/abs/10.1201/b17320-6 (2014)
Liu, J., Li, J., Li, W., Wu, J.: Rethinking big data: A review on the data quality and usage issues. ISPRS J. Photogramm. Remote. Sens. 115, 134–142 (2016)
McCallum, A., Corrada-Emmanuel, A., Wang, X.: Topic and role discovery in social networks. In: In Proc. of 2005 Int. Joint Conf. on Artificial Intelligence (IJCAI05), pp. 786–791 (2005)
McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. 30, 249–272 (2007)
Meoni, M., Perego, R., Tonellotto, N.: Dataset popularity prediction for caching of CMS big data. J. Grid Comput. 16(2), 1–18 (2018)
Mikalef, P., Pappas, I.O., Krogstie, J., Giannakos, M.: Big data analytics capabilities: a systematic literature review and research agenda. Information Systems and e-Business Management, 1–32 (2017)
Moore, C., Yan, X., Zhu, Y., Rouquier, J.B., Lane, T.: Active learning for node classification in assortative and disassortative networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 841– 849. ACM (2011)
Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD Workshop, vol. 8 (2008)
Planti’e, M., Crampes, M.: Survey on social community detection. In: Social Media Retrieval, pp. 65–85. Springer (2013)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press (2004)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc, New York (1986)
Sculley, D.: Online active learning methods for fast label-efficient spam filtering. In: CEAS, vol. 7 (2007)
Settles, B.: Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1), 1–114 (2012)
Silva, C., Antunes, M., Costa, J., Ribeiro, B.: Active manifold learning with twitter big data. Procedia Comput. Sci. 53, 208–215 (2015)
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)
Tang, L., Liu, H.: Community detection and mining in social media. Synthesis Lectures on Data Mining and Knowledge Discovery 2(1), 1–137 (2010)
Tang, Z., Fu, Z., Gong, Z., Li, K., Li, K.: A parallel conditional random fields model based on spark computing environment. J. Grid Comput. 15(3), 323–342 (2017)
Wang, M., Wang, C., Yu, J.X., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endowment 8(10), 998–1009 (2015)
Xie, J., Kelley, S., Szymanski, B.K.: Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. (csur) 45(4), 43 (2013)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10(10-10), 95 (2010)
Zhou, X., Saha, A., Sindhwani, V.: Semi-supervised learning. In: Cost-Sensitive Machine Learning, pp. 31–59. CRC Press (2011)
Zhu, J., Wang, H., Tsou, B.K., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)
Zou, L., Song, W.W.: Lda-tm: A two-step approach to twitter topic data clustering. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 342–347. IEEE (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, P., Jindal, R. & Sharma, A. Community Trolling: An Active Learning Approach for Topic Based Community Detection in Big Data. J Grid Computing 16, 553–567 (2018). https://doi.org/10.1007/s10723-018-9457-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-018-9457-z