ABSTRACT
Classification is one of the most studied subjects in machine learning. Most classification methods that were developed this last decade either account for structure (interactions, relationships) or attributes (text, numerical, etc). This leads to ignoring significant patterns in a dataset that could only be captured by analyzing the features of an item and its interactions. Collective classification methods use both structure and attributes, often by aggregating data from neighbors of a node and learning a model on the aggregated data. In social networks, the degree distribution of nodes follows a power law where few nodes have many neighbors. High degree nodes have incoming links from low degree nodes of different classes and many nodes have very few edges. Hence, using only local structure may lead to poor predictions. Also, many social networks allow for different types of interactions (retweet, reply, like, etc.) that affect classification differently. This article proposes a collective classification method that makes use of the structure of a network to determine its neighbors. It then presents experiments aimed at detecting jihadi propagandists and malware distributors on social networks.
- D. Cardon, "Le design de la visibilité: un essai de cartographie du web 2.0," Réseaux, vol. 152, pp. 93--137, 2008. [Online]. Available: http://www.cairn.info/revue-reseaux-2008-6-page-93.htm%5Cnhttp://www.internetactu.net/2008/02/01/le-design-de-la-visibilite-un-essai-de-typologie-du-web-20/Google Scholar
- Y. Freund and R. R. E. Schapire, "Experiments with a New Boosting Algorithm," International Conference on Machine Learning, pp. 148--156, 1996. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.6252Google Scholar
- L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5--32, 2001. Google ScholarDigital Library
- David E. Ruineihart, Geoffrey E. Hinton and R. J. Williams, "Learning internal representations by error propagation," Parallel distributed processing: explorations in the microstructure of cognition, no. 1, pp. 318--362, 1985.Google Scholar
- K. Nigam, J. Lafferty, and A. Mccallum, "Using Maximum Entropy for Text Classification," IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61--67, 1999.Google Scholar
- R. M. Neal, "Probabilistic Inference Using Markov Chain Monte Carlo Methods," Technical Report, vol. 1, pp. 1--144, 1998. [Online]. Available: papers2://publication/uuid/0C88167E-5379-4E4E-A9E4-007ABA4F716DGoogle Scholar
- P. Sen, G. Namata, M. Bilgic, L. Getoor, and B. Gallagher, "Collective Classification in Network Data," pp. 93--106, 2008.Google Scholar
- C. Castillo, D. Donato, and V. Murdock, "Know your Neighbors: Web Spam Detection using the Web Topology," Framework, pp. 423--430, 2007. [Online]. Available: http://www.dcc.uchile.cl/$\sim$ccastill/papers/cdgms_2006_know_your_neighbors.pdf Google ScholarDigital Library
- L. K. McDowell, K. M. Gupta, and D. W. Aha, "Cautious Collective Classification," J. Mach. Learn. Res., vol. 10, pp. 2777--2836, 2009. [Online]. Available: http://dl.acm.org/citation.cfm?id=1577069.1755879Google ScholarDigital Library
- S. a. Macskassy and F. Provost, "Classification in Networked Data: A Toolkit and a Univariate Case Study," Journal of Machine Learning Research, vol. 8, no. December 2004, pp. 935--983, 2007.Google Scholar
- L. Tang and H. Liu, "Leveraging social media networks for classification," Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447--478, 2011. Google ScholarDigital Library
- B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos, "Using ghost edges for classification in sparsely labeled networks," Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, p. 256, 2008. [Online]. Available: http://dl.acm.org/citation.cfm?doid=1401890.1401925Google Scholar
- C. Park, "Effective Label Acquisition for Collective Classification Categories and Subject Descriptors," Design.Google Scholar
- S. Chakrabarti, B. Dom, and P. Indyk, "Enhanced hypertext categorization using hyperlinks," Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98, no. March, pp. 307--318, 1998. [Online]. Available: http://portal.acm.org/citation.cfm?doid=276304.276332Google Scholar
- B. Perozzi, R. Al-Rfou, and S. Skiena, "DeepWalk: Online Learning of Social Representations," 2014. [Online]. Available: http://arxiv.org/abs/1403.6652%0Ahttp://dx.doi.org/10.1145/2623330.2623732 Google ScholarDigital Library
- S. Bhagat, G. Cormode, and S. Muthukrishnan, "Node Classification in Social Networks," 2011. [Online]. Available: http://arxiv.org/abs/1101.3291%0Ahttp://dx.doi.org/10.1007/978-1-4419-8462-3_5 Google ScholarCross Ref
- D. Jensen, J. Neville, and B. Gallagher, "Why collective inference improves relational classification," Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, p. 593, 2004. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1014052.1014125Google Scholar
- D. Zhou, B. Schölkopf, and T. Hofmann, "Semi-Supervised Learning on Directed Graphs," Adv. in Neur. Inf. Proc. Syst. (NIPS), vol. 17, pp. 1633--1640, 2005.Google Scholar
- Z. Yang, W. W. Cohen, and R. Salakhutdinov, "Revisiting Semi-Supervised Learning with Graph Embeddings," vol. 48, 2016. [Online]. Available: http://arxiv.org/abs/1603.08861Google Scholar
- S. A. Macskassy and F. Provost, "Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1," no. February, 2014.Google Scholar
- M. Nickel, V. Tresp, and H.-P. Kriegel, "Factorizing YAGO," Proceedings of the 21st international conference on World Wide Web - WWW '12, p. 271, 2012. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2187836.2187874Google ScholarDigital Library
- P. Kazienko and T. Kajdanowicz, "Label-dependent node classification in the network," Neurocomputing, vol. 75, no. 1, pp. 199--209, 2012. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2011.04.047 Google ScholarDigital Library
- D. Jensen and J. Neville, "Linkage and autocorrelation cause feature selection bias in relational learning," Proceedings of the Nineteenth International Conference on Machine Learning (ICML2002), pp. 259--266, 2002.Google Scholar
- D. Jensen, J. Neville, and M. Hay, "Avoiding Bias when Aggregating Relational Data with Degree Disparity," Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, no. 1, p. 274, 2003.Google Scholar
- F. D. Malliaros and M. Vazirgiannis, "Clustering and community detection in directed networks: A survey," Physics Reports, vol. 533, no. 4, pp. 95--142, 2013. Google ScholarCross Ref
- L. Wang, T. Lou, J. Tang, and J. E. Hopcroft, "Detecting Community Kernels in Large Social Networks."Google Scholar
- P. Pons and M. Latapy, "Computing communities in large networks using random walks," Lect Notes Comput Sc, vol. 3733, pp. 284--293, 2005. Google ScholarDigital Library
- M. Magnani and L. Rossi, "Multi-Stratum Networks: toward a unified model of on-line identities," arXiv preprint arXiv:1211.0169, pp. 1--18, 2012. [Online]. Available: http://arxiv.org/abs/1211.0169v1Google Scholar
- M. Kivelä, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. a. Porter, "Multilayer Networks," arXiv, p. 37, 2014. [Online]. Available: http://arxiv.org/abs/1309.7233Google Scholar
- S. Boccaletti, G. Bianconi, R. Criado, C. I. del Genio, J. Gómez-Gardeñes, M. Romance, I. Sendiña-Nadal, Z. Wang, and M. Zanin, "The structure and dynamics of multilayer networks," Physics Reports, vol. 544, no. 1, pp. 1--122, 2014. [Online]. Available: http://dx.doi.org/10.1016/j.physrep.2014.07.001 Google ScholarCross Ref
- O. Jaafor, "Multi-layered graph-based model for social engineering vulnerability assessment," in The international conference on Advances in Social Network Analysis and Mining (ASONAM). Paris, France: ACM, 2015, pp. 1480--1488. [Online]. Available: http://link.springer.com/bookseries/8768Google Scholar
- P. Kazienko, K. Musial, E. Kukla, and T. Kajdanowicz, "Multidimensional Social Network: Model and Analysis," in International Conference on Computational Collective Intelligence, 2011, pp. 378--387. Google ScholarCross Ref
- Collective classification in social networks
Recommendations
Collective Classification for Social Opinion Spam Detection
DSIT 2019: Proceedings of the 2019 2nd International Conference on Data Science and Information TechnologyWith increasingly more firms using online social media to market their products and services, so are the widely spread attacks to the consumer opinions posted to social media, namely the social opinion spam. Fake social opinions may inflate firms' own ...
User opinion classification in social media
A link-based approach, named global consistency maximization (GCM) is proposed for opinion classification.The proposed approach achieves higher accuracy than two baseline approaches.Link-based opinion classifiers are robust to a small training sample if ...
A Matrix Alignment Approach for Collective Classification
ASONAM '09: Proceedings of the 2009 International Conference on Advances in Social Network Analysis and MiningWithin networks there is often a pattern to the way nodes link to one another.It has been shown that the accuracy of node classification can be improved by using the link data.One of the challenges to integrating the attribute and link data, though, is ...
Comments