skip to main content
10.1145/3110025.3110128acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Collective classification in social networks

Authors Info & Claims
Published:31 July 2017Publication History

ABSTRACT

Classification is one of the most studied subjects in machine learning. Most classification methods that were developed this last decade either account for structure (interactions, relationships) or attributes (text, numerical, etc). This leads to ignoring significant patterns in a dataset that could only be captured by analyzing the features of an item and its interactions. Collective classification methods use both structure and attributes, often by aggregating data from neighbors of a node and learning a model on the aggregated data. In social networks, the degree distribution of nodes follows a power law where few nodes have many neighbors. High degree nodes have incoming links from low degree nodes of different classes and many nodes have very few edges. Hence, using only local structure may lead to poor predictions. Also, many social networks allow for different types of interactions (retweet, reply, like, etc.) that affect classification differently. This article proposes a collective classification method that makes use of the structure of a network to determine its neighbors. It then presents experiments aimed at detecting jihadi propagandists and malware distributors on social networks.

References

  1. D. Cardon, "Le design de la visibilité: un essai de cartographie du web 2.0," Réseaux, vol. 152, pp. 93--137, 2008. [Online]. Available: http://www.cairn.info/revue-reseaux-2008-6-page-93.htm%5Cnhttp://www.internetactu.net/2008/02/01/le-design-de-la-visibilite-un-essai-de-typologie-du-web-20/Google ScholarGoogle Scholar
  2. Y. Freund and R. R. E. Schapire, "Experiments with a New Boosting Algorithm," International Conference on Machine Learning, pp. 148--156, 1996. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.6252Google ScholarGoogle Scholar
  3. L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. David E. Ruineihart, Geoffrey E. Hinton and R. J. Williams, "Learning internal representations by error propagation," Parallel distributed processing: explorations in the microstructure of cognition, no. 1, pp. 318--362, 1985.Google ScholarGoogle Scholar
  5. K. Nigam, J. Lafferty, and A. Mccallum, "Using Maximum Entropy for Text Classification," IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61--67, 1999.Google ScholarGoogle Scholar
  6. R. M. Neal, "Probabilistic Inference Using Markov Chain Monte Carlo Methods," Technical Report, vol. 1, pp. 1--144, 1998. [Online]. Available: papers2://publication/uuid/0C88167E-5379-4E4E-A9E4-007ABA4F716DGoogle ScholarGoogle Scholar
  7. P. Sen, G. Namata, M. Bilgic, L. Getoor, and B. Gallagher, "Collective Classification in Network Data," pp. 93--106, 2008.Google ScholarGoogle Scholar
  8. C. Castillo, D. Donato, and V. Murdock, "Know your Neighbors: Web Spam Detection using the Web Topology," Framework, pp. 423--430, 2007. [Online]. Available: http://www.dcc.uchile.cl/$\sim$ccastill/papers/cdgms_2006_know_your_neighbors.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. K. McDowell, K. M. Gupta, and D. W. Aha, "Cautious Collective Classification," J. Mach. Learn. Res., vol. 10, pp. 2777--2836, 2009. [Online]. Available: http://dl.acm.org/citation.cfm?id=1577069.1755879Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. a. Macskassy and F. Provost, "Classification in Networked Data: A Toolkit and a Univariate Case Study," Journal of Machine Learning Research, vol. 8, no. December 2004, pp. 935--983, 2007.Google ScholarGoogle Scholar
  11. L. Tang and H. Liu, "Leveraging social media networks for classification," Data Mining and Knowledge Discovery, vol. 23, no. 3, pp. 447--478, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Gallagher, H. Tong, T. Eliassi-Rad, and C. Faloutsos, "Using ghost edges for classification in sparsely labeled networks," Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, p. 256, 2008. [Online]. Available: http://dl.acm.org/citation.cfm?doid=1401890.1401925Google ScholarGoogle Scholar
  13. C. Park, "Effective Label Acquisition for Collective Classification Categories and Subject Descriptors," Design.Google ScholarGoogle Scholar
  14. S. Chakrabarti, B. Dom, and P. Indyk, "Enhanced hypertext categorization using hyperlinks," Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98, no. March, pp. 307--318, 1998. [Online]. Available: http://portal.acm.org/citation.cfm?doid=276304.276332Google ScholarGoogle Scholar
  15. B. Perozzi, R. Al-Rfou, and S. Skiena, "DeepWalk: Online Learning of Social Representations," 2014. [Online]. Available: http://arxiv.org/abs/1403.6652%0Ahttp://dx.doi.org/10.1145/2623330.2623732 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Bhagat, G. Cormode, and S. Muthukrishnan, "Node Classification in Social Networks," 2011. [Online]. Available: http://arxiv.org/abs/1101.3291%0Ahttp://dx.doi.org/10.1007/978-1-4419-8462-3_5 Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Jensen, J. Neville, and B. Gallagher, "Why collective inference improves relational classification," Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, p. 593, 2004. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1014052.1014125Google ScholarGoogle Scholar
  18. D. Zhou, B. Schölkopf, and T. Hofmann, "Semi-Supervised Learning on Directed Graphs," Adv. in Neur. Inf. Proc. Syst. (NIPS), vol. 17, pp. 1633--1640, 2005.Google ScholarGoogle Scholar
  19. Z. Yang, W. W. Cohen, and R. Salakhutdinov, "Revisiting Semi-Supervised Learning with Graph Embeddings," vol. 48, 2016. [Online]. Available: http://arxiv.org/abs/1603.08861Google ScholarGoogle Scholar
  20. S. A. Macskassy and F. Provost, "Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1," no. February, 2014.Google ScholarGoogle Scholar
  21. M. Nickel, V. Tresp, and H.-P. Kriegel, "Factorizing YAGO," Proceedings of the 21st international conference on World Wide Web - WWW '12, p. 271, 2012. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2187836.2187874Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Kazienko and T. Kajdanowicz, "Label-dependent node classification in the network," Neurocomputing, vol. 75, no. 1, pp. 199--209, 2012. [Online]. Available: http://dx.doi.org/10.1016/j.neucom.2011.04.047 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Jensen and J. Neville, "Linkage and autocorrelation cause feature selection bias in relational learning," Proceedings of the Nineteenth International Conference on Machine Learning (ICML2002), pp. 259--266, 2002.Google ScholarGoogle Scholar
  24. D. Jensen, J. Neville, and M. Hay, "Avoiding Bias when Aggregating Relational Data with Degree Disparity," Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, no. 1, p. 274, 2003.Google ScholarGoogle Scholar
  25. F. D. Malliaros and M. Vazirgiannis, "Clustering and community detection in directed networks: A survey," Physics Reports, vol. 533, no. 4, pp. 95--142, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  26. L. Wang, T. Lou, J. Tang, and J. E. Hopcroft, "Detecting Community Kernels in Large Social Networks."Google ScholarGoogle Scholar
  27. P. Pons and M. Latapy, "Computing communities in large networks using random walks," Lect Notes Comput Sc, vol. 3733, pp. 284--293, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Magnani and L. Rossi, "Multi-Stratum Networks: toward a unified model of on-line identities," arXiv preprint arXiv:1211.0169, pp. 1--18, 2012. [Online]. Available: http://arxiv.org/abs/1211.0169v1Google ScholarGoogle Scholar
  29. M. Kivelä, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. a. Porter, "Multilayer Networks," arXiv, p. 37, 2014. [Online]. Available: http://arxiv.org/abs/1309.7233Google ScholarGoogle Scholar
  30. S. Boccaletti, G. Bianconi, R. Criado, C. I. del Genio, J. Gómez-Gardeñes, M. Romance, I. Sendiña-Nadal, Z. Wang, and M. Zanin, "The structure and dynamics of multilayer networks," Physics Reports, vol. 544, no. 1, pp. 1--122, 2014. [Online]. Available: http://dx.doi.org/10.1016/j.physrep.2014.07.001 Google ScholarGoogle ScholarCross RefCross Ref
  31. O. Jaafor, "Multi-layered graph-based model for social engineering vulnerability assessment," in The international conference on Advances in Social Network Analysis and Mining (ASONAM). Paris, France: ACM, 2015, pp. 1480--1488. [Online]. Available: http://link.springer.com/bookseries/8768Google ScholarGoogle Scholar
  32. P. Kazienko, K. Musial, E. Kukla, and T. Kajdanowicz, "Multidimensional Social Network: Model and Analysis," in International Conference on Computational Collective Intelligence, 2011, pp. 378--387. Google ScholarGoogle ScholarCross RefCross Ref
  1. Collective classification in social networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASONAM '17: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017
          July 2017
          698 pages
          ISBN:9781450349932
          DOI:10.1145/3110025

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 July 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate116of549submissions,21%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader