A Weighted Artificial Bee Colony algorithm for influence maximization

https://doi.org/10.1016/j.osnem.2021.100167Get rights and content

Abstract

Social media platforms are increasingly used to convey advertising campaigns for products or services. A key issue is to identify an appropriate set of influencers within a social network, investing resources to get them to adopt a product. Influence maximization is an optimization problem that aims at finding a small set of users that maximize the spread of influence in a social network. In this paper we propose an influence maximization algorithm, named Weighted Artificial Bee Colony (WABC), that is based on a bio-inspired technique for identifying a subset of users which maximizes the spread. The proposed algorithm has been applied to a case study that analyzes the propagation of information among Twitter users during the Constitutional Referendum held in Italy in 2016. Our analysis is aimed at identifying the main influencers of the yes and no factions, and deriving the main information diffusion strategies of each faction during the political campaign. WABC outperformed ranking-proxy techniques based on classical centrality measures, i.e., PageRank, Rank and Degree. Even compared to DIRIE, which exploits a more complex algorithm, WABC was able to find a more accurate set of users which allows to maximize the spread in almost all the considered configurations.

Introduction

Millions of people every day interact on social media platforms by generating large amounts of data [1], which can be exploited for extracting valuable information in different application contexts, such as information diffusion [2], sentiment [3] and opinion mining [4], [5], news gathering [6] and misinformation blocking [7].

A very active research area that seeks to exploit the data available on social media is viral marketing. Viral marketing or viral advertising is a business strategy that uses social media to promote a product or service. An efficient way for performing a good marketing campaign is to identify an appropriate set of influencers among users and invest resources to make them adopt a product/service. This can lead to a cascade process, influencing consumer preferences in a large part of the network [8], [9].

Influence maximization is an optimization problem that aims at finding a small set of users that maximize the spread of influence in a social network [10]. Initially proposed as a stochastic optimization problem in [11], it consists in identifying a set of k users with the greatest overall influence, by analyzing the structure of the network and user interconnections, as well as user-specific features such as demographic properties [12].

Influence maximization is an NP-Hard problem, with two sources of hardness: (i) the complexity of computing the spread, i.e. the number of influenced users; (ii) the combinatorial nature of identifying the best solution, that maximizes the influence, among all possible combinations. For this reason, implementing efficient influence maximization algorithms requires the use of heuristic methods and also of parallel computing models. An effective parallel computing paradigm to be used here is the Bulk Synchronous Parallel (BSP) model, that simplifies the implementation of parallel applications by exploiting distributed-memory parallelism. An efficient implementation of BSP is provided by the Apache Hama framework.

This paper describes the functioning and the implementation of an influence maximization algorithm, namely Weighted Artificial Bee Colony (WABC), aimed at identifying a subset of users which maximizes the spread. It is based on a bio-inspired approach based on the Artificial Bee Colony algorithm [13] that has been modified for implementing the influence maximization task [14], by introducing several changes and improvements with respect to previous related work. In particular, the proposed algorithm exploits an effective approach to evaluate the fitness value, which can be considered as the resolution of a reachability problem centered on the paths of maximum probability. We also addressed the influence overlap problem of classical influence ranking-proxy algorithms, avoiding the negative effects caused by influence redundancy during the maximization process. Moreover, the proposed algorithm is less sensitive to parameter tuning in comparison to related work, as it dynamically sets the depth at which to explore the graph, focusing more on the most promising paths. All of these factors contribute in making the model able to produce an accurate estimate of the total spread for the final seed set, which is useful for estimating the number of users who will actually be influenced.

The WABC algorithm has been applied to a case study that analyzes the propagation of information in Twitter during the Constitutional Referendum held in Italy in 2016, for identifying the main influencers of the two factions, i.e. yes and no, and deriving the main information diffusion strategies of each faction during the political campaign. We experimentally evaluated the accuracy of the WABC algorithm through its implementation in Apache Hama. For analyzing qualitative aspects, we classified the identified influencers according to their profile (journalistic page, political activist, popular or normal user) to better determine the type of political campaign. We carried out several simulations in order to measure their influence strength. For what concerns quantitative analysis, we compared the obtained results with both standard ABC algorithm and other related state-of-art techniques in terms of computing time, evaluated spread and relative error on the expected spread. Specifically, WABC turned out to be more time consuming than its classical version (ABC), but much more accurate in determining the expected spread, with an up to 24% decrease of the relative estimation error. Furthermore, it outperformed ranking-proxy techniques based on classical centrality measures, i.e., PageRank, Rank and Degree, with an up to 40% improvement. Even compared to DIRIE, which is based on the Independent Cascade model and exploits a more complex algorithm, WABC was able to find a more accurate set of users which allows to maximize the spread in almost all the considered configurations. Overall, the obtained results confirm the effectiveness of the proposed approach in identifying the leading influencers of a social network and understanding the main information diffusion strategies.

The remainder of the paper is organized as follows. Section 2 describes the main information diffusion models used in literature. Section 3 discusses influence maximization related work. Section 4 describes the proposed algorithm. Section 5 presents the experimental evaluation on a case study, and Section 6 concludes the paper.

Section snippets

Information diffusion models

Interactions among users of a social network can be represented as a directed graph G=(V,E), where V is the set of users in the network and E represents the relationship among them as edges directed from one vertex to another. The influence exercises by a user on the other members of the network is modeled as a function p:E[0,1] that associates a weight to each relationship (u,v)E. Given a user node uV, we define with Nin(u) and Nout(u) the sets of users vV for which there exists a

Related work

The problem of identifying a set of k elements that maximizes the spread σ is an NP-Hard optimization problem. However, thanks to the properties of monotonicity and submodularity of σ, a greedy hill-climbing procedure, which selects at each iteration the most promising node in terms of influence spread, provides a pseudo-optimal solution S, achieving a (11e) approximation ratio.

Despite the theoretical bound provided by the greedy algorithm, the influence maximization task remains hard to

Proposed algorithm

In recent years, nature has been a great source of inspiration for the development of different algorithms aimed at solving many real world optimization problems [32]. These bio-inspired techniques are related to Swarm Intelligence (SI), a particular field of Artificial Intelligence (AI) based on observing the behavior of social animals such as ants and bees. Swarm Intelligence can be defined as the collective behavior of decentralized and self-organized systems, in which the interaction among

Experimental evaluation

In this section, we evaluated the performances of the proposed algorithm implemented by the Apache Hama framework and applied to the influence maximization task. Experiments have been designed for answering the following research questions: (i) what are the main advantages of the WABC algorithm with respect to its original version (ABC)? (ii) how does WABC perform compared to the other state-of-art ranking-proxy approaches?

Conclusion

Influence maximization is an optimization problem aimed at finding a k-seed set which maximizes the spread of influence in a social network. This problem is a central one in understanding how information flows within a network of users, and is related to a wide range of applications in viral marketing, advertisement and news spread. In this paper we proposed a bio-inspired influence maximization algorithm, namely Weighted Artificial Bee Colony (WABC), improving fitness evaluation with respect

CRediT authorship contribution statement

Riccardo Cantini: Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing, Visualization. Fabrizio Marozzo: Conceptualization, Methodology, Validation, Writing – original draft, Writing – review & editing, Supervision. Silvio Mazza: Software, Investigation, Data curation, Visualization. Domenico Talia: Writing – review & editing, Supervision, Funding acquisition. Paolo Trunfio: Writing – review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work has been supported by the ASPIDE Project funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 801091.

References (34)

  • SunJ. et al.

    A survey of models and algorithms for social influence analysis

  • GuilleA. et al.

    Information diffusion in online social networks: A survey

    ACM Sigmod Rec.

    (2013)
  • BanerjeeS. et al.

    A survey on influence maximization in a social network

    Knowl. Inf. Syst.

    (2020)
  • P. Domingos, M. Richardson, Mining the network value of customers, in: Proceedings of the Seventh ACM SIGKDD...
  • StoicaA.-A. et al.

    Fairness in social influence maximization

  • KarabogaD.

    An Idea Based on Honey Bee Swarm for Numerical OptimizationTechnical Report-tr06

    (2005)
  • SankarC.P. et al.

    Learning from bees: An approach for influence maximization on viral campaigns

    PLoS One

    (2016)
  • Cited by (7)

    • Topic Detection and Tracking in Social Media Platforms

      2023, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
    View all citing articles on Scopus
    View full text