A Weighted Artificial Bee Colony algorithm for influence maximization
Introduction
Millions of people every day interact on social media platforms by generating large amounts of data [1], which can be exploited for extracting valuable information in different application contexts, such as information diffusion [2], sentiment [3] and opinion mining [4], [5], news gathering [6] and misinformation blocking [7].
A very active research area that seeks to exploit the data available on social media is viral marketing. Viral marketing or viral advertising is a business strategy that uses social media to promote a product or service. An efficient way for performing a good marketing campaign is to identify an appropriate set of influencers among users and invest resources to make them adopt a product/service. This can lead to a cascade process, influencing consumer preferences in a large part of the network [8], [9].
Influence maximization is an optimization problem that aims at finding a small set of users that maximize the spread of influence in a social network [10]. Initially proposed as a stochastic optimization problem in [11], it consists in identifying a set of users with the greatest overall influence, by analyzing the structure of the network and user interconnections, as well as user-specific features such as demographic properties [12].
Influence maximization is an NP-Hard problem, with two sources of hardness: () the complexity of computing the spread, i.e. the number of influenced users; () the combinatorial nature of identifying the best solution, that maximizes the influence, among all possible combinations. For this reason, implementing efficient influence maximization algorithms requires the use of heuristic methods and also of parallel computing models. An effective parallel computing paradigm to be used here is the Bulk Synchronous Parallel (BSP) model, that simplifies the implementation of parallel applications by exploiting distributed-memory parallelism. An efficient implementation of BSP is provided by the Apache Hama framework.
This paper describes the functioning and the implementation of an influence maximization algorithm, namely Weighted Artificial Bee Colony (WABC), aimed at identifying a subset of users which maximizes the spread. It is based on a bio-inspired approach based on the Artificial Bee Colony algorithm [13] that has been modified for implementing the influence maximization task [14], by introducing several changes and improvements with respect to previous related work. In particular, the proposed algorithm exploits an effective approach to evaluate the fitness value, which can be considered as the resolution of a reachability problem centered on the paths of maximum probability. We also addressed the influence overlap problem of classical influence ranking-proxy algorithms, avoiding the negative effects caused by influence redundancy during the maximization process. Moreover, the proposed algorithm is less sensitive to parameter tuning in comparison to related work, as it dynamically sets the depth at which to explore the graph, focusing more on the most promising paths. All of these factors contribute in making the model able to produce an accurate estimate of the total spread for the final seed set, which is useful for estimating the number of users who will actually be influenced.
The WABC algorithm has been applied to a case study that analyzes the propagation of information in Twitter during the Constitutional Referendum held in Italy in 2016, for identifying the main influencers of the two factions, i.e. and , and deriving the main information diffusion strategies of each faction during the political campaign. We experimentally evaluated the accuracy of the WABC algorithm through its implementation in Apache Hama. For analyzing qualitative aspects, we classified the identified influencers according to their profile (journalistic page, political activist, popular or normal user) to better determine the type of political campaign. We carried out several simulations in order to measure their influence strength. For what concerns quantitative analysis, we compared the obtained results with both standard ABC algorithm and other related state-of-art techniques in terms of computing time, evaluated spread and relative error on the expected spread. Specifically, WABC turned out to be more time consuming than its classical version (ABC), but much more accurate in determining the expected spread, with an up to 24% decrease of the relative estimation error. Furthermore, it outperformed ranking-proxy techniques based on classical centrality measures, i.e., PageRank, Rank and Degree, with an up to 40% improvement. Even compared to DIRIE, which is based on the Independent Cascade model and exploits a more complex algorithm, WABC was able to find a more accurate set of users which allows to maximize the spread in almost all the considered configurations. Overall, the obtained results confirm the effectiveness of the proposed approach in identifying the leading influencers of a social network and understanding the main information diffusion strategies.
The remainder of the paper is organized as follows. Section 2 describes the main information diffusion models used in literature. Section 3 discusses influence maximization related work. Section 4 describes the proposed algorithm. Section 5 presents the experimental evaluation on a case study, and Section 6 concludes the paper.
Section snippets
Information diffusion models
Interactions among users of a social network can be represented as a directed graph , where is the set of users in the network and represents the relationship among them as edges directed from one vertex to another. The influence exercises by a user on the other members of the network is modeled as a function that associates a weight to each relationship . Given a user node , we define with and the sets of users for which there exists a
Related work
The problem of identifying a set of elements that maximizes the spread is an NP-Hard optimization problem. However, thanks to the properties of monotonicity and submodularity of , a greedy hill-climbing procedure, which selects at each iteration the most promising node in terms of influence spread, provides a pseudo-optimal solution , achieving a approximation ratio.
Despite the theoretical bound provided by the greedy algorithm, the influence maximization task remains hard to
Proposed algorithm
In recent years, nature has been a great source of inspiration for the development of different algorithms aimed at solving many real world optimization problems [32]. These bio-inspired techniques are related to Swarm Intelligence (SI), a particular field of Artificial Intelligence (AI) based on observing the behavior of social animals such as ants and bees. Swarm Intelligence can be defined as the collective behavior of decentralized and self-organized systems, in which the interaction among
Experimental evaluation
In this section, we evaluated the performances of the proposed algorithm implemented by the Apache Hama framework and applied to the influence maximization task. Experiments have been designed for answering the following research questions: () what are the main advantages of the WABC algorithm with respect to its original version (ABC)? () how does WABC perform compared to the other state-of-art ranking-proxy approaches?
Conclusion
Influence maximization is an optimization problem aimed at finding a -seed set which maximizes the spread of influence in a social network. This problem is a central one in understanding how information flows within a network of users, and is related to a wide range of applications in viral marketing, advertisement and news spread. In this paper we proposed a bio-inspired influence maximization algorithm, namely Weighted Artificial Bee Colony (WABC), improving fitness evaluation with respect
CRediT authorship contribution statement
Riccardo Cantini: Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing, Visualization. Fabrizio Marozzo: Conceptualization, Methodology, Validation, Writing – original draft, Writing – review & editing, Supervision. Silvio Mazza: Software, Investigation, Data curation, Visualization. Domenico Talia: Writing – review & editing, Supervision, Funding acquisition. Paolo Trunfio: Writing – review & editing, Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work has been supported by the ASPIDE Project funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 801091.
References (34)
- et al.
Social media big data analytics: A survey
Comput. Hum. Behav.
(2019) - et al.
Online social networks and information diffusion: The role of ego networks
Online Soc. Netw. Media
(2017) Mining social media for newsgathering: A review
Online Soc. Netw. Media
(2019)- et al.
Efficient and timely misinformation blocking under varying cost constraints
Online Soc. Netw. Media
(2017) - et al.
Community-based influence maximization in social networks under a competitive linear threshold model
Knowl.-Based Syst.
(2017) - et al.
CT-IC: Continuously activated and time-restricted independent cascade model for viral marketing
Knowl.-Based Syst.
(2014) - et al.
A survey of swarm intelligence for dynamic optimization: Algorithms and applications
Swarm Evol. Comput.
(2017) - C.J. Hutto, E. Gilbert, Vader: A parsimonious rule-based model for sentiment analysis of social media text, in: Eighth...
- et al.
Learning political polarization on social media using neural networks
IEEE Access
(2020) - et al.
Analyzing polarization of social media users and news sites during political campaigns
Soc. Netw. Anal. Min.
(2018)
A survey of models and algorithms for social influence analysis
Information diffusion in online social networks: A survey
ACM Sigmod Rec.
A survey on influence maximization in a social network
Knowl. Inf. Syst.
Fairness in social influence maximization
An Idea Based on Honey Bee Swarm for Numerical OptimizationTechnical Report-tr06
Learning from bees: An approach for influence maximization on viral campaigns
PLoS One
Cited by (7)
Prediction of soil properties based on characteristic wavelengths with optimal spectral resolution by using Vis-NIR spectroscopy
2023, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyTopic Detection and Tracking in Social Media Platforms
2023, Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICSTProgramming big data analysis: principles and solutions
2022, Journal of Big Data