Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The problem of monitoring social behavior including their security implications have become in a paramount task for security agencies around the world. Therefore, sampling and monitoring social networks such as Facebook is a major application area that can be seen as means to detect emerging social behavior in a virtual environment. This kind of uncertainty about what is actually happening around us has led us to develop control mechanisms able to cope with information flows generated by the Open Social Networks (OSNs).

There is a growing tendency to exchange information through the OSNs e.g., messages, emails, pictures, videos and breaking news [1]. This property of the OSNs has led us to interact with social media systems, losing in some degree our privacy rights. Under such a situation, many users deliberately share all their information with the rest of the world by just uploading or posting something on any OSN. In any attempt to monitor emerging social behavior in Facebook it is necessary to distinguish what kind of information sources we are interested i.e., messages, pictures, locations, events, users, etc. All this information can be collected for further examination.

In this study we consider the Facebook structure as a whole which is characterized by a graph, where all its entities such as vertices and edges are modeled from a Graph Theory point of view [2]. However, it should be stressed that features describing the network structure i.e., node degree, number of common friends, shortest path length will not be considered in this study.

In Sect. 2 we present a review of the literature on four different random movement models together with their statistical properties. In the second part of the chapter we present an exploratory system that searches for “key events” and how to use optimal search strategies in Facebook. We also review the paths generated by the random walk models. The third part of the chapter addresses the designed set of experiments to evaluate our assumptions and the exploration feature for sampling emergent social behavior on Facebook. Finally, we report representative results on real time data. We further demonstrate that random walk models are particularly suitable for searching large-scale social networks.

2 Random Walk Models

Traditionally, random walk models have addressed the question of what is the best statistical strategy to adapt in order to search efficiency for randomly located objects [3]. In this regard, we have consider a set of random walks that depends on the probability distribution of step lengths taken by a random walker. Thus, the question of how direction of movement affects the path of the walker provides us with various random movement models to be tested. Generally, it is possible to identify a set of diffusion models where movement paths present idealized patterns which captures some of the essential dynamics of animal foraging.

According to [4] a general property arises from the using of optimal strategies: the more heterogeneous the target distribution, the more probable is that after an encounter a search starts with a nearby target. Furthermore, it is important to note that there is not a universal solution to searching problems i.e., this kind of problems are sensitive to initial conditions. Thus, bio-inspired movement paths can be optimal strategies for searching randomly targets, this kind of models show interesting properties e.g., scale invariance (fractal property) and super diffusion at all scales [5].

A foraging task could be defined as an extensive searching until a target is found, followed by an intensive grazing period of limited length that is generally an efficient strategy in all types of environments [7]. Generally, random walk models can be considered as processes with an associated high degree of uncertainty. However, an important feature of these models is that they can provide a behavioral flexibility to adapt in different scenarios [8].

A random walk model is a formalization of the intuitive idea of taking successive steps, each in a random direction. Thus, they are simple stochastic processes consisting of a discrete sequence of displacement events (i.e., move lengths) separated by successive reorientation events (i.e., turning angles) [8]. The influence of reorientations is another key concept that we must take into account, in particular we need to consider how this degree of directional memory (i.e., persistence) in the walk has led the explorer to optimize encounter rates [9]. There are two different features related to turning angle distributions:

  1. 1.

    The shape (relative kurtosis Footnote 1) of the angular distribution and

  2. 2.

    The correlations between successive relative orientations (directional memory).

This variability in the shape of the turning angle distributions can appear only at limited spatio-temporal scales [9] i.e., after a certain amount of time some random walk models tend to settle down as a Gaussian distribution [7]. It should be noted that the simplicity of random walk models is methodologically attractive for using in several scenarios [10]. Thus, improving the ability to search involves the selection of a specific set of “rules of search” that enhances the probability of finding unknown located items [11]. It seems that different optimal solutions arise by merely embracing different random strategies [6]. Therefore, in any given environment there might be a range of search strategies that can be successful, and individuals may differ in the search strategy used [11].

In the present work we focus our attention on those particular cases where the efficiency of searching and sampling is fully determined by the statistical features of turning angle distributions. In this chapter we consider the problem of a social explorer searching for specific information inside a OSN over a short-time period. Thus, it raises the question of whether a social explorer is able to cope with an OSN (Facebook).

Fig. 1
figure 1

Plots showing simulated random walks. a Lévy walk with \(\mu = 2\); b Brownian walk with \(\mu = 3\); c Correlated Random Walk (CRW) with \(0 \le \rho \le 1\) and d adaptive or composite strategy switching between Lévy and brownian motion

The statistical properties of these random walks are as follows:

  • Lévy Walk: Lévy walks are a class of stochastic processes based on the Lévy-stable distribution. The stochastic processes arising from such distributions are tightly related to anomalous diffusion phenomena. Furthermore, in Lévy walks, a explorer must move along the trajectory and the time to complete a jump is involved [8]. From the statistical point of view, Lévy walks are characterized by a distribution function \(P(l_j) \sim l_j^{-\mu } \) with \(1 < \mu \le 3\) where \(l_j\) is the flight length and the symbol \(\sim \) refers to the asymptotic limiting behavior as the relevant quantity goes to infinity or zero [3]. In a Lévy walk the turning angles are usually not directly considered (turning angles are uniform on the unit circle). For the special case when \(\mu \ge 3\) a Gaussian distribution arises due to the central limit theorem [12]. On the other hand, when \(\mu \le 1\) the probability distribution cannot be normalized. The exponent of the power-law is named the Lévy index (\(\mu \)) and controls the range of correlations in the movement, introducing a family of distributions, ranging from Brownian motion \((\mu > 3)\) to straight-line paths \((\mu \rightarrow 1)\) [11]. Figure 1a shows a plot of a simulated Lévy walk.

  • Brownian Motion: The classic diffusion model known as Brownian motion it is a simple strategy uncorrelated and unbiased. Uncorrelated means that the direction of movement is completely independent of the previous direction and unbiased means that there is no preferred direction: the direction moved at each step is completely random. This model presents some features such as: explorations over short distances can be made in much shorter times than explorations over long distances. The random walker tends to explore a given region of space rather thoroughly. It tends to return to the same point many times before finally wandering away. It chooses new regions to explore blindly. In addition, the random walker has no any tendency to move toward regions that it has not occupied before. It has absolutely no inkling of the past and finally its track does not fill up the space uniformly [13]. Figure 1b shows a plot of a simulated Brownian walk.

  • Correlated Random Walk (CRW): Correlated random walks (CRWs) involve a correlation between successive step orientations, where movement paths show persistence. This produces a local directional bias: each step tends to point in the same direction as the previous one [7]. The CRW consists of a series of discrete steps of length \(l_j\) and direction \(\theta _j\). The length \(l_j\) of the \(j\)th move and the turning angle \(\phi _j = \theta _{j + 1} - \theta _j \) are assumed to be random variables with no autocorrelation or cross-correlation (and no correlation between step length and step direction) [7]. Furthermore, the simplest way to incorporate directional persistence into a random walk model is introducing correlations (i.e., memory effects) between successive random walk steps. Thus, the trajectories generated by correlated random walk models appear more similar to the empirical data than those generated by uncorrelated random walks (e.g., Brownian motion). This model controls directional persistence via the probability distribution of turning angles, combining persistence with a preferred direction. This strategy is characterized by using a wrapped Cauchy distribution (WCD) [14] for the turning angles

    $$\begin{aligned} \theta = \Bigg [ 2 \times \arctan \Bigg ( \frac{(1 - \rho ) \times \tan (\pi \times (r - 0.5))}{1 + \rho }\Bigg )\Bigg ], \end{aligned}$$
    (1)

    where \(\rho \) is the shape parameter \((0 \le \rho \le 1)\) and \(r\) is an uniformly distributed random variable \(r \in [0,1]\). Directional persistence is controlled by changing the shape parameter of the WCD (\(\rho \)). Thus, for \(\rho = 0\) we obtain an uniform distribution with no correlation between successive steps (Brownian motion), and for \(\rho = 1\), we get a delta distribution at \(0^\circ \), leading to straight-line searches. Figure 1c shows a plot of a simulated CRW.

  • Adaptive or Composite strategy: In some cases when purely random searching models become less effective, the explorer must attempt to move in such a way so as to optimize their chances of locating targets by increasing the chances of covering a given area [11]. A random search model where the explorer can change its behavior depending on the environmental conditions is known as an adaptive search [4]. Thus, a composite strategy or an adaptive random walk consists of an explorer undergoing an extensive searchFootnote 2 (in this study: a Lévy walk) until a target is encountered, at which point the explorer changes to Brownian motion to undergo an intensive search [1517]. This kind of model switches between a Brownian motion and a Lévy walk according to a biological oscillator used by Nurzaman et al. [18]. This switch is based on environmental changes (encounter rate) sensed by the forager. We use an adaptive switching behavior defined in [18, 19]. Specifically, we compute \(P(t) = \exp (-z(t))\) with a conditional function where if \(P(t) = 1\), then a Brownian motion is triggered. Otherwise, a Lévy walk is used as a default behavior. Figure 1d shows a plot of a simulated composite strategy.

3 Emerging Social Behavior in Facebook

We develop a social explorer that searches for “key events” and provides a quick mechanism to monitor Facebook, tracking the ebb and flow of social media as an innovative approach. It should be stressed that most of the emerging social events are regularly reported by any social network e.g., Facebook, Twitter and Foursquare. This consideration implies that Social Media has become a primary source of information about events and developing situations.

Velasco and Keller in [20, 21] respectively, pointed out that the main goal is to view publicly available open source and non-private social data that is readily available on the Internet. Our application is not focused on specific persons or protected groups. It is mainly focused on words related to events and activities related on social behavior or potential threats. Examples of these words include lock down, bomb, suspicious package, white powder, active shoot, etc. This technique intends to capture Facebook users related to a specific keyword or key event with the aim to display an activity plot (see Fig. 2) showing popular or emerging events according to their attending list obtained from Facebook. These plots present a graphical random distribution based on any of the previously mentioned random walk models presented in Sect, 2. Thus, our study can be considered as a way of quickly getting an idea about whether a status update has been posted by someone that potentially could be involved in a particular event.

Fig. 2
figure 2

An example plot showing a random distribution of nodes/users on Facebook. This distribution is based on a keyword as a parameter of search. Users are displayed in various clusters according to a particular attractor, in this case a key event

One of the main problems highlighted in social media is about location and mapping “bad actors” and analyzing their movements, vulnerabilities, limitations, and possible adverse actions [20]. Thus, the use of social media to target specific users or groups of users becomes in a high priority plan for security agencies around the world. It seems reasonable to assume that security agencies could be alerted if some specific search produces evidence of breaking events, incidents, and emerging social behavior. These breaking events are graphically symbolized by clusters of users located around their respective activities on Facebook (see Fig. 2).

4 How to Use Optimal Search Strategies in Facebook

Initially, we consider the Facebook social network as an unknown complex environment modeled by an undirected graph \(G = (V,E)\), where \(V\) is a set of vertices assuming the role of nodes (users) and \(E\) assuming two different roles. On one hand, the first role is determined by those paths traced by the social explorer. On the other hand, the second role is determined by all the edges linking a key event to its attending list of users. In this work we consider two stages: \((1)\) analysis of movement pattern and \((2)\) linkage between a key event and its corresponding attending list.

4.1 Analysis of Movement Pattern

This stage consists of a social explorer that carries out an extensive search based on a keyword until a target node is found. The searching mechanism is based on one of the random walk models mentioned in Sect. 2. Thus, we propose a single social explorer to interact with Facebook with the intention to obtain publicly available information from this network.

Facebook is a large-scale social network that is continually changing over time [22]. Therefore, Facebook is essentially unknowable. But at the same time, it does not need to be known in detail [1, 23].

A sampling example movement path based in random walk models explained in Sect. 2 is given in Fig. 3. In this Figure we show our proposed process of a random walk carried out in Facebook. Firstly, a sample is obtained from a RW model. In this case the information is provided as a set of key event nodes. It should be noted that these key event nodes have a list of attending users (not displayed at this stage). Thus, all nodes are placed randomly on the graphical user interface (GUI), see Fig. 3a.

Fig. 3
figure 3

Representative examples of a movement path based on a random walk model displaying only key event nodes (a), and a movement path linking all the key event nodes to their respective edges forming a statistically generated pattern (b)

In Fig. 3b, the key event nodes are linked to each other according to a specific random walk model. Therefore, a movement pattern is statistically generated by a stochastic process and depicted on the GUI. It is important to mention that this kind of links are merely conceptual i.e., we are taking into consideration the underlying processes that generate them. As described earlier the initial stage works as a tracking and monitoring system based on initial conditions determined by a keyword.

4.2 Linkage Between a Key Event and Its Corresponding List of Attending

In this second stage, each user who is attending to a particular event is directly linked to its event, i.e., a key event becomes in a kernel for all the surrounding users (see Fig. 4a). This linkage process continues for every single event that belongs to the random sample. Thus, each event is linked to a group of users forming various clusters of nodes presented in the GUI. At this point, it is possible to distinguish between two types of elements: events and users; and at the same time it is also feasible to identify which user is attending to what event, see Fig. 4b.

Fig. 4
figure 4

Representative examples of (a) the early stage of the structural pattern forming a kernel surrounded by collections of nodes, and (b) the final stage showing a set of clusters linking all the key event nodes to their respective attending group of users

5 Methods

We have designed a set of experiments to evaluate our assumptions about improving the ability of searching and sampling in Facebook using random walk models. We investigated the influence of random walks on the probability of successful social network exploration. We also investigated the exploration feature for sampling emergent social behavior through viewing publicly available social data. We used our social explorer for searching a list of twelve proposed keywordsFootnote 3 (see Table 1). The investigations were conducted using the following metrics: number of vertices, number of edges, number of events and elapsed time.

Table 1 List of twelve proposed keywords used as search parameters on the social network environment (Facebook)

6 Results

We report representative results from experiments conducted in a social network environment using four strategies. Across all random walk models, the number of nodes/vertices explored (\(V\)), edges explored (\(E\)) and events examined in the Brownian motion case were greater than the rest of strategies (see Fig. 5a, b). Our results agree with those reported in [6], where it is argued that a Brownian motion follows a Gaussian distribution. Thus, a redundant search is able to explore or examine more nodes (users or events) even in a OSN scenario.

Results regarding the elapsed time reflect assumptions about complexity of an algorithm from the statistical point of view, e.g., a CRW spent more time than the rest of strategies probably due to how the turning angles are computed (see Eq. 1). In contrast, a Brownian motion and a Lévy walk spent less time searching in Facebook (see Fig. 5c). Brownian motion and a Lévy walk presented reduced variability in their elapsed times. This reflects that these strategies showed the least variable time response (see Fig. 5d). Finally, we found that our reported measures showed statistical differences when mean and standard error mean (SEM) were compared in Fig. 5c. Our results suggest that the average time of sampling is fully acceptable for using these algorithms on Facebook.

Exploratory experiments were also completed with the aim to compare the ability to search using the list of twelve keywords provided in Table 1. It should be stressed that results regarding the elapsed time could be associated with the response capability of Facebook servers (see Fig. 6).

Fig. 5
figure 5

a Total of vertices explored V and edges explored E. b Total of events examined in Facebook. c Means and SEMs bars corresponding to the elapsed time and d variability in the elapsed time using four different strategies

Fig. 6
figure 6

Elapsed time of the 12 keywords during an exhaustive exploration through Facebook. Means and SEMs bars corresponding to the elapsed time are also indicated below

7 Conclusions

We have completed a comprehensive study of Social Networks exploration using random walk models. Our results agree with classic studies in the literature where the influence of turning angles for optimizing encounter rates is used on non-oriented searches [3, 9]. Our comparison of the performance of several search strategies on Facebook suggests that searching and sampling the content of Facebook can be used as a simple methodology for network exploration. It should be stressed that we are using a real time social explorer and our results should not be considered only in a quantitative sense but also on a qualitative sense related to a conceptual modeling. The main contribution of this work is that it compares the performances of four different techniques with the aim to monitor social behaviors in Facebook in an informative way.

Future work in this area should consider different random walk models provided by natural mechanisms or artificial techniques, these kind of models should examine all trajectories generated with the aim to take decisions about which search strategy to follow. We can also suggest some areas of potential application for our study, e.g., detecting and monitoring threats; taking into account that privacy laws have been passed and that actually enable monitoring social networks even when settings are set to private [24].