1 Introduction

Jeff Howe’s definition of ”crowdsourcing” is: “A company or organization that outsources tasks that were performed by employees in the past to a non-specific public of the network in a voluntary form [3]. Mobile crowdsourcing services extend traditional crowdsourcing patterns to mobile space, and it does not require workers to perform tasks on a fixed web platform, but increases the constraints of time and location. The core issue is task assignment. Mobile crowdsourcing task assignment aims at assigning spatial tasks (i.e., tasks related to location and time) to a worker set [4], and the workers will complete it in a separate or cooperative manners, while meeting the requirements for time, location and other constraints of the task [1, 2, 5].

With the rapid development of computer network and the number of mobile intelligent terminal increasing, mobile crowdsourcing service platform can receive the task by the task publisher or the request of the worker to perform task anytime and anywhere, which puts forward high requirements for whether the platform can allocate tasks in time and dynamic adaptability. Today, a majority of the task assignment strategies are simply assigning tasks directly to workers near the task without paying attention to the variation of the workers’ own trajectories and locations. If the workers are going away from the location of the task, they will probably refuse to accept the task. Because the location of workers will be far away from the task’s location, the platform needs to pay extra cost to encourage workers to perform. This will not only reduce the success rate of task assignment, but also increase the time cost of workers and the additional cost of the platform.

We know that diverse data from different sources and types hide much valuable information. This paper takes into account that the user’s historical track data can also bring us a lot of useful information. By analyzing these historical data, we can get the information about users’ behaviors, interests and preferences. The user’s location region is predicted based on the current location and information of the user and then tasks in the region are allocated to him.

When predicting, this paper is not only concerned with the binary relation of “user-location”, but also taking into consideration the context information of the user (such as time, task location, weather, etc.) to form a “context-user-location” relationship. It enables us to automatically discover and use context information when predicting, and to satisfy users’ personalized needs changing with the change of contextual information. For example, users are more willing to go to sports and eat after work rather than go to dinner first. Compared with workdays, users are more willing to go to entertainment plaza on weekend. In this paper, time is divided into workday and weekend. It is regarded as a kind of temporal context information and is integrated into the user’s location prediction process. The context information is appropriately fused into the location prediction algorithm. On the one hand, it accords with the practical significance of contextual information; on the other hand, it has substantial help in location prediction and improves the accuracy of prediction.

The main contributions of this paper are as follows:

  1. (1)

    Based on the discrete historical data, we mine the context dependent user movement pattern.

  2. (2)

    Based on the context dependent user movement pattern, we propose a location prediction algorithm for mobile crowdsourcing workers, which can provide support for spatial task assignment.

  3. (3)

    Based on the experiments on real data sets, we verify the validity and accuracy of the proposed mobile user location prediction algorithm.

The rest of this paper is organized as follows. In Sect. 2, we mainly discuss the related work; in Sect. 3 we give the definition; in Sect. 4, we propose the method, describe the specific algorithms and examples; in the Sect. 5, we show the experimental results; and in Sect. 6, we draw conclusions of this paper.

2 Related Work

There are usually two ways of location prediction. The first is to predict the current location based on the last access point of the user, and to predict the location by calculating the transfer probability. The Markov and hidden Markov algorithms in paper [10, 12] are used for location prediction, combined with the relationship of the user and the time matching, which is only related to the transfer probability of the previous location to the current location. Paper [7] uses the ramble algorithm and the Markov algorithm for the simultaneous prediction; the user access path and the time interval are also the influencing factors of the prediction. The paper [6] predicts its future cell based on the user’s current cell. Although the user’s current location information has the most important meaning to predict its future location, it can greatly improve the system’s pretest performance if the location information of the user’s previous period of location is also taken into consideration.

The second way is to collect historical location point information to predict the current location. Paper [13] model the location of historical activities, and take the moving trend of the user as an important factor in location prediction. SPM (Sampled Pattern Matching) algorithm [8], PPM (Prediction by Partial Matching) algorithm [9] also based on the Markov model for trajectory prediction. These algorithms are based on the trajectory prediction based on Markov expansion, and some improvements have been made in improving the prediction accuracy or optimizing the time space complexity. However, there are still some problems, such as the lack of historical information, which leads to the lack of prediction accuracy in simple Markov models.

Li et al. [14] referred to the historical trajectory of workers, and recommended a route that contains as many tasks as possible for workers. Although the paper overcomes the problem of dynamic programming path, it can update the route timely when new tasks arrive. However, the influence of contextual information on workers is not taken into consideration, and workers may refuse to accept the recommended route.

In this paper we improve the location prediction method proposed in paper [13], add the influence of contexts to the prediction of movement patterns, taking into account the differences in user movement patterns on weekends/holidays and workdays, and extract movement rules based on context-sensitive movement patterns to improve the accuracy and adaptability of location prediction.

3 Problem Definition

In order to enhance the understanding, this section introduces the relevant definitions of the methods in this paper (Table 1).

Definition 1

Workers’ Movement Patterns (WMPs). Context dependent movement pattern WMPs is a sequence composed of multiple region numbers, which indicates that workers have been visited one after another in a day, expressed as \(W^{mp}(w)\) = (\(<(r_{1}, t_{1}), (r_{2},t_{2}), \dots ,(r_{n},t_{n})>\),C,supp). Movement Patterns can describe the trajectory of workers in daily life. C is context information, this paper mainly considers the context of time, and the C is divided into workday and weekend; supp is support, which is used to measure the possibility of a route appearing in the user’s historical trajectory, \(supp\) \(\ge \)0. We refer to the Apriori algorithm for calculation and threshold setting. In this paper, the threshold is set to 1.33.

Definition 2

Workers’ Movement Rules(WMRs). A movement rule, WMRs, describes the transfer relationship between regions which workers arrived at, expressed as \(W^{mr}(w)\) = \(<(r_{1}, t_{1}), (r_{2},t_{2}),\dots , (r_{k-1},t_{k-1})>\) \(\rightarrow \) \(<(r_{k},t_{k})>\). \(<(r_{1}, t_{1}), (r_{2},t_{2}), \dots , (r_{k-1},t_{k-1})>\) is the rule head, which represents the worker’s current trajectory, and the tail of the rule \(<r_{k}>\) represents the region where the worker will arrive with the greatest probability. The movement rule is obtained on the basis of the movement pattern. The following table gives an example of the set of movement rules:

Table 1. An example of movement rule set

4 Location Prediction Based on the Mining of Movement Rules

The location prediction process is shown in the Fig.1.

Fig. 1.
figure 1

Location prediction process for mobile workers

4.1 Generate Regions

The method proposed in this paper is based on regional prediction, all discrete location points in the history log of the mobile crowdsourcing platform are first clustered into regions, so the transfer of locations in worker’s historical trajectory is converted to the transfer of regions. Assume that locations of all tasks in this paper can also be included in these regions, the location points can be aggregated into regions by using K-Means algorithm, thus to realize the transfer from points to regions.

Since the location points are discrete and relatively sparse, we use the k-means algorithm. The algorithm is simple and efficient for large datasets, and has low time complexity and space complexity.

4.2 Mining Workers’ Movement Patterns

In this section, we refer to Apriori algorithm, detailing how to mine workers’ movement patterns. It is known that multiple workers’ actual route, the first step is to determine the time context, then mining the workday and weekend movement pattern respectively. First we obtain a candidate pattern set \(C_1\) of length 1, calculate the support and add into the movement pattern set \(L_1\) of length 1 if the support is greater than the threshold 1.33 set in this paper. Observe which regions can be directly reached from the current region \(L_1\), and add their region numbers to the set and form a candidate pattern set \(C_2\) of length 2. Then calculate the support and add those greater than the threshold into the movement pattern set \(L_2\) of length 2. According to this rule, continue to generate movement pattern sets until no one is left. Finally, combine the pattern together. Table 2 gives an example of worker context dependent movement pattern (WD represents workday, and PD represents weekend).

 

Table 2. A set of workers’ movement patterns
Table 3. Movement rule set

4.3 Generate Movement Rules

For a rule R:\(<(r_{1},t_{1}),(r_{2},t_{2}),\dots ,(r_{k-1},t_{k-1})>\) \(\rightarrow \) \(<(r_{k},t_{k})>\), confidence is defined using the following formula:

$$\begin{aligned} confidence=\frac{<(r_{1},t_{1}),(r_{2},t_{2}),\dots , (r_{k},t_{k})>.supp}{<(r_{1},t_{1}),(r_{2},t_{2}),\dots , (r_{k-1},t_{k-1})>.supp} \end{aligned}$$
(1)

If the confidence of a rule is higher than the pre-set confidence threshold (\(coff_{min}\)), it will be selected for the next regional prediction phase. Since the movement pattern is extracted based on different contexts (workday/weekend), each movement rule also needs a contextual label to indicate a specific context.

Assume that the confidence threshold is 50, then the set of movement rules is shown in Table 3 (WD represents workday, and PD represents weekend).

4.4 Predict Workers’ Regions

Prediction of regions is the last stage, and the pseudocode of the algorithm is described below:

figure a

When scanning, if the context information is inconsistent, skip the current rules and then scan the next rule which can improve the efficiency of the algorithm. After getting the matching rule set, they are first sorted according to the length of the header, and then sort according to the confidence. This ensures that the prediction based on the longest sequence is as much as possible, and the accuracy of the prediction is improved.

After predicting the location of workers, the tasks in the region are then recommended to workers. The purpose of task assignment is to achieve local optimization by allocating the maximum tasks within a period of time. We consider the change of the worker’s movement trajectory, and then allocate the task to him to avoid the extra time and travel cost of the workers. It will increase the success rate of the task assignment, maximize the number of assignment tasks and reduce the cost of the platform.

5 Experiment

5.1 Experimental Design

In order to test the proposed method in real-world environment, we use the data set of Gowalla, a location-based social network, on which users can sign-in at different locations, including user time, latitude, longitude, and ID of location. More than 644 million data from 2009 to October 2010 are collected; we selected the top 1 million data with user number from 0 to 4806 as our data set, containing more than 4,000 users and 45,000 different locations.

In the experiment, the locations and users of the data set are used to represent the spatial crowdsourcing tasks and the locations of workers. As long as the worker arrives at the designated place to sign in, it is considered that the crowdsourcing task has been accepted and completed. Although the data set do not come directly from spatial crowdsourcing, it provides the distributions of workers and tasks. Since the algorithms studied in this paper rely on locations, we use this data set to draw some reasonable conclusions about their relative performance.

5.2 Experimental Result

As shown in Fig. 2, we can see that after dividing into workdays and weekends, the success rate of workdays is significantly higher than that we don’t distinguish workdays. After our investigation and analysis, a worker has only two days in a week and has a weekend trajectory, the data volume is small, on the other hand, the choice of workers is too much, but the impact is small. The main reason is insufficient data. After dividing the workdays, not only the success rate is improved, but also the time complexity of the algorithm is lower. Owing to we can directly judge whether workdays are based on the context labels, reduce the time of scanning rules, and then efficiently predict the region and assign tasks.

Fig. 2.
figure 2

Match of WMP prediction

Fig. 3.
figure 3

Accuracy of WMP and UMP prediction

5.3 Experimental Evaluation

Next, we compare the WMP-methods in this paper with the UMP-methods proposed by Yavas et al. [13] from the perspective of accuracy of the region number that the test set predicted, as shown in Fig. 3.

The accuracy is defined as follows:

$$\begin{aligned} Accuracy\#k=\frac{|hit\#k|}{|Total|} \end{aligned}$$
(2)

\(Accuracy\#k\) represents the accuracy on condition that there are k-clusters; \(hit\#k\) represents the number of data items predicted successfully; |Total| represents the total number of the sign-in data items by k-clusters.

It is clear that the accuracy of this method is higher than the UMP-method in [13]. We consider the sequential of the historical track of the workers, the probability of a user going to the first 100 locations is 0.5 greater than that of the following locations, which indicates that there is some potential connection between locations, not only considering the last region when predicting [11]. We take into account the areas that workers have been visited in the history, which greatly improves the accuracy, and further increases the probability of success in task assignment.

6 Conclusion

In mobile crowdsourcing services, it is crucial to effectively predict the mobile workers’ trajectories, so that they are willing to get to the location and perform their tasks with travel and time costs as little as possible. We propose a context-sensitive prediction approach for workers’ moving path in mobile crowdsourcing services. Thereby, when assigning spatial tasks on a crowdsourcing service platform, a task can be pushed to the workers who will enter the region within the deadline of the task. Our approach can avoid workers’ extra time and travel cost in performing the spatial tasks, and as a result, it is expected to increase the probability that a task is accepted and completed, and ultimately improve the success rate of task assignment.