1 Introduction

The rich and explosive growth of social media data has resulted in the integration of social data into a range of data-centric applications [1, 2]. Recent communication devices like smartphones, i.e., social-sensors provide the ability to embed sensor data directly into cloud-based social networks, i.e., social-sensor clouds [2, 4]. Monitoring these social-sensors’ activities provide multiple benefits in various domains. For example, urban management requires scene reconstruction and analysis in an area. Suppose the surveillance of the road segment through traditional sensors is limited in coverage. In such cases, social-sensors facilitate to fill in the information gap within events or happenings [5].

Social-sensor data, e.g., social media image meta-data and related posted data (e.g., location, description, and comments) are inherently multi-modal because of the different data formats and sources in those social media platforms. The multifaceted data poses a significant challenge for the efficient and real-time delivery of the social-sensors’ data to the users [7, 13]. In our previous work, we propose the social-sensor cloud services to provide an open, flexible, and reconfigurable platform for monitoring and controlling applications [4, 5]. We abstract social-sensor cloud data, e.g., images’ annotations (meta-data and related information like description and comments) as a service, i.e., social-sensor cloud service, to fulfill the users’ information requirement [4, 5].

This paper focuses on using the service paradigm as a vehicle to devise a method for scene reconstruction and analysis without carrying out actual image processing. The aim is to provide the similar useful information about the required scene as image processing does [5]. A complete scene analysis needs images from multiple angles and different time intervals. In such cases, a composition of services is required to form multiple viewing angles to fulfill the users’ requirement(s) [10, 12]. In this regard, we have identified the following challenges:

  • Relevance Model for Spatio-temporal Cloud Service: The accurate information regarding the context of the service is vital for better utilization and selection of the social-sensor cloud service as per user requirements [8]. The relevance of the service to the given query helps to ascertain whether the service is in the same context as of the query.

  • Spatio-temporal Composition. Social-sensor cloud services composition becomes even more challenging in dynamic service environments characterized by changing conditions and context. An optimal composite service is a set of social-sensor cloud services, providing the best-suited services at any given time as per the users’ query. A spatio-temporal composition aims to execute an optimal composition based on the functional attributes.

This paper accommodates the solution of the challenges mentioned above. We propose a composite service that will provide the user-required view and related information about any event or a happening for the scene analysis. The proposed composition model forms a tapestry in the spatial aspect and a storyboard in the temporal aspect. In the spatial aspect, the composition forms a scene by selecting images from un-coordinated users and placing them in a tapestry-like structure. In the temporal aspect, a timeline is formed by combining various tapestries to form the story of the event.

2 Motivation Scenario

Let us assume an accident occurred on 5th July 2016 around 8:30 pm, on the Pascovale Road, Glenroy. The crash involves two vehicles cars A and B crossing an intersection. The service user, i.e., the police has queried a scene analysis of the accident. The aim is to find the original behavior leading to the crash and the objects of interest, i.e., the vehicles or people involved. In such case, anyone in the area can act as a social-sensor by sharing images over a social network. We rely on these social-media images as social-sensor cloud services in the vicinity during that specific period to reconstruct the desired scene.

This work proposes a model for selection and composition of the social-sensor cloud services based on the user query. The query includes a region of interest, textual description and time of the queried event. The query includes (1) Query phrase, e.g., a car accident involving Car A and B on city-bound Pascovale Road. Car A and Car B are the objects of interest. (2) Query region, i.e., decimal longitude-latitude position. For example, (−37.694264, 144.9131593) covering the area of 10 m on all sides of the road. (3) Query time, e.g., 5th July 2016, from 8:25 pm to 8:40 pm.

The basic functional attributes of a social-sensor cloud service Serv, are abstracted from the social media image information. These include:

  • Time T of the service at which the image is taken.

  • Description D is a set of keywords or key-phrases providing additional information regarding the image, e.g., Car A crashes in Car B, Car accident.

  • Location L(xy) is longitude and latitude position where the image is taken.

  • Coverage Cov of the image is defined as VisD, i.e., the maximum visible distance, covered by the image, \(\overrightarrow{dir}\), i.e., the orientation angle of the image and \(\alpha \), the angular extent of the scene covered by the image.

It is assumed that the available services are tagged with location and time. We index all the available services considering their spatio-temporal features using a 3D R-tree [4, 6]. The search space is reduced by selecting the services that are spatio-temporally close to the querying location and contextually related to the query description. For example, at time \(t_{-1}\), the descriptions of three images \(img_{1}\), \(img_{2}\) and \(img_{3}\) show that Cars A and B were running along Pascovale Road city-bound, and Car C was taking the exit from M80 Ring Road. Cars A and B are objects of interest and therefore \(img_{1}\), \(img_{2}\) and \(img_{3}\) are selected due to their contextual relevance to the query. Further, at time \(t_{0}\), the description of an image \(img_{4}\) shows that Car A stopped and avoided the collision with Car C. Therefore, Car C is considered as interacting with the object of interest Car A and \(img_{4}\) is selected. Three images \(img_{5}\), \(img_{6}\) and \(img_{7}\) in the spatio-temporal query region show that at the intersection, Car C ran the red light. At time \(t_{1}\), four images \(img_{8}\), \(img_{9}\), \(img_{10}\) and \(img_{11}\) are selected due to their spatio-temporal and contextual relevance. Images \(img_{8}\), \(img_{9}\) and \(img_{10}\)’s description says that Car B and Car A crashed. Image \(img_{11}\)’s description says Car C escaped the accident scene.

11 services (images) are selected in this scenario. We cluster the selected services according to their spatio-temporal and contextual relationships. The contextual clustering is based on the interaction and relations between the services. The interactions and relationship between the objects of interest of the services are determined on the basis of the semantic similarity between the service description and the query description. The event-specific relationship describing the vocabulary dictionary provided by domain experts is used for this purpose. We assess the services for composability. The composability is assessed by predefined relations, explained in the relevance and composability models (Sect. 4). Finally, we build-up the composition, i.e., a visual summary by forming a tapestry-like scene. The composition is formulated by selecting the composable services covering the accident, the object of interests and the interacting object. The composition depicts the cars crashed and the cars involved, i.e., Car A and B crashed, and Car C escaped the crash scene.

3 Model for Social-Sensor Cloud Service

In this section, we have defined the social-sensor cloud service, selection, and composition model.

3.1 Model for an Atomic Social-Sensor Cloud Service

An atomic social-sensor cloud service Serv is defined by:

  • Serv_id is a unique service id of the service provider SocSen.

  • F is a set of functional properties of the service Serv.

3.2 Functional Model of an Atomic Social-Sensor Cloud Service

The functional requirements capture the intended behavior of an atomic service and form the baseline functionality. The minimal functional requirements associated with an atomic service and their information sources are:

  • Social-sensor device: The basic functional attributes of a social-sensor cloud service associated with social-sensor device are time t, location L(x,y) and coverage Cov of the sensor. We have discussed all these parameters in [5].

  • Social-sensor service owner: Context Con of a social-sensor cloud service is associates with the service owner. It is the description of a service provided by the service owner. Context Con is defined by D and T. Description D of the service provides additional information regarding the image. It is assumed that the service’s description includes complete detail of the service specifics related to the scene captured, e.g., objects captured, and their relations. Tags T provide location and focus of the image.

  • Social-sensor cloud: Interaction I is the social network provided information regarding objects of interest in the image. It is assumed that the description includes detail of the objects of interests. This description is provided by the users of the cloud, i.e., social media, through comments. We assume that the information collected though comments is trustworthy. The trustworthiness of the comments is dealt in our previous work [3, 9].

4 Social-Sensor Cloud Service Composability

In this section, we propose the social-sensor cloud service relevance and composability models for the social-sensor cloud services.

4.1 Model for Social-Sensor Cloud Service Relevance

The relevance of a service to a given query or another service helps to ascertain whether the service is in the same context as of the query or the other service. The relevance between two or more social-sensor cloud services can be described as spatio-temporal closeness, contextual relatedness and interaction relevance. The relevance between two services \(Serv_{1}\) and \(Serv_{2}\) can be defined as:

Spatial Relevance. \(Rel_{S}\) means \(Serv_{1}\) and \(Serv_{2}\) are close in space boundaries and have similar coverage direction. This encompasses \((Serv_{1}.Cov_{(\alpha ,dir)} \cong Serv_{2}.Cov_{(\alpha ,dir)})\), i.e., similar in directions and angles AND \(Serv_{1}.L = Serv_{2}.L \pm \varDelta \), i.e., close in the geo-location. Where, \(\varDelta \) is the max. allowed spatial difference.

Temporal Relevance. \(Rel_{t}\) means \(Serv_{1}\) and \(Serv_{2}\) coincide in time, i.e., \(((Serv_{1}.t_{e} = Serv_{2}.t_{s} \pm \varepsilon ) \mid (Serv_{1}.t_{s} = Serv_{2}.t_{s} \pm \varepsilon ) \mid (Serv_{1}.t_{e} = Serv_{2}.t_{e}) \pm \varepsilon )\). Where, \((Serv_{1}.t_{e} = Serv_{2}.t_{s} \pm \varepsilon )\) means the end time of \(serv_{1}\) is close to the start time of \(Serv_{2}\). \((Serv_{1}.t_{s} = Serv_{2}.t_{s} \pm \varepsilon )\) means the start time of \(serv_{1}\) is close to the start time of \(Serv_{2}\). \((Serv_{1}.t_{e} = Serv_{2}.t_{e} \pm \varepsilon )\) means the end time of \(serv_{1}\) is close to the end time of \(Serv_{2}\). \(\varepsilon \) is the max. allowed time difference.

Spatio-Temporal Relevance. \(Rel_{St}\) means \(Serv_{1}\) and \(Serv_{2}\) have overlap in time and space. This encompasses \(Rel_{S}\) \(\cap \) \(Rel_{t}\).

Contextual Relevance. \(Rel_{C}\) means \(Serv_{1}\) and \(Serv_{2}\) share same or almost similar context. This encompasses \((Serv_{1}.Con \cong Serv_{2}.Con)\). The contextual relevance is based on the textual similarity of the contextual descriptions of both services. Contextual relevance is calculated as a semantic distance between the descriptions of the services and the query [5]. Event specific relationships are used for the implementation of the similarity measure. These event specific relationships are described in the vocabulary dictionary provided by the domain experts. We have used \( \theta \) to define \(related_{LIN}(Serv_{1}.Con, Serv_{2}.Con)\). The higher value of \(\theta \) shows higher similarity in context.

Interaction Relevance. \(Rel_{I}\) means \(Serv_{1}\) and \(Serv_{2}\) both share objects of interest in the coverage (refer Sect. 4.1). This encompasses \((Serv_{1}.I \cap Serv_{2}.I)\)

4.2 Model for Social-Sensor Cloud Service Composability

The spatio-temporal and contextual composability of two or more social-sensor cloud services can be defined as four instances:

  • \((Rel_{St} \cap Rel_{C})\). Two or more services are composable if these services are spatio-temporally and contextually relevant.

  • \((Rel_{t} \cap Rel_{C})\). Two or more services are composable if these services are temporally and contextually relevant. In such cases, services might be located outside the region of interest but still capture a scene inside.

  • \((Rel_{S} \cap Rel_{C})\). Two or more services are composable if these services are spatially comparable and contextually relevant. In such cases, services are available either before or after the required period.

  • \((Rel_{C} \cap Rel_{I})\).Two or more services might be composable if these services share context and objects of interest. In such cases, services might be located outside the region of interest but still capture some related objects of interest.

5 Social-Sensor Cloud Service Composition Approach

We propose an approach to filter, select and compose the best available social-sensor cloud service to form a visual summary according to the user’s query. The composition is achieved by constructing the information context of the service with the functional. The composite service comprises a set of selected atomic services to form a visual summary of the queried event. The visual summary offers an arrangement of the 2D images, forming a tapestry-like scene of the required event. Our approach aims to efficiently compose the available services into a single composite service that matches with the users’ requirements.

A query q can be defined as \(q = (Rgn,des,t_{s},t_{e})\), giving the region of interest, description and time of the required service(s).

  • \(Rgn = \{P<x,y>,l,w\}\) [5], where P is a geospatial co-ordinate set, i.e., decimal longitude-latitude position and l and w are length and width distance from P to the edge of region of interest.

  • ts is the start time and te is the end time of the query.

  • des is a phrase describing the query. Query description includes details of the objects of interests obj, i.e., objects involved and the context of the query cont, i.e., the scene to be captured.

5.1 Social-Sensor Cloud Service Selection

The indexing and spatio-temporal filtering of the services enable the fast discovery of the services. We index all the available services using a 3D R-tree [4] and select the services inside the bounded region of interest [5]. Next, the services are selected and classified based on the relevance between the services, the queried scene and the objects of interest. It might happen that the service does lie spatio-temporally in the query area Rgn, but has no contextual relation with the query q or has too much noise concerning unwanted information. In such cases, the object(s) of interest and behavior relations are used for the service filtration. The contextual relevance of all the services to a query’s scene and objects of interest are assessed. Using previous research as reference we have set the value of threshold \(\theta \) = 0.5 for the contextual relevance [14]. The services related to the queried scene and objects of interests are selected. The services are classified in three sets according to their relevance: (1) spatio-temporally and contextually relevant services \(S_{StC}\), (2) spatio-temporally relevant and interacting services \(S_{StI}\) and (3) contextually relevant and interacting services \(S_{CI}\).

5.2 Social-Sensor Cloud Service Composability Assessment

The composability rules aims to construct a composite service. Composability assessment among component services is based on their spatio-temporal and contextual parameters. The relevance and overlap is considered to define the composability relations between the services, e.g., \(Serv_{1}\) and \(Serv_{2}\). We aim to define composability of the service as quantitative relations. The relevance between the services is an arithmetic mean of the considered parameters. It is calculated as:

$$\begin{aligned} \begin{aligned} Rel(Serv_{1},Serv_{2})&= [(Rel_{St}(Serv_{1},Serv_{2}) + \\&Rel_{C}(Serv_{1},Serv_{2}) + Rel_{I}(Serv_{1},Serv_{2}))] \end{aligned} \end{aligned}$$
(1)

where, \(Rel_{St}(Serv_{1},Serv_{2})\) is based on the time of the services and their proximity in space. \(\lambda \) is the shortest distance between \(Serv_{1}\) and \(Serv_{2}\) and \(\vartheta \) is the difference between coverage angles \(Serv_{1}.Cov_{dir}\) and \(Serv_{2}.Cov_{dir}\). The thresholds for the spatial relevance are set as \(\lambda _{thr}\) for distance and \(\vartheta _{thr}\) for \(\overrightarrow{dir}\). Therefore, the services are considered spatio-temporally relevant if difference between the distance and direction of the services is below the threshold. \(Rel_{C}(Serv_{1},Serv_{2})\) is the semantic distance between the descriptions of \(Serv_{1}\) and \(Serv_{2}\) (Refer Sect. 5.1). \(Rel_{I}(Serv_{1},Serv_{2})\) is the count of the mutual objects of interest in \(Serv_{1}\) and \(Serv_{2}\). The overlap between the services is considered:

$$\begin{aligned} Overlap (Serv_{1},Serv_{2}) = Overlap_{spatial}(Serv_{1},Serv_{2}) \end{aligned}$$
(2)

The quantitative value of the mutual composability is calculated as:

$$\begin{aligned} Comp (Serv_{1},Serv_{2}) = Rel(Serv_{1},Serv_{2}) - Overlap(Serv_{1},Serv_{2}) \end{aligned}$$
(3)

A geographic coverage patch GeoPatch is formed to assess the composability of each service from the spatio-temporal and contextual selection \(S_{StC}\). A set N of the spatio-temporally nearest services is selected for each GeoPatch. The mutual composability Comp is calculated with each service in N. The process of calculating the mutual composability of the services is repeated with the sets N’ and N”. N’ is the set of the nearest services concerning the spatio-contextual and temporal-contextual relevance. N” is a set of the nearest services based on the contextual relevance and interaction. The assessment process of the mutual composability is based on relevance and overlap of the services.

5.3 Social-Sensor Cloud Service Composition

The composition is handled as sewing a tapestry to form the scene. We start with the central piece, concerning space and time, and build a tapestry around it. The build-up is based on selecting the best composable services from the set of nearest services. The best neighbor service is with the maximum relevance and the minimum overlap.

The composition covers the visual summary of the whole queried scene, i.e., all objects of interest and their context. We choose the central service \(Serv_{c}\) in terms of space and time from the spatio-temporal and contextual selection. We further add \(Serv_{c}\)’s neighbors to a separate pool. We assume that the central service is in the middle of the spatio-temporal dimension. Next, we extract the best neighbor service \(Serv_{k.bn}\) from the pool and place it with \(Serv_{c}\) by joining the patch. \(Serv_{k.bn}\) is selected according to the maximum composability. We add neighbors of \(Serv_{k.bn}\) to the pool. The process of selecting the best neighbor and joining to patch continues until we have any service in the pool. We reassess the composability of the remaining services and start again with the nearest service if the pool is empty. Spatial gaps in the composition are assessed after the utilization of all services from the spatio-temporal and contextual selection. Comp.C is the total coverage of the services in the composition overlapping the bounded region Rgn and within time ts and te. The relationship between Serv and q.Rgn can be illustrated as:

$$\begin{aligned} \begin{aligned} Composition \longrightarrow \{Comp \in \cup _{i=1}^{n} Serv \mid (Comp.C \cap Q.Rgn)\cap \\ Rel_{C} \cap Rel_{I}, t_{s} \le t \le t\_{e}\} \end{aligned} \end{aligned}$$
(4)

In our previous work, we have discussed the coverage of the composition and gap assessment [5]. We estimate and select an arbitrary neighbor \(Serv_{kc'}\) if there are any spatial gaps. Next, the best nearest service \(Serv_{k.bn}\) from the set of the spatio-temporally relevant and interacting services. The process of selecting and joining the services continues until we fill in the gaps and get the maximum available coverage. The composite service is a series of spatial tapestries in time, providing a timeline of the visual summary of the event.

6 Experiment and Evaluation

We focus on evaluating the proposed approach using the real dataset. The set is a collection of 10000 user uploaded images downloaded from social networks (flicker, twitter, google+). We had extracted their geo-tagged locations, the time when an image was captured, post description and tags to create the services. Further, the camera direction \(\overrightarrow{dir}\), the maximum visible distance of the image VisD and the viewable angle \(\alpha \) are abstracted as the functional property Cov.

We generated eight different queries based on the locations and events in our dataset. We have evaluated the service composition based on the spatial relevance in the first part of the experiment. The result of these experiments is evaluated upon the traditional image processing technique SIFT (Scale-Invariant Feature Transform) [11]. We used images’ geolocation information, associated directions and viewing angles to gather an associated image dataset I from Google Street View of the area of interest R. We first downloaded \(360^{\circ }\) views of Google Street View using GPS from the image and collected the views related to the service. Further, we compared the similarity between images in the composition and the image set I by SIFT features. This comparison is achieved by individually comparing the key point feature vector of the images in I and images in the composition, and finding the images’ matching features based on the Euclidean distance of their feature vectors. Further, we assessed if the images in the composition are correctly positioned in spatial relations. The evaluation of the similarity threshold is set around 60%. 40% noise margin is given due to traffic and pedestrian obstruction in the images.

We have assessed how useful the composite service is in completing the contextual storyboard in the second part of the experiment. The assessment is done by manually analyzing the composition for the spatial-temporal and contextual coverage. The effectiveness of the composite service is assessed upon the selection and composition of the related and accurate services. It is assessed if the composite service contains the required object(s) of interest and their behavior according to the user query.

6.1 Evaluation

We have evaluated the proposed approach by (1) accuracy in the spatial coverage of the user required region, (2) effectiveness in selecting the related services (precision), and (3) effectiveness of the composite service in capturing the required context, i.e., the object(s) of interest and their behaviors (recall). All images and the composed services are manually analyzed by a human to form a baseline.

 

Table 1. Relative accuracy in spatio-temporal coverage
Table 2. Precision and recall

We have assessed the composite services by comparing the similarity between the service image and the Google street view. SIFT image processing is used for the comparison of all the eight queries (Table 1). We observed that approximately 63% of services in the compositions are accurately categorized in space. The 37% error rate was reasonable due to the noise in the images. Noise is an obstruction in the image affecting the scene building. For example, a vehicle obstructing the building of interest can be considered as noise. Further, we have assessed the composite services by manually analyzing the effectiveness of selecting the relevant spatio-temporal services, i.e., precision (Table 2). The average precision of the proposed approach for the location-based queries is 78% and for the event-based queries is 64%. The effective spatio-temporal and contextual coverage are assessed by recall (Table 2). The average recall of the proposed approach for the location-based queries is 65% and for the event-based queries is 76%. The results show that the values of precision are higher for the location-oriented queries, e.g., Melbourne Central (Q6). The values of recall are higher for the event or scene-oriented queries, e.g., Bourke Street Accident (Q1). Therefore, it is concluded that our proposed approach effectively helps in the accurate composition of the services for the scene analysis. The proposed approach considers the related contextual data that describes the situation from various aspects, e.g., what has happened, where it happened, who is involved and what the effects on surrounding area.

7 Conclusion

We propose a social-sensor cloud service composition approach based on the spatio-temporal and contextual relevance. Our experiments evaluate the proposed approach for an accurate and effective composition. We plan to focus on the optimal social-sensor cloud service composition based on the uncertain time, location and context requirements.