Abstract
The scope of the Internet of Things (IoT) environment has been expanding from private to public spaces, where selecting the most appropriate service by predicting the service quality has become a timely problem. However, IoT services can be physically affected by (1) uncertain environmental factors such as obstacles and (2) interference among services in the same environment while interacting with users. Using the traditional modeling-based approach, analyzing the influence of such factors on the service quality requires modeling efforts and lacks generalizability. In this study, we propose Learning Physical Environment factors based on the Attention mechanism to Select Services for UsERs (PLEASSURE), a novel framework that selects IoT services by learning the uncertain influence and predicting the long-term quality from the users’ feedback without additional modeling. Furthermore, we propose fingerprint attention that extends the attention mechanism to capture the physical interference among services. We evaluate PLEASSURE by simulating various IoT environments with mobile users and IoT services. The results show that PLEASSURE outperforms the baseline algorithms in rewards consisting of users’ feedback on satisfaction and interference.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Avoid common mistakes on your manuscript.
1 Introduction
Based on the recent proliferation of the Internet of Things (IoT) technology, the scope of IoT has been expanding from private to public spaces, where multiple providers can offer the functions of shareable IoT devices as services to users, commonly referred to as IoT services [1, 2]. With this expansion, the capability of IoT services that can physically interact with the users has become in the spotlight [3, 4]. Users can utilize IoT services on the fly for various purposes [5], such as bike sharing [6], customized route navigation on public displays/drones [7, 8], or even evacuation support in an emergency [9].
When a user requests a public IoT service, the service registry in the environment may discover and return the candidate services along with predicted qualities to let the user select the most appropriate. However, quality prediction of IoT services is more challenging than traditional Web services for the following reasons. First, IoT services are physically affected by various environmental factors with high uncertainty [10]. For instance, obstacles may block the display or speaker, preventing the user from receiving content visually or acoustically from the service. Second, the actuation of IoT services may unintentionally interfere with other services in the same space. For instance, when multiple speakers simultaneously generating sounds, some users may not be able to perceive the sound appropriately.
In Fig. 1, we illustrate a scenario in an airport to motivate the challenges of selecting public IoT services. A user is urgently finding the way to the plane about to depart. To this end, the user may select one of the speaker services in the airport to receive wayfinding support. First, the user may select the service with the lowest network latency (speaker 1). However, the user cannot perceive the sound appropriately because the speaker is too far from the user. Second, the user may select a nearby speaker (speaker 2). However, walls and obstacles block the sound from the speaker. Detecting every wall and obstacle in the airport and simulating complex physics phenomena such as sound absorption, reflection, and diffraction are difficult and computationally heavy. Third, in some way, the user may select another speaker close enough and is not blocked by obstacles (speaker 3). However, the user still cannot perceive the content appropriately because the sounds from other nearby speakers are too noisy and interfering. Numerous speakers in the airport may affect each other according to the distance or spatial layout. Finally, the user selects the most appropriate speaker that can deliver sounds to the user (speaker 4). As shown in the scenario, the selection of the most appropriate IoT service in public spaces is challenging because of the uncertain influence of environmental factors and other services.
Using traditional modeling-based approaches, the physical influence of environmental factors and other services can be modeled through sophisticated processes. However, developing models by domain experts requires numerous efforts, the service-specific models are difficult to generalize to other service types, and installing sensor devices in public environments is costly.
In this study, we propose a framework named Learning Physical Environment factors based on the Attention mechanism to Select Services for UsERs (PLEASSURE) for selecting IoT services that can be physically affected by uncertain environmental factors and other services in the same space. To eliminate the necessity of modeling processes and sensor devices, PLEASSURE builds quality prediction models solely based on user feedback on the service. With users’ feedback on satisfaction and interference, PLEASSURE uses multi-agent reinforcement learning to predict long-term quality by learning the influence of environmental factors and other services.
PLEASSURE allocates a distributed and specialized prediction model for each service agent instead of a global and generalized one since the influence of environmental factors and other services differed for each agent. The service registry aggregates the service quality predicted by each available agent so that the user may select the most appropriate service. The selected agent provides the service and collects the user’s feedback on whether the service was satisfactory or was interfered with by other services. Using the collected feedback, the service agent individually improves its prediction model. Therefore, after sufficient training of the distributed agents, the quality prediction of each agent specializes toward its environmental conditions.
Furthermore, PLEASSURE uses a new attention mechanism called fingerprint attention we propose to estimate the potential interference of other services effectively. Attention mechanism [11] is a popular class of machine learning techniques that adjusts the attention on other entities by calculating their importance. However, common attention mechanisms require predefined or calculated vector representations of the entities, which may be unavailable for service agents with hardly detectable physical factors. Therefore, we propose fingerprint attention, in which each agent continuously learns and represents the hidden physical factors as a fingerprint vector.
We evaluate PLEASSURE in simulated environments of public IoT services with randomly distributed devices and obstacles. Note that multi-agent reinforcement learning methods [12, 13] are conventionally evaluated in simulated environments [14, 15] because deploying and training interactive agents in the real world may cause safety issues [16]. We compare different versions of PLEASSURE with the baseline algorithms, and the results show that PLEASSURE with fingerprint attention performs the best in most environments. This means that the service agents of PLEASSURE successfully learn the physical influence of environmental factors and other services from the users’ feedback using the suggested fingerprint attention, accurately predicting the long-term quality.
The main contributions of this work are summarized as follows:
-
We define a new service selection problem for interactive IoT services in public spaces, focusing on environmental factors and other services that may physically affect the IoT services.
-
We propose a learning-based framework called PLEASSURE that uses multi-agent reinforcement learning to train service agents. The service agents predict the long-term quality of the services in a distributed manner by learning the influence of environmental factors and other services based on the users’ feedback.
-
We propose an extended attention mechanism called fingerprint attention that calculates the attention weights between agents according to the learnable fingerprint assigned to each service agent. The fingerprints are continuously updated to reflect the hidden physical factors.
-
We developed a simulation for public IoT services physically affected by environmental factors. Experiment results and analysis in the simulated environments show that PLEASSURE outperforms other baseline algorithms.
The organization of this paper is as follows. We discuss related studies on the selection problem in Web and IoT services in Section 2. Next, we state the system design and the selection problem of public IoT services in Section 3. As the solutions for the stated problem, we propose PLEASSURE and fingerprint attention in Section 4. Then, we present the simulation results and evaluate PLEASSURE in Section 5. Finally, we conclude the paper in Section 6.
2 Related work
Research on service selection problems has a long history in the domain of web services. Traditionally, web services are composed as composite services to provide enhanced functionality. The goal of web service selection is to select the services that optimize the quality of the composite service in terms of system quality attributes such as latency, throughput, and reliability [17]. The high complexity of web service selection mainly comes from the exponential number of possible combinations of services to compose, and researchers have proposed various search-based optimization algorithms for service selection.
In contrast, research on selection problems in the IoT domain has a relatively short history, and new research challenges have emerged from the unique characteristics of IoT services [18,19,20]. One of the unique characteristics of IoT services is the cyber-physical interactions between the user and the service using sensors and actuators [19, 21, 22]. However, traditional network-level quality metrics cannot evaluate the quality of cyber-physical interactions from the users’ perspectives. For instance, a display service cannot deliver its contents appropriately if the user is not directly in front of the display even though the service shows low latency and high bitrate [21]. In a study [19], the authors extend the software service model to the physical service model with a new concept called service area that represents the locations where the services are available to the users. Considering such physical constraints of IoT services as one of the quality attributes is meaningful. However, in their study, the service area of an IoT service is statically defined, which means that the area is not adjustable according to changing environmental factors. In a recent study [21], the authors suggest a metric that objectively measures the quality of delivering visual content from display services to the users. However, the metric cannot be applied generally to other service types differently affected by physical factors.
Predicting the quality of services is one of the most critical elements of service selection. However, predicting the subjective quality of services from objective factors is challenging [23, 24], especially for IoT services with physical characteristics. Recent trends in quality prediction have transitioned from model-based to learning-based approaches for better scalability and adaptation [25]. Most quality prediction methodologies require every influencing factor to be numerically represented for input. However, physical factors of an IoT service, such as obstacles and potential influence among services, require domain-specific models and additional sensor devices to calculate their influence [26]. In a recent study [22], the authors suggest the Quality of the Internet of Things (QoIoT) model that consists of Quality of Experience (QoE) and Quality of Machine Experience (QoME). However, this model is still in the conceptualization phase without concrete metrics to quantitatively predict the quality of IoT services.
To predict and optimize the quality of services, several studies adopt optimization approaches such as evolutionary algorithms [27], bio-inspired algorithms [28], and reinforcement learning [29]. These algorithms define the service selection problem as a search-based optimization problem for a given static instance, which means that they cannot be applied to the instant selection of IoT services in dynamically changing environments. For instance, in a study [30], the authors utilize multi-agent reinforcement learning for adaptive service compositions. However, their algorithm requires recorded or instantly measured quality profiles of the candidate services, which are unavailable for IoT services physically affected by environmental factors.
Recently, task offloading technologies have been spotlighted [14, 15, 31, 32], which suffer from similar challenges to the service selection problem of this work. The users may offload heavy computation tasks by selecting the most appropriate edge server rather than processing them on a resource-constrained handheld device. In a recent study [32], the authors focus on the dynamic selection of public services for multiple users and the challenge of unknownness when the system information is unavailable. Furthermore, the authors point out that users should be unaware of each other in public environments. Finally, the authors design a fully decentralized online learning algorithm that learns system information from previous observations, considering coexisting users without inter-user communications. However, the algorithm uses simple averaging-based prediction that lacks context awareness for IoT services that may be physically affected by environmental factors.
In summary, current technologies suggested in previous studies cannot address the following challenges of selecting IoT services in public environments:
-
Existing works cannot predict the dynamically changing quality of physical interaction between the user and devices.
-
Existing works cannot adapt to the physical influence of uncertain environmental factors.
-
Existing works cannot mediate interference among services in the same environment.
In this work, we propose a novel approach that learns the uncertain influence of environmental factors and interfering services from user feedback, different from model-based existing works that require sophisticated modeling processes.
3 System design
In this section, we define the public IoT environments we target. Then, based on the environment definition, we formulate the selection problem of public IoT services physically affected by uncertain environmental factors and other services. We summarize the notations in Table 1.
3.1 Environment definition
As shown in Fig. 2, we target public IoT environments where multiple service agents provide IoT services by utilizing shareable devices in the space, and the users can publicly use them. When a user requests a service, the service registry of the environment discovers candidate services, and the user selects the most appropriate service agent. Meanwhile, various environmental factors and other services may physically affect the service agents. Therefore, we define the environment at time t in terms of users, services, environmental factors, and the service registry, \(env_t = \langle U_t, S, W, g \rangle \), as follows:
-
\(U_t\) is the set of users in the environment at time t. The number of users \(|U_t|\) is variable because users can enter and exit public spaces.
-
S is the set of IoT services in the environment, where \(s \in S\) is a service provided by a software agent that utilizes the associated device \(d_s\). The number of services |S| is constant and not concretely limited because we assume that the installation and removal of IoT services are relatively rare in the public environment.
-
W is the set of environmental factors such as walls and obstacles that may physically affect the service agents. Since we assume the environment does not change frequently, the number of environmental factors |W| is also constant.
-
g is the service registry where the services in the environment are registered and managed.
3.1.1 User model
Each user in the environment carries a smart device (e.g., a smartphone) as the control interface of the services and as the tracking sensor to detect the current location through the Global Positioning System (GPS) or Indoor Positioning System (IPS). A user \(u\in U_t\) may send a service request \(q = \langle \vec {l_{u, t}}, f\rangle \) where \(\vec {l_{u, t}}\) is the location at time t, and f is the functional requirements. Note that the high mobility of the user may negatively affect the service quality and user satisfaction when the user exceeds the service coverage where the service can be provided appropriately [31].
The user may give feedback to the service agent through the control interface. Primarily, the user will give positive feedback if the service agent provides the service appropriately. However, even though the service agent fulfills the requirements, the user may report negative feedback if the service has been interfered with by other services (e.g., the sound from another service is louder than that of the selected service). We assume the users’ feedback data are available to the service agents. However, in the real world, most users may not be willing to give feedback manually. In Section 4.2, we discuss in detail how to increase the practicality of the feedback system for PLEASSURE by improving interfaces or inferring from the users’ reactions.
3.1.2 Service model
An IoT service \(s = \langle s_i, d_s\rangle \in S\) is provided by the service agent \(s_i\) which is associated with a distributed IoT device \(d_s\). A service agent may run on the associated IoT device if the device has enough computing resources or otherwise on a gateway such as an edge computing server. We define the state of each service in terms of intensity (e.g., the sound volume of speaker devices) and the device location. We assume that a service agent cannot provide its service simultaneously to multiple users due to the exclusive nature of IoT devices. For instance, a music service cannot play multiple sound clips simultaneously. Grouped users may share a service, but we only focus on single users in this work. Note that the user’s location and mobility heavily affect the quality of IoT services because the agents cannot interact with the user if the distance between the user and the associated device is too far.
3.1.3 Environmental factor model
Various environmental factors may physically affect IoT services. For instance, obstacles such as walls and vehicles may block the sound and light from speakers and displays. In this work, we set walls as the primary environmental factor affecting the spatial layout, user mobility, and physics phenomena. We define the factors by physical objects’ shape, location, and orientation. However, in our future work, the scope of environmental factors can be expanded to more complex types such as placeness and social characteristics. Note that we assume that the details of the environmental factors are hidden from the service agents due to the lack of sophisticated detection models and additional sensor devices.
3.2 Service selection process
Figure 3 shows the sequence diagram of the service provision process that the user, service registry, and service agents perform upon the user’s request. Initially, the service agents should register their services to the service registry (interaction 1, 4) and periodically report their current states (interaction 2-3, 5-6). When a user requests a service to the service registry (interaction 7), the service registry discovers a set of candidate services (interaction 8). For each candidate, the service registry sends a call for quality prediction to the service agent (interaction 9, 11) with information such as the user state and other service agents. The service registry collects the prediction results from the service agents (interaction 10, 12) and returns the set of candidate services with their predicted quality (interaction 13). Finally, the user selects the most promising service agent according to the predicted quality (interaction 14). After the selection, the selected agent provides the service to the user (interaction 16), and the user reports feedback to the agent (interaction 17). The service agent improves the quality prediction by learning from the user feedback (interaction 18).
3.3 Problem statement
To apply multi-agent reinforcement learning, we formulate the service selection problem of public IoT services as a Partially Observable Stochastic Game (POSG) \(\langle \! X, I, O, A, P, R\!\rangle \), which extends the Markov Decision Process (MDP) with multiple agents and partial observability [13]. Note that the POSG represents the optimization problem that service agents would solve by performing actions and iteratively learning hidden environmental dynamics. Following the structure of POSG, service agents observe the current state of the environment and transition to the next state by jointly performing actions.
-
X is the set of potential states \(x \in X\) that the environment may have.
-
I is the set of service agents \(s_i \in I\), where \(s_i\) is a service agent in the environment that receives user requests and provides services according to the request.
-
O is the set of observation functions \(O_i \in O\), where \(O_i(x)\) represents what agent \(s_i\) can observe from the current state x of the environment. Note that some state information such as the device locations is not observable to the service agents due to their restricted sensing capability.
-
A is the set of action functions \(A_i \in A\), where \(A_i(x)\) represents the set of actions that agent \(s_i\) can perform at the current state x. Joint action \(\vec {a}\) is the simultaneous actions by agents in one step, which represents service selection in this work.
-
\(P(x, \vec {a}, x')\) is the transition probability from x to \(x'\) when the agents perform a joint action \(\vec {a}\) at state x. Note that the transition probability is initially unknown to the service agents and thus should be learned from the collected data.
-
R is the set of reward functions \(R_i \in R\), where \(R_i(x, \vec {a}, x')\) represents the reward that agent \(s_i\) receives according to the transition.
Figure 4 shows an example timeline of discrete steps in the environment where agents select and provide services following the defined POSG. The state of the environment transitions to the next state periodically. For each transition, the users move around, and the selected agent may start to provide service upon each user’s request. Primarily, the service selection and provision affect the environment state.
Commonly, a POSG has initial and terminal states that start and finish an episode of the agents. On the one hand, the POSG we define is non-episodic and has no specific initial and terminal states because the service agents in the environment provide services to users continually. On the other hand, each service provision has explicit initial and terminal states. Therefore, we summarize the service provision of the agents as sub-episodes that may occur in parallel and the goal of service agents is maximizing the average reward [12]. The service agents start and finish a sub-episode upon a request, as shown in Fig. 4.
For each step, the users give feedback, and we use them as rewards for the service agents to reinforce or weaken their behavior. In other words, the agent will reinforce its behavior if positive feedback is given; and weaken upon negative feedback. We consider two types of feedback in this work: satisfaction and interference. First, the service agent is rewarded or punished depending on the user’s satisfaction. The source of satisfaction is the agent that provides the service to the user, so the reward is given to the agent only. Second, the user may give negative feedback if other services interfere with the service. In Fig. 4, red shades represent potential interference among services when multiple agents simultaneously provide services in the same space. Because the source of such interference is uncertain, there is the credit-assignment problem [33] in deciding which agent to punish. In this work, we equally divide and distribute the negative reward to every active agent in the environment.
The ultimate goal of POSG is finding the optimal policy of the service agents that maximizes the average reward per step, which in our environment implies maximizing satisfaction while minimizing interference. Each service agent maintains and updates its policy \(\pi _i(x)\) by which the agent will perform its action in the given state x.
4 Proposed approach
In this section, along with the fingerprint attention mechanism, we propose PLEASSURE consisting of a selection algorithm and an online training algorithm for the service agents.
4.1 PLEASSURE: Learning physical environment factors based on the attention mechanism to select services for UsERs
We propose PLEASSURE to select public IoT services by predicting the long-term quality of candidates in a distributed manner. Figure 5 presents the overall process of PLEASSURE. For each IoT device, the corresponding service agent has the role of (1) providing the service and (2) predicting the long-term quality. First, when a user requests a service, idle service agents receive the request along with the user state. Second, each agent predicts its service quality based on the user state and the environmental context summarized by fingerprint attention (Section 4.1.1). Third, the most appropriate service is selected according to the aggregated quality prediction results. Fourth, the user gives feedback to the agent according to the perceived service quality. Finally, the service agent updates its neural network and fingerprint vector by performing MARL using the feedback.
4.1.1 Service selection algorithm based on fingerprint attention
Algorithm 1 shows the details of the service selection process of PLEASSURE. The primary part of the selection algorithm is the prediction of the Q-value in a distributed manner, where the Q-value Q(x, a) is the expectation value of the cumulative reward when performing action a at state x [12]. In the context of our problem statement, the Q-value represents the expected long-term quality when selecting a service. When a user makes a service request q, the service registry forwards the request to appropriate candidates \(S'\) discovered based on the functional requirement. Each service agent that receives the request predicts the Q-value using its prediction model (line 2-14).
To be aware of potential interference of other services, each service agent collects the state of other service agents from the service registry for the Q-value prediction (line 5-9). To summarize the information of other service agents as a fixed-length context vector, attention mechanism [11] can be used to determine the weights of which agents to give attention more. However, conventional attention mechanisms require influencing factors to be numerically represented, while the physical factors of service agents are hardly observable. Therefore, we propose fingerprint attention to let the service agents of PLEASSURE maintain learnable fingerprints to calculate the attention weights of the other service agents. PLEASSURE initially assigns a random fingerprint vector for each service agent. The service agents update the fingerprints using gradient descent to increase the prediction accuracy. After sufficient training, the fingerprint vector of each service agent may implicitly represent its environmental condition including physical factors.
Figure 6 shows how each service agent calculates the context vector using fingerprint attention to estimate the influence of the other services (line 5-9). First, the service agent transforms the fingerprints of the other service agents through a single fully-connected neural network layer (line 6). Second, the agent calculates the dot products [34] of the transformed fingerprints and its fingerprint to get their similarities as the attention scores (line 6). Third, scaled softmax is applied to transform the attention scores into attention weights. Finally, the agent concatenates the current states and fingerprints of the other service agents to calculate the weighted sum as the context vector (line 11). As shown in Fig. 6, the neural network receives the concatenation of the user request vector and the context vector as input.
The service agent predicts satisfaction \(Q_s\) and interference \(Q_i\) separately based on the service request and the context (line 12), and Q-value is calculated by subtracting \(Q_i\) from \(Q_s\) (line 13). Based on the results of Q-value estimation, the user selects the service agent with the highest Q-value (line 15).
4.1.2 Online training algorithm of the service agents
Algorithm 2 outlines the details of the online training of the service agents after deployment in the environment. After deployment, each service agent initializes its experience memory [35] and neural network (line 1-3). At each step, the users in the environment may make new service requests, and the service agents observe the states of the environment (line 5). The service agents continually estimate Q-values according to the users’ requests (line 4-33) and use users’ feedback to improve accuracy (line 29). For each service request, the candidate service agents return the highest Q-value (line 8).
If the highest Q-value returned by the service agents is higher than zero, the user selects the service with the highest Q-value (line 9). Otherwise, if the user cannot expect high quality from any candidate, the user selects the least-selected service, expecting novel data (line 11). Then, the selected service agent starts providing the service to the user and becomes unavailable to the other users (line 18-19).
According to the current state and the selections, the environment transitions to the next state (line 22). As a result of the transition, the environment returns the reward value and ongoing sub-episodes of the service provision. For each sub-episode, the service agent records (1) the reward value, (2) the user state, and (3) the other services’ states (line 23-32). After the user finishes using the service, the user releases the service agent (line 26). The released service agent adds the sub-episode to its memory (line 27) and updates its neural network and fingerprint (line 29). The update occurs only if the memory is full enough to avoid over-fitting problems (line 28).
Algorithm 3 shows how each service agent updates its neural network and fingerprint vector according to the collected data. A service agent performs the learning process when a new sub-episode is added. Initially, the service agent splits the memory into several folds, where the number of folds is predefined (line 1). For each fold, the service agent sets the fold as the validation set (line 3) and the other folds as the training set (line 4). The service agent iteratively updates the neural network using the training set (line 8), measuring the prediction loss using the validation set (line 9). If the loss increases, the service agent decreases the patience value and stops the training when the value becomes zero (line 10-15). Splitting the training and validation set and early-stopping the training according to the loss are popular techniques for avoiding over-fitting problems.
4.2 Discussion
We discuss the PLEASSURE’s limitations and potential solutions as future research directions.
First, PLEASSURE suffers from the cold-start problem and requires enough feedback data to learn in a new environment. However, before the quality prediction becomes accurate, the users may be dissatisfied. Therefore, the training algorithm of PLEASSURE should be data-efficient such that the service agents can learn effectively even with a small amount of data. In our future works, we plan to adopt federated learning techniques [36] that can improve data efficiency by utilizing the knowledge collected from other agents.
Second, PLEASSURE lacks adaptability toward the dynamic and continual changes in public IoT environments. For instance, obstacles and devices may join, leave, or move to another location. In particular, the users have mobility patterns that may require continual reconfiguration of the prediction model or additional analysis. Furthermore, limitless agents may dynamically join the environment and may suffer from the network scalability issue caused by numerous agents that securely exchange private information. In particular, maintaining security and reliability in public IoT environments is essential but challenging because of their highly dynamic nature [37]. We have plans to adopt rapid exploration and domain adaptation techniques [38] so that the service agents can deal with statistical shifts, maintaining high prediction accuracy even in continually changing environments.
Third, PLEASSURE assumes the users to report feedback after utilizing services. However, collecting explicit feedback is hard in practice because of inconvenient processes and privacy protection. There are several potential solutions to alternate the explicit feedback system. First, various sensors embedded in commercial off-the-shelf wearable devices can be utilized to measure the service effectiveness. For instance, if the microphone of the user’s smartphone can receive the sound from the speakers, probably the user can perceive the sound also [39]. In another instance, the brightness sensor of the user’s smartwatch can measure the brightness of the environment and compare it with the user’s preference [40]. Such measurable values can be partially utilized to indicate user satisfaction but cannot represent subjective factors such as cognition. Second, another alternative is observing the users’ reactions and implicitly inferring feedback. For instance, if the user iteratively increases the sound volume of the speaker, it probably means that the user cannot perceive the sound appropriately. In another instance, if the user moves to a particular location and stays until the content ends, it probably means that the user perceives the content from the display or speaker there. Such implicit feedback inference may require additional sensor devices to detect user reactions and sophisticated inference models based on intuitions. Based on the recent advances in wearable devices such as wireless earphones and smart watches, user reactions would be detectable without severe privacy leakage. However, recognizing user reactions using extremely noisy data from wearable devices is a challenging problem. Third, designing an easy-to-use user interface would be one of the solutions from the perspective of Human-Computer Interaction (HCI) to reduce the required efforts. For instance, collecting feedback through conversational interfaces would be easier than through visual interfaces. Based on the significant advances in chatbots and Large Language Models (LLM), developing conversational interfaces that receive user feedback through natural language becomes feasible. Furthermore, users’ dissatisfaction toward environmental conditions can be extracted by inferring implicatures in the conversations [41]. For instance, if the user says “It is too hot”, it probably means that the user is dissatisfied with the heating or cooling service that controls the room temperature.
5 Evaluation
We evaluate PLEASSURE in this section by simulating various instances of public IoT environments. We train and test the service agents of PLEASSURE in each simulated environment to compare with the baseline algorithms. The simulation code used for the evaluation and generated dataset are available online.Footnote 1
5.1 Evaluation goals
The primary goal of the evaluation is to assess PLEASSURE’s ability to train the service agents to learn the influence of environmental factors and other services solely based on the users’ feedback. We measure the performance of PLEASSURE in terms of the rewards collected by the service agents in the simulations. If PLEASSURE results in higher rewards than other algorithms, we can conclude that the service agents successfully maximize the users’ satisfaction while minimizing the interference among the services.
Another evaluation goal is to measure the learning speed of PLEASSURE with limited explorations. Commonly, reinforcement learning utilizes randomized exploration strategies to escape local optimal solutions by collecting novel training data [12]. However, PLEASSURE is designed to avoid such exploration that may dissatisfy the users in the real world. Therefore, we measure the required training time for PLEASSURE to exceed the performance of the baseline algorithms.
The final evaluation goal is to check the execution time of selecting services. If the quality prediction of the service agents takes an unacceptably long time for computation, PLEASSURE loses its practicality. Thus, the execution time of the service selection process should be in an acceptable range.
5.2 Simulation settings
Table 2 shows the specific configurations of the simulations. We assume small room-scale environments (10m \(\times \) 10m \(\times \) 3m) that may accommodate a maximum of ten users. We randomly distribute the IoT devices and obstacles to construct a new environment. Note that the numbers and the locations of the IoT devices and obstacles are static for each simulation. To control the influence of environmental factors, we set the number of obstacles in environments from 0 to 15. Note that we conducted experiments with variable numbers of obstacles, services, and users to simulate various degrees of uncertainty. When the number of obstacles, services, and users increases, each agent faces higher uncertainty caused by more affecting factors.
We simulate acoustic services that deliver acoustic content to users using distributed speaker devices, one of the most common IoT services physically affected by environmental factors. Each service agent has different maximum intensity from calm conversation (40 dB) to loud music (100 dB), according to the randomly assigned specification of the associated speaker device. The intensity the user perceives is inversely proportional to the distance between the speaker device and the user. Obstacles may absorb, reflect, and diffract the sounds generated by speaker devices, but we simulate only sound absorption because it is the most influencing phenomenon. Therefore, obstacles between the speaker device and the user absorb the sound by its randomly assigned absorption rate (90-100%).
Users can enter the environment at random moments and move until leaving the environment. The number of users in the environment is variable but has an upper limit. Users can request a service at random moments with randomly assigned intensity and duration. The range of required intensity is predefined as the range of conversation-level sounds (50-60 dB); the range of required duration is predefined from short music tracks to news clips (5-15 steps). The user may adjust the intensity of the selected service during the provision to fit the requirement. The service agent, however, may fail to fulfill the required intensity because of distance, obstacles, and limited maximum intensity. For instance, if the service is too far from the user or an obstacle blocks the service from the user, the user cannot perceive the sound even if the intensity is set to the maximum.
The distribution of obstacles in the environment may restrict the mobility of the users. We simulate the mobility of the users by setting patterns for each environment according to the obstacles. When constructing a new environment, we distribute random mobility patterns over the space. Each instance of the mobility patterns has location, direction, and strength. We adjust the direction of the mobility patterns to be nonorthogonal to the obstacles considering that users may walk along the obstacles. Furthermore, we normalize the adjacent mobility patterns to be similar. When a user enters the environment, the user is given a random momentum. For each step, the user updates the momentum by referring to the mobility patterns in the vicinity. Then, the user walks according to the updated momentum until leaving the environment. As a result, the users have similar tendencies but not identical mobility trajectories.
We simulate the satisfaction and interference feedback of the users according to the users’ perception of the sound from the services. Primarily, the user gives positive feedback if the perceived intensity is higher than the required intensity and the interference. Otherwise, the user gives negative feedback. The interference encompasses sound from other services and background noise. For realistic simulations, the users may give negative feedback if the interference in the environment exceeds a predefined threshold.
5.3 Evaluation settings
We compare the performance of PLEASSURE with the baseline algorithms in several settings. For a fair comparison, we test the performance of different algorithms with identical trajectories of user locations, mobility, and requests. We construct 25 testing trajectories for each of the 25 environments to ensure statistical significance.
Table 3 shows the training settings of the PLEASSURE. Furthermore, we compare different versions of PLEASSURE to show the effectiveness of fingerprint attention. The agents of the ‘Independent’ version ignore the other service agents and train independently as a Naïve reinforcement learning baseline based on Deep Q-Network (DQN) [43]. The agents of the ‘EqualAttention’ version collect the other service agents’ states but average them without fingerprint attention. The ‘FingerprintAttention’ version is the full version of PLEASSURE, in which the agents learn their fingerprints and calculate attention to summarize the states of the other service agents.
Because there is no other work that attempts to solve the selection problem of public IoT services under the physical influence of environmental factors, we compare PLEASSURE with the following baseline algorithms, which are common heuristics for selecting IoT services [44]. First, the ‘NearestGreedy’ algorithm selects the nearest service regardless of obstacles. Note that selecting the nearest service is the most straightforward solution for IoT services but requires additional sensor devices to detect the locations of the devices. Second, the ‘WallNearestGreedy’ algorithm selects the service provided by the nearest device that is not blocked by obstacles. The WallNearestGreedy algorithm requires the locations of the devices and obstacles in the environment as input, which are hardly available in practice.
5.4 Results and analysis
Figure 7 shows the performance of the algorithms during the training in environments with 15 services and different numbers of obstacles. We measure the rewards collected by each algorithm following the testing trajectories at the beginning and end of each training day. We focus on relative rather than absolute values because the feedback in real-world situations may have different scales. The collected reward of the NearestGreedy algorithm is low, and that of the WallNearestGreedy algorithm is slightly higher, avoiding the services blocked by obstacles except in cases where there are no obstacles to consider. The difference between the NearestGreedy and WallNearestGreedy algorithms increases as the number of obstacles increases. Before training, PLEASSURE performs worse than the baseline algorithms. However, after one to four training days, depending on the number of obstacles, the service agents of PLEASSURE successfully learn the influence of environmental factors and service interference by taking advantage of multi-agent reinforcement learning, even with dynamically behaving users and agents. Finally, the performance of PLEASSURE exceeds that of the baseline algorithms. Therefore, PLEASSURE effectively selects services that satisfy the users and reduce interference without the details of the environmental factors given to the baseline algorithms. Furthermore, the performance of the FingerprintAttention version of PLEASSURE is higher than that of the EqualAttention and Independent versions because other versions without fingerprint attention are less effective in estimating the interference among services.
Figure 8 shows the final performance of the algorithms in different environmental settings, with the number of services set to 15. We train the service agents with five maximum users and test with one, three, five, seven, and nine maximum users to evaluate the generalization of the agents to different numbers of users. We test the statistical significance of whether the performance of the FingerprintAttention version of PLEASSURE is greater than that of (1) the Independent version and (2) the WallNearestGreedy algorithm. We use the Wilcoxon signed-rank test given that the data consists of paired samples with identical testing trajectories and that the normality of the data is unknown. Note that we disregard the EqualAttention version and the NearestGreedy algorithm for the statistical test since they are generally outperformed by the Independent version and the WallNearestGreedy algorithm, respectively. EqualAttention’s poorer performance may be due to the lack of appropriate weightings in incorporating the context of other service agents into the quality prediction. In the plots, the test results with p-values lower than 0.0001 are labeled with four asterisks while those that are not significant are labeled with ‘ns’. As depicted in the green box, the FingerprintAttention version of PLEASSURE generally outperforms the other versions and baseline algorithms. However, when the number of users is higher in the testing phase than in the training phase, the baseline algorithms and other versions of PLEASSURE may show better performance, as shown in the red dotted box. This implies that (1) PLEASSURE may have difficulties in extrapolating the quality prediction with more users than the users in the training phase, or (2) the environments consisting of dense users and obstacles are too harsh to provide services appropriately, and the overall performances of the algorithms are low. Also, we expected the collected rewards to decrease as the number of obstacles increases. However, the differences are not significant because the obstacles also block interference. As the number of users increases, the mean of the collected reward decreases due to higher interference, and the variance increases due to higher randomness.
Figure 9 shows the performance of the algorithms in environments with different numbers of services. Both the number of users and obstacles are set to five. The rewards collected by the baseline algorithms increase as the number of services increases. However, the rewards collected by PLEASSURE have relatively less difference even though the number of services increases. The results imply that PLEASSURE can achieve high performance even in environments with few services. Furthermore, we expected the performance of the FingerprintAttention version to outperform the other versions of PLEASSURE in environments with more services. However, the collected rewards of the EqualAttention version are higher than those of the FingerprintAttention version in environments with more services. The results imply that the amount of service interference depends more on the number of users activating services than the number of services.
Figure 10 shows the time spent for each selection by the different versions of PLEASSURE. The selection time heavily depends on the amount of required computation according to the size of the service agents’ neural networks. Note that heuristic-based baselines require not heavy computation but precise localization of users and devices instead and cannot deal with environmental factors. Because the FingerprintAttention version has an additional layer for calculating attention, the selection time is almost twice that of the Independent version. We expected the selection time of the FingerprintAttention version to increase as the number of users increases with more potential interference. However, the selection time slightly decreases as the number of users increases because the number of candidate services decreases as more services are occupied. Therefore, the selection time of the FingerprintAttention may depend on the number of services linearly, not exponentially. Furthermore, in real-world environments, the quality predictions may require more computation time given the resource-constrained nature of IoT devices. However, this computation can be distributed over multiple service agents in parallel, reducing the overall computation time. Therefore, fingerprint attention can be applied to environments with many users without severe computational costs.
5.5 Threats to validity
Internally, PLEASSURE and the baseline algorithms should be compared under fair and identical conditions. We test the algorithms following the same trajectories (user location, mobility, and request) in the same environment (locations of obstacles and services). Therefore, we can conclude that the results and analysis of the evaluation are valid. Externally, the evaluation results should be generalizable to other environments. We test PLEASSURE in a sufficient number of randomly constructed environments. Therefore, we can conclude that PLEASSURE will generally outperform the baseline algorithms in other environments. The simulation should include influencing factors that may affect the quality of IoT services. In the simulations, we include the essential factors that may affect the quality of acoustic services such as sound attenuation and absorption. However, the simulated environments have limitations such as (1) artificial user behaviors, (2) unrealistic distribution of obstacles and services, and (3) lack of acoustics simulation. We plan to solve the limitations of PLEASSURE in our future works by conducting real-world studies in controlled or public environments.
6 Conclusion
In public spaces, where service agents openly provide interactive IoT services to users, users should select the most promising service among discovered candidate services to maximize satisfaction while minimizing interference from other services. However, existing approaches to IoT service selection cannot deal with (1) environmental factors such as walls that may physically affect the device and the user and (2) interference among the services in the same space. We propose a novel learning-based service selection framework named PLEASSURE to select services by predicting expected long-term user satisfaction and service interference based on multi-agent reinforcement learning. Each service agent of PLEASSURE learns its specialized prediction model of satisfaction and interference solely based on the users’ feedback without any sophisticated models or additional sensors. Furthermore, we propose fingerprint attention that extends the conventional attention mechanism to enable the service agents to learn their fingerprints to calculate attention weights of each other. Powered by fingerprint attention, the service agents of PLEASSURE can learn the influences of other services, which are necessary but hidden factors for quality prediction.
We evaluate PLEASSURE by simulating public IoT services in randomly constructed environments. We compare the performance of several versions of PLEASSURE and baseline algorithms on the collected reward that represents the satisfaction and interference feedback from the users. The simulation results show that PLEASSURE with fingerprint attention outperforms the baseline algorithms in most environments.
We have three plans for future works to improve the limitations of PLEASSURE. Firstly, we may adopt federated learning [36] and domain-adaptation [38] techniques to accelerate the learning process of PLEASSURE, making the service agents adapt to the environmental changes flexibly. Secondly, we will conduct user studies with real-world implementations of PLEASSURE and a feedback system to verify the practicality of our approach. Thirdly, we may extend the actions of the service agents so that the agents can provide service more autonomously and cooperatively.
Data Availability
This work’s simulation codes and generated datasets are publicly available.
References
Fang S-S, Chai Z-Y, Li Y-L (2021) Dynamic multi-objective evolutionary algorithm for iot services. Appl Intell 51(3):1177–1200
Benazzouz Y, Munilla C, Günalp O, Gallissot M, Gürgen L (2014) Sharing user iot devices in the cloud. In: 2014 IEEE World Forum on Internet of Things (WF-IoT), pp 373–374. IEEE
Want R, Schilit BN, Jenson S (2015) Enabling the internet of things. Computer 48(1):28–35
Bouguettaya A, Sheng QZ, Benatallah B, Neiat AG, Mistry S, Ghose A, Nepal S, Yao L (2021) An internet of things service roadmap. Commun ACM 64(9):86–95
Issarny V, Bouloukakis G, Georgantas N, Billet B (2016) Revisiting service-oriented architecture for the iot: a middleware perspective. In: International conference on service-oriented computing, pp 3–17. Springer
Chen C-F, Huang C-Y (2021) Investigating the effects of a shared bike for tourism use on the tourist experience and its consequences. Curr Issues Tour 24(1):134–148
Coenen J, Wouters N, Moere AV (2016) Synchronized wayfinding on multiple consecutively situated public displays. In: Proceedings of the 5th ACM international symposium on pervasive displays, pp 182–196
Knierim P, Maurer S, Wolf K, Funk M (2018) Quadcopter-projected in-situ navigation cues for improved location awareness. In: Proceedings of the 2018 CHI conference on human factors in computing systems, pp 1–6
Alqahtani A, Alsubai S, Bhatia M (2024) Applied artificial intelligence framework for smart evacuation in industrial disasters. Appl Intell 1–16
Liu Q, Xu H, He B, Yuan H, Liu Z, Fan S, Xu J, Li T, Li J, Wang M et al (2023) A novel context inconsistency elimination algorithm based on the optimized dempster-shafer evidence theory for context-awareness systems. Appl Intell 53(12):15261–15277
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. MIT press
Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C (Appl Rev) 38(2):156–172
Chai Z, Hou H, Li Y (2023) A dynamic queuing model based distributed task offloading algorithm using deep reinforcement learning in mobile edge computing. Appl Intell 53(23):28832–28847
Zhang X, Wang Y (2023) Deepmecagent: multi-agent computing resource allocation for uav-assisted mobile edge computing in distributed iot system. Appl Intell 53(1):1180–1191
Dulac-Arnold G, Levine N, Mankowitz DJ, Li J, Paduraru C, Gowal S, Hester T (2021) Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Mach Learn 110(9):2419–2468
Moghaddam M, Davis JG (2014) Service selection in web service composition: A comparative review of existing approaches. Web services foundations, pp 321–346
Yen I, Bastani F, Hwang S-Y, Zhu W, Zhou G et al (2017) From software services to iot services: the modeling perspective. In: International conference on serviceology, pp 215–223. Springer
Jin X, Chun S, Jung J, Lee K-H (2017) A fast and scalable approach for iot service selection based on a physical service model. Inf Syst Front 19(6):1357–1372
Altaf A, Abbas H, Iqbal F, Khan MMZM, Daneshmand M (2020) Robust, secure, and adaptive trust-oriented service selection in iot-based smart buildings. IEEE Internet Things J 8(9):7497–7509
Baek K, Ko I-Y (2023) Dynamic and effect-driven output service selection for iot environments using deep reinforcement learning. IEEE Internet Things J 10(4):3339–3355
Minovski D, Åhlund C, Mitra K (2020) Modeling quality of iot experience in autonomous vehicles. IEEE Internet Things J 7(5):3833–3849
Brooks P, Hestnes B (2010) User measures of quality of experience: why being objective and quantitative is important. IEEE Netw 24(2):8–13
Mitra K, Zaslavsky A, Åhlund C (2013) Context-aware qoe modelling, measurement, and prediction in mobile computing systems. IEEE Trans Mob Comput 14(5):920–936
Kougioumtzidis G, Poulkov V, Zaharis ZD, Lazaridis PI (2022) A survey on multimedia services qoe assessment and machine learning-based prediction. IEEE Access 10:19507–19538
Skorin-Kapov L, Varela M, Hoßfeld T, Chen K-T (2018) A survey of emerging concepts and challenges for qoe management of multimedia services. ACM Trans Multimed Comput Commun Appl (TOMM) 14(2s):1–29
Purohit L, Kumar S (2021) A study on evolutionary computing based web service selection techniques. Artif Intell Rev 54(2):1117–1170
Dahan F, Mathkour H, Arafah M (2019) Two-step artificial bee colony algorithm enhancement for qos-aware web service selection problem. IEEE Access 7:21787–21794
Ren L, Wang W, Xu H (2017) A reinforcement learning method for constraint-satisfied services composition. IEEE Trans Serv Comput 13(5):786–800
Wang H, Wu Q, Chen X, Yu Q, Zheng Z, Bouguettaya A (2014) Adaptive and dynamic service composition via multi-agent reinforcement learning. In: 2014 IEEE international conference on web services, pp 447–454. IEEE
Zhang X, Tian S, Liu Y, Cao Z (2023) User location-aware edge services selection based on generative adversarial network and improved ant colony algorithm. Appl Intell 53(11):13643–13664
Wang X, Ye J, Lui JC (2023) Online learning aided decentralized multi-user task offloading for mobile edge computing. IEEE Trans Mob Comput
Zhou M, Liu Z, Sui P, Li Y, Chung YY (2020) Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Adv Neural Inf Process Syst 33:11853–11864
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, vol 30
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp 1273–1282. PMLR
Schiller E, Aidoo A, Fuhrer J, Stahl J, Ziörjen M, Stiller B (2022) Landscape of iot security. Comput Sci Rev 44:100467
Steinparz CA, Schmied T, Paischer F, Dinu M-C, Patil VP, Bitto-Nemling A, Eghbal-zadeh H, Hochreiter S (2022) Reactive exploration to cope with non-stationarity in lifelong reinforcement learning. In: Conference on lifelong learning agents, pp 441–469. PMLR
Kim W, Lee S, Chang Y, Lee T, Hwang I, Song J (2021) Hivemind: social control-and-use of iot towards democratization of public spaces. In: Proceedings of the 19th annual international conference on mobile systems, applications, and services, pp 467–482
Karapetyan A, Chau SC-K, Elbassioni K, Khonji M, Dababseh E (2018) Smart lighting control using oblivious mobile sensors. In: Proceedings of the 5th conference on systems for built environments, pp 158–167
Kim S, Ko I-Y (2022) A conversational approach for modifying service mashups in iot environments. In: Proceedings of the 2022 CHI conference on human factors in computing systems, pp 1–16
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. https://doi.org/10.48550/arxiv.1412.6980. arXiv:1412.6980
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv:1312.5602
Baek K-D, Ko I-Y (2017) Spatially cohesive service discovery and dynamic service handover for distributed iot environments. In: International conference on web engineering, pp 60–78. Springer
Acknowledgements
This work was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP)-ITRC (Information Technology Research Center) grant funded by the Korea government (MSIT) (IITP-2025-RS-2020-II201795, 50%). This research was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (RS-2024-00451716, 50%).
Funding
Open Access funding enabled and organized by KAIST.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Baek, K., Ko, IY. Effective selection of public IoT services by learning uncertain environmental factors using fingerprint attention. Appl Intell 55, 578 (2025). https://doi.org/10.1007/s10489-025-06472-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06472-8