Abstract
Mobile crowdsensing involves assigning multiple tasks around points of interest to mobile users (MUs) for execution. Developing an optimal task allocation strategy is crucial for the entire system, as it directly impacts the benefits of stakeholders. Leveraging recent advancements in multi-agent reinforcement learning (MARL), which have demonstrated unique advantages in simulating complex interactions among multiple agents, we propose a pricing strategy based on an improved behavior network and multi-agent proximal policy optimization (MAPPO) algorithm. Specifically, we formulate the problem as a multi-leader multi-follower Stackelberg game, and then apply MAPPO, a MARL technique which employs centralized training and decentralized execution, to solve this game. To better capture complex sequential input information and achieve superior behavior strategies, we integrate an attention mechanism with a gated recurrent unit (GRU) network into the actor network, forming a MARL algorithm with an improved behavior network, termed GRU-and-Attention-based MAPPO (GA-MAPPO). Simulation results demonstrate that the proposed GA-MAPPO algorithm is effective compared with baseline approaches. It can learn an optimal pricing strategy that maximizes the benefits of Task Initiators (TIs) and guides TIs in pricing MUs effectively.












Similar content being viewed by others
References
Alashaikh AS, Alhazemi FM (2022) Efficient mobile crowdsourcing for environmental noise monitoring. IEEE Access 10:77251–77262. https://doi.org/10.1109/ACCESS.2022.3191780
Capponi A, Fiandrino C, Kantarci B et al (2019) A survey on mobile crowdsensing systems: challenges, solutions, and opportunities. IEEE Commun Surv Tutor 21(3):2419–2465. https://doi.org/10.1109/COMST.2019.2914030
Dai C, Zhu K, Hossain E (2023) Multi-agent deep reinforcement learning for joint decoupled user association and trajectory design in full-duplex multi-uav networks. IEEE Trans Mob Comput 22(10):6056–6070. https://doi.org/10.1109/TMC.2022.3188473
Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. CoRR abs/1701.05923
Ding R, Yang Z, Wei Y, et al (2021) Multi-agent reinforcement learning for urban crowd sensing with for-hire vehicles. In: IEEE INFOCOM 2021 - IEEE Conference on Computer Communications, pp 1–10, https://doi.org/10.1109/INFOCOM42981.2021.9488713
Duan X, Zhao C, He S et al (2017) Distributed algorithms to compute walrasian equilibrium in mobile crowdsensing. IEEE Trans Industr Electron 64(5):4048–4057. https://doi.org/10.1109/TIE.2016.2645138
Gao H, Xu H, Li L et al (2022) Mean-field-game-based dynamic task pricing in mobile crowdsensing. IEEE Internet Things J 9(18):18098–18112. https://doi.org/10.1109/JIOT.2022.3161952
Gu B, Yang X, Lin Z et al (2021) Multiagent actor-critic network-based incentive mechanism for mobile crowdsensing in industrial systems. IEEE Trans Industr Inf 17(9):6182–6191. https://doi.org/10.1109/TII.2020.3024611
Guo X, Tu C, Hao Y et al (2024) Online user recruitment with adaptive budget segmentation in sparse mobile crowdsensing. IEEE Internet Things J 11(5):8526–8538. https://doi.org/10.1109/JIOT.2023.3318817
Hu CL, Lin KY, Chang CK (2023) Incentive mechanism for mobile crowdsensing with two-stage stackelberg game. IEEE Trans Serv Comput 16(3):1904–1918. https://doi.org/10.1109/TSC.2022.3198436
Huang H, Chen T, Wang H, et al (2022) Mappo method based on attention behavior network. In: 2022 10th International Conference on Information Systems and Computing Technology (ISCTech), pp 301–308, https://doi.org/10.1109/ISCTech58360.2022.00054
Jia J, Xing X, Chang DE (2022) Gru-attention based td3 network for mobile robot navigation. In: 2022 22nd International Conference on Control, Automation and Systems (ICCAS), pp 1642–1647, https://doi.org/10.23919/ICCAS55662.2022.10003950
Jiang B, Du J, Jiang C et al (2024) Underwater searching and multiround data collection via auv swarms: an energy-efficient aoi-aware mappo approach. IEEE Internet Things J 11(7):12768–12782. https://doi.org/10.1109/JIOT.2023.3336055
Kang J, Chen J, Xu M et al (2024) Uav-assisted dynamic avatar task migration for vehicular metaverse services: a multi-agent deep reinforcement learning approach. IEEE/CAA J Autom Sin 11(2):430–445. https://doi.org/10.1109/JAS.2023.123993
Kong X, Xia F, Li J et al (2020) A shared bus profiling scheme for smart cities based on heterogeneous mobile crowdsourced data. IEEE Trans Industr Inf 16(2):1436–1444. https://doi.org/10.1109/TII.2019.2947063
Liu Y, Wang H, Peng M et al (2020) Deepga: a privacy-preserving data aggregation game in crowdsensing via deep reinforcement learning. IEEE Internet Things J 7(5):4113–4127. https://doi.org/10.1109/JIOT.2019.2957400
Liu Y, Wang H, Peng M et al (2021) An incentive mechanism for privacy-preserving crowdsensing via deep reinforcement learning. IEEE Internet Things J 8(10):8616–8631. https://doi.org/10.1109/JIOT.2020.3047105
Lowe R, Wu Y, Tamar A, et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR abs/1706.02275
Tang L, Cheng Z, Dai J et al (2024) Joint optimization of vehicular sensing and vehicle digital twins deployment for dt-assisted ioVs. IEEE Trans Veh Technol 73(8):11834–11847. https://doi.org/10.1109/TVT.2024.3373175
Wang H (2022) A survey of application and key techniques for mobile crowdsensing. Wirel Commun Mob Comput 2022(1):3693537. https://doi.org/10.1155/2022/3693537
Wang H, Tao J, Chi D et al (2024) Imrsg: incentive mechanism based on rubinstein-starr game for mobile crowdsensing. IEEE Trans Veh Technol 73(2):2656–2668. https://doi.org/10.1109/TVT.2023.3318229
Xiao L, Chen T, Xie C, et al (2015) Mobile crowdsensing game in vehicular networks. In: TENCON 2015 - 2015 IEEE Region 10 Conference, pp 1–6, https://doi.org/10.1109/TENCON.2015.7372721
Xu Y, Wang Y, Ma J et al (2022) Psare: a rl-based online participant selection scheme incorporating area coverage ratio and degree in mobile crowdsensing. IEEE Trans Veh Technol 71(10):10923–10933. https://doi.org/10.1109/TVT.2022.3183607
Min M, Zhu H, Yang S, et al (2024) Geo-perturbation for task allocation in 3-d mobile crowdsourcing: An a3c-based approach. IEEE Internet of Things Journal 11(2):1854–1865. https://doi.org/10.1109/JIOT.2023.3295786
Yang Y, Wang W, Liu L et al (2023) Aoi optimization in the uav-aided traffic monitoring network under attack: a stackelberg game viewpoint. IEEE Trans Intell Transp Syst 24(1):932–941. https://doi.org/10.1109/TITS.2022.3157394
Yang Y, Du H, Xiong Z, et al (2024) Enhancing wireless networks with attention mechanisms: Insights from mobile crowdsensing
Yu C, Velu A, Vinitsky E, et al (2021) The surprising effectiveness of MAPPO in cooperative, multi-agent games. CoRR abs/2103.01955
Zhan Y, Xia Y, Zhang J (2018) Quality-aware incentive mechanism based on payoff maximization for mobile crowdsensing. Ad Hoc Netw 72:44–55. https://doi.org/10.1016/j.adhoc.2018.01.009
Zhan Y, Liu CH, Zhao Y et al (2020) Free market of multi-leader multi-follower mobile crowdsensing: an incentive mechanism design by deep reinforcement learning. IEEE Trans Mob Comput 19(10):2316–2329. https://doi.org/10.1109/TMC.2019.2927314
Zhang E, Trujillo R, Templeton JM et al (2023) A study on mobile crowd sensing systems for healthcare scenarios. IEEE Access 11:140325–140347. https://doi.org/10.1109/ACCESS.2023.3342158
Zhou Y, Tong F, He S (2024) Bi-objective incentive mechanism for mobile crowdsensing with budget/cost constraint. IEEE Trans Mob Comput 23(1):223–237. https://doi.org/10.1109/TMC.2022.3229470
Zhu Z, Zhao Y, Chen B et al (2023) A crowd-aided vehicular hybrid sensing framework for intelligent transportation systems. IEEE Trans Intell Veh 8(2):1484–1497. https://doi.org/10.1109/TIV.2022.3216318
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Proof of theorem 1
The corresponding Lagrangian form of Problem 1 is
where \({\lambda _{n0}}\) and \({\lambda _{nm}}\) are the Lagrangian multipliers.
The Karush-Kuhn-Tucker (KKT) conditions are given as follows.
Without loss of generality, we consider \(t_m^n> 0\) and we can obtain that \({\lambda _{nm}} = 0\). Eqn. (A2) can be converted to
Then, Eqn. (A5) can be decomposed into two cases as follows.
-
(1)
Case I: \({\lambda _{n0}} = 0\), according to Eqn. (A5), we have
$$\begin{aligned} t_m^n = \frac{{p_m^n - b_m^n}}{{2a_m^n}}. \end{aligned}$$(A6) -
(2)
Case II: \({\lambda _{n0}}> 0\), according to Eqn. (A5), we have \(t_m^n = \frac{{p_m^n - b_m^n - {\lambda _{n0}}}}{{2a_m^n}}\). Substituting \(t_m^n\) into Eqn. (A3), we have \({\lambda _{n0}} = \frac{{\left( {\sum \nolimits _m {\left( {p_m^n - b_m^n} \right) \left( {\prod \nolimits _{m1 \ne m} {a_{m1}^n} } \right) } } \right) - 2{\kappa _n}\left( {\prod \nolimits _m {a_m^n} } \right) }}{{\sum \nolimits _m {\left( {\prod \nolimits _{m1 \ne m} {a_{m1}^n} } \right) } }}\), and therefore \(t_m^n\) can be obtained as follows.
$$t_{m}^{n} = {\text{ }}\left\{ {\begin{array}{*{20}l} {\frac{{p_{m}^{n} - b_{m}^{n} }}{{2a_{m}^{n} }},} \hfill & {{\text{if}}\;\sum\nolimits_{{m = 1}}^{M} {\frac{{p_{m}^{n} - b_{m}^{n} }}{{2a_{m}^{n} }} \le \kappa _{n} ,} } \hfill \\ {\frac{{p_{m}^{n} - b_{m}^{n} - \lambda _{{n0}} }}{{2a_{m}^{n} }},} \hfill & {{\text{else}}\;{\text{if}}\;\lambda _{{n0}} {\text{> }}0,} \hfill \\ {0,} \hfill & {{\text{otherwise}}{\text{.}}} \hfill \\ \end{array} } \right.$$(A7)
However, the positive or negative of \(t_m^n\) is not constrained in fact and it is a must. Then, we supplement the above equation. Considering that \(\sum _mt_m^n\le \kappa _n\), we have \({C_1} = \sum \nolimits _m {\min \left( {0,t_m^n} \right) }\) and \({C_2} = \sum \nolimits _m {\max \left( {0,t_m^n} \right) }\), which are the cumulative sum of sensing time that are less than or greater than 0. We process the \(t_m^n\) greater than zero with a specified ratio \({{{C_1}} / {{C_2}}}\) and set the \(t_m^n\) less than 0 to 0, which means the consistent distribution of losses and initial benefit.
In summary, the optimal solution to Problem 1 is given as follows.
where,
\(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, S., Yu, Y., Huang, T. et al. Pricing strategies in mobile crowdsensing: an enhanced MAPPO approach using a behavior network. J Supercomput 81, 457 (2025). https://doi.org/10.1007/s11227-025-06957-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-06957-w