Abstract
Data valuation, the process of assigning value to data based on its utility and usefulness, is a critical and largely unexplored aspect of data markets. Within the Machine Learning Data Market (MLDM), a platform that enables data exchange among multiple agents, the challenge of quantifying the value of data becomes particularly prominent. Agents within MLDM are motivated to exchange data based on its potential impact on their individual performance. Shapley Value-based methods have gained traction in addressing this challenge, prompting our study to investigate their effectiveness within the MLDM context. Specifically, we propose the Gain Data Shapley Value (GDSV) method tailored for MLDM and compare it to the original data valuation method used in MLDM. Our analysis focuses on two common learning algorithms, Decision Tree (DT) and K-nearest neighbors (KNN), within a simulated society of five agents, tested on 45 classification datasets. results show that the GDSV leads to incremental improvements in predictive performance across both DT and KNN algorithms compared to performance-based valuation or the baseline. These findings underscore the potential of Shapley Value-based methods in identifying high-value data within MLDM while indicating areas for further improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, A., Dahleh, M., Sarkar, T.: A marketplace for data: an algorithmic solution. In: ACM EC 2019 - Proceedings of the 2019 ACM Conference on Economics and Computation, pp. 701–726 (2019)
Baghcheband, H., Soares, C., Reis, L.: Machine learning data markets: trading data using a multi-agent system. In: 2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Los Alamitos, CA, USA, pp. 450–457. IEEE Computer Society (2022)
Baghcheband, H., Soares, C., Reis, L.P.: Machine learning data markets: evaluating the impact of data exchange on the agent learning performance. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds.) EPIA 2023. LNCS, vol. 14115, pp. 337–348. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-49008-8_27
Faroukhi, A.Z., El Alaoui, I., Gahi, Y., Amine, A.: Big data monetization throughout Big Data Value Chain: a comprehensive review. J. Big Data 7(1) (2020)
Ghorbani, A., Zou, J.: Data Shapley: equitable valuation of data for machine learning. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2242–2251. PMLR (2019)
Jia, R., et al.: Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endow. 12(11), 1610–1623 (2019)
Jia, R., et al.: Towards efficient data valuation based on the Shapley value (2023)
Liu, J., Lou, J., Liu, J., Xiong, L., Pei, J., Sun, J.: Dealer: an end-to-end model marketplace with differential privacy. Proc. VLDB Endow. 14(6), 957–969 (2021)
Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games II, pp. 307–317. Princeton University Press, Princeton (1953)
Shapley, L.S.: The Shapley Value: Essays in Honor of Lloyd S. Shapley edited by Alvin E. Roth. Cambridge University Press (1988)
Sim, R.H.L., Zhang, Y., Chan, M.C., Low, B.K.H.: Collaborative machine learning with incentive-aware model rewards. In: 37th International Conference on Machine Learning, ICML 2020, vol. PartF16814(Ml), pp. 8886–8895 (2020)
Tang, S., et al.: Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset. Sci. Rep. 11, 1–9 (2021)
Tian, Z., et al.: Private data valuation and fair payment in data marketplaces (2023)
Vanschoren, J., van Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198
Acknowledgments
This work was financially supported (or partially financially supported) by Base Funding – UIDB/00027/2020 of the Artificial Intelligence and Computer Science Laboratory - LIACC - funded by national funds through the FCT/MCTES (PIDDAC) and by a PhD grant from Fundação para a Ciência e Tecnologia (FCT), reference SFRH/BD /06064/2021.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baghcheband, H., Soares, C., Reis, L.P. (2024). Shapley-Based Data Valuation Method for the Machine Learning Data Markets (MLDM). In: Appice, A., Azzag, H., Hacid, MS., Hadjali, A., Ras, Z. (eds) Foundations of Intelligent Systems. ISMIS 2024. Lecture Notes in Computer Science(), vol 14670. Springer, Cham. https://doi.org/10.1007/978-3-031-62700-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-62700-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62699-9
Online ISBN: 978-3-031-62700-2
eBook Packages: Computer ScienceComputer Science (R0)