Skip to main content

Shapley-Based Data Valuation Method for the Machine Learning Data Markets (MLDM)

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2024)

Abstract

Data valuation, the process of assigning value to data based on its utility and usefulness, is a critical and largely unexplored aspect of data markets. Within the Machine Learning Data Market (MLDM), a platform that enables data exchange among multiple agents, the challenge of quantifying the value of data becomes particularly prominent. Agents within MLDM are motivated to exchange data based on its potential impact on their individual performance. Shapley Value-based methods have gained traction in addressing this challenge, prompting our study to investigate their effectiveness within the MLDM context. Specifically, we propose the Gain Data Shapley Value (GDSV) method tailored for MLDM and compare it to the original data valuation method used in MLDM. Our analysis focuses on two common learning algorithms, Decision Tree (DT) and K-nearest neighbors (KNN), within a simulated society of five agents, tested on 45 classification datasets. results show that the GDSV leads to incremental improvements in predictive performance across both DT and KNN algorithms compared to performance-based valuation or the baseline. These findings underscore the potential of Shapley Value-based methods in identifying high-value data within MLDM while indicating areas for further improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwal, A., Dahleh, M., Sarkar, T.: A marketplace for data: an algorithmic solution. In: ACM EC 2019 - Proceedings of the 2019 ACM Conference on Economics and Computation, pp. 701–726 (2019)

    Google Scholar 

  2. Baghcheband, H., Soares, C., Reis, L.: Machine learning data markets: trading data using a multi-agent system. In: 2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Los Alamitos, CA, USA, pp. 450–457. IEEE Computer Society (2022)

    Google Scholar 

  3. Baghcheband, H., Soares, C., Reis, L.P.: Machine learning data markets: evaluating the impact of data exchange on the agent learning performance. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds.) EPIA 2023. LNCS, vol. 14115, pp. 337–348. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-49008-8_27

    Chapter  Google Scholar 

  4. Faroukhi, A.Z., El Alaoui, I., Gahi, Y., Amine, A.: Big data monetization throughout Big Data Value Chain: a comprehensive review. J. Big Data 7(1) (2020)

    Google Scholar 

  5. Ghorbani, A., Zou, J.: Data Shapley: equitable valuation of data for machine learning. In: Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 2242–2251. PMLR (2019)

    Google Scholar 

  6. Jia, R., et al.: Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endow. 12(11), 1610–1623 (2019)

    Article  Google Scholar 

  7. Jia, R., et al.: Towards efficient data valuation based on the Shapley value (2023)

    Google Scholar 

  8. Liu, J., Lou, J., Liu, J., Xiong, L., Pei, J., Sun, J.: Dealer: an end-to-end model marketplace with differential privacy. Proc. VLDB Endow. 14(6), 957–969 (2021)

    Article  Google Scholar 

  9. Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games II, pp. 307–317. Princeton University Press, Princeton (1953)

    Google Scholar 

  10. Shapley, L.S.: The Shapley Value: Essays in Honor of Lloyd S. Shapley edited by Alvin E. Roth. Cambridge University Press (1988)

    Google Scholar 

  11. Sim, R.H.L., Zhang, Y., Chan, M.C., Low, B.K.H.: Collaborative machine learning with incentive-aware model rewards. In: 37th International Conference on Machine Learning, ICML 2020, vol. PartF16814(Ml), pp. 8886–8895 (2020)

    Google Scholar 

  12. Tang, S., et al.: Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset. Sci. Rep. 11, 1–9 (2021)

    Google Scholar 

  13. Tian, Z., et al.: Private data valuation and fair payment in data marketplaces (2023)

    Google Scholar 

  14. Vanschoren, J., van Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198

    Article  Google Scholar 

Download references

Acknowledgments

This work was financially supported (or partially financially supported) by Base Funding – UIDB/00027/2020 of the Artificial Intelligence and Computer Science Laboratory - LIACC - funded by national funds through the FCT/MCTES (PIDDAC) and by a PhD grant from Fundação para a Ciência e Tecnologia (FCT), reference SFRH/BD /06064/2021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hajar Baghcheband .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baghcheband, H., Soares, C., Reis, L.P. (2024). Shapley-Based Data Valuation Method for the Machine Learning Data Markets (MLDM). In: Appice, A., Azzag, H., Hacid, MS., Hadjali, A., Ras, Z. (eds) Foundations of Intelligent Systems. ISMIS 2024. Lecture Notes in Computer Science(), vol 14670. Springer, Cham. https://doi.org/10.1007/978-3-031-62700-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-62700-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-62699-9

  • Online ISBN: 978-3-031-62700-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics