Abstract
Trading the machine learning-based prediction services has been up-and-coming for individuals and small companies. It serves to directly provide the predictions, e.g., classifications, for consumers without domain knowledge. Existing prediction service pricing methods closely rely on the strong assumption of completely known information on service quality and consumers’ valuations. In this paper, we study the profit maximization problem of pricing prediction services under incomplete information for the first time. We propose a novel Service Market model, named SMELT, considering multiple types of customers with dEmand and quaLity-aware valuaTions. We first derive the theoretical optimal solution to maximize service profit with complete information. Then, we develop an effective framework PSPricer under the profit ratio guarantee to solve the profit maximization problem with incomplete information. It is capable of not only efficiently getting the sub-optimal service price with bounded revenue loss, but also effectively learning the service quality function with the maximum likelihood estimation. Moreover, due to the resource-intensive and costly characteristic of machine learning model inference, we further extend the SMELT model to consider the inevitable inference cost in service trading. We formulate a novel cost-aware profit maximization problem and derive the general optimal solution. The PSPricer framework is tailored with an effective heuristic to maximize the cost-aware profit with the theoretical profit ratio guarantee. Extensive experiments on real-life datasets demonstrate our theoretical findings and the effectiveness and efficiency of PSPricer in various settings, compared with the state-of-the-art approaches.


























Similar content being viewed by others
References
ABBYY: Abbyy OCR service (2022). https://www.abbyy.com/
Agarwal, A., Dahleh, M.A., Sarkar, T.: A marketplace for data: An algorithmic solution. In: EC, pp. 701–726 (2019)
AggData: Aggdata market (2022). https://terbine.com
Alsheikh, M.A., Hoang, D.T., Niyato, D., Leong, D., Wang, P., Han, Z.: Optimal pricing of internet of things: a machine learning approach. IEEE J. Sel. Areas Commun. 38(4), 669–684 (2020)
Amazon: Amazon mechanical turk (2022). https://www.mturk.com/
Amazon: Amazon text classification service (2022). https://aws.amazon.com/marketplace/pp/prodview-ztfazwmcw7pxc
Amazon: Aws marketplace (2022). https://aws.amazon.com/marketplace
An, B., Xiao, M., Liu, A., Xie, X., Zhou, X.: Crowdsensing data trading based on combinatorial multi-armed bandit and stackelberg game. In: ICDE, pp. 253–264 (2021)
Azcoitia, S.A., Iordanou, C., Laoutaris, N.: Understanding the price of data in commercial data marketplaces. In: ICDE, pp. 3718–3728 (2023)
Azcoitia, S.A., Laoutaris, N.: Try before you buy: A practical data purchasing algorithm for real-world data marketplaces. In: Proceedings of the 1st International Workshop on Data Economy, pp. 27–33 (2022)
Bertin-Mahieux, T.: Year prediction MSD. UCI machine learning repository (2011). DOI: https://doi.org/10.24432/C50K61
Bishop, C.M.: Pattern recognition and machine learning. Springer-Verlag, Berlin, Heidelberg (2006)
Boston, K., Bettinger, P.: An analysis of monte carlo integer programming, simulated annealing, and tabu search heuristics for solving spatial harvest scheduling problems. Forest Sci. 45(2), 292–301 (1999)
Cai, H., Ye, F., Yang, Y., Zhu, Y., Li, J., Xiao, F.: Online pricing and trading of private data in correlated queries. IEEE Trans. Parallel Distrib. Syst. 33(3), 569–585 (2022)
Cai, Z., Zheng, X., Wang, J., He, Z.: Private data trading towards range counting queries in internet of things. IEEE Trans. Mob. Comput. 22(8), 4881–4897 (2023)
Castelo, S., Rampin, R., Santos, A., Bessa, A., Chirigati, F., Freire, J.: Auctus: a dataset search engine for data discovery and augmentation. Proc. VLDB Endow. 14(12), 2791–2794 (2021)
Chapman, A., Simperl, E., Koesten, L., Konstantinidis, G., Ibáñez, L.D., Kacprzak, E., Groth, P.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2020)
Chawla, S., Deep, S., Koutris, P., Teng, Y.: Revenue maximization for query pricing. Proc. VLDB Endow. 13(1), 1–14 (2019)
Chen, C., Yuan, Y., Wen, Z., Wang, G., Li, A.: GQP: A framework for scalable and effective graph query-based pricing. In: ICDE, pp. 1573–1585 (2022)
Chen, L., Acun, B., Ardalani, N., Sun, Y., Kang, F., Lyu, H., Kwon, Y., Jia, R., Wu, C., Zaharia, M., Zou, J.: Data acquisition: A new frontier in data-centric AI. CoRR (2023)
Chen, L., Jin, Z., Eyuboglu, S., Qu, H., Ré, C., Zaharia, M., Zou, J.: Hapi explorer: comprehension, discovery, and explanation on history of ml apis. AAAI 37, 16416–16418 (2023)
Chen, L., Koutris, P., Kumar, A.: Towards model-based pricing for machine learning in a data marketplace. In: SIGMOD, pp. 1535–1552 (2019)
Chen, L., Zaharia, M., Zou, J.: Efficient online ML API selection for multi-label classification tasks. ICML 162, 3716–3746 (2022)
Chen, Y., Seshadri, S.: Product development and pricing strategy for information goods under heterogeneous outside opportunities. Inf. Syst. Res. 18(2), 150–172 (2007)
Chen, Z., Zhang, H., Li, X., Miao, Y., Zhang, X., Zhang, M., Ma, S., Deng, R.H.: FDFL: fair and discrepancy-aware incentive mechanism for federated learning. IEEE Trans. Inf. Forensics Secur. 19, 8140–8154 (2024)
Cheng, K., Wang, L., Shen, Y., Liu, Y., Wang, Y., Zheng, L.: A lightweight auction framework for spectrum allocation with strong security guarantees. In: INFOCOM, pp. 1708–1717 (2020)
CitizenMe: Citizenme market (2022). https://www.citizenme.com/
Cloud, G.: Ocr service price on google cloud (2022). https://cloud.google.com/vision/on-prem/pricing
Datarade: B2B contact data on datarade (2022). https://datarade.ai/
Deep, S., Koutris, P.: QIRANA: A framework for scalable query pricing. In: SIGMOD, pp. 699–713 (2017)
Ding, D., Xu, B., Lakshmanan, L.V.: Occam: Towards cost-efficient and accuracy-aware image classification inference. arXiv preprint arXiv:2406.04508 (2024)
Feng, Z., Lahaie, S., Schneider, J., Ye, J.: Reserve price optimization for first price auctions in display advertising. In: ICML, pp. 3230–3239 (2021)
Gorham, T.: Online surveys are not enough (2022). https://www.greenbook.org/mr/market-research-methodology/online-surveys-are-not-enough/
Han, M., Light, J., Xia, S., Galhotra, S., Fernandez, R.C., Xu, H.: A data-centric online market for machine learning: From discovery to pricing. arXiv preprint arXiv:2310.17843 (2023)
Handel, B.R., Misra, K.: Robust new product pricing. Mark. Sci. 34(6), 864–881 (2015)
He, Z., Cai, Z.: Trading aggregate statistics over private internet of things data. IEEE Trans. Comput. 73(2), 394–407 (2024)
Hebrail, G., Berard, A.: Individual household electric power consumption. UCI machine learning repository (2012). DOI: https://doi.org/10.24432/C58K54
Hou, H., Qiao, L., Yuan, Y., Chen, C., Wang, G.: A scalable query pricing framework for incomplete graph data. In: DASFAA, pp. 97–113 (2023)
Hu, X., Guo, Y., Hao, J., Li, C.: Co-deployment of UAV and ground base station for data collection in wireless sensor networks. In: ICC, pp. 1–6 (2020)
Hu, Y., Ghosh, R., Govindan, R.: Scrooge: A cost-effective deep learning inference system. In: SoCC, pp. 624–638 (2021)
Information, T.: Microsoft readies ai chip as machine learning costs surge (2024). https://www.theinformation.com/articles/microsoft-readies-ai-chip-as-machine-learning-costs-surge
Insights, F.B.: Machine learning market to reach usd 117.19 billion by 2027 (2020). https://www.globenewswire.com/news-release/2020/07/17/2063938/0/en/Machine-Learning-Market-to-Reach-USD-117-19-Billion-by-2027- Increasing-Popularity-of-Self-Driving-Cars-to-Propel-Demand-from-Automotive-Industry-says-Fortune-Business-Insights.html
Juda, A.I., Parkes, D.C.: The sequential auction problem on eBay: an empirical analysis and a solution. In: EC, pp. 180–189 (2006)
Just, H.A., Kang, F., Wang, T., Zeng, Y., Ko, M., Jin, M., Jia, R.: LAVA: data valuation without pre-specified learning algorithms. In: ICLR (2023)
LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database (2010). https://yann.lecun.com/exdb/mnist/
Li, B., Samsi, S., Gadepally, V., Tiwari, D.: Kairos: Building cost-efficient machine learning inference systems with heterogeneous cloud resources. In: HDPC, pp. 3–16 (2023)
Li, M., Li, P., Guo, L., Huang, X.: PPER: Privacy-preserving economic-robust spectrum auction in wireless networks. In: INFOCOM, pp. 909–917 (2015)
Li, Q., Li, Z., Zheng, Z., Wu, F., Tang, S., Zhang, Z., Chen, G.: Capitalize your data: optimal selling mechanisms for IoT data exchange. IEEE Trans. Mob. Comput. 22(4), 1988–2000 (2023)
Li, Y., Shen, Y., Chen, L.: Camel: Managing data for efficient stream learning. In: SIGMOD, pp. 1271–1285 (2022)
Li, Y., Yu, X., Koudas, N.: Data acquisition for improving machine learning models. Proc. VLDB Endow. 14(10), 1832–1844 (2021)
Li, Z., Ding, B., Yao, L., Li, Y., Xiao, X., Zhou, J.: Performance-based pricing for federated learning via auction. Proc. VLDB Endow. 17(6), 1269–1282 (2024)
Lim, B.H., Lee, H.S.: Portfolio decision with a quadratic utility and inflation risk. Adv. Differ. Equ. 2018(1), 1–16 (2018)
Liu, J., Lou, J., Liu, J., Xiong, L., Pei, J., Sun, J.: Dealer: an end-to-end model marketplace with differential privacy. Proc. VLDB Endow. 14(6), 957–969 (2021)
MacKie-Mason, J.K., Varian, H.R.: Pricing congestible network resources. IEEE J. Sel. Areas Commun. 13(7), 1141–1149 (1995)
Maffii, S., Parolin, R., Ponti, M.: Social marginal cost pricing and second best alternatives in partnerships for transport infrastructures. Res. Transp. Econ. 30(1), 23–28 (2010)
Mankiw, N.G.: Principles of microeconomics (1998)
Mathpix: Mathpix OCR service (2022). https://mathpix.com/
Miao, X., Gao, Y., Chen, L., Peng, H., Yin, J., Li, Q.: Towards query pricing on incomplete data. IEEE Trans. Knowl. Data Eng. 34(8), 4024–4036 (2022)
Miao, X., Peng, H., Chen, K., Peng, Y., Gao, Y., Yin, J.: Maximizing time-aware welfare for mixed items. In: ICDE, pp. 1044–1057 (2022)
Miao, X., Peng, H., Huang, X., Chen, L., Gao, Y., Yin, J.: Modern data pricing models: Taxonomy and comprehensive survey. arXiv preprint arXiv:2306.04945 (2023)
Mirzasoleiman, B., Bilmes, J.A., Leskovec, J.: Coresets for data-efficient training of machine learning models. In: ICML, pp. 6950–6960 (2020)
Misra, K., Schwartz, E.M., Abernethy, J.: Dynamic online pricing with incomplete information using multiarmed bandit experiments. Mark. Sci. 38(2), 226–252 (2019)
Misra, K., Schwartz, E.M., Abernethy, J.D.: Dynamic online pricing with incomplete information using multiarmed bandit experiments. Mark. Sci. 38(2), 226–252 (2019)
Newey, W.K., McFadden, D.: Large sample estimation and hypothesis testing. Handb. Econ. 4, 2111–2245 (1994)
Niazadeh, R., Hartline, J.D., Immorlica, N., Khani, M.R., Lucier, B.: Fast core pricing for rich advertising auctions. Oper. Res. 70(1), 223–240 (2022)
Niu, C., Zheng, Z., Wu, F., Tang, S., Chen, G.: Online pricing with reserve price constraint for personal data markets. In: ICDE, pp. 1978–1981 (2020)
Niu, C., Zheng, Z., Wu, F., Tang, S., Gao, X., Chen, G.: Unlocking the value of privacy: Trading aggregate statistics over private correlated data. In: SIGKDD, pp. 2031–2040 (2018)
Niyato, D., Alsheikh, M.A., Wang, P., Kim, D.I., Han, Z.: Market model and optimal pricing scheme of big data and Internet of Things (IoT). In: ICC, pp. 1–6 (2016)
Oshima, H.: Improved randomized algorithm for k-submodular function maximization. SIAM J. Discret. Math. 35(1), 1–22 (2021)
Park, Y., Qing, J., Shen, X., Mozafari, B.: BlinkML: Efficient maximum likelihood estimation with probabilistic guarantees. In: SIGMOD, pp. 1135–1152 (2019)
Paschalidis, I.C., Tsitsiklis, J.N.: Congestion-dependent pricing of network services. IEEE/ACM Trans. Netw. 8(2), 171–184 (2000)
Pei, J.: A survey on data pricing: from economics to data science. IEEE Trans. Knowl. Data Eng. 34(10), 4586–4608 (2020)
Peng, H., Miao, X., Chen, L., Gao, Y., Yin, J.: Pricing prediction services for profit maximization with incomplete information. In: ICDE, pp. 1353–1365 (2023)
Roughgarden, T.: Algorithmic game theory. Commun. ACM 53(7), 78–86 (2010)
Salmani, M., Ghafouri, S., Sanaee, A., Razavi, K., Mühlhäuser, M., Doyle, J., Jamshidi, P., Sharifi, M.: Reconciling high accuracy, cost-efficiency, and low latency of inference serving systems. In: EuroMLSys, pp. 78–86 (2023)
Sun, P., Che, H., Wang, Z., Wang, Y., Wang, T., Wu, L., Shao, H.: Pain-FL: personalized privacy-preserving incentive for federated learning. IEEE J. Sel. Areas Commun. 39(12), 3805–3820 (2021)
Sun, P., Chen, X., Liao, G., Huang, J.: A profit-maximizing model marketplace with differentially private federated learning. In: INFOCOM, pp. 1439–1448 (2022)
Trovo, F., Paladino, S., Restelli, M., Gatti, N., et al.: Multi-armed bandit for pricing. In: Proceedings of the 12th European Workshop on Reinforcement Learning, pp. 1–9 (2015)
VWO: A/b test pricing (2024). https://vwo.com/blog/ab-testing-price-testing/
Whiteson, D.: SUSY. UCI machine learning repository (2014). DOI: https://doi.org/10.24432/C54606
Wright, K.B.: Researching internet-based populations: advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. J. Comput.-Med. Commun. 10(3), 1034 (2005)
Xiao, Y., Liu, L., Huang, G., Cui, Q., Huang, S., Shi, S., Chen, J.: Bitiimt: A bilingual text-infilling method for interactive machine translation. In: ACL, pp. 1958–1969 (2022)
Xu, A., Zheng, Z., Li, Q., Wu, F., Chen, G.: Vap: Online data valuation and pricing for machine learning models in mobile health. IEEE Trans. Mob, Comput (2023)
Xu, X., Wu, Z., Foo, C.S., Low, B.K.H.: Validation free and replication robust volume-based data valuation. In: NeurIPS, pp. 10,837–10,848 (2021)
Yi, C., Cai, J., Zhang, G.M.: Spectrum auction for differential secondary wireless service provisioning with time-dependent valuation information. IEEE Trans. Wirel. Commun. 16(1), 206–220 (2017)
Zhang, C., Yu, M., Wang, W., Yan, F.: \(\{\)MArk\(\}\): Exploiting cloud services for \(\{\)cost-effective\(\}\),\(\{\)SLO-aware\(\}\) machine learning inference serving. In: USENIX ATC, pp. 1049–1062 (2019)
Zhang, M., Arafa, A., Wei, E., Berry, R.: Optimal and quantized mechanism design for fresh data acquisition. IEEE J. Sel. Areas Commun. 39(5), 1226–1239 (2021)
Zhang, M., Beltrán, F., Liu, J.: A survey of data pricing for data marketplaces. IEEE Trans. Big Data 9(4), 1038–1056 (2023)
Zhang, Z., Pfister, T.: Learning fast sample re-weighting without reward data. In: ICCV, pp. 705–714 (2021)
Zheng, Z., Peng, Y., Wu, F., Tang, S., Chen, G.: ARETE: on designing joint online pricing and reward sharing mechanisms for mobile data markets. IEEE Trans. Mob. Comput. 19(4), 769–787 (2019)
Acknowledgements
This work is partly supported by the National Natural Science Foundation of China under Grants No. 62372404, No. 62125206, and No. 61825205, Leading Goose R&D Program of Zhejiang under Grant No. 2024C01109, Project of National Office for Philosophy and Social Sciences under Grant No. 22 &ZD154, and the Fundamental Research Funds for the Central Universities under Grant No. 226-2024-00030. Jinshan Zhang was supported by National Natural Science Foundation Project of China under Grant No. 62472376, the Key Research and Development Jianbing Program of Zhejiang Province under Grant No. 2023C01002, Hangzhou Major Project and Development Program under Grant No. 2022AIZD0140, and Yongjiang Talent Introduction Program under Grant No. 2022A-236-G. Xiaoye Miao is the corresponding author of the work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, H., Miao, X., Zhang, J. et al. Cost-aware prediction service pricing with incomplete information. The VLDB Journal 34, 28 (2025). https://doi.org/10.1007/s00778-025-00909-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00778-025-00909-9