research-article

Generating synthetic data in finance: opportunities, challenges and pitfalls

Authors:

Samuel A. Assefa,

Danial Dervovic,

Mahmoud Mahfouz,

Robert E. Tillman,

Prashant Reddy,

Manuela VelosoAuthors Info & Claims

ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance

Article No.: 44, Pages 1 - 8

https://doi.org/10.1145/3383455.3422554

Published: 07 October 2021 Publication History

Abstract

Financial services generate a huge volume of data that is extremely complex and varied. These datasets are often stored in silos within organisations for various reasons, including but not limited to regulatory requirements and business needs. As a result, data sharing within different lines of business as well as outside of the organisation (e.g. to the research community) is severely limited. It is therefore critical to investigate methods for synthesising financial datasets that follow the same properties of the real data while respecting the need for privacy of the parties involved.

This introductory paper aims to highlight the growing need for effective synthetic data generation in the financial domain. We highlight three main areas of focus that are of particular importance while generating synthetic financial datasets: 1) Generating realistic synthetic datasets. 2) Measuring the similarities between real and generated datasets. 3) Ensuring the generative process satisfies any privacy constraints.

Although these challenges are also present in other domains, the additional regulatory and privacy requirements within financial services present unique questions that are not asked elsewhere. Due to the size and influence of the financial services industry, answering these questions has the potential for a great and lasting impact. Finally, we aim to develop a shared vocabulary and context for generating synthetic financial data using two types of financial datasets as examples.

References

[1]

Nazmiye Ceren Abay, Yan Zhou, Murat Kantarcioglu, Bhavani Thuraisingham, and Latanya Sweeney. 2019. Privacy Preserving Synthetic Data Release Using Deep Learning. In Machine Learning and Knowledge Discovery in Databases, Michele Berlingerio, Francesco Bonchi, Thomas Gärtner, Neil Hurley, and Georgiana Ifrim (Eds.). Springer International Publishing, Cham, 510--526.

[2]

Frédéric Abergel, Marouane Anane, Anirban Chakraborti, Aymen Jedidi, and Ioane Muni Toke. 2016. Limit Order Books. Cambridge University Press.

[3]

John M. Abowd and Lars Vilhuber. 2008. How Protective Are Synthetic Data?. In Privacy in Statistical Databases, Josep Domingo-Ferrer and Yücel Saygin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 239--246.

[4]

G. Acs, L. Melis, C. Castelluccia, and E. De Cristofaro. 2019. Differentially Private Mixture of Generative Neural Networks. IEEE Transactions on Knowledge and Data Engineering 31, 6 (2019), 1109--1121.

[5]

Jacob Adrian. 2016. Informational Inequality: How High Frequency Traders Use Premier Access to Information to Prey on Institutional Investors. Duke Law Technology Review 14 (2016), 256--279.

[6]

Charu C. Aggarwal. 2005. On K-anonymity and the Curse of Dimensionality. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway) (VLDB '05). VLDB Endowment, 901--909.

Digital Library

[7]

Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Rev. Mod. Phys. 74 (Jan 2002), 47--97. Issue 1.

[8]

Maryam Archie, Sophie Gershon, Abigail Katcoff, and Aaron Zeng. 2018. Who's Watching?De-anonymization of Netflix Reviews using Amazon Reviews. Technical Report. MIT.

[9]

Luca Arciero, Claudia Biancotti, Leandro D'Aurizio, and Claudio Impenna. 2009. Exploring Agent-Based Methods for the Analysis of Payment Systems: A Crisis Model for StarLogo TNG. Journal of Artificial Societies and Social Simulation 12, 1 (2009), 2.

[10]

Gilad Asharov, Tucker Hybinette Balch, Antigoni Polychroniadou, and Manuela Veloso. 2020. Privacy-Preserving Dark Pools. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (Auckland, New Zealand) (AAMAS '20). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1747--1749.

Digital Library

[11]

Michael Bain and Claude Sammut. 1995. A Framework for Behavioural Cloning. In Machine Intelligence 15. 103--129.

[12]

Andrea Barbon, Marco Di Maggio, Francesco Franzoni, and Augustin Landier. 2019. Brokers and Order Flow Leakage: Evidence from Fire Sales. The Journal of Finance 74, 6 (2019), 2707--2749.

[13]

Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, and Martin Gould. 2018. Trades, quotes and prices: financial markets under the microscope. Cambridge University Press, Cambridge.

[14]

Claire McKay Bowen and Fang Liu. 2019. Comparative Study of Differentially Private Data Synthesis Methods. arXiv:1602.01063 [stat.ME] (2019).

[15]

David Byrd, Maria Hybinette, and Tucker Hybinette Balch. 2019. ABIDES: Towards High-Fidelity Market Simulation for AI Research. arXiv:1904.12066 [cs.MA] (2019).

[16]

Fabio Caccioli, Paolo Barucca, and Teruyoshi Kobayashi. 2017. Network models of financial systemic risk: A review. Journal of Computational Social Science (10 2017).

[17]

Gregory Caiola and Jerome P. Reiter. 2010. Random Forests for Generating Partially Synthetic, Categorical Data. Trans. Data Privacy 3, 1 (April 2010), 27--42.

[18]

Konstantinos Chatzikokolakis, Miguel E. Andrés, Nicolás Emilio Bordenabe, and Catuscia Palamidessi. 2013. Broadening the Scope of Differential Privacy Using Metrics. In Privacy Enhancing Technologies, Emiliano De Cristofaro and Matthew Wright (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 82--102.

[19]

Long Cheng, Fang Liu, and Danfeng (Daphne) Yao. 2017. Enterprise data breach: causes, challenges, prevention, and future directions. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 5 (2017), e1211.

[20]

Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. 2017. Generating Multi-label Discrete Patient Records using Generative Adversarial Networks. In Proceedings of the 2nd Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research), Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (Eds.), Vol. 68. PMLR, Boston, Massachusetts, 286--305.

[21]

Edward Choi, Siddharth Biswal, Bradley A. Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. 2017. Generating Multi-label Discrete Patient Records using Generative Adversarial Networks. In Proceedings of the Machine Learning for Health Care Conference, MLHC 2017, Boston, Massachusetts, USA, 18--19 August 2017 (Proceedings of Machine Learning Research), Finale Doshi-Velez, Jim Fackler, David C. Kale, Rajesh Ranganath, Byron C. Wallace, and Jenna Wiens (Eds.), Vol. 68. PMLR, 286--305.

[22]

IEX cloud. 2020. https://iexcloud.io/

[23]

I. Glenn Cohen and Michelle M. Mello. 2018. HIPAA and Protecting Health Information in the 21st Century. JAMA 320, 3 (2018), 231--232.

[24]

Douglas Crockford. 2001. JSON. https://www.json.org

[25]

Jon P. Daries, Justin Reich, Jim Waldo, Elise M. Young, Jonathan Whittinghill, Andrew Dean Ho, Daniel Thomas Seaton, and Isaac Chuang. 2014. Privacy, Anonymity, and Big Data in the Social Sciences. Commun. ACM 57, 9 (Sept. 2014), 56--63.

[26]

David Donoho. 2017. 50 Years of Data Science. Journal of Computational and Graphical Statistics 26, 4 (2017), 745--766.

[27]

Jörg Drechsler. 2010. Using Support Vector Machines for Generating Synthetic Datasets. In Proceedings of the 2010 International Conference on Privacy in Statistical Databases (Corfu, Greece) (PSD'10). Springer-Verlag, Berlin, Heidelberg, 148--161.

Digital Library

[28]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[29]

Cynthia Dwork. 2010. Differential Privacy in New Settings. In Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms (Austin, Texas) (SODA '10). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 174--183.

Digital Library

[30]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Shai Halevi and Tal Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 265--284.

Digital Library

[31]

Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. 2010. Differential Privacy Under Continual Observation. In Proceedings of the Forty-second ACM Symposium on Theory of Computing (Cambridge, Massachusetts, USA) (STOC '10). ACM, New York, NY, USA, 715--724.

[32]

Marek Eliáš, Michael Kapralov, Janardhan Kulkarni, and Yin Tat Lee. 2020. Differentially Private Release of Synthetic Graphs. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms. SIAM, 560--578.

[33]

J. Eno and C. W. Thompson. 2008. Generating Synthetic Data to Match Data Mining Patterns. IEEE Internet Computing 12, 3 (May 2008), 78--82.

Digital Library

[34]

P. Erdös and A. Rényi. 1959. On Random Graphs I. Publicationes Mathematicae Debrecen 6 (1959), 290.

[35]

Ferdinando Fioretto and Pascal Van Hentenryck. 2019. Optstream: Releasing Time Series Privately. J. Artif. Int. Res. 65, 1 (May 2019), 423--456.

[36]

Andrea Fronzetti Colladon and Elisa Remondi. 2017. Using social network analysis to prevent money laundering. Expert Systems with Applications 67 (2017), 49 -- 58.

Digital Library

[37]

Rao Fu, Jie Chen, Shutian Zeng, Yiping Zhuang, and Agus Sudjianto. 2019. Time Series Simulation by Conditional Generative Adversarial Net. arXiv:1904.11419 [stat.ML] (2019).

[38]

M. Galbiati and K. Soramäki. 2011. An agent-based model of payment systems. Journal of Economic Dynamics and Control 35, 6 (2011), 859 -- 875.

[39]

L. H. Gilpin, D. Bau, Yuan B. Z., A. Bajwa, M. Specter, and L. Kagal. 2018. Explaining Explanations: An Overview of Interpretability of Machine Learning. The 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018). (2018).

[40]

M. D. Gould, M. A. Porter, S. Williams, M. McDonald, D. J. Fenn, and S. D. Howison. 2013. Limit order books. Quantitative Finance 13, 11 (2013), 1709--1742.

[41]

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A Kernel Two-sample Test. Journal of Machine Learning Research 13, 1 (March 2012), 723--773.

[42]

Daniel Grigat and Fabio Caccioli. 2017. Reverse stress testing interbank networks. Scientific Reports 7, 1 (15 Nov 2017), 15616.

[43]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., USA, 6629--6640.

Digital Library

[44]

Chris Jay Hoofnagle, Bart van der Sloot, and Frederik Zuiderveen Borgesius. 2019. The European Union general data protection regulation: what it is and what it means. Information & Communications Technology Law 28, 1 (2019), 65--98.

[45]

James Jordon, Jinsung Yoon, and Mihaela van der Schaar. 2019. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=S1zk9iRqF7

[46]

Philippe Jorion. 2006. Value at Risk: The New Benchmark for Managing Financial Risk (3^rd ed.). McGraw-Hill.

[47]

Niaz Kammoun, Ahmed Bounfour, Altay Özaygen, and Rokhaya Dieye. 2019. Financial market reaction to cyberattacks. Cogent Economics & Finance 7, 1 (2019), 1645584.

[48]

Georgios Kellaris, Stavros Papadopoulos, Xiaokui Xiao, and Dimitris Papadias. 2014. Differentially Private Event Sequences over Infinite Streams. Proc. VLDB Endow. 7, 12 (Aug. 2014), 1155--1166.

Digital Library

[49]

Daniel Kifer and Ashwin Machanavajjhala. 2012. A Rigorous and Customizable Framework for Privacy. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Scottsdale, Arizona, USA) (PODS '12). ACM, New York, NY, USA, 77--88.

Digital Library

[50]

Nils M. Kriege, Fredrik D. Johansson, and Christopher Morris. 2020. A survey on graph kernels. Applied Network Science 5, 1 (2020), 6.

[51]

Logan Kugler. 2019. Protecting the 2020 Census. Commun. ACM 62, 7 (June 2019), 17--19.

[52]

Blake LeBaron. 2006. Agent-based Computational Finance. In Handbook of Computational Economics, Leigh Tesfatsion and Kenneth L. Judd (Eds.). Handbook of Computational Economics, Vol. 2. Elsevier, Chapter 24, 1187--1233.

[53]

Jure Leskovec and Christos Faloutsos. 2007. Scalable Modeling of Real Graphs Using Kronecker Multiplication. In Proceedings of the 24th International Conference on Machine Learning (Corvalis, Oregon, USA) (ICML '07). Association for Computing Machinery, New York, NY, USA, 497--504.

Digital Library

[54]

Haoran Li, Li Xiong, Lifan Zhang, and Xiaoqian Jiang. 2014. DPSynthesizer: Differentially Private Data Synthesizer for Privacy Preserving Data Sharing. Proc. VLDB Endow. 7, 13 (Aug. 2014), 1677--1680.

Digital Library

[55]

Junyi Li, Xintong Wang, Yaoyang Lin, Arunesh Sinha, and Michael P. Wellman. 2019. Generating Realistic Stock Market Order Streams. https://openreview.net/forum?id=rke41hC5Km

[56]

Seung-Hwan Lim, Sangkeun Lee, Sarah S Powers, Mallikarjun Shankar, and Neena Imam. 2016. Survey of Approaches to Generate Realistic Synthetic Graphs. Tech. Rep. Oak Ridge National Laboratory (2016). Issue ORNL/TM-2016/3.

[57]

Edgar Alonso Lopez-Rojas and Stefan Axelsson. 2015. Using the RetSim Fraud Simulation Tool to Set Thresholds for Triage of Retail Fraud. In Secure IT Systems, Sonja Buchegger and Mads Dam (Eds.). Springer International Publishing, Cham, 156--171.

[58]

Alejandro Mottini, Alix Lheritier, and Rodrigo Acuna-Agost. 2019. Airline Passenger Name Record Generation using Generative Adversarial Networks. Presented at the 2018 ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models. arXiv:1807.06657 [cs.LG] (2019).

[59]

NASDAQ TotalView. 2020. https://www.nasdaq.com/solutions/nasdaq-totalview

[60]

Juyong Park and M. E. J. Newman. 2004. Statistical mechanics of networks. Phys. Rev. E 70 (Dec 2004), 066117. Issue 6.

[61]

Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hongkyu Park, and Youngmin Kim. 2018. Data Synthesis based on Generative Adversarial Networks. Proc. VLDB Endow. 11, 10 (2018), 1071--1083.

Digital Library

[62]

Yubin Park and Joydeep Ghosh. 2014. PeGS: Perturbed Gibbs Samplers That Generate Privacy-Compliant Synthetic Data. Trans. Data Privacy 7, 3 (Dec. 2014), 253--282.

[63]

N. Patki, R. Wedge, and K. Veeramachaneni. 2016. The Synthetic Data Vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). 399--410.

[64]

D. Platt. 2019. A Comparison of Economic Agent-Based Model Calibration Methods. arXiv:1902.05938 [q-fin.CP] (2019).

[65]

Polygon Financial Data Platform. 2020. https://polygon.io/

[66]

Tahereh Pourhabibi, Kok-Leong Ong, Booi H. Kam, and Yee Ling Boo. 2020. Fraud detection: A systematic literature review of graph-based anomaly detection approaches. Decision Support Systems 133 (2020), 113303.

Digital Library

[67]

A. Rahimi and B. Recht. 2017. Reflections on Random Kitchen Sinks. (2017). http://www.argmin.net/2017/12/05/kitchen-sinks/

[68]

Sofya Raskhodnikova and Adam Smith. 2014. Private Analysis of Graph Data. Springer Berlin Heidelberg, Berlin, Heidelberg, 1--6.

[69]

Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. 2018. Assessing Generative Models via Precision and Recall. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., USA, 5234--5243.

Digital Library

[70]

Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, and Ben Y. Zhao. 2010. Measurement-Calibrated Graph Models for Social Network Experiments. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW '10). Association for Computing Machinery, New York, NY, USA, 861--870.

[71]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved Techniques for Training GANs. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., USA, 2234--2242.

Digital Library

[72]

Pierangela Samarati and Latanya Sweeney. 1998. Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report.

[73]

Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. 2019. Are adversarial examples inevitable?. In International Conference on Learning Representations. https://openreview.net/forum?id=r1lWUoA9FQ

[74]

R. Shokri and V. Shmatikov. 2015. Privacy-preserving deep learning. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing(Allerton). 909--910.

[75]

Joan Sieber. 2007. Family Educational Rights and Privacy Act (FERPA). Journal of Empirical Research on Human Research Ethics 2, 1 (2007), 101--101.

[76]

H. Surendra and H. S. Mohan. 2015. A Review Of Synthetic Data Generation Methods For Privacy Preserving Data Publishing. International Journal of Scientific & Technology Research 4, 8 (2015), 95--101.

[77]

Brian Tarran. 2018. What can we learn from the Facebook---Cambridge Analytica scandal? Significance 15, 3 (2018), 4--5.

[78]

L. Theis, A. van den Oord, and M. Bethge. 2016. A note on the evaluation of generative models. arXiv:1511.01844 [stat.ML] (2016).

[79]

B. K. Tripathy, M. S. Sishodia, Sumeet Jain, and Anirban Mitra. 2014. Privacy and Anonymization in Social Networks. Springer International Publishing, Cham, 243--270.

[80]

Ruey Tsay. 2010. Analysis of Financial Time Series (3^rd ed.). Wiley & Sons.

[81]

Magnus Wiese, Robert Knobloch, Ralf Korn, and Peter Kretschmer. 2019. Quant GANs: Deep Generation of Financial Time Series. arXiv:1907.06673 [q-fin.MF] (2019).

[82]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 7333--7343.

[83]

Jun Zhang. 2016. Algorithms for Synthetic Data Release under Differential Privacy. Ph.D. Dissertation. Nanyang Technological University.

[84]

Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2014. PrivBayes: Private Data Release via Bayesian Networks. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). ACM, New York, NY, USA, 1423--1434.

Digital Library

[85]

Bin Zhou, Jian Pei, and WoShun Luk. 2008. A Brief Survey on Anonymization Techniques for Privacy Preserving Publishing of Social Network Data. SIGKDD Explor. Newsl. 10, 2 (Dec. 2008), 12--22.

Digital Library

Cited By

Bruni Prenestino FBarbierato EGatti A(2025)Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo ArchitecturesFuture Internet10.3390/fi1702009517:2(95)Online publication date: 19-Feb-2025
https://doi.org/10.3390/fi17020095
Tkachuk SŁukasik SWróblewska A(2025)Consumer Transactions Simulation Through Generative Adversarial Networks Under Stock Constraints in Large-Scale RetailElectronics10.3390/electronics1402028414:2(284)Online publication date: 12-Jan-2025
https://doi.org/10.3390/electronics14020284
Ravn L(2025) The fabrication of synthetic data promises: Tracing emerging arenas of expectations and boundary work 1 Big Data & Society10.1177/2053951724130791512:1Online publication date: 31-Jan-2025
https://doi.org/10.1177/20539517241307915
Show More Cited By

Index Terms

Generating synthetic data in finance: opportunities, challenges and pitfalls

Recommendations

Using support vector machines for generating synthetic datasets
PSD'10: Proceedings of the 2010 international conference on Privacy in statistical databases

Generating synthetic datasets is an innovative approach for data dissemination. Values at risk of disclosure or even the entire dataset are replaced with multiple draws from statistical models. The quality of the released data strongly depends on the ...
Generating and Evolving Real-Life Like Synthetic Data for e-Government Services Without Using Real-World Raw Data
Product-Focused Software Process Improvement. Industry-, Workshop-, and Doctoral Symposium Papers
Abstract
Testing of applications that use data from e-Government services as input requires test data that is real-life like but where the privacy of personal information is guaranteed. Many approaches exist for creating high-quality synthetic test data, ...
Synthetic Data for Feature Selection
Artificial Intelligence and Soft Computing
Abstract
Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance

October 2020

422 pages

ISBN:9781450375849

DOI:10.1145/3383455

Conference Chair:
Tucker Balch
J.P. Morgan AI Research

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICAIF '20

Sponsor:

ACM

ICAIF '20: ACM International Conference on AI in Finance

October 15 - 16, 2020

New York, New York

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

93
Total Citations
View Citations
2,542
Total Downloads

Downloads (Last 12 months)836
Downloads (Last 6 weeks)108

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bruni Prenestino FBarbierato EGatti A(2025)Robust Synthetic Data Generation for Sequential Financial Models Using Hybrid Variational Autoencoder–Markov Chain Monte Carlo ArchitecturesFuture Internet10.3390/fi1702009517:2(95)Online publication date: 19-Feb-2025
https://doi.org/10.3390/fi17020095
Tkachuk SŁukasik SWróblewska A(2025)Consumer Transactions Simulation Through Generative Adversarial Networks Under Stock Constraints in Large-Scale RetailElectronics10.3390/electronics1402028414:2(284)Online publication date: 12-Jan-2025
https://doi.org/10.3390/electronics14020284
Ravn L(2025) The fabrication of synthetic data promises: Tracing emerging arenas of expectations and boundary work 1 Big Data & Society10.1177/2053951724130791512:1Online publication date: 31-Jan-2025
https://doi.org/10.1177/20539517241307915
Sai SArunakar KChamola VHussain ABisht PKumar S(2025)Generative AI for Finance: Applications, Case Studies and ChallengesExpert Systems10.1111/exsy.7001842:3Online publication date: 13-Feb-2025
https://doi.org/10.1111/exsy.70018
Kiran ARubini PKumar S(2025)Comprehensive Review of Privacy, Utility, and Fairness Offered by Synthetic DataIEEE Access10.1109/ACCESS.2025.353212813(15795-15811)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3532128
Long YKroeger SZaeh MBrintrup A(2025)Leveraging synthetic data to tackle machine learning challenges in supply chains: challenges, methods, applications, and research opportunitiesInternational Journal of Production Research10.1080/00207543.2024.2447927(1-22)Online publication date: 8-Jan-2025
https://doi.org/10.1080/00207543.2024.2447927
Wang ANguyen B(2025)TTVAE: Transformer-based Generative Modeling for Tabular Data GenerationArtificial Intelligence10.1016/j.artint.2025.104292(104292)Online publication date: Jan-2025
https://doi.org/10.1016/j.artint.2025.104292
Alonso-Robisco ACarbó J(2025)Should We Trust the Credit Decisions Provided by Machine Learning Models?Computational Economics10.1007/s10614-025-10855-xOnline publication date: 17-Jan-2025
https://doi.org/10.1007/s10614-025-10855-x
Kulkarni PPathak PPillai STigga V(2025)Role of Generative AI for Fraud Detection and PreventionGenerative Artificial Intelligence in Finance10.1002/9781394271078.ch10(175-198)Online publication date: 21-Jan-2025
https://doi.org/10.1002/9781394271078.ch10
Wang ZDraghi BRotalinti YLunn DMyles P(2024)High-Fidelity Synthetic Data Applications for Data AugmentationDeep Learning - Recent Findings and Research10.5772/intechopen.113884Online publication date: 29-May-2024
https://doi.org/10.5772/intechopen.113884
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten