skip to main content
research-article

Controllable Tabular Data Synthesis Using Diffusion Models

Published:26 March 2024Publication History
Skip Abstract Section

Abstract

Controllable tabular data synthesis plays a crucial role in numerous applications by allowing users to generate synthetic data with specific conditions. These conditions can include synthesizing tuples with predefined attribute values or creating tuples that exhibit a particular correlation with an external table. However, existing approaches lack the flexibility to support new conditions and can be time-consuming when dealing with multiple conditions. To overcome these limitations, we propose a novel approach that leverages diffusion models to first learn an unconditional generative model. Subsequently, we introduce lightweight controllers to guide the unconditional generative model in generating synthetic data that satisfies different conditions. The primary research challenge lies in effectively supporting controllability using lightweight solutions while ensuring the realism of the synthetic data. To address this challenge, we design an unconditional diffusion model tailored specifically for tabular data. Additionally, we propose a new sampling method that enables correlation-aware controls throughout the data generation process. We conducted extensive experiments across various applications for controllable tabular data synthesis, which show that our approach outperforms the state-of-the-art methods.

References

  1. [n. d.]. Airbnb Data Set. https://public.opendatasoft.com/explore/dataset/airbnb-listings.Google ScholarGoogle Scholar
  2. [n. d.]. Default Data Set. https://archive.ics.uci.edu/dataset/350/defaultofcreditcardclients.Google ScholarGoogle Scholar
  3. [n. d.]. Heart Data Set. https://www.openml.org/data/download/6358/BNG_heart-statlog.arff.Google ScholarGoogle Scholar
  4. [n. d.]. Imdb Data Set. http://homepages.cwi.nl/~boncz/job/imdb.tgz.Google ScholarGoogle Scholar
  5. [n. d.]. WeatherAUS Data Set. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.Google ScholarGoogle Scholar
  6. Rameen Abdal, Peihao Zhu, Niloy J. Mitra, and Peter Wonka. 2021. StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. ACM Trans. Graph. 40, 3 (2021), 21:1--21:21. https://doi.org/10.1145/3447648Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2021. Deep Neural Networks and Tabular Data: A Survey. CoRR abs/2110.01889 (2021). arXiv:2110.01889 https://arxiv.org/abs/2110.01889Google ScholarGoogle Scholar
  8. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.htmlGoogle ScholarGoogle Scholar
  9. Zhipeng Cai, Zuobin Xiong, Honghui Xu, Peng Wang, Wei Li, and Yi Pan. 2021. Generative adversarial networks: A survey toward private and secure applications. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tânia Carvalho, Nuno Moniz, Pedro Faria, and Luís Antunes. 2022. Survey on Privacy-Preserving Techniques for Data Publishing. CoRR abs/2201.08120 (2022). arXiv:2201.08120 https://arxiv.org/abs/2201.08120Google ScholarGoogle Scholar
  11. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16 (2002), 321--357. https://doi.org/10.1613/jair.953Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ting Chen, Ruixiang Zhang, and Geoffrey E. Hinton. 2022. Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning. CoRR abs/2208.04202 (2022). https://doi.org/10.48550/arXiv.2208.04202 arXiv:2208.04202Google ScholarGoogle ScholarCross RefCross Ref
  13. Edward Choi, Siddharth Biswal, Bradley A. Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. 2017. Generating Multi-label Discrete Patient Records using Generative Adversarial Networks. In Proceedings of the Machine Learning for Health Care Conference, MLHC 2017, Boston, Massachusetts, USA, 18--19 August 2017 (Proceedings of Machine Learning Research, Vol. 68), Finale Doshi-Velez, Jim Fackler, David C. Kale, Rajesh Ranganath, Byron C. Wallace, and Jenna Wiens (Eds.). PMLR, 286--305. http://proceedings.mlr.press/v68/choi17a.htmlGoogle ScholarGoogle Scholar
  14. Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. Computer Vision Foundation / IEEE Computer Society, 8789--8797. https://doi.org/10.1109/CVPR.2018.00916Google ScholarGoogle ScholarCross RefCross Ref
  15. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. CoRR abs/2204.02311 (2022). https://doi.org/10.48550/arXiv.2204.02311 arXiv:2204.02311Google ScholarGoogle ScholarCross RefCross Ref
  16. Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 8780--8794. https://proceedings.neurips.cc/paper/2021/hash/49ad23d1ec9fa4bd8d77d02681df5cfa-Abstract.htmlGoogle ScholarGoogle Scholar
  17. Ju Fan, Tongyu Liu, Guoliang Li, Junyou Chen, Yuwei Shen, and Xiaoyong Du. 2020. Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration. Proc. VLDB Endow. 13, 11 (2020), 1962--1975. http://www.vldb.org/pvldb/vol13/p1962-fan.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liyue Fan. 2020. A survey of differentially private generative adversarial networks. In The AAAI Workshop on Privacy-Preserving Artificial Intelligence, Vol. 8.Google ScholarGoogle Scholar
  19. Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321 (2018), 321--331. https://doi.org/10.1016/j.neucom.2018.09.013Google ScholarGoogle ScholarCross RefCross Ref
  20. Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4 (2010), 14:1--14:53. https://doi.org/10.1145/1749603.1749605Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. 2015. MADE: Masked Autoencoder for Distribution Estimation. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings, Vol. 37), Francis R. Bach and David M. Blei (Eds.). JMLR.org, 881--889. http://proceedings.mlr.press/v37/germain15.htmlGoogle ScholarGoogle Scholar
  22. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 2672--2680. https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alexandros Graikos, Nikolay Malkin, Nebojsa Jojic, and Dimitris Samaras. 2022. Diffusion models as plug-and-play priors. CoRR abs/2206.09012 (2022). https://doi.org/10.48550/arXiv.2206.09012 arXiv:2206.09012Google ScholarGoogle ScholarCross RefCross Ref
  24. Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long Text Generation via Adversarial Training with Leaked Information. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5141--5148. https://doi.org/10.1609/aaai.v32i1.11957Google ScholarGoogle ScholarCross RefCross Ref
  25. Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/6fe43269967adbb64ec6149852b5cc3e-Abstract.htmlGoogle ScholarGoogle Scholar
  26. Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, and Gautam Das. 2020. Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1035--1050. https://doi.org/10.1145/3318464.3389741Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Benjamin Hilprecht and Carsten Binnig. 2021. ReStore - Neural Data Completion for Relational Databases. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 710--722. https://doi.org/10.1145/3448016.3457264Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.htmlGoogle ScholarGoogle Scholar
  29. Yuzheng Hu, Fan Wu, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David A. Forsyth, Bo Li, and Dawn Song. 2023. SoK: Privacy-Preserving Data Synthesis. CoRR abs/2307.02106 (2023). https://doi.org/10.48550/arXiv.2307.02106 arXiv:2307.02106Google ScholarGoogle ScholarCross RefCross Ref
  30. Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. 2023. LayoutDM: Discrete Diffusion Model for Controllable Layout Generation. CoRR abs/2303.08137 (2023). https://doi.org/10.48550/arXiv.2303.08137 arXiv:2303.08137Google ScholarGoogle ScholarCross RefCross Ref
  31. Ali Jahanian, Lucy Chai, and Phillip Isola. 2020. On the "steerability" of generative adversarial networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/forum?id=HylsTT4FvBGoogle ScholarGoogle Scholar
  32. James Jordon, Jinsung Yoon, and Mihaela van der Schaar. 2019. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=S1zk9iRqF7Google ScholarGoogle Scholar
  33. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Hk99zCeAbGoogle ScholarGoogle Scholar
  34. Jayoung Kim, Jinsung Jeon, Jaehoon Lee, Jihyeon Hyeong, and Noseong Park. 2021. OCT-GAN: Neural ODE-based Conditional Tabular GANs. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19--23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 1506--1515. https://doi.org/10.1145/3442381.3449999Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jayoung Kim, Chaejeong Lee, and Noseong Park. 2022. STaSy: Score-based Tabular data Synthesis. CoRR abs/2210.04018 (2022). https://doi.org/10.48550/arXiv.2210.04018 arXiv:2210.04018Google ScholarGoogle ScholarCross RefCross Ref
  36. Jayoung Kim, Chaejeong Lee, Yehjin Shin, Sewon Park, Minjung Kim, Noseong Park, and Jihoon Cho. 2022. SOS: Score-based Oversampling for Tabular Data. In KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, Aidong Zhang and Huzefa Rangwala (Eds.). ACM, 762--772. https://doi.org/10.1145/3534678.3539454Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2022. TabDDPM: Modelling Tabular Data with Diffusion Models. CoRR abs/2209.15421 (2022). https://doi.org/10.48550/arXiv.2209.15421 arXiv:2209.15421Google ScholarGoogle ScholarCross RefCross Ref
  38. Jaehoon Lee, Jihyeon Hyeong, Jinsung Jeon, Noseong Park, and Jihoon Cho. 2021. Invertible Tabular GANs: Killing Two Birds with One Stone for Tabular Data Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 4263--4273. https://proceedings.neurips.cc/paper/2021/hash/22456f4b545572855c766df5eefc9832-Abstract.htmlGoogle ScholarGoogle Scholar
  39. Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 1192--1202. https://doi.org/10.18653/v1/d16--1127Google ScholarGoogle ScholarCross RefCross Ref
  40. Xiang Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. 2022. Diffusion-LM Improves Controllable Text Generation. In NeurIPS. http://papers.nips.cc/paper_files/paper/2022/hash/1be5bc25d50895ee656b8c2d9eb89d6a-Abstract-Conference.htmlGoogle ScholarGoogle Scholar
  41. Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, and Jianfeng Gao. 2020. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX (Lecture Notes in Computer Science, Vol. 12375), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 121--137. https://doi.org/10.1007/978--3-030--58577--8_8Google ScholarGoogle ScholarCross RefCross Ref
  42. Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. 2022. Diffusion-LM Improves Controllable Text Generation. CoRR abs/2205.14217 (2022). https://doi.org/10.48550/arXiv.2205.14217 arXiv:2205.14217Google ScholarGoogle ScholarCross RefCross Ref
  43. Tongyu Liu, Ju Fan, Yinqing Luo, Nan Tang, Guoliang Li, and Xiaoyong Du. 2021. Adaptive Data Augmentation for Supervised Learning over Missing Data. Proc. VLDB Endow. 14, 7 (2021), 1202--1214. https://doi.org/10.14778/3450980.3450989Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. 2021. More Control for Free! Image Synthesis with Semantic Diffusion Guidance. CoRR abs/2112.05744 (2021). arXiv:2112.05744 https://arxiv.org/abs/2112.05744Google ScholarGoogle Scholar
  45. Chao Ma, Sebastian Tschiatschek, Richard Turner, José Miguel Hernández-Lobato, and Cheng Zhang. 2020. VAEM: a deep generative model for heterogeneous mixed type data. Advances in Neural Information Processing Systems 33 (2020), 11237--11247.Google ScholarGoogle Scholar
  46. Abdul Majeed and Sungchang Lee. 2021. Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey. IEEE Access 9 (2021), 8512--8545. https://doi.org/10.1109/ACCESS.2020.3045700Google ScholarGoogle ScholarCross RefCross Ref
  47. Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014). arXiv:1411.1784 http://arxiv.org/abs/1411.1784Google ScholarGoogle Scholar
  48. Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, and Jason Yosinski. 2017. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. IEEE Computer Society, 3510--3520. https://doi.org/10.1109/CVPR.2017.374Google ScholarGoogle ScholarCross RefCross Ref
  49. Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8162--8171. http://proceedings.mlr.press/v139/nichol21a.htmlGoogle ScholarGoogle Scholar
  50. Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hongkyu Park, and Youngmin Kim. 2018. Data Synthesis based on Generative Adversarial Networks. Proc. VLDB Endow. 11, 10 (2018), 1071--1083. https://doi.org/10.14778/3231751.3231757Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation / IEEE, 2337--2346. https://doi.org/10.1109/CVPR.2019.00244Google ScholarGoogle ScholarCross RefCross Ref
  52. Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748--8763. http://proceedings.mlr.press/v139/radford21a.htmlGoogle ScholarGoogle Scholar
  53. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1511.06434Google ScholarGoogle Scholar
  54. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278--1286.Google ScholarGoogle Scholar
  55. Cemal Okan Sakar, Suleyman Olcay Polat, Mete Katircioglu, and Yomi Kastro. 2019. Real-time prediction of online shoppers' purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput. Appl. 31, 10 (2019), 6893--6908. https://doi.org/10.1007/s00521-018--3523-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning a Generative Model From a Single Natural Image. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 4569--4579. https://doi.org/10.1109/ICCV.2019.00467Google ScholarGoogle ScholarCross RefCross Ref
  57. Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings, Vol. 37), Francis R. Bach and David M. Blei (Eds.). JMLR.org, 2256--2265. http://proceedings.mlr.press/v37/sohl-dickstein15.htmlGoogle ScholarGoogle Scholar
  58. Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 1849--1857. https://proceedings.neurips.cc/paper/2016/hash/6b180037abbebea991d8b1232f8a8ca9-Abstract.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=PxTIG12RRHSGoogle ScholarGoogle Scholar
  60. Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Samuel Madden, and Mourad Ouzzani. 2021. RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation. Proc. VLDB Endow. 14, 8 (2021), 1254--1261. https://doi.org/10.14778/3457390.3457391Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mihaela van der Schaar. 2021. DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 22221--22233. https://proceedings.neurips.cc/paper/2021/hash/ba9fab001f67381e56e410575874d967-Abstract.htmlGoogle ScholarGoogle Scholar
  62. L Vivek Harsha Vardhan and Stanley Kok. 2020. Generating privacy-preserving synthetic tabular data using oblivious variational autoencoders. In Proceedings of the Workshop on Economics of Privacy and Data Labor at the 37 th International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  63. Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, and Jiayu Zhou. 2018. Differentially Private Generative Adversarial Network. CoRR abs/1802.06739 (2018). arXiv:1802.06739 http://arxiv.org/abs/1802.06739Google ScholarGoogle Scholar
  64. Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 7333--7343. https://proceedings.neurips.cc/paper/2019/hash/254ed7d2de3b23ab10936522dd547b78-Abstract.htmlGoogle ScholarGoogle Scholar
  65. Jingyi Yang, Peizhi Wu, Gao Cong, Tieying Zhang, and Xiao He. 2022. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 1542--1555. https://doi.org/10.1145/3514221.3526168Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. 13, 3 (2019), 279--292. https://doi.org/10.14778/3368289.3368294Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco California, USA, Satinder Singh and Shaul Markovitch (Eds.). AAAI Press, 2852--2858. https://doi.org/10.1609/aaai.v31i1.10804Google ScholarGoogle ScholarCross RefCross Ref
  68. Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. IEEE Computer Society, 5908--5916. https://doi.org/10.1109/ICCV.2017.629Google ScholarGoogle ScholarCross RefCross Ref
  69. Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2014. PrivBayes: private data release via bayesian networks. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu (Eds.). ACM, 1423--1434. https://doi.org/10.1145/2588555.2588573Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, and Marco Pavone. 2022. Guided Conditional Diffusion for Controllable Traffic Simulation. CoRR abs/2210.17366 (2022). https://doi.org/10.48550/arXiv.2210.17366 arXiv:2210.17366Google ScholarGoogle ScholarCross RefCross Ref
  71. Bin Zhou, Jian Pei, and Wo-Shun Luk. 2008. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor. 10, 2 (2008), 12--22. https://doi.org/10.1145/1540276.1540279Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. IEEE Computer Society, 2242--2251. https://doi.org/10.1109/ICCV.2017.244Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Controllable Tabular Data Synthesis Using Diffusion Models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 2, Issue 1
      PACMMOD
      February 2024
      1874 pages
      EISSN:2836-6573
      DOI:10.1145/3654807
      Issue’s Table of Contents

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 March 2024
      Published in pacmmod Volume 2, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)181
      • Downloads (Last 6 weeks)132

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader