research-article

Controllable Tabular Data Synthesis Using Diffusion Models

Authors:
Tongyu Liu

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China

0009-0003-8326-8351
View Profile

,
Ju Fan

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China

0000-0003-4729-9903
View Profile

,
Nan Tang

HKUST (GZ), Guangzhou, China

HKUST (GZ), Guangzhou, China

0000-0003-2832-0295
View Profile

,
Guoliang Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0002-1398-0621
View Profile

,
Xiaoyong Du

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China

0000-0002-5757-9135
View Profile

Proceedings of the ACM on Management of Data Volume 2 Issue 1Article No.: 28pp 1–29https://doi.org/10.1145/3639283

Published:26 March 2024Publication History

Proceedings of the ACM on Management of Data

Abstract

Controllable tabular data synthesis plays a crucial role in numerous applications by allowing users to generate synthetic data with specific conditions. These conditions can include synthesizing tuples with predefined attribute values or creating tuples that exhibit a particular correlation with an external table. However, existing approaches lack the flexibility to support new conditions and can be time-consuming when dealing with multiple conditions. To overcome these limitations, we propose a novel approach that leverages diffusion models to first learn an unconditional generative model. Subsequently, we introduce lightweight controllers to guide the unconditional generative model in generating synthetic data that satisfies different conditions. The primary research challenge lies in effectively supporting controllability using lightweight solutions while ensuring the realism of the synthetic data. To address this challenge, we design an unconditional diffusion model tailored specifically for tabular data. Additionally, we propose a new sampling method that enables correlation-aware controls throughout the data generation process. We conducted extensive experiments across various applications for controllable tabular data synthesis, which show that our approach outperforms the state-of-the-art methods.

References

[n. d.]. Airbnb Data Set. https://public.opendatasoft.com/explore/dataset/airbnb-listings.Google Scholar
[n. d.]. Default Data Set. https://archive.ics.uci.edu/dataset/350/defaultofcreditcardclients.Google Scholar
[n. d.]. Heart Data Set. https://www.openml.org/data/download/6358/BNG_heart-statlog.arff.Google Scholar
[n. d.]. Imdb Data Set. http://homepages.cwi.nl/~boncz/job/imdb.tgz.Google Scholar
[n. d.]. WeatherAUS Data Set. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.Google Scholar
Rameen Abdal, Peihao Zhu, Niloy J. Mitra, and Peter Wonka. 2021. StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. ACM Trans. Graph. 40, 3 (2021), 21:1--21:21. https://doi.org/10.1145/3447648Google ScholarDigital Library
Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2021. Deep Neural Networks and Tabular Data: A Survey. CoRR abs/2110.01889 (2021). arXiv:2110.01889 https://arxiv.org/abs/2110.01889Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.htmlGoogle Scholar
Zhipeng Cai, Zuobin Xiong, Honghui Xu, Peng Wang, Wei Li, and Yi Pan. 2021. Generative adversarial networks: A survey toward private and secure applications. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--38.Google ScholarDigital Library
Tânia Carvalho, Nuno Moniz, Pedro Faria, and Luís Antunes. 2022. Survey on Privacy-Preserving Techniques for Data Publishing. CoRR abs/2201.08120 (2022). arXiv:2201.08120 https://arxiv.org/abs/2201.08120Google Scholar
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16 (2002), 321--357. https://doi.org/10.1613/jair.953Google ScholarDigital Library
Ting Chen, Ruixiang Zhang, and Geoffrey E. Hinton. 2022. Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning. CoRR abs/2208.04202 (2022). https://doi.org/10.48550/arXiv.2208.04202 arXiv:2208.04202Google ScholarCross Ref
Edward Choi, Siddharth Biswal, Bradley A. Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. 2017. Generating Multi-label Discrete Patient Records using Generative Adversarial Networks. In Proceedings of the Machine Learning for Health Care Conference, MLHC 2017, Boston, Massachusetts, USA, 18--19 August 2017 (Proceedings of Machine Learning Research, Vol. 68), Finale Doshi-Velez, Jim Fackler, David C. Kale, Rajesh Ranganath, Byron C. Wallace, and Jenna Wiens (Eds.). PMLR, 286--305. http://proceedings.mlr.press/v68/choi17a.htmlGoogle Scholar
Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. Computer Vision Foundation / IEEE Computer Society, 8789--8797. https://doi.org/10.1109/CVPR.2018.00916Google ScholarCross Ref
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. CoRR abs/2204.02311 (2022). https://doi.org/10.48550/arXiv.2204.02311 arXiv:2204.02311Google ScholarCross Ref
Prafulla Dhariwal and Alexander Quinn Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 8780--8794. https://proceedings.neurips.cc/paper/2021/hash/49ad23d1ec9fa4bd8d77d02681df5cfa-Abstract.htmlGoogle Scholar
Ju Fan, Tongyu Liu, Guoliang Li, Junyou Chen, Yuwei Shen, and Xiaoyong Du. 2020. Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration. Proc. VLDB Endow. 13, 11 (2020), 1962--1975. http://www.vldb.org/pvldb/vol13/p1962-fan.pdfGoogle ScholarDigital Library
Liyue Fan. 2020. A survey of differentially private generative adversarial networks. In The AAAI Workshop on Privacy-Preserving Artificial Intelligence, Vol. 8.Google Scholar
Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321 (2018), 321--331. https://doi.org/10.1016/j.neucom.2018.09.013Google ScholarCross Ref
Benjamin C. M. Fung, Ke Wang, Rui Chen, and Philip S. Yu. 2010. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 42, 4 (2010), 14:1--14:53. https://doi.org/10.1145/1749603.1749605Google ScholarDigital Library
Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. 2015. MADE: Masked Autoencoder for Distribution Estimation. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings, Vol. 37), Francis R. Bach and David M. Blei (Eds.). JMLR.org, 881--889. http://proceedings.mlr.press/v37/germain15.htmlGoogle Scholar
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 2672--2680. https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.htmlGoogle ScholarDigital Library
Alexandros Graikos, Nikolay Malkin, Nebojsa Jojic, and Dimitris Samaras. 2022. Diffusion models as plug-and-play priors. CoRR abs/2206.09012 (2022). https://doi.org/10.48550/arXiv.2206.09012 arXiv:2206.09012Google ScholarCross Ref
Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Long Text Generation via Adversarial Training with Leaked Information. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5141--5148. https://doi.org/10.1609/aaai.v32i1.11957Google ScholarCross Ref
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/6fe43269967adbb64ec6149852b5cc3e-Abstract.htmlGoogle Scholar
Shohedul Hasan, Saravanan Thirumuruganathan, Jees Augustine, Nick Koudas, and Gautam Das. 2020. Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1035--1050. https://doi.org/10.1145/3318464.3389741Google ScholarDigital Library
Benjamin Hilprecht and Carsten Binnig. 2021. ReStore - Neural Data Completion for Relational Databases. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 710--722. https://doi.org/10.1145/3448016.3457264Google ScholarDigital Library
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.htmlGoogle Scholar
Yuzheng Hu, Fan Wu, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David A. Forsyth, Bo Li, and Dawn Song. 2023. SoK: Privacy-Preserving Data Synthesis. CoRR abs/2307.02106 (2023). https://doi.org/10.48550/arXiv.2307.02106 arXiv:2307.02106Google ScholarCross Ref
Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, and Kota Yamaguchi. 2023. LayoutDM: Discrete Diffusion Model for Controllable Layout Generation. CoRR abs/2303.08137 (2023). https://doi.org/10.48550/arXiv.2303.08137 arXiv:2303.08137Google ScholarCross Ref
Ali Jahanian, Lucy Chai, and Phillip Isola. 2020. On the "steerability" of generative adversarial networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/forum?id=HylsTT4FvBGoogle Scholar
James Jordon, Jinsung Yoon, and Mihaela van der Schaar. 2019. PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=S1zk9iRqF7Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive Growing of GANs for Improved Quality, Stability, and Variation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Hk99zCeAbGoogle Scholar
Jayoung Kim, Jinsung Jeon, Jaehoon Lee, Jihyeon Hyeong, and Noseong Park. 2021. OCT-GAN: Neural ODE-based Conditional Tabular GANs. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19--23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 1506--1515. https://doi.org/10.1145/3442381.3449999Google ScholarDigital Library
Jayoung Kim, Chaejeong Lee, and Noseong Park. 2022. STaSy: Score-based Tabular data Synthesis. CoRR abs/2210.04018 (2022). https://doi.org/10.48550/arXiv.2210.04018 arXiv:2210.04018Google ScholarCross Ref
Jayoung Kim, Chaejeong Lee, Yehjin Shin, Sewon Park, Minjung Kim, Noseong Park, and Jihoon Cho. 2022. SOS: Score-based Oversampling for Tabular Data. In KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, Aidong Zhang and Huzefa Rangwala (Eds.). ACM, 762--772. https://doi.org/10.1145/3534678.3539454Google ScholarDigital Library
Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2022. TabDDPM: Modelling Tabular Data with Diffusion Models. CoRR abs/2209.15421 (2022). https://doi.org/10.48550/arXiv.2209.15421 arXiv:2209.15421Google ScholarCross Ref
Jaehoon Lee, Jihyeon Hyeong, Jinsung Jeon, Noseong Park, and Jihoon Cho. 2021. Invertible Tabular GANs: Killing Two Birds with One Stone for Tabular Data Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 4263--4273. https://proceedings.neurips.cc/paper/2021/hash/22456f4b545572855c766df5eefc9832-Abstract.htmlGoogle Scholar
Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 1192--1202. https://doi.org/10.18653/v1/d16--1127Google ScholarCross Ref
Xiang Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. 2022. Diffusion-LM Improves Controllable Text Generation. In NeurIPS. http://papers.nips.cc/paper_files/paper/2022/hash/1be5bc25d50895ee656b8c2d9eb89d6a-Abstract-Conference.htmlGoogle Scholar
Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, and Jianfeng Gao. 2020. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX (Lecture Notes in Computer Science, Vol. 12375), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 121--137. https://doi.org/10.1007/978--3-030--58577--8_8Google ScholarCross Ref
Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. 2022. Diffusion-LM Improves Controllable Text Generation. CoRR abs/2205.14217 (2022). https://doi.org/10.48550/arXiv.2205.14217 arXiv:2205.14217Google ScholarCross Ref
Tongyu Liu, Ju Fan, Yinqing Luo, Nan Tang, Guoliang Li, and Xiaoyong Du. 2021. Adaptive Data Augmentation for Supervised Learning over Missing Data. Proc. VLDB Endow. 14, 7 (2021), 1202--1214. https://doi.org/10.14778/3450980.3450989Google ScholarDigital Library
Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, and Trevor Darrell. 2021. More Control for Free! Image Synthesis with Semantic Diffusion Guidance. CoRR abs/2112.05744 (2021). arXiv:2112.05744 https://arxiv.org/abs/2112.05744Google Scholar
Chao Ma, Sebastian Tschiatschek, Richard Turner, José Miguel Hernández-Lobato, and Cheng Zhang. 2020. VAEM: a deep generative model for heterogeneous mixed type data. Advances in Neural Information Processing Systems 33 (2020), 11237--11247.Google Scholar
Abdul Majeed and Sungchang Lee. 2021. Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey. IEEE Access 9 (2021), 8512--8545. https://doi.org/10.1109/ACCESS.2020.3045700Google ScholarCross Ref
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR abs/1411.1784 (2014). arXiv:1411.1784 http://arxiv.org/abs/1411.1784Google Scholar
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, and Jason Yosinski. 2017. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. IEEE Computer Society, 3510--3520. https://doi.org/10.1109/CVPR.2017.374Google ScholarCross Ref
Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8162--8171. http://proceedings.mlr.press/v139/nichol21a.htmlGoogle Scholar
Noseong Park, Mahmoud Mohammadi, Kshitij Gorde, Sushil Jajodia, Hongkyu Park, and Youngmin Kim. 2018. Data Synthesis based on Generative Adversarial Networks. Proc. VLDB Endow. 11, 10 (2018), 1071--1083. https://doi.org/10.14778/3231751.3231757Google ScholarDigital Library
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis With Spatially-Adaptive Normalization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation / IEEE, 2337--2346. https://doi.org/10.1109/CVPR.2019.00244Google ScholarCross Ref
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748--8763. http://proceedings.mlr.press/v139/radford21a.htmlGoogle Scholar
Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2--4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1511.06434Google Scholar
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278--1286.Google Scholar
Cemal Okan Sakar, Suleyman Olcay Polat, Mete Katircioglu, and Yomi Kastro. 2019. Real-time prediction of online shoppers' purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput. Appl. 31, 10 (2019), 6893--6908. https://doi.org/10.1007/s00521-018--3523-0Google ScholarDigital Library
Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. SinGAN: Learning a Generative Model From a Single Natural Image. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 4569--4579. https://doi.org/10.1109/ICCV.2019.00467Google ScholarCross Ref
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 (JMLR Workshop and Conference Proceedings, Vol. 37), Francis R. Bach and David M. Blei (Eds.). JMLR.org, 2256--2265. http://proceedings.mlr.press/v37/sohl-dickstein15.htmlGoogle Scholar
Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 1849--1857. https://proceedings.neurips.cc/paper/2016/hash/6b180037abbebea991d8b1232f8a8ca9-Abstract.htmlGoogle ScholarDigital Library
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-Based Generative Modeling through Stochastic Differential Equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=PxTIG12RRHSGoogle Scholar
Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Samuel Madden, and Mourad Ouzzani. 2021. RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation. Proc. VLDB Endow. 14, 8 (2021), 1254--1261. https://doi.org/10.14778/3457390.3457391Google ScholarDigital Library
Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mihaela van der Schaar. 2021. DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 22221--22233. https://proceedings.neurips.cc/paper/2021/hash/ba9fab001f67381e56e410575874d967-Abstract.htmlGoogle Scholar
L Vivek Harsha Vardhan and Stanley Kok. 2020. Generating privacy-preserving synthetic tabular data using oblivious variational autoencoders. In Proceedings of the Workshop on Economics of Privacy and Data Labor at the 37 th International Conference on Machine Learning (ICML).Google Scholar
Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, and Jiayu Zhou. 2018. Differentially Private Generative Adversarial Network. CoRR abs/1802.06739 (2018). arXiv:1802.06739 http://arxiv.org/abs/1802.06739Google Scholar
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 7333--7343. https://proceedings.neurips.cc/paper/2019/hash/254ed7d2de3b23ab10936522dd547b78-Abstract.htmlGoogle Scholar
Jingyi Yang, Peizhi Wu, Gao Cong, Tieying Zhang, and Xiao He. 2022. SAM: Database Generation from Query Workloads with Supervised Autoregressive Models. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 1542--1555. https://doi.org/10.1145/3514221.3526168Google ScholarDigital Library
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. 13, 3 (2019), 279--292. https://doi.org/10.14778/3368289.3368294Google ScholarDigital Library
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco California, USA, Satinder Singh and Shaul Markovitch (Eds.). AAAI Press, 2852--2858. https://doi.org/10.1609/aaai.v31i1.10804Google ScholarCross Ref
Han Zhang, Tao Xu, and Hongsheng Li. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. IEEE Computer Society, 5908--5916. https://doi.org/10.1109/ICCV.2017.629Google ScholarCross Ref
Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2014. PrivBayes: private data release via bayesian networks. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, Curtis E. Dyreson, Feifei Li, and M. Tamer Özsu (Eds.). ACM, 1423--1434. https://doi.org/10.1145/2588555.2588573Google ScholarDigital Library
Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, and Marco Pavone. 2022. Guided Conditional Diffusion for Controllable Traffic Simulation. CoRR abs/2210.17366 (2022). https://doi.org/10.48550/arXiv.2210.17366 arXiv:2210.17366Google ScholarCross Ref
Bin Zhou, Jian Pei, and Wo-Shun Luk. 2008. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor. 10, 2 (2008), 12--22. https://doi.org/10.1145/1540276.1540279Google ScholarDigital Library
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. IEEE Computer Society, 2242--2251. https://doi.org/10.1109/ICCV.2017.244Google ScholarCross Ref

Index Terms

Controllable Tabular Data Synthesis Using Diffusion Models
1. Information systems
  1. Data management systems

Recommendations

Tabular data synthesis with generative adversarial networks: design space and optimizations
Abstract
The proliferation of big data has brought an urgent demand for privacy-preserving data publishing. Traditional solutions to this demand have limitations on effectively balancing the trade-off between privacy and utility of the released data. To ...
Read More
An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Tabular data typically contains private and important information; thus, precautions must be taken before they are shared with others. Although several methods (e.g., differential privacy and k-anonymity) have been proposed to prevent information ...
Read More
HDL presynthesis optimizations using a tabular model

In this paper, we introduce presynthesis optimizations on hardware description languages (HDLs). Presynthesis optimizations consist of two categories of tasks: 1) source-level transformations, which produce optimized behavioral HDL descriptions that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 2, Issue 1
PACMMOD
February 2024
1874 pages
EISSN:2836-6573
DOI:10.1145/3654807
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2024
Published in pacmmod Volume 2, Issue 1

Permissions
Request permissions about this article.
Request Permissions
Author Tags
controllable data synthesis
diffusion model
tabular data synthesis
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 181
  Total Downloads
- Downloads (Last 12 months)181
- Downloads (Last 6 weeks)132
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Controllable Tabular Data Synthesis Using Diffusion Models

Proceedings of the ACM on Management of Data

Abstract

References

Cited By

Index Terms

Recommendations

Tabular data synthesis with generative adversarial networks: design space and optimizations

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

HDL presynthesis optimizations using a tabular model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Controllable Tabular Data Synthesis Using Diffusion Models

Proceedings of the ACM on Management of Data

Abstract

References

Cited By

Index Terms

Recommendations

Tabular data synthesis with generative adversarial networks: design space and optimizations

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

HDL presynthesis optimizations using a tabular model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media