Abstract
We propose a method to generate synthetic data by Support Vector Data Description. Support Vector Data Description is a variant of Support Vector Machine for one-class classification problem. Our method assumes that an observed data is a sample of a random variable which satisfies an unknown membership decision function. The unknown membership decision function is to be learned by Support Vector Data Description based on the training data.
By using the learned membership decision function, we perform rejection sampling. Firstly, we generate a random data point. Secondly, we test the data point against the membership decision function. Lastly, if the data point fails the test, we repeat from the first step.
However, in some cases, the rejection sampling approach runs slowly. Therefore, we also propose another approach. The approach works by using a heuristic to find a good starting point and then performs gradient descent to gradually move the data point into inside the positive region boundary while maintaining randomness of the generated data. This approach runs noticeably faster than rejection sampling when rejection sampling runs slowly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cozman, F.G., Cohen, I., Cirelo, M.C.: Semi-supervised learning of mixture models. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 99–106 (2003). http://www.aaai.org/Papers/ICML/2003/ICML03-016.pdf
Eshky, A., Allison, B., Ramamoorthy, S., Steedman, M.: A generative model for user simulation in a spatial navigation domain. In: EACL, pp. 626–635 (2014)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013). http://arxiv.org/abs/1308.0850
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: A recurrent neural network for image generation. In: Blei, D., Bach, F. (eds.) Proceedings of the 32nd International Conference on Machine Learning (ICML-15), JMLR Workshop and Conference Proceedings, pp. 1462–1471 (2015). http://jmlr.org/proceedings/papers/v37/gregor15.pdf
Kingma, D.P., Mohamed, S., Jimenez Rezende, D., Welling, M.: Semi-supervised learning with deep generative models. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3581–3589. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models.pdf
Liu, Y.H., Lin, S.H., Hsueh, Y.L., Lee, M.J.: Automatic target defect identification for tft-lcd array process inspection using kernel fcm-based fuzzy SVDD ensemble. Exper. Syst. with Appl. 36(2, Part I), 1978–1998 (2009). http://www.sciencedirect.com/science/article/pii/S0957417407006240
Luo, H., Wang, Y., Cui, J.: A SVDD approach of fuzzy classification for analog circuit fault diagnosis with FWT as preprocessor. Expert Syst. with Appl. 38(8), 10554–10561 (2011). http://www.sciencedirect.com/science/article/pii/S0957417411002934
McGrath, R., Pozdnukhov, A.: A generative model of urban activities: simulating a population. In: Proceedings of the 3rd ACM SIGKDD International Workshop on Urban Computing (2014)
Perry, G.L.W., Wainwright, J., Etherington, T.R., Wilmshurst, J.M.: Experimental simulation: Using generative modeling and palaeoecological data to understand human-environment interactions. Front. in Ecol. Evol. 4, 109 (2016). http://journal.frontiersin.org/article/10.3389/fevo.2016.00109
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014). http://www.sciencedirect.com/science/article/pii/S016516841300515X
Sakla, W., Chan, A., Ji, J., Sakla, A.: An svdd-based algorithm for target detection in hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 8(2), 384–388 (2011)
Sanchez-Hernandez, C., Boyd, D.S., Foody, G.M.: One-class classification for mapping a specific land-cover class: Svdd classification of fenland. IEEE Trans. Geosci. Remote Sens. 45(4), 1061–1073 (2007)
Smolensky, P.: Information Processing in Dynamical Systems: Foundations of Harmony Theory. Technical report, DTIC Document (1986)
Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004). http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49
Yunus, F., Dandekar, A., Bressan, S.: Data driven generation of synthetic data with support vector data description. Technical Report TRA6/17, National University of Singapore. https://dl.comp.nus.edu.sg/jspui/handle/1900.100/6428
Zhao, Y., Wang, S., Xiao, F.: Pattern recognition-based chillers fault detection method using support vector data description (svdd). Appl.D Energ. 112, 1041–1048 (2013). http://www.sciencedirect.com/science/article/pii/S0306261912009348
Zheng, A., Goldenberg, A.: A generative model for dynamic contextual friendship networks. Technical Report, Machine Learning Department. Carnegie Mellon University (2006). http://ra.adm.cs.cmu.edu/anon/usr/ftp/anon/ml/CMU-ML-06-107.pdf
Acknowledgement
This work is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Yunus, F., Dandekar, A., Bressan, S. (2017). Data Driven Generation of Synthetic Data with Support Vector Data Description. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-64471-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64470-7
Online ISBN: 978-3-319-64471-4
eBook Packages: Computer ScienceComputer Science (R0)