Skip to main content

Data Driven Generation of Synthetic Data with Support Vector Data Description

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10439))

Included in the following conference series:

  • 1095 Accesses

Abstract

We propose a method to generate synthetic data by Support Vector Data Description. Support Vector Data Description is a variant of Support Vector Machine for one-class classification problem. Our method assumes that an observed data is a sample of a random variable which satisfies an unknown membership decision function. The unknown membership decision function is to be learned by Support Vector Data Description based on the training data.

By using the learned membership decision function, we perform rejection sampling. Firstly, we generate a random data point. Secondly, we test the data point against the membership decision function. Lastly, if the data point fails the test, we repeat from the first step.

However, in some cases, the rejection sampling approach runs slowly. Therefore, we also propose another approach. The approach works by using a heuristic to find a good starting point and then performs gradient descent to gradually move the data point into inside the positive region boundary while maintaining randomness of the generated data. This approach runs noticeably faster than rejection sampling when rejection sampling runs slowly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Cozman, F.G., Cohen, I., Cirelo, M.C.: Semi-supervised learning of mixture models. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 99–106 (2003). http://www.aaai.org/Papers/ICML/2003/ICML03-016.pdf

  3. Eshky, A., Allison, B., Ramamoorthy, S., Steedman, M.: A generative model for user simulation in a spatial navigation domain. In: EACL, pp. 626–635 (2014)

    Google Scholar 

  4. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  5. Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013). http://arxiv.org/abs/1308.0850

  6. Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: A recurrent neural network for image generation. In: Blei, D., Bach, F. (eds.) Proceedings of the 32nd International Conference on Machine Learning (ICML-15), JMLR Workshop and Conference Proceedings, pp. 1462–1471 (2015). http://jmlr.org/proceedings/papers/v37/gregor15.pdf

  7. Kingma, D.P., Mohamed, S., Jimenez Rezende, D., Welling, M.: Semi-supervised learning with deep generative models. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3581–3589. Curran Associates Inc. (2014). http://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models.pdf

  8. Liu, Y.H., Lin, S.H., Hsueh, Y.L., Lee, M.J.: Automatic target defect identification for tft-lcd array process inspection using kernel fcm-based fuzzy SVDD ensemble. Exper. Syst. with Appl. 36(2, Part I), 1978–1998 (2009). http://www.sciencedirect.com/science/article/pii/S0957417407006240

    Article  Google Scholar 

  9. Luo, H., Wang, Y., Cui, J.: A SVDD approach of fuzzy classification for analog circuit fault diagnosis with FWT as preprocessor. Expert Syst. with Appl. 38(8), 10554–10561 (2011). http://www.sciencedirect.com/science/article/pii/S0957417411002934

    Article  Google Scholar 

  10. McGrath, R., Pozdnukhov, A.: A generative model of urban activities: simulating a population. In: Proceedings of the 3rd ACM SIGKDD International Workshop on Urban Computing (2014)

    Google Scholar 

  11. Perry, G.L.W., Wainwright, J., Etherington, T.R., Wilmshurst, J.M.: Experimental simulation: Using generative modeling and palaeoecological data to understand human-environment interactions. Front. in Ecol. Evol. 4, 109 (2016). http://journal.frontiersin.org/article/10.3389/fevo.2016.00109

    Article  Google Scholar 

  12. Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014). http://www.sciencedirect.com/science/article/pii/S016516841300515X

    Article  Google Scholar 

  13. Sakla, W., Chan, A., Ji, J., Sakla, A.: An svdd-based algorithm for target detection in hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 8(2), 384–388 (2011)

    Article  Google Scholar 

  14. Sanchez-Hernandez, C., Boyd, D.S., Foody, G.M.: One-class classification for mapping a specific land-cover class: Svdd classification of fenland. IEEE Trans. Geosci. Remote Sens. 45(4), 1061–1073 (2007)

    Article  Google Scholar 

  15. Smolensky, P.: Information Processing in Dynamical Systems: Foundations of Harmony Theory. Technical report, DTIC Document (1986)

    Google Scholar 

  16. Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004). http://dx.doi.org/10.1023/B:MACH.0000008084.60811.49

  17. Yunus, F., Dandekar, A., Bressan, S.: Data driven generation of synthetic data with support vector data description. Technical Report TRA6/17, National University of Singapore. https://dl.comp.nus.edu.sg/jspui/handle/1900.100/6428

  18. Zhao, Y., Wang, S., Xiao, F.: Pattern recognition-based chillers fault detection method using support vector data description (svdd). Appl.D Energ. 112, 1041–1048 (2013). http://www.sciencedirect.com/science/article/pii/S0306261912009348

    Article  Google Scholar 

  19. Zheng, A., Goldenberg, A.: A generative model for dynamic contextual friendship networks. Technical Report, Machine Learning Department. Carnegie Mellon University (2006). http://ra.adm.cs.cmu.edu/anon/usr/ftp/anon/ml/CMU-ML-06-107.pdf

Download references

Acknowledgement

This work is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fajrian Yunus .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Yunus, F., Dandekar, A., Bressan, S. (2017). Data Driven Generation of Synthetic Data with Support Vector Data Description. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64471-4_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64470-7

  • Online ISBN: 978-3-319-64471-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics