CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

Agrawal, Rishabh; Kothari, Paridhi

doi:10.1007/978-3-030-37599-7_26

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1892 Accesses

Abstract

An efficient data augmentation algorithm generates samples that improves accuracy and robustness of training models. Augmentation with informative samples imparts meaning to the augmented data set. In this paper, we propose CoPASample (Covariance Preserving Algorithm for generating Samples), a data augmentation algorithm that generates samples which reflects the first and second order statistical information of the data set, thereby augmenting the data set in a manner that preserves the total covariance of the data. To address the issue of exponential computations in the generation of points for augmentation, we formulate an optimisation problem motivated by the approach used in $\nu $-SVR to iteratively compute a heuristics based optimal set of points for augmentation in polynomial time. Experimental results for several data sets and comparisons with other data augmentation algorithms validate the potential of our proposed algorithm.

R. Agrawal and P. Kothari—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving efficiency of data augmentation algorithms using Peskun’s theorem

Article 19 November 2015

Statistical Data Generation Using Sample Data

Limitations of data augmentation and outlook

References

Wang, C., Principe, J.C.: Training neural networks with additive noise in the desired signal. In: The Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, pp. 1084–1089 (1998)
Google Scholar
Brown, W.M., Gedeon, T.D., Groves, D.I.: Use of noise to augment training data: a neural network method of mineral potential mapping in regions of limited known deposit examples. Nat. Resour. Res. 12(2), 141–152 (2003)
Article Google Scholar
Karystinos, G.N., Pados, D.A.: On overfitting, generalization, and randomly expanded training sets. IEEE Trans. Neural Netw. 11(5), 1050–1057 (2000)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: The Proceedings of the 2008 IEEE International Joint Conference on Neural Networks, pp. 1322–1328 (2008)
Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Chen, S., He, H., Garcia, E.A.: RAMOBoost: ranked minority oversampling in boosting. IEEE Trans. Neural Netw. 21(10), 1624–1642 (2010)
Article Google Scholar
Polson, N.G., Scott, S.L.: Data augmentation for support vector machines. Bayesian Anal. 6(1), 1–23 (2011)
Article MathSciNet Google Scholar
Meng, X.L., van Dyk, D.A.: Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86(2), 301–320 (1999)
Article MathSciNet Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap, 1st edn. Chapman & Hall/CRC, Boca Raton (1993)
Book Google Scholar
Ivănescu, V.C., Bertrand, J.W.M., Fransoo, J.C., Kleijnen, J.P.C.: Bootstrapping to solve the limited data problem in production control: an application in batch process industries. J. Oper. Res. Soc. 57(1), 2–9 (2006)
Article Google Scholar
Tsai, T.I., Li, D.C.: Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Syst. Appl. 35(3), 1293–1300 (2008)
Article Google Scholar
Jayadeva, Soman, S., Saxena, S.: EigenSample: a non-iterative technique for adding samples to small datasets. Appl. Soft Comput. 70, 1064–1077 (2018)
Google Scholar
Pearson, K.: On lines and planes of closest fit to systems of points in space. London Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Article Google Scholar
David, C.C., Jacobs, D.J.: Principal component analysis: a method for determining the essential dynamics of proteins. Methods Mol. Biol. 1084, 193–226 (2014)
Article Google Scholar
van Nieuwenburg, E.P.L., Liu, Y.H., Huber, S.D.: Learning phase transitions by confusion. Nat. Phys. 13, 435–439 (2017)
Article Google Scholar
Yang, M.H., Kriegman, D.J., Ahuja, N.: Detecting faces in images: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(1), 34–58 (2002)
Article Google Scholar
Khan, N.M., Ksantini, R., Ahmad, I.S., Guan, L.: Covariance-guided one-class support vector machine. Pattern Recogn. 47(6), 2165–2177 (2014)
Article Google Scholar
Ottersten, B., Stoica, P., Roy, R.: Covariance matching estimation techniques for array signal processing applications. Digit. Signal Proc. 8(3), 185–210 (1998)
Article Google Scholar
Alqallah, F.A., Konis, K.P., Martin, R.D., Zamar, R.H.: Scalable robust covariance and correlation estimates for data mining. In: The Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 14–23 (2002)
Google Scholar
Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)
Article Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2019). http://archive.ics.uci.edu/ml
SMOTE MATLAB Code. https://in.mathworks.com/matlabcentral/fileexchange/38830-smote-synthetic-minority-over-sampling-technique. Accessed 10 May 2019
ADASYN MATLAB Code. https://in.mathworks.com/matlabcentral/fileexchange/50541-adasyn-improves-class-balance-extension-of-smote. Accessed 10 May 2019
Little, M.A., McSharry, P.E., Roberts, S.J., Costello, D.A.E., Moroz, I.M.: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. Online 6(23) (2007)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Article Google Scholar
Linda, O., Manic, M.: General type-2 fuzzy c-means algorithm for uncertain fuzzy clustering. IEEE Trans. Fuzzy Syst. 20(5), 883–897 (2012)
Article Google Scholar
Kulkarni, S., Agrawal, R., Rhee, F.C.H.: Determining the optimal fuzzifier range for alpha-planes of general type-2 fuzzy sets. In: The Proceedings of The 2018 IEEE International Conference on Fuzzy Systems (2018)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Breiman, L.: Classification and Regression Trees, 1st edn. Chapman & Hall/CRC, New York (1984)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bhatt, R.: Planning-Relax Dataset for Automatic Classification of EEG Signals. UCI Machine Learning Repository (2012)
Google Scholar

Download references

Acknowledgement

The authors would like to acknowledge Dr. Sriparna Bandopadhyay (Indian Institute of Technology Guwahati) and Dr. Ayon Ganguly (Indian Institute of Technology Guwahati) for their valuable feedbacks.

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Rishabh Agrawal & Paridhi Kothari

Authors

Rishabh Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Paridhi Kothari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rishabh Agrawal .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
Harvard University, Cambridge, MA, USA
Renato Umeton
Università di Catania, Catania, Catania, Italy
Giovanni Giuffrida
Almawave, Rome, Roma, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agrawal, R., Kothari, P. (2019). CoPASample: A Heuristics Based Covariance Preserving Data Augmentation. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-37599-7_26
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving efficiency of data augmentation algorithms using Peskun’s theorem

Statistical Data Generation Using Sample Data

Limitations of data augmentation and outlook

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving efficiency of data augmentation algorithms using Peskun’s theorem

Statistical Data Generation Using Sample Data

Limitations of data augmentation and outlook

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation