Over-Sampling from an Auxiliary Domain

Al-Stouhi, Samir; Pandya, Abhilash

doi:10.1007/978-3-642-34475-6_69

Samir Al-Stouhi²⁰ &
Abhilash Pandya²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7663))

Included in the following conference series:

International Conference on Neural Information Processing

3162 Accesses

Abstract

The exponential growth of data dimensions presents an obstacle in informatics as data miners try to construct ever greater training sets to overcome the theoretical limitations of statistical learning theory. Machine learning models require a minimum set of samples within each label to develop a representative hypothesis. To overcome these bounds, we developed an algorithm that can extract samples from an auxiliary domain to augment the training set. Our work exploits concepts from the “Transfer Learning” and “Imbalanced Learning” domains to expand the training set and permit standard models to be applied. We present theoretical verification of our method and demonstrate the effectiveness of our framework with experimental results on real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kearns, M.J., Vazirani, U.V.: An introduction to computational learning theory. MIT Press (1994)
Google Scholar
Vapnik, V.N., Ya: On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. TPA 16(2), 264–280 (1971)
MATH Google Scholar
He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Knowl- edge and Data Engineering 21(9), 1263–1284(2009)
Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: Synthetic minority over- sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Chapter Google Scholar
Haibo, H., Yang, B., Garcia, E.A., Shutao, L.: Adasyn: Adaptive synthetic sam- pling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, pp. 1322–1328 (2008)
Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classifi- cation of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)
Article MATH Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowl- edge and Data Engineering 22(10), 1345–1359 (2010)
Article Google Scholar
Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: International Conference on Machine learning, pp. 193–200 (2007)
Google Scholar
Al-Stouhi, S., Reddy, C.K.: Adaptive Boosting for Transfer Learning Using Dynamic Updates. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 60–75. Springer, Heidelberg (2011)
Chapter Google Scholar
Pardoe, D., Stone, P.: Boosting for regression transfer. In: Proceedings of the 27th International Conference on Machine Learning, pp. 863–870 (2010)
Google Scholar
Yao, Y., Doretto, G.: Boosting for transfer learning with multiple sources. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1855–1862 (2010)
Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6(1), 7–19 (2004)
Article Google Scholar
Weiss, G.M.: Mining with rare cases. In: Data Mining and Knowledge Discovery Handbook, pp. 747–757. Springer (2010)
Google Scholar
He, J.: Rare Category Analysis. PhD thesis, Carnegie Mellon University (2010)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Second European Conference on Computational Learning Theory, pp. 23–37 (1995)
Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 256–261 (1989)
Google Scholar
Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Co-clustering based classification for out- of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 210–219 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI, 48202, USA
Samir Al-Stouhi & Abhilash Pandya

Authors

Samir Al-Stouhi
View author publications
You can also search for this author in PubMed Google Scholar
Abhilash Pandya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Texas A&M University at Qatar, Education City, P.O. Box 23874, Doha, Qatar
Tingwen Huang
Department of Control Science and Engineering, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074, Wuhan, Hubei, China
Zhigang Zeng
College of Computer Science, Chongqing University, 174 Shazhengjie Street, 400044, Chongqing, China
Chuandong Li
Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Chi Sing Leung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al-Stouhi, S., Pandya, A. (2012). Over-Sampling from an Auxiliary Domain. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34475-6_69

Download citation

DOI: https://doi.org/10.1007/978-3-642-34475-6_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34474-9
Online ISBN: 978-3-642-34475-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics