Skip to main content

Over-Sampling from an Auxiliary Domain

  • Conference paper
Neural Information Processing (ICONIP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7663))

Included in the following conference series:

  • 3162 Accesses

Abstract

The exponential growth of data dimensions presents an obstacle in informatics as data miners try to construct ever greater training sets to overcome the theoretical limitations of statistical learning theory. Machine learning models require a minimum set of samples within each label to develop a representative hypothesis. To overcome these bounds, we developed an algorithm that can extract samples from an auxiliary domain to augment the training set. Our work exploits concepts from the “Transfer Learning” and “Imbalanced Learning” domains to expand the training set and permit standard models to be applied. We present theoretical verification of our method and demonstrate the effectiveness of our framework with experimental results on real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kearns, M.J., Vazirani, U.V.: An introduction to computational learning theory. MIT Press (1994)

    Google Scholar 

  2. Vapnik, V.N., Ya: On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. TPA 16(2), 264–280 (1971)

    MATH  Google Scholar 

  3. He, H., Garcia, E.: Learning from imbalanced data. IEEE Transactions on Knowl- edge and Data Engineering 21(9), 1263–1284(2009)

    Google Scholar 

  4. Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: Synthetic minority over- sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  5. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Haibo, H., Yang, B., Garcia, E.A., Shutao, L.: Adasyn: Adaptive synthetic sam- pling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, pp. 1322–1328 (2008)

    Google Scholar 

  7. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classifi- cation of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)

    Article  MATH  Google Scholar 

  8. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowl- edge and Data Engineering 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  9. Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: International Conference on Machine learning, pp. 193–200 (2007)

    Google Scholar 

  10. Al-Stouhi, S., Reddy, C.K.: Adaptive Boosting for Transfer Learning Using Dynamic Updates. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part I. LNCS, vol. 6911, pp. 60–75. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Pardoe, D., Stone, P.: Boosting for regression transfer. In: Proceedings of the 27th International Conference on Machine Learning, pp. 863–870 (2010)

    Google Scholar 

  12. Yao, Y., Doretto, G.: Boosting for transfer learning with multiple sources. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1855–1862 (2010)

    Google Scholar 

  13. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6(1), 7–19 (2004)

    Article  Google Scholar 

  14. Weiss, G.M.: Mining with rare cases. In: Data Mining and Knowledge Discovery Handbook, pp. 747–757. Springer (2010)

    Google Scholar 

  15. He, J.: Rare Category Analysis. PhD thesis, Carnegie Mellon University (2010)

    Google Scholar 

  16. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Second European Conference on Computational Learning Theory, pp. 23–37 (1995)

    Google Scholar 

  17. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. In: Proceedings of the Annual Symposium on Foundations of Computer Science, pp. 256–261 (1989)

    Google Scholar 

  18. Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Co-clustering based classification for out- of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 210–219 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Al-Stouhi, S., Pandya, A. (2012). Over-Sampling from an Auxiliary Domain. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34475-6_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34475-6_69

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34474-9

  • Online ISBN: 978-3-642-34475-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics