On the Homogeneous Ensembling with Balanced Random Sets and Boosting

Nikulin, Vladimir

doi:10.1007/978-3-642-32115-3_21

Vladimir Nikulin²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7413))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

1977 Accesses
1 Citations

Abstract

Ensembles are often capable of greater prediction accuracy than any of their individual members. As a consequence of the diversity between individual base-learners, an ensemble will not suffer from overfitting. On the other hand, in many cases we are dealing with imbalanced data and a classifier which was built using all data has tendency to ignore minority class. As a solution to the problem, we propose to consider a large number of relatively small and balanced subsets where representatives from the both patterns are to be selected randomly. Using different pre-processing technique combined with available background knowledge, which may have subjective treatment, we can generate many secondary databases for training. The relevance of those databases maybe tested with five folds cross-validation (CV5). Further, we can use CV5-results to optimise blending structure. Note that it is appropriate to use different software for CV5 evaluation and for the computation of the final solution. Our model was tested online during an International Carvana data mining Contest on the Kaggle platform. This Contest was highly popular and attracted 582 actively participating teams, where our team was awarded 2nd prize.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Koren, Y.: The BellKor Solution to the Netflix Grand Prize, Wikipedia, 10 pages (2009)
Google Scholar
Wang, W.: Some fundamental issues in ensemble methods. In: World Congress on Computational Intelligence, Hong Kong, pp. 2244–2251. IEEE (2008)
Google Scholar
Nikulin, V.: Classification of imbalanced data with random sets and mean-variance filtering. International Journal of Data Warehousing and Mining 4(2), 63–78 (2008)
Article MathSciNet Google Scholar
Nikulin, V., McLachlan, G.: Classification of imbalanced data with balanced random sets. Journal of Machine Learning Research, Workshop and Conference Proceedings 7, 89–100 (2009)
Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Article MathSciNet MATH Google Scholar
Heckerman, J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979)
Article MathSciNet Google Scholar
Jelizarow, M., Guillemot, V., Tenenhaus, A., Strimmer, K., Boulesteix, A.-L.: Over-optimism in bioinformatics: an illustration. Bioinformatics 26(16), 1990–1998 (2010)
Article Google Scholar
Carpenter, J.: the best analyst win. Science 331, 698–699 (2011)
Article Google Scholar
Cudeck, R., Browne, M.: Cross-validation of covariance structures. Multivariate Behavioral Research 18(2), 147–167 (1983)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematical Methods in Economy, Vyatka State University, Kirov, Russia
Vladimir Nikulin

Authors

Vladimir Nikulin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
JingTao Yao
School of Information Science and Technology, Southwest Jiaotong University, 610031, Chengdu, P.R. China
Yan Yang
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland
Roman Słowiński
Faculty of Economics, University of Catania, Corso Italia, 55, 95129, Catania, Italy
Salvatore Greco
School of Management and Engineering, Nanjing University, 210093, Nanjing, Jiangsu, P.R. China
Huaxiong Li
Machine Intelligence Unit, Indian Statistical Institute (ISI), 700108, Kolkata, India
Sushmita Mitra
Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Lech Polkowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nikulin, V. (2012). On the Homogeneous Ensembling with Balanced Random Sets and Boosting. In: Yao, J., et al. Rough Sets and Current Trends in Computing. RSCTC 2012. Lecture Notes in Computer Science(), vol 7413. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32115-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-32115-3_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32114-6
Online ISBN: 978-3-642-32115-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics