Skip to main content

SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting

  • Conference paper
  • First Online:
Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9437))

Abstract

In this paper we describe our submission to the IJCRS’15 Data Mining Competition, which is concerned with prediction of dangerous concentrations of methane in longwalls of a Polish coalmine. We address the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data. Moreover, we investigate the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting. Our results show improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution. By applying the proposed method we were able to build a classification model that predicts unseen test data even better than the training data, thus highlighting the non-over-fitting properties of the model. The submitted solution was about 2 % behind the winning solution.

P. Lameski—This work was partially financed by the Faculty of Computer Science and Engineering at the Ss.Cyril and Methodius University, Skopje, Macedonia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Finkelman, R.B.: Health impacts of coal: facts and fallacies. AMBIO J. Hum. Environ. 36(1), 103–106 (2007)

    Article  Google Scholar 

  2. Hendryx, M., Ahern, M.M., Nurkiewicz, T.R.: Hospitalization patterns associated with appalachian coal mining. J. Toxicol. Environ. Health Part A 70(24), 2064–2070 (2007)

    Article  Google Scholar 

  3. Kozielski, M., Skowron, A., Wrbel, L., Sikora, M.: Regression rule learning for methane forecasting in coal mines. In: Kozielski, S., Mrozek, D., Kasprowski, P., Malysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Communications in Computer and Information Science, vol. 521, pp. 495–504. Springer, Cham (2015)

    Chapter  Google Scholar 

  4. Krasuski, A., Jankowski, A., Skowron, A., Slezak, D.: From sensory data to decision making: a perspective on supporting a fire commander. In: 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), pp. 229–236. IEEE (2013)

    Google Scholar 

  5. Janusz, A., Ślȩzak, D., Sikora, M., Wróbel, ł., Stawicki, S., Marek, G., Slezak, D.: Mining data from coal mines: IJCRS’15 data challenge. In: Yao, Y., Hu, Q., Yu, H. Grzymala-Busse, J. (eds.) RSFDGrC 2015. LNCS, vol. 9437, pp. 429–438. Springer, Heidelberg (2015). https://knowledgepit.fedcsis.org/contest/view.php?id=109. Accessed 29 Jun 2015

  6. Fu, T.C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)

    Article  Google Scholar 

  7. Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surv. 45(1), 12:1–12:34 (2012)

    Article  MATH  Google Scholar 

  8. Hu, B., Chen, Y., Keogh, E.: Classification of streaming time series under more realistic assumptions. Data Min. Knowl. Disc. 1–35 (2015)

    Google Scholar 

  9. Nguyen, H.S.: On efficient handling of continuous attributes in large data bases. Fundam. Inf. 48(1), 61–81 (2001)

    MathSciNet  MATH  Google Scholar 

  10. Grzymala-Busse, J.W.: A new version of the rule induction system lers. Fundam. Inf. 31(1), 27–39 (1997)

    MATH  Google Scholar 

  11. Riza, L.S., Janusz, A., Bergmeir, C., Cornelis, C., Herrera, F., Slezak, D., Bentez, J.M.: Implementing algorithms of rough set theory and fuzzy rough set theory in the R package “roughsets”. Information Sciences 287, 68–89 (2014)

    Article  Google Scholar 

  12. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  13. Ben-Hur, A., Weston, J.: A users guide to support vector machines. In: Carugo, O., Eisenhaber, F. (eds.) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol. 609, pp. 223–239. Humana Press, New York (2010)

    Chapter  Google Scholar 

  14. Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification

    Google Scholar 

  15. Zdravevski, E., Lameski, P., Mingov, R., Kulakov, A., Gjorgjevikj, D.: Robust histogram-based feature engineering of time series data. In Ganzha, M., Maciaszek, L.A., Paprzycki, M., (eds.) Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (2015, in print)

    Google Scholar 

  16. Zdravevski, E., Lameski, P., Kulakov, A., Gjorgjevikj, D.: Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets. In: 2014 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 387–394, September 2014

    Google Scholar 

  17. Jolliffe, I.: Principal component analysis. In: Balakrishnan, N., Colton, T., Everitt, B., Piegorsch, W., Ruggeri, F., Teugels, J.L. (eds.) Wiley StatsRef: Statistics Reference Online. Wiley, Chichester (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eftim Zdravevski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lameski, P., Zdravevski, E., Mingov, R., Kulakov, A. (2015). SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting. In: Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W. (eds) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. Lecture Notes in Computer Science(), vol 9437. Springer, Cham. https://doi.org/10.1007/978-3-319-25783-9_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25783-9_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25782-2

  • Online ISBN: 978-3-319-25783-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics