Skip to main content

A Parallel Implementation of Relief Algorithm Using Mapreduce Paradigm

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9876))

Abstract

Feature selection is an important research topic in machine learning and pattern recognition. In recent years, data has become increasingly larger in both number of instances and number of features. In fact the number of features that can be contained in a Big Data is hard to deal with. Unfortunately, the number of features that can be processed by most classification algorithms is considerably less. As a result, it is important to develop techniques for selecting features from very large data sets. However the efficiency of existing feature selection algorithms significantly downgrades, if not totally inapplicable, when data size exceeds hundreds of gigabytes. Traditional methods like Filters, Wrappers and Embedded methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a finite time. Therefore, the main purpose of this paper is to propose a new parallel feature selection framework that enable the use of feature selection methods in large datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. García, S., Luengo, J., Herrera, F.: Feature selection. In: García, S., Luengo, J., Herrera, F. (eds.) Data Preprocessing in Data Mining, pp. 163–193. Springer, Heidelberg (2015)

    Google Scholar 

  2. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  3. Arauzo-Azofra, A., Benitez, J.M., Castro, J.L.: Consistency measures for feature selection. J. Intell. Inf. Syst. 30(3), 273–292 (2008)

    Article  Google Scholar 

  4. Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: AAAI, vol. 91, pp. 547–552, Citeseer (1991)

    Google Scholar 

  5. Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: Issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 995–1004. IEEE (2013)

    Google Scholar 

  6. HajKacem, M.A.B., N’cir, C.B., Essoussi, N.: Mapreduce-based k-prototypes clustering method for big data. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Campus des Cordeliers, Paris, France, 19–21 October 2015, pp. 1–7 (2015)

    Google Scholar 

  7. Karegowda, A.G., Jayaram, M., Manjunath, A.: Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comput. Appl. 1(7), 13–17 (2010)

    Google Scholar 

  8. Sun, Z.: Parallel feature selection based on mapreduce. In: Wong, W.E., Zhu, T. (eds.) Computer Engineering and Networking, pp. 299–306. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  9. He, Q., Cheng, X., Zhuang, F., Shi, Z.: Parallel feature selection using positive approximation based on mapreduce. In: 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 397–402. IEEE (2014)

    Google Scholar 

  10. Kourid, A.: Iterative mapreduce for feature selection. Int. J. Eng. Res. Technol. 3 (2014). ESRSA Publications

    Google Scholar 

  11. Reggiani, C.: Scaling feature selection algorithms using mapreduce on apache hadoop (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jamila Yazidi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Yazidi, J., Bouaguel, W., Essoussi, N. (2016). A Parallel Implementation of Relief Algorithm Using Mapreduce Paradigm. In: Nguyen, N., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9876. Springer, Cham. https://doi.org/10.1007/978-3-319-45246-3_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45246-3_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45245-6

  • Online ISBN: 978-3-319-45246-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics