Skip to main content

Online Multi-label Feature Selection on Imbalanced Data Sets

  • Conference paper
  • First Online:
Wireless Sensor Networks (CWSN 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 812))

Included in the following conference series:

Abstract

Feature selection is an important step of data processing. When feature selection is conducted for multi-label classification problem in online learning fashion, it is the problem of online multi-label feature selection. Online feature selection is very appropriate for some actual situations in which the data is not available in advance, the data size is very large or fast running speed is highly demanding. We propose an online multi-label feature selection algorithm in which the data set is divided into many single-label data sets, feature selection is conducted for each single-label data set and the final features are selected from the selected single-label features. As many data sets are imbalanced, we use the basic idea of cost-sensitive learning to combat it. Experiment results corroborate the performance of our algorithm on various data sets and demonstrate that the proposed algorithm can improve online classification performance on imbalanced data sets effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, H., Xu, X., Lai, L., Shen, Y.: Online commercial intention detection framework based on web pages. Int. J. Comput. Sci. Eng. 12(2/3), 176–185 (2016)

    Article  Google Scholar 

  2. Perozzi, B., Al-Rfou, R., Skiena, S: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)

    Google Scholar 

  3. Rosenblatt, F.: The perception: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)

    Article  Google Scholar 

  4. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(3), 551–585 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Cesabianchi, N., Conconi, A., Gentile, C.: A second-order perceptron algorithm. SIAM J. Comput. 2375(3), 121–137 (2002)

    MathSciNet  MATH  Google Scholar 

  6. Wang, J., Zhao, P., Hoi, S.C.H.: Exact soft confidence-weighted learning. In: Computer Science, pp. 107–114 (2012)

    Google Scholar 

  7. Crammer, K., Dredze, M., Pereira, F.: Confidence-weighted linear classification for text categorization. J. Mach. Learn. Res. 13(1), 1891–1926 (2012)

    MathSciNet  MATH  Google Scholar 

  8. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Article  Google Scholar 

  9. Dash, M., Gopalkrishnan, V.: Distance based feature selection for clustering microarray data. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 512–519. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78568-2_41

    Chapter  Google Scholar 

  10. Karegowda, A.G., Bharathi, P.T.: Enhancing cbir performance using evolutionary algorithm-assisted significant feature selection: a filter approach. Int. J. Appl. Res. Inf. Technol. Comput. 7(1), 53–59 (2016)

    Article  Google Scholar 

  11. Rodrigues, D., Nakamura, R.Y.M., Costa, K.A.P., Yang, X.S.: A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst. Appl. 41(5), 2250–2258 (2014)

    Article  Google Scholar 

  12. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  13. Li-Yeh, C., Ke, C.H., Yang, C.H.: A hybrid both filter and wrapper feature selection method for microarray classification. In: International Multi Conference of Engineers and Computer Scientists, vol. 2168 (2008)

    Google Scholar 

  14. Longadge, R., Dongre, S.: Class imbalance problem in data mining review. Int. J. Comput. Sci. Netw. 2(1), 83 (2013)

    Google Scholar 

  15. Wang, J., Zhao, P., Hoi, S.C.H., Jin, R.: Online feature selection and its applications. IEEE Trans. Knowl. Data Eng. 26(3), 698–710 (2013)

    Article  Google Scholar 

  16. Mulan. http://mulan.sourceforge.net/datasetsmlc.html

  17. UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets.html

  18. Han, C., Tan, Y.K., Zhu, J.H., et al.: Online feature selection of class imbalance via PA algorithm. J. Comput. Sci. Technol. 31(4), 673–682 (2016)

    Article  MathSciNet  Google Scholar 

  19. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: International Conference on Machine Learning, pp. 179–186 (1997)

    Google Scholar 

  20. Chen, X.W., Wasikowski, M.: FAST:a ROC-based feature selection metric for small samples and imbalanced data classification problems. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 46, pp. 124–132 (2008)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (NSFC) under the grant number 61379127, 61379128.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongwen Guo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, J., Guo, Z., Sun, Z., Liu, S., Wang, X. (2018). Online Multi-label Feature Selection on Imbalanced Data Sets. In: Li, J., et al. Wireless Sensor Networks. CWSN 2017. Communications in Computer and Information Science, vol 812. Springer, Singapore. https://doi.org/10.1007/978-981-10-8123-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8123-1_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8122-4

  • Online ISBN: 978-981-10-8123-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics