Skip to main content

Supervised Selection of Dynamic Features, with an Application to Telecommunication Data Preparation

  • Conference paper
  • 1782 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4065))

Abstract

In the field of data mining, data preparation has more and more in common with a bottleneck. Indeed, collecting and storing data becomes cheaper while modelling costs remain unchanged. As a result, feature selection is now usually performed. In the data preparation step, selection often relies on feature ranking. In the supervised classification context, ranking is based on the information that the explanatory feature brings on the target categorical attribute.

With the increasing presence in the database of feature measured over time, i.e. dynamic features, new supervised ranking methods have to be designed. In this paper, we propose a new method to evaluate dynamic features, which is derived from a probabilistic criterion. The criterion is non-parametric and handles automatically the problem of overfitting the data. The resulting evaluation produces reliable results. Furthermore, the design of the criterion relies on an understandable and simple approach. This allows to provide meaningful visualization of the evaluation, in addition to the computed score. The advantages of the new method are illustrated on a telecommunication dataset.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Boullé, M.: A grouping method for categorical attributes having very large number of values. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 228–242. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Boullé, M.: A bayesian approach for supervised discretization. In: Data Mining V, Zanasi and Ebecken and Brebbia, pp. 199–208. WIT Press (2004)

    Google Scholar 

  3. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: step-by-step data mining guide. Applied Statistics Algorithms (2000)

    Google Scholar 

  4. Fawcett, T.: ROC Graphs: notes and practical considerations for reseachers. Technical report HPL-2003-4 (2003)

    Google Scholar 

  5. Ferrandiz, S., Boullé, M.: Supervised evaluation of Voronoi partitions. Journal of intelligent data analysis (published, 2006)

    Google Scholar 

  6. Gilad-Bachrach, R., Navot, A., Tishby, N.: Margin based feature selection - theory and algorithms. In: Proceedings of the 21’st international conference on machine learning (2004)

    Google Scholar 

  7. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of machine learning research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  8. Hansen, P., Mladenovic, N.: Variable neighborhood search: principles and applications. European journal of operational research 130, 449–467 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  9. Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  10. Kohavi, R., Sahami, M.: Error-based and entropy-based Discretization of continuous features. In: Proceedings of the 2’nd international conference on knowledge discovery and data mining, pp. 114–119 (1996)

    Google Scholar 

  11. Shannon, C.E.: A mathematical theory of communication. Bell systems technical journal 27, 379–423, 623–656 (1948)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferrandiz, S., Boullé, M. (2006). Supervised Selection of Dynamic Features, with an Application to Telecommunication Data Preparation. In: Perner, P. (eds) Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. ICDM 2006. Lecture Notes in Computer Science(), vol 4065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790853_19

Download citation

  • DOI: https://doi.org/10.1007/11790853_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36036-0

  • Online ISBN: 978-3-540-36037-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics