Skip to main content

Discovering Highly Informative Feature Sets from Data Streams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Abstract

How to select interesting feature sets from data streams is a new and important research topic in which there are three major challenges. First of all, instead of discovering features individually and independently, we are interested in comprehensively selecting a subset of features whose joint importance or weight is the highest. Secondly, we are concerned with the problem of selecting feature sets over dynamic, large and online data streams which are only partly available when we are selecting the features. This problem distinguishes itself over the data streams from the ones on the static data which is completely available before the feature selection. Finally, data streams may evolve over time, requiring an online feature selection technique which can capture and adapt to such changes. We introduce the problem of online feature selection over data streams and we provide a heuristic solution. We also demonstrate the effectiveness and efficiency of our method through experiments on real-world mobile web usage data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)

    Google Scholar 

  2. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining. AAAI/MIT (2003)

    Google Scholar 

  3. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)

    Google Scholar 

  4. Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. Information and Control 36(2), 133–156 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  5. Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikäinen, T., Seppänen, J.K.: Finding low-entropy sets and trees from binary data. In: KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–359. ACM, New York (2007)

    Chapter  Google Scholar 

  6. Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 237–244. ACM, New York (2006)

    Chapter  Google Scholar 

  7. Liu, H., Motoda, H.: Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC (2007)

    Google Scholar 

  8. Molina, L.C., Belanche, L., Nebot, À.: Feature selection algorithms: A survey and experimental evaluation. In: IEEE ICDM ’02, p. 306. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  9. Teng, W.-G., Chen, M.-S., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: VLDB 2003, pp. 93–104 (2003)

    Google Scholar 

  10. Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining non-redundant high order correlations in binary data. In: Proc. VLDB Endow., vol. 1(1), pp. 1178–1188 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, C., Masseglia, F. (2010). Discovering Highly Informative Feature Sets from Data Streams. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15364-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15363-1

  • Online ISBN: 978-3-642-15364-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics