Discovering Highly Informative Feature Sets from Data Streams

Zhang, Chongsheng; Masseglia, Florent

doi:10.1007/978-3-642-15364-8_7

Discovering Highly Informative Feature Sets from Data Streams

Chongsheng Zhang¹⁹ &
Florent Masseglia¹⁹

Conference paper

1053 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6261))

Abstract

How to select interesting feature sets from data streams is a new and important research topic in which there are three major challenges. First of all, instead of discovering features individually and independently, we are interested in comprehensively selecting a subset of features whose joint importance or weight is the highest. Secondly, we are concerned with the problem of selecting feature sets over dynamic, large and online data streams which are only partly available when we are selecting the features. This problem distinguishes itself over the data streams from the ones on the static data which is completely available before the feature selection. Finally, data streams may evolve over time, requiring an online feature selection technique which can capture and adapt to such changes. We introduce the problem of online feature selection over data streams and we provide a heuristic solution. We also demonstrate the effectiveness and efficiency of our method through experiments on real-world mobile web usage data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining. AAAI/MIT (2003)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)
Google Scholar
Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. Information and Control 36(2), 133–156 (1978)
Article MATH MathSciNet Google Scholar
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikäinen, T., Seppänen, J.K.: Finding low-entropy sets and trees from binary data. In: KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–359. ACM, New York (2007)
Chapter Google Scholar
Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 237–244. ACM, New York (2006)
Chapter Google Scholar
Liu, H., Motoda, H.: Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC (2007)
Google Scholar
Molina, L.C., Belanche, L., Nebot, À.: Feature selection algorithms: A survey and experimental evaluation. In: IEEE ICDM ’02, p. 306. IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Teng, W.-G., Chen, M.-S., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: VLDB 2003, pp. 93–104 (2003)
Google Scholar
Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining non-redundant high order correlations in binary data. In: Proc. VLDB Endow., vol. 1(1), pp. 1178–1188 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

AxIS team, INRIA, 2004 Route des lucioles, 06902, Sophia-Antipolis, France
Chongsheng Zhang & Florent Masseglia

Authors

Chongsheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Florent Masseglia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DeustoTech Computing, University of Deusto, Avda. Universidades, 24, 48007, Bilbao, Spain
Pablo García Bringas
Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Faculty of Computer Science, Department of Distributed Systems and Multimedia Systems, University of Vienna, Liebiggasse 4/3-4, 1010, Vienna, Austria
Gerald Quirchmayr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Masseglia, F. (2010). Discovering Highly Informative Feature Sets from Data Streams. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-15364-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15363-1
Online ISBN: 978-3-642-15364-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics