Abstract
How to select interesting feature sets from data streams is a new and important research topic in which there are three major challenges. First of all, instead of discovering features individually and independently, we are interested in comprehensively selecting a subset of features whose joint importance or weight is the highest. Secondly, we are concerned with the problem of selecting feature sets over dynamic, large and online data streams which are only partly available when we are selecting the features. This problem distinguishes itself over the data streams from the ones on the static data which is completely available before the feature selection. Finally, data streams may evolve over time, requiring an online feature selection technique which can capture and adapt to such changes. We introduce the problem of online feature selection over data streams and we provide a heuristic solution. We also demonstrate the effectiveness and efficiency of our method through experiments on real-world mobile web usage data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining. AAAI/MIT (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD ’00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)
Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. Information and Control 36(2), 133–156 (1978)
Heikinheimo, H., Hinkkanen, E., Mannila, H., Mielikäinen, T., Seppänen, J.K.: Finding low-entropy sets and trees from binary data. In: KDD ’07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 350–359. ACM, New York (2007)
Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD ’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 237–244. ACM, New York (2006)
Liu, H., Motoda, H.: Computational Methods of Feature Selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC (2007)
Molina, L.C., Belanche, L., Nebot, À.: Feature selection algorithms: A survey and experimental evaluation. In: IEEE ICDM ’02, p. 306. IEEE Computer Society, Los Alamitos (2002)
Teng, W.-G., Chen, M.-S., Yu, P.S.: A Regression-Based Temporal Pattern Mining Scheme for Data Streams. In: VLDB 2003, pp. 93–104 (2003)
Zhang, X., Pan, F., Wang, W., Nobel, A.: Mining non-redundant high order correlations in binary data. In: Proc. VLDB Endow., vol. 1(1), pp. 1178–1188 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, C., Masseglia, F. (2010). Discovering Highly Informative Feature Sets from Data Streams. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds) Database and Expert Systems Applications. DEXA 2010. Lecture Notes in Computer Science, vol 6261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15364-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-15364-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15363-1
Online ISBN: 978-3-642-15364-8
eBook Packages: Computer ScienceComputer Science (R0)