Abstract
Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 34–42 (2008)
Bifet, A., Gavaldà, R.: Adaptive XML tree classification on evolving data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 147–162. Springer, Heidelberg (2009)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (2000)
Chen, R., Sivakumar, K., Kargupta, H.: Collective mining of Bayesian networks from heterogeneous data. Knowledge and Information Systems Journal 6(2), 164–187 (2004)
Gaber, M., Yu, P.S.: A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In: ACM Symposium Applied Computing, pp. 649–656. ACM Press (2006)
Medhat, M., Gaber, M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the Second Workshop on Australasian Information Security, pp. 109–114. Australian Computer Society, Inc. (2004)
Gama, J.: Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman & Hall CRC Press, Atlanta (2010)
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD, pp. 329–338 (2009)
Hulten, G., Domingos, P.: Catching up with the data: research issues in mining data streams. In: Proc. of Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara, USA (2001)
Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y.: Data Mining: Next Generation Challenges and Future Directions. AAAI Press and MIT Press (2004)
Kargupta, H., Park, B.H.: Mining decision trees from data streams in a mobile environment. In: IEEE International Conference on Data Mining, pp. 281–288. IEEE Computer Society, San Jose (2001)
Kargupta, H., Park, B.H., Dutta, H.: Orthogonal decision trees. IEEE Transactions on Knowledge and Data Engineering 18, 1028–1042 (2006)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, Toronto (2004)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo (1993)
Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Transactions Database Systems 32(4), 301–312 (2007)
Wald, A.: Sequential Analysis. John Wiley and Sons, Inc. (1947)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gama, J. (2015). Challenges in Learning from Streaming Data Extended Abstract. In: Bogdanova, A., Gjorgjevikj, D. (eds) ICT Innovations 2014. ICT Innovations 2014. Advances in Intelligent Systems and Computing, vol 311. Springer, Cham. https://doi.org/10.1007/978-3-319-09879-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-09879-1_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09878-4
Online ISBN: 978-3-319-09879-1
eBook Packages: EngineeringEngineering (R0)