Abstract
Dynamic feature spaces appear when different records or instances in databases are defined in terms of different features. This is in contrast with usual (static) feature spaces in standard databases, where the schema of the database is known and fixed. Then, all records in the database have the same set of variables, attributes or features. Dynamic feature mining algorithms are to extract knowledge from data on dynamic feature spaces. As an example, spam detection methods have been developed from a dynamic feature space perspective. Words are taken as features and new words appearing in new emails are, therefore, considered new features. In this case, the problem of spam detection is represented as a classification problem (a supervised machine learning problem).
The relevance of dynamic feature spaces is increasing. The large amounts of data currently available or received by systems are not necessarily described using the same feature spaces. This is the case of distributed databases with data about customers, providers, etc. Industry 4.0, Internet of Things, and RFIDs are and will be a source of data in dynamic feature spaces. New sensors added in an industrial environment, new devices connected into a smart home, new types of analysis and new types of sensors in healthcare, all are examples of dynamic feature spaces. Machine learning algorithms are needed to deal with these type of scenarios.
In this paper we motivate the interest for dynamic feature mining, we give some examples of scenarios where these techniques are needed, we review some of the existing solutions and its relationship with other areas of machine learning and data mining (e.g., incremental learning, concept drift, topic modeling), we discuss some open problems, and we discuss synthetic data generation for this type of problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abuzayed, N., Ergenç, B.: Dynamic itemset mining under multiple support thresholds. In: Proceedings of the FSDM 2016, pp. 141–148 (2016)
Abuzayed, N.N., Ergenç, B.: Comparison of dynamic itemset mining algorithms for multiple support thresholds. In: Proceedings of the IDEAS 2017 (2017)
Belford, M., Mac Namee, B., Greene, D.: Synthetic dataset generation for online topic modeling. In: Proceedings of the AICS 2017, pp. 7–8 (2017)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Drichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Gomes, J.B., Gaber, M., Sousa, P.A.C., Menasalvas, E.: Mining recurring concepts in a dynamic feature space. IEEE Trans. Neural Networks Learn. Syst. 25(1), 95–110 (2014)
Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)
Herranz, J., Nin, J., Solé, M.: Kd-trees and the real disclosure risks of large statistical databases. Inf. Fusion 13(4), 260–270 (2012)
Ibrahim, O.A., Keller, J.M., Bezdek, J.C.: Evaluating evolving structure in streaming data with modified Dunn’s indices. IEEE Trans. Emerg. Top. Comput. Intell. (2020, in press). https://doi.org/10.1109/TETCI.2019.2909521
Katakis, I., Tsoumakas, G., Vlahavas, I.: On the utility of incremental feature selection for the classification of textual data streams. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 338–348. Springer, Heidelberg (2005). https://doi.org/10.1007/11573036_32
Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. In: Proceedings of the KAIS (2009)
Law, Y.-N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 108–120. Springer, Heidelberg (2005). https://doi.org/10.1007/11564126_15
Lee, I., Lee, K.: The Internet of Things (IoT): applications, investments, and challenges for enterprises. Bus. Horiz. 58, 431–440 (2015)
Moshtaghi, M., Bezdek, J.C., Erfani, S.M., Leckie, C., Bailey, J.: Online cluster validity indices for performance monitoring of streaming data clustering. Int. J. Intell. Syst. 34, 541–563 (2019)
Otey, M.E., Wang, C., Parthasarathy, S., Veloso, A., Meira, W.: Mining frequent itemsets in distributed and dynamic database. In: Proceedings of the ICDM 2003 (2003)
Wenerstrom, B., Giraud-Carrier, C.: Temporal data mining in dynamic feature spaces. In: Proceedings of the ICDM 2006 (2006)
Zliobaite, I.: Learning under concept drift: an overview. Arxiv:1010.4784v1 (2010). https://arxiv.org/pdf/1010.4784.pdf
Sanghani, G., Kotecha, K.: Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update. Expert Syst. Appl. 115, 287–299 (2019)
Song, G., Ye, Y., Zhang, H., Xu, X., Lau, R.Y.K., Liu, F.: Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift. Inf. Sci. 357, 125–143 (2016)
Steinhauer, H.J., Helldin, T., Mathiason, G., Karlsson, A.: Topic modeling for anomaly detection in telecommunication networks. J. Ambient Intell. Humanized Comput. (2019, in press)
https://towardsdatascience.com/why-machine-learning-models-degrade-in-production-d0f2108e9214
Acknowledgements
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kaya, S.K., Navarro-Arribas, G., Torra, V. (2020). Dynamic Features Spaces and Machine Learning: Open Problems and Synthetic Data Sets. In: Huynh, VN., Entani, T., Jeenanunta, C., Inuiguchi, M., Yenradee, P. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2020. Lecture Notes in Computer Science(), vol 12482. Springer, Cham. https://doi.org/10.1007/978-3-030-62509-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-62509-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62508-5
Online ISBN: 978-3-030-62509-2
eBook Packages: Computer ScienceComputer Science (R0)