Dynamic Features Spaces and Machine Learning: Open Problems and Synthetic Data Sets

Kaya, Sema Kayapinar; Navarro-Arribas, Guillermo; Torra, Vicenç

doi:10.1007/978-3-030-62509-2_11

Sema Kayapinar Kaya¹³,
Guillermo Navarro-Arribas¹⁴ &
Vicenç Torra¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12482))

Included in the following conference series:

International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making

669 Accesses
1 Citations

Abstract

Dynamic feature spaces appear when different records or instances in databases are defined in terms of different features. This is in contrast with usual (static) feature spaces in standard databases, where the schema of the database is known and fixed. Then, all records in the database have the same set of variables, attributes or features. Dynamic feature mining algorithms are to extract knowledge from data on dynamic feature spaces. As an example, spam detection methods have been developed from a dynamic feature space perspective. Words are taken as features and new words appearing in new emails are, therefore, considered new features. In this case, the problem of spam detection is represented as a classification problem (a supervised machine learning problem).

The relevance of dynamic feature spaces is increasing. The large amounts of data currently available or received by systems are not necessarily described using the same feature spaces. This is the case of distributed databases with data about customers, providers, etc. Industry 4.0, Internet of Things, and RFIDs are and will be a source of data in dynamic feature spaces. New sensors added in an industrial environment, new devices connected into a smart home, new types of analysis and new types of sensors in healthcare, all are examples of dynamic feature spaces. Machine learning algorithms are needed to deal with these type of scenarios.

In this paper we motivate the interest for dynamic feature mining, we give some examples of scenarios where these techniques are needed, we review some of the existing solutions and its relationship with other areas of machine learning and data mining (e.g., incremental learning, concept drift, topic modeling), we discuss some open problems, and we discuss synthetic data generation for this type of problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abuzayed, N., Ergenç, B.: Dynamic itemset mining under multiple support thresholds. In: Proceedings of the FSDM 2016, pp. 141–148 (2016)
Google Scholar
Abuzayed, N.N., Ergenç, B.: Comparison of dynamic itemset mining algorithms for multiple support thresholds. In: Proceedings of the IDEAS 2017 (2017)
Google Scholar
Belford, M., Mac Namee, B., Greene, D.: Synthetic dataset generation for online topic modeling. In: Proceedings of the AICS 2017, pp. 7–8 (2017)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Drichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Gomes, J.B., Gaber, M., Sousa, P.A.C., Menasalvas, E.: Mining recurring concepts in a dynamic feature space. IEEE Trans. Neural Networks Learn. Syst. 25(1), 95–110 (2014)
Article Google Scholar
Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)
Article Google Scholar
Herranz, J., Nin, J., Solé, M.: Kd-trees and the real disclosure risks of large statistical databases. Inf. Fusion 13(4), 260–270 (2012)
Article Google Scholar
Ibrahim, O.A., Keller, J.M., Bezdek, J.C.: Evaluating evolving structure in streaming data with modified Dunn’s indices. IEEE Trans. Emerg. Top. Comput. Intell. (2020, in press). https://doi.org/10.1109/TETCI.2019.2909521
Katakis, I., Tsoumakas, G., Vlahavas, I.: On the utility of incremental feature selection for the classification of textual data streams. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 338–348. Springer, Heidelberg (2005). https://doi.org/10.1007/11573036_32
Chapter Google Scholar
Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. In: Proceedings of the KAIS (2009)
Google Scholar
Law, Y.-N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 108–120. Springer, Heidelberg (2005). https://doi.org/10.1007/11564126_15
Chapter Google Scholar
Lee, I., Lee, K.: The Internet of Things (IoT): applications, investments, and challenges for enterprises. Bus. Horiz. 58, 431–440 (2015)
Article Google Scholar
Moshtaghi, M., Bezdek, J.C., Erfani, S.M., Leckie, C., Bailey, J.: Online cluster validity indices for performance monitoring of streaming data clustering. Int. J. Intell. Syst. 34, 541–563 (2019)
Article Google Scholar
Otey, M.E., Wang, C., Parthasarathy, S., Veloso, A., Meira, W.: Mining frequent itemsets in distributed and dynamic database. In: Proceedings of the ICDM 2003 (2003)
Google Scholar
Wenerstrom, B., Giraud-Carrier, C.: Temporal data mining in dynamic feature spaces. In: Proceedings of the ICDM 2006 (2006)
Google Scholar
Zliobaite, I.: Learning under concept drift: an overview. Arxiv:1010.4784v1 (2010). https://arxiv.org/pdf/1010.4784.pdf
Sanghani, G., Kotecha, K.: Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update. Expert Syst. Appl. 115, 287–299 (2019)
Article Google Scholar
Song, G., Ye, Y., Zhang, H., Xu, X., Lau, R.Y.K., Liu, F.: Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift. Inf. Sci. 357, 125–143 (2016)
Article Google Scholar
Steinhauer, H.J., Helldin, T., Mathiason, G., Karlsson, A.: Topic modeling for anomaly detection in telecommunication networks. J. Ambient Intell. Humanized Comput. (2019, in press)
Google Scholar
http://byubookstore.com
http://www.ppdm.cat/gransDades.php
http://spamassassin.apache.org/publiccorpus/
https://towardsdatascience.com/why-machine-learning-models-degrade-in-production-d0f2108e9214
http://www.ecn.purdue.edu/KDDCUP/
http://www.ppdm.cat/links.php

Download references

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Department of Industrial Engineering, Munzur University, Tunceli, Turkey
Sema Kayapinar Kaya
Department of Information and Communications Engineering, CYBERCAT-Center for Cybersecurity Research of Catalonia, Universitat Autònoma de Barcelona, Bellaterra, Spain
Guillermo Navarro-Arribas
Department Computing Sciences, Umeå University, Umeå, Sweden
Vicenç Torra

Authors

Sema Kayapinar Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Navarro-Arribas
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh
University of Hyogo, Kobe, Japan
Tomoe Entani
Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
Chawalit Jeenanunta
Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka, Japan
Masahiro Inuiguchi
Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani, Thailand
Pisal Yenradee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaya, S.K., Navarro-Arribas, G., Torra, V. (2020). Dynamic Features Spaces and Machine Learning: Open Problems and Synthetic Data Sets. In: Huynh, VN., Entani, T., Jeenanunta, C., Inuiguchi, M., Yenradee, P. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2020. Lecture Notes in Computer Science(), vol 12482. Springer, Cham. https://doi.org/10.1007/978-3-030-62509-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-62509-2_11
Published: 02 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62508-5
Online ISBN: 978-3-030-62509-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics