Skip to main content

Dynamic Features Spaces and Machine Learning: Open Problems and Synthetic Data Sets

  • Conference paper
  • First Online:
Integrated Uncertainty in Knowledge Modelling and Decision Making (IUKM 2020)

Abstract

Dynamic feature spaces appear when different records or instances in databases are defined in terms of different features. This is in contrast with usual (static) feature spaces in standard databases, where the schema of the database is known and fixed. Then, all records in the database have the same set of variables, attributes or features. Dynamic feature mining algorithms are to extract knowledge from data on dynamic feature spaces. As an example, spam detection methods have been developed from a dynamic feature space perspective. Words are taken as features and new words appearing in new emails are, therefore, considered new features. In this case, the problem of spam detection is represented as a classification problem (a supervised machine learning problem).

The relevance of dynamic feature spaces is increasing. The large amounts of data currently available or received by systems are not necessarily described using the same feature spaces. This is the case of distributed databases with data about customers, providers, etc. Industry 4.0, Internet of Things, and RFIDs are and will be a source of data in dynamic feature spaces. New sensors added in an industrial environment, new devices connected into a smart home, new types of analysis and new types of sensors in healthcare, all are examples of dynamic feature spaces. Machine learning algorithms are needed to deal with these type of scenarios.

In this paper we motivate the interest for dynamic feature mining, we give some examples of scenarios where these techniques are needed, we review some of the existing solutions and its relationship with other areas of machine learning and data mining (e.g., incremental learning, concept drift, topic modeling), we discuss some open problems, and we discuss synthetic data generation for this type of problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abuzayed, N., Ergenç, B.: Dynamic itemset mining under multiple support thresholds. In: Proceedings of the FSDM 2016, pp. 141–148 (2016)

    Google Scholar 

  2. Abuzayed, N.N., Ergenç, B.: Comparison of dynamic itemset mining algorithms for multiple support thresholds. In: Proceedings of the IDEAS 2017 (2017)

    Google Scholar 

  3. Belford, M., Mac Namee, B., Greene, D.: Synthetic dataset generation for online topic modeling. In: Proceedings of the AICS 2017, pp. 7–8 (2017)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Drichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Gomes, J.B., Gaber, M., Sousa, P.A.C., Menasalvas, E.: Mining recurring concepts in a dynamic feature space. IEEE Trans. Neural Networks Learn. Syst. 25(1), 95–110 (2014)

    Article  Google Scholar 

  6. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29, 1645–1660 (2013)

    Article  Google Scholar 

  7. Herranz, J., Nin, J., Solé, M.: Kd-trees and the real disclosure risks of large statistical databases. Inf. Fusion 13(4), 260–270 (2012)

    Article  Google Scholar 

  8. Ibrahim, O.A., Keller, J.M., Bezdek, J.C.: Evaluating evolving structure in streaming data with modified Dunn’s indices. IEEE Trans. Emerg. Top. Comput. Intell. (2020, in press). https://doi.org/10.1109/TETCI.2019.2909521

  9. Katakis, I., Tsoumakas, G., Vlahavas, I.: On the utility of incremental feature selection for the classification of textual data streams. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 338–348. Springer, Heidelberg (2005). https://doi.org/10.1007/11573036_32

    Chapter  Google Scholar 

  10. Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. In: Proceedings of the KAIS (2009)

    Google Scholar 

  11. Law, Y.-N., Zaniolo, C.: An adaptive nearest neighbor classification algorithm for data streams. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 108–120. Springer, Heidelberg (2005). https://doi.org/10.1007/11564126_15

    Chapter  Google Scholar 

  12. Lee, I., Lee, K.: The Internet of Things (IoT): applications, investments, and challenges for enterprises. Bus. Horiz. 58, 431–440 (2015)

    Article  Google Scholar 

  13. Moshtaghi, M., Bezdek, J.C., Erfani, S.M., Leckie, C., Bailey, J.: Online cluster validity indices for performance monitoring of streaming data clustering. Int. J. Intell. Syst. 34, 541–563 (2019)

    Article  Google Scholar 

  14. Otey, M.E., Wang, C., Parthasarathy, S., Veloso, A., Meira, W.: Mining frequent itemsets in distributed and dynamic database. In: Proceedings of the ICDM 2003 (2003)

    Google Scholar 

  15. Wenerstrom, B., Giraud-Carrier, C.: Temporal data mining in dynamic feature spaces. In: Proceedings of the ICDM 2006 (2006)

    Google Scholar 

  16. Zliobaite, I.: Learning under concept drift: an overview. Arxiv:1010.4784v1 (2010). https://arxiv.org/pdf/1010.4784.pdf

  17. Sanghani, G., Kotecha, K.: Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update. Expert Syst. Appl. 115, 287–299 (2019)

    Article  Google Scholar 

  18. Song, G., Ye, Y., Zhang, H., Xu, X., Lau, R.Y.K., Liu, F.: Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift. Inf. Sci. 357, 125–143 (2016)

    Article  Google Scholar 

  19. Steinhauer, H.J., Helldin, T., Mathiason, G., Karlsson, A.: Topic modeling for anomaly detection in telecommunication networks. J. Ambient Intell. Humanized Comput. (2019, in press)

    Google Scholar 

  20. http://byubookstore.com

  21. http://www.ppdm.cat/gransDades.php

  22. http://spamassassin.apache.org/publiccorpus/

  23. https://towardsdatascience.com/why-machine-learning-models-degrade-in-production-d0f2108e9214

  24. http://www.ecn.purdue.edu/KDDCUP/

  25. http://www.ppdm.cat/links.php

Download references

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaya, S.K., Navarro-Arribas, G., Torra, V. (2020). Dynamic Features Spaces and Machine Learning: Open Problems and Synthetic Data Sets. In: Huynh, VN., Entani, T., Jeenanunta, C., Inuiguchi, M., Yenradee, P. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2020. Lecture Notes in Computer Science(), vol 12482. Springer, Cham. https://doi.org/10.1007/978-3-030-62509-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62509-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62508-5

  • Online ISBN: 978-3-030-62509-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics