ABSTRACT
The plethora of available analytic platforms escalates the difficulty of selecting the most appropriate platform for a certain data mining task and datasets with varying characteristics. Especially novice analysts experience difficulties to keep up with the latest technical developments. In this demo, we present the ASAP-DM framework. ASAP-DM is able to automatically select a well-performing analytic platform for a given data mining task via an intuitive web interface, thus especially supporting novice analysts. The take-aways for demo attendees are: (1) a good understanding of the challenges of various data mining workloads, dataset characteristics, and the effects on the selection of analytic platforms, (2) useful insights on how ASAP-DM internally works, and (3) how to benefit from ASAP-DM for exploratory data analysis.
- Divy Agrawal, Sanjay Chawla, Bertty Contreras-Rojas, Ahmed Elmagarmid, Yasser Idris, Zoi Kaoudi, Sebastian Kruse, Ji Lucas, Essam Mansour, Mourad Ouzzani, Paolo Papotti, Jorge Arnulfo Quiane´-Ruiz, Nan Tang, Saravanan Thirumuruganathan, and Anis Troudi. 2018. RHEEM: Enabling cross platform data processing. In Proceedings of the VLDB Endowment, Vol. 11. 1414–1427.Google ScholarDigital Library
- Pavel Brazdil, Christophe Giraud Carrier, Carlos Soares, and Ricardo Vilalta. 2008. Metalearning: Applications to data mining. Springer Science & Business Media.Google Scholar
- Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. Advances in Neural Information Processing Systems (2015).Google Scholar
- Manuel Fritz, Michael Behringer, and Holger Schwarz. 2020. LOG-Means: Efficiently Estimating the Number of Clusters in Large Datasets. Proceedings of the VLDB Endowment 13, 11 (2020), 2118 – 2131.Google ScholarDigital Library
- Manuel Fritz, Osama Muazzen, Michael Behringer, and Holger Schwarz. 2019. ASAP-DM: a framework for automatic selection of analytic platforms for data mining. In Software-Intensive Cyber-Physical Systems. Springer Berlin Heidelberg.Google Scholar
- Manuel Fritz, Dennis Tschechlov, and Holger Schwarz. 2020. Learning from past observations: Meta-learning for efficient clustering analyses. In Lecture Notes in Computer Science, Vol. 12393 LNCS. 364–379.Google Scholar
- Manuel Fritz, Dennis Tschechlov, and Holger Schwarz. 2021. Efficient Exploratory Clustering Analyses with Qualitative Approximations. In International Conference on Extending Database Technology (EDBT).Google Scholar
- Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P Grosvenor, Allen Clement, and Steven Hand. 2015. Musketeer: All for one, one for all in data processing systems. In Proceedings of the 10th European Conference on Computer Systems, EuroSys 2015.Google ScholarDigital Library
- Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D B Tsai, Manish Amde, Sean Owen, and Others. 2016. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research 17, 1 (2016), 1235–1241.Google ScholarDigital Library
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and Others. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 10 (2011), 2825–2830.Google ScholarDigital Library
- Dennis Tschechlov, Manuel Fritz, and Holger Schwarz. 2021. AutoML4Clust: Efficient AutoML for Clustering Analyses. In International Conference on Extending Database Technology (EDBT).Google Scholar
- Joaquin Vanschoren. 2011. Meta-learning architectures: Collecting, organizing and exploiting meta-knowledge. Studies in Computational Intelligence 358 (2011).Google Scholar
Recommendations
Big Data Open Source Platforms
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataIn a global market the capacity to mine and analyze user data is one way for companies to be as close in time and accuracy to the needs of their users. Big Data Platforms are one solution for companies to solve the necessary challenges to accomplish ...
Influencing factors of mobile instant messaging applications between single- and multi- platform use cases
Highlights- With the advance of handheld devices, a number of application service provider provided desktop and mobile versions, and users may use either or both of ...
AbstractThe purpose of this study is to investigate the factors that influence the usage intention for different instant messaging application platforms. This study targeted the widespread instant messaging software LINE, for which a survey of ...
From Big Data to Big Data Mining: Challenges, Issues, and Opportunities
Proceedings of the 18th International Conference on Database Systems for Advanced Applications - Volume 7827While "big data" has become a highlighted buzzword since last year, "big data mining", i.e., mining from big data, has almost immediately followed up as an emerging, interrelated research area. This paper provides an overview of big data mining and ...
Comments