Abstract
Nowadays, the amount of data being collected and stored has been constantly increasing. Data come from different sources such as various devices, sensors, networks, transactional applications, web and social media. Conventional technologies and methods are not able to store and analyze such amount of data. In this paper, a comparative analysis of the existing data mining systems is performed and it shows that the most of existing data mining solutions are not appropriate to solve Big Data problems. In order to bring conventional data mining to a new level and to cope with challenges of massive and complex data of different nature, requirements for data mining systems suitable for Big Data are derived.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barker, A., Van Hemert, J.I.: Scientific workflow: A survey and research directions. PPAM 4967, 746–753 (2008). doi:10.1007/978-3-540-68111-3_78
Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.A.: Biocatalogue: A universal catalogue of web services for the life sciences. Nucleic Acid Res. 38 (2010). doi:10.1093/nar/gkq394
Birant, D.: Service-oriented data mining (2011). doi:10.5772/14066
Cerezo, N., Montagnat, J., Blay-Fornarino, M.: Computer-assisted scientific workflow design. J. Grid Comput. 11(3), 585–612 (2013). doi:10.1007/s10723-013-9264-5
Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues, and opportunities. In: Database Systems for Advanced Applications, Lecture Notes in Computer Science, pp. 1–15. Springer (2013). doi:10.1007/978-3-642-40270-8
Chen, X., Ye, Y., Williams, G., Xu, X.: A survey of open source data mining systems. Emerg. Technol. Knowl. Discov. Data Min. 4819, 3–14 (2007). doi:10.1007/978-3-540-77018-3_2
Congiusta, A., Talia, D., Trunfio, P.: Service-oriented middleware for distributed data mining on the grid. J. Parallel Distrib. Comput. 68, 3–15 (2008). doi:10.1016/j.jpdc.2007.07.007
Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovič, M., Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L., Žagar, L., Žbontar, J., Žitnik, M., Zupan, B.: Orange: Data mining toolbox in Python. J. Mach. Learn. Res. 14, 2349–2353 (2013). http://jmlr.org/papers/v14/demsar13a.html
De Roure, D., Goble, C., Stevens, R.: The design and realisation of the virtual research environment for social sharing of workflows (2009). doi:10.1016/j.future.2008.06.010
Domenico, T., Paolo, T.: Service-oriented distributed knowledge discovery. Chapman and Hall/CRC (2012)
Foster, I.: Globus toolkit version 4: Software for service-oriented systems. Netw. Parallel Comput. 3779, 2–13 (2005). doi:10.1007/11577188_2
Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010). doi:10.1186/gb-2010-11-8-r86
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278
Heather, K.: Web services conceptual architecture (wsca 1.0). Architecture 5, 6–7 (2001)
Hmida, M.B.H., Slimani, Y.: Meta-learning in grid-based data mining systems. Int. J. Commun. Networks Distrib. Syst. 5(3), 214–228 (2010). 10.5121/ijcnc.2010.2514
Japkowicz, N., Stefanowski, J.: A machine learning perspective on big data analysis. In: Big Data Analysis: New Algorithms for a New Society, pp. 1–31. Springer (2016). doi:10.1007/978-3-319-26989-4
Jovic, A., Brkic, K., Bogunovic, N.: An overview of free software tools for general data mining. In: 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2014) 11(3), 1112–1117 (2014). doi:10.1109/MIPRO.2014.6859735
Kranjc, J., Podpecan, V., Lavrac, N.: Clowdflows: A cloud based scientific workflow platform. In: Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol. 7524, pp. 816–819. Springer, Berlin, Heidelberg (2012). doi:10.1007/978-3-642-33486-3
Kranjc, J., Smailovič, J., Podpečan, V., Grčar, M., Žnidaršič, M., Lavrač, N.: Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the clowdflows platform. Inf. Process. Manage. 51(2), 187–203 (2014). doi:10.1016/j.ipm.2014.04.001
Kravtsov, V., Niessen, T., Stankovski, V., Schuster, A.: Service-based resource brokering for grid-based data mining. In: in: Proceedings of the International Conference on Grid Computing and Applications, pp. 163–169 (2006)
Kurasova, O., Marcinkevičius, V., Medvedev, V., Rapečka, A., Stefanovič, P.: Strategies for big data clustering. In: 26th International Conference on Tools with Artificial Intelligence (ICTAI2014), pp. 740–747. IEEE (2014). doi:10.1109/ICTAI.2014.115
Massimo, B., Giuseppe, L., Castellani, M., Cavuoti, S., D’Abrusco, R., Laurino, O.: Dame: A distributed web based framework for knowledge discovery in databases. Metnorie della Soc. Astron. Ital. Suppl. 19, 324–329 (2012)
Meinl, T., Cebron, N., Gabriel, T.R., Dill, F., Kötter, T.: The konstanz information miner 2, (2009). doi:10.1145/1656274.1656280
Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6187 LNCS, pp. 471–481 (2010). doi:10.1007/978-3-642-13818-8
Pattnaik, K., Mishra, B.S.P.: Introduction to big data analysis. In: Techniques and Environments for Big Data Analysis, pp. 1–20. Springer (2016). doi:10.1007/978-3-319-27520-8
Podpečan, V., Zemenova, M., Lavrač, N.: Orange4ws environment for service-oriented data mining. Comput. J. 55, 82–98 (2012). doi:10.1093/comjnl/bxr077
Schmidt, S.: Data is exploding: the 3 versus of big data. Bus. Comput. World 15 (2012)
Stankovski, V., Swain, M., Kravtsov, V., Niessen, T., Wegener, D., Kindermann, J., Dubitzky, W.: Grid-enabling data mining applications with DataMiningGrid: An architectural perspective. Future Gener. Comput. Syst. 24, 259–279 (2008). doi:10.1016/j.future.2007.05.004
Talia, D., Trunfio, P.: How distributed data mining tasks can thrive as knowledge services. Commun. ACM 53, 132–137 (2010). doi:10.1145/1785414.1785451
Talia, D., Trunfio, P., Verta, O.: The weka4ws framework for distributed data mining in service-oriented grids. Concurrency Comput. Pract. Experience 20, 1933–1951 (2008). doi:10.1002/cpe.v20:16
Werner, D.: Data Mining Meets Grid Computing: Time to Dance? John Wiley and Sons. Ltd (2009). doi:10.1002/9780470699904.ch1
White, T.: Hadoop: The definitive guide, vol. 54. O’Reilly Media (2012)
Wojnarski, M., Stawicki, S., Wojnarowski, P.: Tunedit.org: System for automated evaluation of algorithms in repeatable experiments. Rough Sets Current Trends Comput. 6086, 20–29 (2010). doi:10.1007/978-3-642-13529-3_4
Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M.P., Sufi, S., Goble, C.: The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013). doi:10.1093/nar/gkt328
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Medvedev, V., Kurasova, O. (2016). Cloud Technologies: A New Level for Big Data Mining. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-44881-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)