Abstract
In data science, the process of development focuses on the improvement of methods for individual data analytical tasks. However, their combination is not properly researched. We believe that this situation is caused by a missing framework, that would focus solely on data analytical tasks, instead of complicated transformation between individual methods. In this paper, a new analytical algebra is defined. This algebra is based on a flat structure of transaction file and operations over it. As a part of the paper, definitions of several data analytical tasks are proposed. Algebra is recursive and extendable. As an example of usability of the algebra, one complex analytical task created by a combination of analytical operators is described.
This research has been supported by the GACR project No. GA19-02033S.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB. vol. 1215, pp. 487–499 (1994)
Aryabarzan, N., Minaei-Bidgoli, B., Teshnehlab, M.: negFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst. Appl. 105, 129–143 (2018)
Chakraborty, T., Dalmia, A., Mukherjee, A., Ganguly, N.: Metrics for community analysis: a survey. ACM Comput. Surv. (CSUR) 50(4), 1–37 (2017)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An E cient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB Conference, Athens, Greece, pp. 426–435. Citeseer (1997)
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3569–3573 (2014). http://jmlr.org/papers/v15/fournierviger14a.html
Gupta, M.K., Chandra, P.: A comparative study of clustering algorithms. In: 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 801–805. IEEE (2019)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29, 1–12 (2000)
Kluyver, T., et al.: Jupyter notebooks-a publishing format for reproducible computational workflows. In: ELPUB, pp. 87–90 (2016)
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Data sets. Cambridge university press, New York (2020)
MATLAB: version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts (2010)
Mitzenmacher, M., Pagh, R., Pham, N.: Efficient estimation for high similarities using odd sketches. In: Proceedings of the 23rd International Conference on World Wide web, pp. 109–118 (2014)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 647–652. ACM (2004)
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Systems 36(4), 721–733 (2011)
Pei, J., et al.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of 17th International Conference on Data Engineering, pp. 215–224. IEEE (2001)
Peschel, J., Zezula, P.: ADAMiSS: advanced data analysis, mining and search, system. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 351–355. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_31
Plantié, M., Crampes, M.: Survey on social community detection. In: Ramzan, N., van Zwol, R., Lee, J.S., Clúver, K., Hua, X.S. (eds.) Social Media Retrieval. CCN, pp. 65–85. Springer, Lodon (2013). https://doi.org/10.1007/978-1-4471-4555-4_4
Schubert, E., Zimek, A.: ELKI: a large open-source library for data analysis - ELKI release 0.7.5 “heidelberg”. CoRR abs/1902.03616 (2019). http://arxiv.org/abs/1902.03616
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140
Team, R.C., et al.: R: a language and environment for statistical computing (2013)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 721–724. IEEE (2002)
Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Peschel, J., Batko, M., Zezula, P. (2020). Algebra for Complex Analysis of Data. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-59003-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)