Algebra for Complex Analysis of Data

Peschel, Jakub; Batko, Michal; Zezula, Pavel

doi:10.1007/978-3-030-59003-1_12

Jakub Peschel¹³,
Michal Batko¹³ &
Pavel Zezula¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12391))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

934 Accesses
1 Citations

Abstract

In data science, the process of development focuses on the improvement of methods for individual data analytical tasks. However, their combination is not properly researched. We believe that this situation is caused by a missing framework, that would focus solely on data analytical tasks, instead of complicated transformation between individual methods. In this paper, a new analytical algebra is defined. This algebra is based on a flat structure of transaction file and operations over it. As a part of the paper, definitions of several data analytical tasks are proposed. Algebra is recursive and extendable. As an example of usability of the algebra, one complex analytical task created by a combination of analytical operators is described.

This research has been supported by the GACR project No. GA19-02033S.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2
Book MATH Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB. vol. 1215, pp. 487–499 (1994)
Google Scholar
Aryabarzan, N., Minaei-Bidgoli, B., Teshnehlab, M.: negFIN: an efficient algorithm for fast mining frequent itemsets. Expert Syst. Appl. 105, 129–143 (2018)
Article Google Scholar
Chakraborty, T., Dalmia, A., Mukherjee, A., Ganguly, N.: Metrics for community analysis: a survey. ACM Comput. Surv. (CSUR) 50(4), 1–37 (2017)
Article Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An E cient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB Conference, Athens, Greece, pp. 426–435. Citeseer (1997)
Google Scholar
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3569–3573 (2014). http://jmlr.org/papers/v15/fournierviger14a.html
Gupta, M.K., Chandra, P.: A comparative study of clustering algorithms. In: 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 801–805. IEEE (2019)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29, 1–12 (2000)
Google Scholar
Kluyver, T., et al.: Jupyter notebooks-a publishing format for reproducible computational workflows. In: ELPUB, pp. 87–90 (2016)
Google Scholar
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Data sets. Cambridge university press, New York (2020)
Google Scholar
MATLAB: version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts (2010)
Google Scholar
Mitzenmacher, M., Pagh, R., Pham, N.: Efficient estimation for high similarities using odd sketches. In: Proceedings of the 23rd International Conference on World Wide web, pp. 109–118 (2014)
Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 647–652. ACM (2004)
Google Scholar
Novak, D., Batko, M., Zezula, P.: Metric index: an efficient and scalable solution for precise and approximate similarity search. Inf. Systems 36(4), 721–733 (2011)
Article Google Scholar
Pei, J., et al.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of 17th International Conference on Data Engineering, pp. 215–224. IEEE (2001)
Google Scholar
Peschel, J., Zezula, P.: ADAMiSS: advanced data analysis, mining and search, system. In: Amato, G., Gennaro, C., Oria, V., Radovanović, M. (eds.) SISAP 2019. LNCS, vol. 11807, pp. 351–355. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32047-8_31
Chapter Google Scholar
Plantié, M., Crampes, M.: Survey on social community detection. In: Ramzan, N., van Zwol, R., Lee, J.S., Clúver, K., Hua, X.S. (eds.) Social Media Retrieval. CCN, pp. 65–85. Springer, Lodon (2013). https://doi.org/10.1007/978-1-4471-4555-4_4
Chapter Google Scholar
Schubert, E., Zimek, A.: ELKI: a large open-source library for data analysis - ELKI release 0.7.5 “heidelberg”. CoRR abs/1902.03616 (2019). http://arxiv.org/abs/1902.03616
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140
Chapter Google Scholar
Team, R.C., et al.: R: a language and environment for statistical computing (2013)
Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, pp. 721–724. IEEE (2002)
Google Scholar
Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1–2), 31–60 (2001)
Article Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Masaryk University, Brno, Czech Republic
Jakub Peschel, Michal Batko & Pavel Zezula

Authors

Jakub Peschel
View author publications
You can also search for this author in PubMed Google Scholar
Michal Batko
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Zezula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Peschel .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
IFS, Vienna University of Technology, Vienna, Wien, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peschel, J., Batko, M., Zezula, P. (2020). Algebra for Complex Analysis of Data. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-59003-1_12
Published: 14 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics