Queries for Data Analysis

Siebes, Arno

doi:10.1007/978-3-642-34156-4_3

Arno Siebes¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7619))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

Abstract

If we view data as a set of queries with an answer, what would a model be? In this paper we explore this question. The motivation is that there are more and more kinds of data that have to be analysed. Data of such a diverse nature that it is not easy to define precisely what data analysis actually is. Since all these different types of data share one characteristic – they can be queried – it seems natural to base a notion of data analysis on this characteristic.

The discussion in this paper is preliminary at best. There is no attempt made to connect the basic ideas to other – well known – foundations of data analysis. Rather, it just explores some simple consequences of its central tenet: data is a set of queries with their answer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Data Science: An Introduction

Is Data Science More Than Statistics? The Bigger Picture

Lessons from the Philosophy of Science to Data Mining and Vice Versa

References

Calders, T., Goethals, B.: Mining All Non-derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Chapter Google Scholar
Cilibrasi, R., Vitányi, P.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3) (2007)
Google Scholar
Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)
Article MATH Google Scholar
Grünwald, P.D.: Minimum description length tutorial. In: Grünwald, P.D., Myung, I.J. (eds.) Advances in Minimum Description Length. MIT Press (2005)
Google Scholar
Hand, D.J.: Statistics and the theory of measurement. Journal of the Royal Statistical Society. Series A 159(3), 445–492 (1996)
Google Scholar
Mac Lane, S.: Categories for the Working Mathematician. Springer (1971)
Google Scholar
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications. Springer (1993)
Google Scholar
Lloyd, J.W.: Logic for Learning. Springer (2003)
Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. In: Data Mining and Knowledge Discovery, pp. 241–258 (1997)
Google Scholar
Meijer, E., Bierman, G.M.: A co-relational model of data for large shared data banks. Commun. ACM 54(4), 49–58 (2011)
Article Google Scholar
Nies, A.: Computability and Randomness. Oxford University Press (2009)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Google Scholar
Pei, J., Tung, A.K.H., Han, J.: Fault tolerant pattern mining: Problems and challenges. In: DMKD (2001)
Google Scholar
Pierce, B.C.: Types and Programming Languages. MIT Press (2002)
Google Scholar
Siebes, A., Kersten, R.: A structure function for transaction data. In: Proc. SIAM conf. on Data Mining (2011)
Google Scholar
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Proc. SIAM Conf. Data Mining, pp. 393–404 (2006)
Google Scholar
Spivak, D.I.: Functorial data migration. Information and Computation 217, 31–51 (2012)
Article MathSciNet MATH Google Scholar
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Chapter Google Scholar
Vreeken, J., Siebes, A.: Filling in the blanks - krimp minimization for missing data. In: Proceedings of the IEEE International Conference on Data Mining (2008)
Google Scholar
Webb, G.I.: Self-sufficient itemsets: An approach to screening potentially interesting associations between items. ACM Transactions on Knowledge Discovery from Data 4(1) (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Algorithmic Data Analysis Group, Universiteit Utrecht, The Netherlands
Arno Siebes

Authors

Arno Siebes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Jaakko Hollmén
Department of Computer Science, Ostfalia University of Applied Sciences, Salzdahlumer Straße 46/48, 38302, Wolfenbüttel, Germany
Frank Klawonn
School of Information Systems, Computing and Mathematics, Brunel University, UB8 3PH, Uxbridge, Middlesex, UK
Allan Tucker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siebes, A. (2012). Queries for Data Analysis. In: Hollmén, J., Klawonn, F., Tucker, A. (eds) Advances in Intelligent Data Analysis XI. IDA 2012. Lecture Notes in Computer Science, vol 7619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34156-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-34156-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34155-7
Online ISBN: 978-3-642-34156-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics