Abstract
If we view data as a set of queries with an answer, what would a model be? In this paper we explore this question. The motivation is that there are more and more kinds of data that have to be analysed. Data of such a diverse nature that it is not easy to define precisely what data analysis actually is. Since all these different types of data share one characteristic – they can be queried – it seems natural to base a notion of data analysis on this characteristic.
The discussion in this paper is preliminary at best. There is no attempt made to connect the basic ideas to other – well known – foundations of data analysis. Rather, it just explores some simple consequences of its central tenet: data is a set of queries with their answer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Calders, T., Goethals, B.: Mining All Non-derivable Frequent Itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Cilibrasi, R., Vitányi, P.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3) (2007)
Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)
Grünwald, P.D.: Minimum description length tutorial. In: Grünwald, P.D., Myung, I.J. (eds.) Advances in Minimum Description Length. MIT Press (2005)
Hand, D.J.: Statistics and the theory of measurement. Journal of the Royal Statistical Society. Series A 159(3), 445–492 (1996)
Mac Lane, S.: Categories for the Working Mathematician. Springer (1971)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications. Springer (1993)
Lloyd, J.W.: Logic for Learning. Springer (2003)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. In: Data Mining and Knowledge Discovery, pp. 241–258 (1997)
Meijer, E., Bierman, G.M.: A co-relational model of data for large shared data banks. Commun. ACM 54(4), 49–58 (2011)
Nies, A.: Computability and Randomness. Oxford University Press (2009)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Pei, J., Tung, A.K.H., Han, J.: Fault tolerant pattern mining: Problems and challenges. In: DMKD (2001)
Pierce, B.C.: Types and Programming Languages. MIT Press (2002)
Siebes, A., Kersten, R.: A structure function for transaction data. In: Proc. SIAM conf. on Data Mining (2011)
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: Proc. SIAM Conf. Data Mining, pp. 393–404 (2006)
Spivak, D.I.: Functorial data migration. Information and Computation 217, 31–51 (2012)
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression Picks Item Sets That Matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Vreeken, J., Siebes, A.: Filling in the blanks - krimp minimization for missing data. In: Proceedings of the IEEE International Conference on Data Mining (2008)
Webb, G.I.: Self-sufficient itemsets: An approach to screening potentially interesting associations between items. ACM Transactions on Knowledge Discovery from Data 4(1) (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Siebes, A. (2012). Queries for Data Analysis. In: Hollmén, J., Klawonn, F., Tucker, A. (eds) Advances in Intelligent Data Analysis XI. IDA 2012. Lecture Notes in Computer Science, vol 7619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34156-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-34156-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34155-7
Online ISBN: 978-3-642-34156-4
eBook Packages: Computer ScienceComputer Science (R0)