Abstract
The paper presents minimum variance patterns: a new class of itemsets and rules for numerical data, which capture arbitrary continuous relationships between numerical attributes without the need for discretization. The approach is based on finding polynomials over sets of attributes whose variance, in a given dataset, is close to zero. Sets of attributes for which such functions exist are considered interesting. Further, two types of rules are introduced, which help extract understandable relationships from such itemsets. Efficient algorithms for mining minimum variance patterns are presented and verified experimentally.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Zimek, A.: Deriving quantitative models for correlation clusters. In: KDD, Philadelphia, PA, pp. 4–13 (2006)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD, pp. 207–216 (1993)
Aumann, Y., Lindell, Y.: A statistical theory for quantitative association rules. In: Proc. of ACM-SIGKDD 1999, San Diego, CA, pp. 261–270 (1999)
Ku, L.-P., Liu, B., Hsu, W.: Discovering interesting holes in data. In: International Joint Conference on Artificial Intelligence (IJCAI 1997), pp. 930–935 (1997)
Besson, J., Robardet, C., De Raedt, L., Boulicaut, J.-F.: Mining bi-sets in numerical data. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 9–19. Springer, Heidelberg (2007)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: SIGMOD, pp. 265–276 (1997)
Calders, T., Goethals, B., Jaroszewicz, S.: Mining rank-correlated sets of numerical attributes. In: KDD, pp. 96–105 (2006)
Calders, T., Jaroszewicz, S.: Efficient auc optimization for classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 42–53. Springer, Heidelberg (2007)
Dzeroski, S., Todorovski, L.: Discovering dynamics: From inductive logic programming to machine discovery. J. of Intelligent Inform. Systems 4, 89–108 (1995)
Anderson, E., et al.: LAPACK Users’ Guide. SIAM, Philadelphia (1999)
Fletcher, R.: Practical Methods of Optimization. Wiley, Chichester (2000)
Georgii, E., Richter, L., Rückert, U., Kramer, S.: Analyzing microarray data using quantitative association rules. Bioinformatics 21(2), ii1–ii8 (2005)
Healy, M.: Matrices for Statistics. Oxford University Press, Oxford (2000)
Jaroszewicz, S.: Polynomial association rules with applications to logistic regression. In: KDD, pp. 586–591 (2006)
Jaroszewicz, S., Korzeń, M.: Approximating representations for continuous data. In: SIAM’DM, pp. 521–526 (2007)
Karel, F.: Quantitative and ordinal association rules mining (qar mining). In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4251, pp. 195–202. Springer, Heidelberg (2006)
Wikipedia: Kepler’s laws of planetary motion (retrieved February 26, 2007), http://en.wikipedia.org/wiki/Kepler’s_laws_of_planetary_motion
Langley, P., Simon, H., Bradshaw, G., Zytkow, J.: Scientific Discovery. Computational Exploration of the Creative Process. MIT Press, Cambridge (1987)
Rückert, U., Kramer, S.: A statistical approach to rule learning. In: International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, June 2006, pp. 785–792 (2006)
Rückert, U., Richter, L., Kramer, S.: Quantitative association rules based on half-spaces: An optimization approach. In: ICDM, pp. 507–510 (2004)
Schneider, J.: The extrasolar planets encyclopaedia, http://exoplanet.eu
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: ACM SIGMOD Conf. on Management of Data, pp. 1–12 (1996)
Steinbach, M., Tan, P.-N., Xiong, H., Kumar, V.: Generalizing the notion of support. In: KDD 2004, Seattle, WA, August 2004, pp. 689–694 (2004)
van Loan, C.F., Golub, G.H.: Matrix Computations. Johns Hopkins University Press (1996)
Zhang, H., Padmanabhan, B., Tuzhilin, A.: On the discovery of significant statistical quantitative rules. In: KDD, pp. 374–383 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jaroszewicz, S. (2008). Minimum Variance Associations — Discovering Relationships in Numerical Data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)