Abstract
Consider a scenario where one aims to learn models from data being characterized by very large fluctuations that are neither attributable to noise nor outliers. This may be the case, for instance, when predicting the potential future damages of earthquakes or oil spills, or when conducting financial data analysis. If follows that, in such a situation, the standard central limit theorem does not apply, since the associated Gaussian distribution exponentially suppresses large fluctuations. In this paper, we present an analysis of data aggregation and correlation in such scenarios. To this end, we introduce the Lévy, or stable, distribution which is a generalization of the Gaussian distribution. Our theoretical conclusions are illustrated with various simulations, as well as against a benchmarking financial database. We show which specific strategies should be adopted for aggregation, depending on the stability exponent of the Lévy distribution. Our results indicate that the correlation in between two attributes may be underestimated if a Gaussian distribution is erroneously assumed. Secondly, we show that, in the scenario where we aim to learn a set of rules to estimate the level of stability of a stock market, the Lévy distribution produces superior results. Thirdly, we illustrate that, in a multi-relational database mining setting, aggregation using average values may be highly unsuitable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Knobbe, A.J., Siebes, A., Marseille, B.: Involving Aggregate Functions in Multi-Relational Search. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 145–168. Springer, Heidelberg (2002)
Malerba, D.: A relational perspective on spatial data mining. Int. J. Data Mining. Modelling and Management 1(1), 103–118 (2008)
Groot, R.D.: Lévy distribution and long correlation times in supermarket sales. Physica A: Statistical Mechanics and its Applications 353, 501–514 (2005)
Walter, C.: Lévy-stability-under-addition and fractal structure of markets: implications for the investment management industry and emphasized examination of MATIF notional contract. Mathematical and Computer Modelling 29(10-12), 37–56 (1999)
Krogel, M.A., Wrobel, S.: Facets of aggregation approaches to propositionalization. In: The 13th International Conference on Inductive Logic Programming, ILP 2003 (2003)
Guo, H., Viktor, H.L.: Multirelational classification: A multiple view approach. Knowledge and Information Systems 17, 287–312 (2008)
Zliobaite, I., et al.: Next challenges for adaptive learning systems. ACM SIGKDD Explorations Newsletter 14(1), 9 (2012)
Samorodnitsky, G., Taqqu, M.S.: Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman & Hall, New York (1994)
Paulson, A.S., Holcomb, E., Leitch, R.: The estimation of the parameters of the stable law. Biometrica 62(1), 163–170 (1977)
Nolan, J.P., Panorska, A.K., McCulloch, J.H.: Estimation of spectral measures. Mathematical and Computer Modelling 34(9-11), 1113–1122 (2001)
Guo, H., Viktor, H.L., Paquet, E.: Privacy Disclosure and Preserving in Learning with Multi-relational Databases. Journal of Computing Science and Engineering 5(3), 183–196 (2011)
Cheng, B., Rachev, S.: Multivariate Stable Future Prices. Mathematical Finance 5, 133–153 (1995)
Tao, Y., Pei, J., Li, L., Xiao, X., Yi, K., Xing, Z.: Correlation hiding by independence masking. In: IEEE 26th International Conference on Data Engineering, ICDE, pp. 964–967 (2010)
Jafer, Y., Viktor, H.L., Paquet, E.: Aggregation and privacy in multi-relational databases. In: Tenth Annual International Conference on Privacy, Security and Trust, PST, pp. 67–74 (2012)
Lévy Véhel, J., Walter, C.: Les marchés fractals (“The fractal markets”). Presses Universitaires de France, Paris (2002)
Berka, P.: Guide to the Financial Data Set. In: Siebes, A., Berka, P. (eds.) PKDD 2000 Discovery Challenge (2000)
Rinne, H.: The Weibull Distribution: A Handbook. Taylor & Francis Group, Boca Raton (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paquet, E., Viktor, H.L., Guo, H. (2013). Learning in the Presence of Large Fluctuations: A Study of Aggregation and Correlation. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2012. Lecture Notes in Computer Science(), vol 7765. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37382-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-37382-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37381-7
Online ISBN: 978-3-642-37382-4
eBook Packages: Computer ScienceComputer Science (R0)