Abstract
Data reduction is a technique used in big data applications. Volume, velocity, and variety of data bring in time and space complexity problems to computation. While there are several approaches used for data reduction, dimension reduction and redundancy removal are among common approaches. In those approaches, data are treated as points in a large space. This paper considers the scenario of analyzing a topic for which similar multi-dimensional data are available from different sources. The problem can be stated as data reduction by source selection. This paper examines distance correlation (DC) as a technique for determining similar data sources. For demonstration, COVID-19 in the United States of America (US) is considered as the topic of analysis as it is a topic of considerable interest. Data reported by the states of US are considered as data sources. We define and use a variation of concordance for validation analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bollen, K.A.: Multiple indicators: internal consistency or no necessary relationship? Qual. Quant. 18, 377–385 (1984)
Chaudhuri, A., Hu, W.: A fast algorithm for computing distance correlation. Comput. Stat. Data Anal. 135, 15–24 (2019)
Cronbach, L.J.: Coefficient alpha and the internal structure of tests. Psychometrika 16, 297–334 (1951)
Mamon, R.S., Elliot, R.J. (eds.): Hidden Markov Models in Finance. Springer, Heidelberg (2007). https://doi.org/10.1007/0-387-71163-5
Habib ur Rehman, M., Liew, C.S., Adbas, A., Jayraman, P.P.: Big data reduction methods: a survey, data science and engineering, January 2017
Szekely, G.J., Rizzo, M.L., Bakirov, N.K.: Measuring and testing dependence by correlation of distance. Ann. Stat. 35(6), 2769–2794 (2007)
Uhm, D., Jun, S., Lee, S.: A classification method using data reduction. Int. J. Fuzzy Log. Intell. Syst. 12(1), 1–5 (2012)
Weng, J., Young, D.S.: Some dimension reduction strategies for the analysis of survey data. J. Big Data 4, 43 (2017). https://doi.org/10.1186/s40537-017-0103-6
Zhou, Z.: Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Ser. 33(3), 438–457 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
George, K.M. (2021). Data Reduction with Distance Correlation. In: Hong, TP., Wojtkiewicz, K., Chawuthai, R., Sitek, P. (eds) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2021. Communications in Computer and Information Science, vol 1371. Springer, Singapore. https://doi.org/10.1007/978-981-16-1685-3_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-1685-3_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1684-6
Online ISBN: 978-981-16-1685-3
eBook Packages: Computer ScienceComputer Science (R0)