Abstract
Correlation analysis is an effective mechanism for studying patterns in data and making predictions. Many interesting discoveries have been made by formulating correlations in seemingly unrelated data. We propose an algorithm to quantify the theory of correlations and to give an intuitive, more accurate correlation coefficient. We propose a predictive metric to calculate correlations between paired values, known as the general rank-based correlation coefficient. It fulfills the five basic criteria of a predictive metric: independence from sample size, value between −1 and 1, measuring the degree of monotonicity, insensitivity to outliers, and intuitive demonstration. Furthermore, the metric has been validated by performing experiments using a real-time dataset and random number simulations. Mathematical derivations of the proposed equations have also been provided. We have compared it to Spearman’s rank correlation coefficient. The comparison results show that the proposed metric fares better than the existing metric on all the predictive metric criteria.
Similar content being viewed by others
References
Chaudhuri B, Bhattacharya A, 2001. On correlation between two fuzzy sets. Fuzzy Sets Syst, 118(3):447–456. https://doi.org/10.1016/S0165-0114(98)00347-9
Chen H, Chiang RHL, Storey VC, 2012. Business intelligence and analytics: from big data to big impact. MIS Q, 36(4):1165–1188.
Chen N, Xu Z, Xia M, 2013. Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis. Appl Math Model, 37(4):2197–2211. https://doi.org/10.1016/j.apm.2012.04.031
Davenport T, Barth P, Bean R, 2013. How ‘Big Data’ is Different. https://doi.org/sloanreview.mit.edu/article/how-bigdata-is-different/
Deufemia V, Giordano M, Polese G, et al., 2014. A visual language-based system for extraction-transformationloading development. Softw Pract Exper, 44(12):1417–1440. https://doi.org/10.1002/spe.2201
Devarajan S, 2013. Africa’s statistical tragedy. Rev Income Wealth, 59(S1):9–15. https://doi.org/10.1111/roiw.12013
Didelez V, Pigeot I, 2001. Judea Pearl: causality: models, reasoning, and inference. PVS, 42(2):313–315. https://doi.org/10.1007/s11615-001-0048-3
Ginsberg J, Mohebbi MH, Patel RS, et al., 2009. Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012–1014. https://doi.org/10.1038/nature07634
Granville V, 2014. Developing analytic talent: becoming a data scientist. John Wiley & Sons, Inc., Indianapolis, USA.
Gratton G, Kolotilin A, 2015. Euclidean fairness and efficiency. Econ Inq, 53(3):1689–1690. https://doi.org/10.1111/ecin.12193
Hauke J, Kossowski T, 2011. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaest Geograph, 30(2):87–93. https://doi.org/10.2478/v10117-011-0021-1
Hong DH, 2006. Fuzzy measures for a correlation coefficient of fuzzy numbers under TW(the weakest t-norm)-based fuzzy arithmetic operations. Inform Sci, 176(2):150–160. https://doi.org/10.1016/j.ins.2004.11.005
Hung WL, 2001. Using statistical viewpoint in developing correlation of intuitionistic fuzzy sets. Int J Uncert Fuzz Knowl Based Syst, 9(4):509–516. https://doi.org/10.1142/S0218488501000910
Huo X, Székely GJ, 2016. Fast computing for distance covariance. Technometrics, 58(4):435–447. https://doi.org/10.1080/00401706.2015.1054435
Kitano H, 2002. Systems biology: a brief overview. Science, 295(5560):1662–1664. https://doi.org/10.1126/science.1069492
Kong J, Klein BEK, Klein R, et al., 2012. Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality. PNAS, 109(50):20352–20357. https://doi.org/10.1073%2Fpnas.1217269109
Li R, Zhong W, Zhu L, 2012. Feature screening via distance correlation learning. J Am Stat Assoc, 107(499):1129–1139. https://doi.org/10.1080/01621459.2012.695654
Liao H, Xu Z, Zeng X, et al., 2015a. Qualitative decision making with correlation coefficients of hesitant fuzzy linguistic term sets. Knowl Based Syst, 76:127–138. https://doi.org/10.1016/j.knosys.2014.12.009
Liao H, Xu Z, Zeng X, 2015b. Novel correlation coefficients between hesitant fuzzy sets and their application in decision making. Knowl Based Syst, 82:115–127. https://doi.org/10.1016/j.knosys.2015.02.020
Linden G, Smith B, York J, 2003. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Intern Comput, 7(1):76–80. https://doi.org/10.1109/MIC.2003.1167344
Liu S, Kao C, 2002. Fuzzy measures for correlation coefficient of fuzzy numbers. Fuzzy Sets Syst, 128(2):267–275. https://doi.org/10.1016/S0165-0114(01)00199-3
Lyons R, 2013. Distance covariance in metric spaces. Ann Probab, 41(5):3284–3305. https://doi.org/10.1214/12-AOP803
McGregor C, 2013. Big data in neonatal intensive care. Computer, 46(6):54–59. https://doi.org/10.1109/MC.2013.157
Mitchell HB, 2004. A correlation coefficient for intuitionistic fuzzy sets. Int J Intell Syst, 19(5):483–490. https://doi.org/10.1002/int.20004
Murthy CA, Pal SK, Majumder DD, 1985. Correlation between two fuzzy membership functions. Fuzzy Sets Syst, 17(1):23–38. https://doi.org/10.1016/0165-0114(85)90004-1
Reshef DN, Reshef YA, Finucane HK,et al., 2011. Detecting novel associations in large data sets. Science, 334(6062):1518–1524. https://doi.org/10.1126/science.1205438
Ritala P, Golnam A, Wegmann A, 2014. Coopetition-based business models: the case of Amazon.com. Ind Mark Manag, 43(2):236–249. https://doi.org/10.1016/j.indmarman.2013.11.005
Sen A, Dacin PA, Pattichis C, 2006. Current trends in web data analysis. Commun ACM, 49(11):85–91. https://doi.org/10.1145/1167838.1167842
Susantitaphong P, Cruz DN, Cerda J, et al., 2013. World incidence of AKI: a meta-analysis. ClinJ Am Soc Nephrol, 8(9):1482–1493. https://doi.org/10.2215/CJN.00710113
Székely GJ, Rizzo ML, 2012. On the uniqueness of distance covariance. Stat Probab Lett, 82(12):2278–2282. https://doi.org/10.1016/j.spl.2012.08.007
Volpone SD, Tonidandel S, Avery DR, et al., 2015. Exploring the use of credit scores in selection processes: beware of adverse impact. J Bus Psychol, 30(2):357–372. https://doi.org/10.1007/s10869-014-9366-5
World Bank, 2012. World Development Indicators 2012. World Development Indicators, Washington DC, USA. https://doi.org/openknowledge.worldbank.org/handle/10986/6014
Xiao C, Ye J, Esteves R, et al., 2015. Using Spearman’s correlation coefficients for exploratory data analysis on big dataset. Concurr Comput Pract Exp, 28(14):3866–3878. https://doi.org/10.1002/cpe.3745
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pandove, D., Goel, S. & Rani, R. An intuitive general rank-based correlation coefficient. Frontiers Inf Technol Electronic Eng 19, 699–711 (2018). https://doi.org/10.1631/FITEE.1601549
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1601549
Key words
- General rank-based correlation coefficient
- Multivariate analysis
- Predictive metric
- Spearman’s rank correlation coefficient