Abstract
We present a platform named Redhyte, short for an interactive platform for “Rapid exploration of data and hypothesis testing”. Redhyte aims to augment the conventional statistical hypothesis testing framework with data-mining techniques in a bid for more wholesome and efficient hypothesis testing. The platform is self-diagnosing (it can detect whether the user is doing a valid statistical test), self-correcting (it can propose and make corrections to the user’s statistical test), and helpful (it can search for promising or interesting hypotheses related to the initial user-specified hypothesis). In Redhyte, hypothesis mining consists of several steps: context mining, mined-hypothesis formulation, mined-hypothesis scoring on interestingness, and statistical adjustments. To capture and evaluate specific aspects of interestingness, we developed and implemented various hypothesis-mining metrics. Redhyte is an R shiny web application and can be found online at https://tohweizhong.shinyapps.io/redhyte, and the source codes are housed in a GitHub repository at https://github.com/tohweizhong/redhyte.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bickel, P., Hammel, E., O’connell, J.: Sex bias in graduate admissions: data from Berkeley. Sci. 187, 398–404 (1975)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Cochran, W.G.: Some methods for strengthening the common \(\chi ^2\) tests. Biometrics 10, 417–451 (1954)
Cox, D.R.: The regression analysis of binary sequences (with discussion). J. R. Stat. Soc. B 20, 215–242 (1958)
Fisher, R.A.: On a distribution yielding the error functions of several well-known statistics. Proc. Int. Congr. Math. 2, 805–813 (1924)
Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2009)
Gosset, W.S.: The probable error of a mean. Biometrika 6, 1–25 (1908)
Ioannidis, J.P.A.: Why most published research findings are false. PLoS Med. 2, e124 (2005)
Liu, G., Suchitra, A., Zhang, H., Feng, M., Ng, S.K., Wong, L.: AssocExplorer: an association rule visualization system for exploratory data analysis. In: Proceedings of 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1536–1539 (2012)
Liu, G., Zhang, H., Wong, L.: A flexible approach to finding representative pattern sets. IEEE Trans. Knowl. Data Eng. 26, 1562–1574 (2014)
Liu, G., Zhang, H., Feng, M., Wong, L., Ng, S.K.: Supporting exploratory hypothesis testing and analysis. ACM Trans. Knowl. Discov. Data 9, Article 31 (2015)
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
Pavlides, M., Perlman, M.: How likely is Simpson’s paradox? Am. Stat. 63, 226–233 (2009)
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5(50), 157–175 (1900)
Poernomo, A.K., Gopalkrishnan, V.: CP-summary: a concise representation for browsing frequent itemsets. In: Proceedings of 12th ACM SIGKDD International Conference on Knowlegde Discovery and Data Mining, pp. 687–696 (2009)
Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965)
Simpson, E.H.: The interpretation of interaction in contingency tables. J. R. Stat. Soc. B 13, 238–241 (1951)
Toh, W.Z.: Redhyte: an interactuve platform for rapid exploration of data and hypothesis testing. Project report, National University of Singapore (2015). http://www.comp.nus.edu.sg/wongls/psZ/tohweizhong-fyp2015.pdf
Wang, C., Parthasarathy, S.: Summarizing itemset patterns using probabilistic models. In: Proceedings of 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 730–735 (2006)
West, M.: Bayesian factor regression models in the “large p, small n” paradigm. Bayesian Stat. 7, 723–732 (2003)
Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Proceedings of 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 314–323 (2005)
Acknowledgements
This work was supported in part by a Singapore Ministry of Education tier-2 grant (MOE2012-T2-1-061) and by NCS Pte Ltd, Singapore.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Toh, W.Z., Choi, K.P., Wong, L. (2016). Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic Platform. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science(), vol 9622. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49390-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-49390-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49389-2
Online ISBN: 978-3-662-49390-8
eBook Packages: Computer ScienceComputer Science (R0)