Abstract
With the booming of social media and health informatics, there exists a pressing need for a powerful tool to sustain comprehensive analysis of public and personal health information. In particular, it should be able (1) to maximize the discovery of association rules amongst data items and (2) to handle the rapid growing data scale. The FP-Growth algorithm is a salient association rule learning method in exploring potential relation in database possibly with a lack of priori knowledge. It has the merits of low time & space complexity, whereas it cannot handle negative association rules which is necessary in comprehensive mining of health data. In order to enable comprehensive discovery of association rules, this study extends the FP-Growth algorithm to mine both positive and negative frequent patterns, namely the PNFP-Growth framework. The extended approach also adopts a prune strategy to filter out misleading patterns to the most by correlating the negative data items and the positive ones. Experiments had been performed to evaluate the performance of the PNFP-Growth over a public data set and a database consisting of thousands of people’s real health examination information (collected within 5 years from the date of this publication). The results indicate that (1) the PNFP-Growth can excavate more patterns than the traditional counterpart does while it is still highly efficient, and (2) the analysis upon the health examination data is informative and well complies with the clinical practices, e.g., more than 30 % people suffering from hypertension are having high systolic pressure and liver problems.
Similar content being viewed by others
References
Agrawal R (1993) Mining association rules between sets of items in large databases. SIGMOD Rec 22(2):207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International conference on very large data bases, pp 487–499
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. ACM SIGMOD Rec 26(2):265–276
Chao HY (2013) Mining association rules between abnormal health examination results and outpatient medical records. HIM J 42(2):23–30
Chen D, Li X, Wang L, Khan SU et al (2015) Fast and scalable multi-way analysis of massive neural data. IEEE Trans Comput 64(3):707–719
Chen D, Hu Y, Cai C, Zeng K et al (2016a) Brain big data processing with massively parallel computing technology: challenges and opportunities. Software: Practice and Experience. doi:10.1002/spe.2418
Chen D, Hu Y, Wang L, Zomaya A et al (2016b) H-PARAFAC: hierarchical parallel factor analysis of multidimensional big data. IEEE T Parall Distr PP(99):1–13. doi:10.1109/TPDS.2016.2613054
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE T Knowl Data En 8(6):866–883
Cong F, Phan AH (2012) Benefits of multi-domain feature of mismatch negativity extracted by non-negative tensor factorization from EEG collected by low-density array. Int J Neural Syst 22(6):1415–1428
Cong F, Zhou G et al (2014) Low-rank approximation based nonnegative multi-way array decomposition on event-related potentials. INT J Neural Syst 24(8):1440,005
Cong F, Lin Q et al (2015) Tensor decomposition of EEG signals: a brief review. J Neurosci Meth 248:59–69
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29 (2):1–12
He D, Zeadally S et al (2015) Certificateless public auditing scheme for cloud-assisted wireless body area networks. IEEE Syst J. doi:10.1109/JSYST.2015.2428620
He D, Zeadally S et al (2016) Anonymous authentication for wireless body area networks with provable security. IEEE Syst J. doi:10.1109/JSYST.2016.2544805
Jiang J, Hu R, Wang Z, Han Z, Ma J (2016) Facial image hallucination through coupled-layer neighbor embedding. IEEE T Circ Syst Vid 26(9):1674–1684
Lu X, Yuan Y, Zheng X (2013) Image super-resolution via double sparsity regularized manifold learning. IEEE T Circ Syst Vid 23(12):2022–2033
Lu X, Yuan Y, Yan P (2014) Alternatively constrained dictionary learning for image super-resolution. IEEE Trans on Cybernetics 44(3):366–377
Ma Y, Wu H, Wang L, Huang B, Ranjan R, Zomaya A, Jie W (2015) Remote sensing big data computing: challenges and opportunities. Future Gener Comp Sy 51:47–60
Shintani T, Kitsuregawa M (1999) Parallel mining algorithms for generalized association rules with classification hierarchy. SIGMOD Rec 27(2):25–36
Song W, Liu P, Wang L (2016) Sparse representation-based correlation analysis of non-stationary spatiotemporal big data. Int J Digital Earth:1–22
Wang K, Tang L, Han J, Liu J (2002) Top down fp-growth for association rule mining. In: Lect Notes Artif Int, Pacific-Asia conference, PAKDD 2002, Taipei, Taiwan, May 6–8, 2002, proceedings, pp 334–340
Wang L, Song W, Liu P (2016) Link the remote sensing big data to the image features via wavelet transformation. Clust Comput 19(2):793–810
Wang Z, Liao J et al (2014) Achieving k-barrier coverage in hybrid directional sensor networks. IEEE T Mobile Comput 13(7):1443–1455
Wang Z, Liao J et al (2015) Friendbook: a semantic-based friend recommendation system for social networks. IEEE T Mobile Comput 14(3):538–551
Wei X, Luo X, Li Q, Zhang J (2015) Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE T Fuzzy Syst 23(1):72–84
Wu X, Zhang C, Zhang S (2004) Efficient mining of both positive and negative association rules. ACM T Inform Syst 22(3):381– 405
Xu Z, Wei X et al (2015) Knowle: a semantic link network based system for organizing large scale online news events. Future Gener Comp Sy 4344:40–50
Xu Z, Liu Y et al (2016a) The mobile media based emergency management of web events influence in cyber-physical space. Wireless Pers Commun:1–14. doi:10.1007/s11277-016-3689-7
Xu Z, Luo X et al (2016b) From latency, through outbreak, to decline: detecting different states of emergency events using web resources. IEEE Transactions on Big Data. doi:10.1109/TBDATA.2016.2599935
Xu Z, Wei X et al (2016c) Building the search pattern of web users using conceptual semantic space model. Int J Web Grid Serv 12(3):328–347
Xu Z, Zhang H et al (2016d) Building knowledge base of urban emergency events based on crowdsourcing of social media. CONCURR Comp-Pract E 28(15):4038– 4052
Zaki MJ (2000) Scalable algorithms for association mining. IEEE T Knowl Data En 12(3):372–390
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Nos. 61272314, 81402760), Fundamental Research Funds for the Central Universities (Nos. 2042015kf1009, 211410100028 (WHU), CCNU16A02020(CCNU)), Science & Technology Supporting Program in Hubei province (No. 2015BAA113), Humanities and Social Sciences Foundation of the Ministry of Education (No. 14YJAZH005), the Natural Science Foundation of Jiangsu Province (No. BK20161563).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wang, B., Chen, D., Shi, B. et al. Comprehensive Association Rules Mining of Health Examination Data with an Extended FP-Growth Method. Mobile Netw Appl 22, 267–274 (2017). https://doi.org/10.1007/s11036-016-0793-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-016-0793-6