Abstract
To improve the accuracy and computational efficiency of the MapReduce distributed parallel computing framework, thereby mining the diagnosis and treatment data of Kashin-Beck Disease (KBD) of the knee joint. Based on the shortcomings of the traditional K-means Clustering Algorithm (KCA), a simplified method for distance calculation was proposed. The Manhattan distance was used instead of Euclidean distance. Further improvement strategies were proposed to implement and compare KCA of MapReduce (MR-KCA) and Improved MR-KCA (IMR-KCA). With the same data, the sum of squared errors of MR-KCA and IMR-KCA decreased with the increase in the number of center points. Compared with MR-KCA, the quality of IMR-KCA was higher, and their difference was especially evident at 8 GB data capacity. The total execution time of both MR-KCA and IMR-KCA increased with the increase in the number of center points. Compared to MR-KCA, the total execution time of IMR-KCA was significantly reduced, especially when the data capacity was 8 GB. When the number of center points was 5000, IMR-KCA could reduce the total execution time by 50%. Through experiments, IMR-KCA was proved to better present the diagnosis and treatment data of patients with knee joint KBD. The scalability rates of MR-KCA and IMR-KCA decreased as the number of nodes increased, but the scalability rates of both algorithms could be maintained above 0.80, which had better scalability. Compared with MR-KCA, IMR-KCA had significantly higher scalability. The IMR-KCA proposed in this study had high accuracy and computing efficiency, which could be used in the visualization of KBD diagnosis and treatment.
Similar content being viewed by others
References
Shi XW, Zhang F, Li ZY et al (2018) Polymorphism in rs2229783 of the alpha 1(XI) collagen gene is associated with susceptibility to but not severity of Kashin-Beck disease in a Northwest Chinese Han population. Biomed Environ Sci Bes 31(4):322–326
Liu H M, Wang Y F, Wu J M, et al. (2020) A comparative study of clinical effect of total knee arthroplasty in the treatment of primary osteoarthritis and osteoarthritis of Kashin-Beck disease. Int Orthop pp 1–8
Ma M, Liang X, Wang X et al (2020) The molecular mechanism study of COMP involved in the articular cartilage damage of Kashin-Beck disease. Bone Joint Res 9(9):578–586
Li Y, Kang P, Zhou Z et al (2020) Magnetic resonance imaging at 7.0 T for evaluation of early lesions of epiphyseal plate and epiphyseal end in a rat model of KashinBeck disease. BMC musculoskelet disord 21(1):1–9
Wu F, Xu J, Zhu Z (2018) Protective effect of tetrandrine in a rabbit model of osteoarthritis. Arch Rheumatol 33(1):80–84
Yang L, Wang D, Li X et al (2020) Comparison of the responsiveness of the WOMAC and the 12-item WHODAS in patients with Kashin–Beck disease. BMC Musculoskelet Disord 21(1):188
Bendechache M, Tari AK, Kechadi MT (2019) Parallel and distributed clustering framework for big spatial data mining. Int J Parallel Emergent Distrib Syst 34(6):671–689
Shakeel PM, Baskar S, Dhulipala VRS et al (2018) Cloud based framework for diagnosis of diabetes mellitus using K-means clustering[J]. Health Inf Sci Syst 6(1):16
Ding H, Sun C, Zeng J (2020) Fuzzy weighted clustering method for numerical attributes of communication big data based on cloud computing. Symmetry 12(4):530
Rathee S, Kashyap A (2018) Adaptive-miner: an efficient distributed association rule mining algorithm on spark. J Big Data 5(1):6
Sardar TH, Ansari Z (2018) An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm. Future Comput Inform J 3(2):200–209
Feng X, Gao J (2019) Gene sequence input formatting and MapReduce computing. Int J Bioautom 23(2):233
Ding D, Han QL, Wang Z et al (2019) A survey on model-based distributed control and filtering for industrial cyber-physical systems. IEEE Trans Industr Inf 15(5):2483–2499
Chen X, Liu Z, Kim I (2020) A parallel computing framework for solving user equilibrium problem on computer clusters. Transportmetrica A: Transport Sci 16(3):550–573
Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Future Comput Inform J 3(2):247–261
Lee S, Kang S, Kim J et al (2019) Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster. Cluster Comput 22(1):2063–2087
Zhang H, Wu Y (2018) Optimization and application of clustering algorithm in community discovery. Wireless Pers Commun 102(4):2443–2454
Xiao B, Wang Z, Liu Q et al (2018) SMK-means: an improved mini batch k-means algorithm based on mapreduce with big data. Comput Mater Continua 56(3):365–379
Chen C, Li K, Ouyang A et al (2018) Gflink: An in-memory computing architecture on heterogeneous CPU-GPU clusters for big data[J]. IEEE Trans Parallel Distrib Syst 29(6):1275–1288
Qiu Z, Chen R, Yan M (2020) Monitoring data analysis technology of smart grid based on cloud computing. MS&E 750(1):012221
Acknowledgements
This work was supported by Fund of Gansu Health Care Research Plan (GSWSKY-2019-12).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dang, C., Yi, G., Zhu, Z. et al. MapReduce distributed parallel computing framework for diagnosis and treatment of knee joint Kashin-Beck disease. J Supercomput 77, 9088–9101 (2021). https://doi.org/10.1007/s11227-020-03608-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03608-0