Abstract
With the development of the medical insurance industry in China, medical insurance data with complex, multidimensional and interdisciplinary feature are extremely increasing. How to mine the potential value from the vast amounts of data and improve the efficiency of data analysis are topical issues in the study of data mining. This paper presents an improved LOF Outlier Detection Algorithm — GdiLOF, an algorithm which reduces dataset by removing the normal data and introduces information entropy to improve the accuracy of the LOF algorithm. Platform adaptability is analyzed by running it on Hadoop platform. The experimental results show that GdiLOF algorithm has high efficiency and the accuracy is 6 percentage points higher than LOF algorithm. And it also run better in the Hadoop distributed platforms, as well as having obvious advantages in processing huge amounts of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
General Office of the State Council of the People’s Republic of China. http://www.zgylbx.com/gfwswecpnew107541_1/. Accessed 27 Apr 2016
Dhar, V.: Data Science and Prediction. Commun. ACM 56(12), 64–73 (2012)
Tao, H.: Research and application of data mining technology in medical insurance. University of Science and Technology of China, USTC (2015)
Li, C.H., Sun, Z.: GridOF: efficient outlier detection algorithm for large-scale data sets. J. Comput. Res. Dev. 40(11), 1586–1592 (2003)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 3–55 (1948)
Wang, Y.F., Zhang, C.H., Zhang, B.B., et al.: Review of data cleaning research. New Technol. Libr. Inf. Serv. 12, 50–56 (2007)
Su, X., Tsai, C.L.: Outlier detection. Wiley Interdisc. Rev. Data Min. Knowl. Dis. 1(3), 261–268 (2011)
Sloane, N., Wyner, A.: A Mathematical Theory of Communication, pp. 379–423. Wiley-IEEE Press, New York (2009)
Xie, L., Li, G., Xiao, M., et al.: Novel classification method for remote sensing images based on information entropy discretization algorithm and vector space model. Comput. Geosci. 89, 252–259 (2015)
Breunig, M.M., Kriegel, H.P., Ng, R.T., et al.: LOF: identifying density-based local outliers. ACM SIGMOD Rec. 29(2), 93–104 (2000)
Wang, X.X., Huang, L.W.: Research and improvement of GridLOF algorithm in data mining. Modern Computer (2007)
Chen, W.M.: Research and improvement of outlier mining algorithm based on GridLOF, Sun Yat-sen University (2007)
Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)
Jiang, F., Sui, Y., Cao, C.: An information entropy-based approach to outlier detection in rough sets. Expert Syst. Appl. 37(9), 6338–6344 (2010)
Acknowledgement
This work is supported by the National Science Foundation of China (Grant Nos. 61502082) and the Fundamental Research Funds for the Central Universities, ZYGX2014J065.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Xie, Z., Li, X., Wu, W., Zhang, X. (2016). An Improved Outlier Detection Algorithm to Medical Insurance. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2016. IDEAL 2016. Lecture Notes in Computer Science(), vol 9937. Springer, Cham. https://doi.org/10.1007/978-3-319-46257-8_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-46257-8_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46256-1
Online ISBN: 978-3-319-46257-8
eBook Packages: Computer ScienceComputer Science (R0)