Abstract
Outlier detection is critical for many applications such as healthcare, health insurance, medical diagnosis, predictive analytics, pattern recognition, intrusion detection, anomaly or defect detection, video surveillance, credit card fraud detection and text mining. Outlier detection techniques could be statistics, distance- or model based. Techniques, which are based on a single method for outlier detection usually have weaknesses and strengths and are mostly unstable. Outlier detection ensembles harness the strengths of individual detectors and result in stable performance. This paper presents a new parameter based growing self-organizing maps ensemble (GSOME) for outlier detection in multivariate patterns. For outlier detection, the proposed GSOME transforms non-linear relationships between high dimensional patterns into a simple 1D geometric relationship. Whatever the pattern dimensionality is, it is mapped to a single point of a line. The dispersion of mapped points will be used to locate the outliers and measure the degree of outlyingness. Several experiments on both real and synthetic data sets show the promising performance of the proposed GSOME.
Similar content being viewed by others
References
Christy, A., MeeraGandhi, G., Vaithyasubramanian, S.: Cluster based outlier detection algorithm for healthcare data. Procedia Comput. Sci. 50, 209–215 (2015)
Muhammad, G.: Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Clust. Comput. 18(2), 795–802 (2015)
Vembandasamy, K., Karthikeyan, T.: Novel outlier detection in diabetics classification using data mining techniques. Int. J. Appl. Eng. Res. 11(2), 1400–1403 (2016)
Hu, L., et al.: Software defined healthcare networks. IEEE Wirel. Commun. 22(6), 67–75 (2015)
Hossain, M.S., Muhammad, G., Alamri, A.: Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimed. Syst. (2017). https://doi.org/10.1007/s00530-017-0561-x
Hossain, M.S., Muhammad, G.: Cloud-assisted industrial internet of things (IIoT)—enabled framework for health monitoring. Comput. Netw. 101(2016), 192–202 (2016)
Hossain, M.S., Muhammad, G.: Cloud-assisted speech and face recognition framework for health monitoring. Mob. Netw. Appl. 20(3), 391–399 (2015)
Hu, Y., Duan, K., Zhang, Y. et al.: Simultaneously aided diagnosis model for outpatient departments via healthcare big data analytics. Multimed Tools Appl. (2016). https://doi.org/10.1007/s11042-016-3719-1
Hauskrecht, M., Batal, I., Hong, C., Nguyen, Q., Cooper, G.E., Visweswaran, S., Clermont, G.: Outlier-based detection of unusual patient-management actions. An ICU study. J. Biomed. Inform. 64, 211–221 (2017)
Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2000), A Workshop at the 14th European Conference on Artificial Intelligence (ECAI-2000), Berlin, Germany, August 20–25 (2000)
Hauskrecht, M., Batal, I., Valko, M., Visweswaran, S., Cooper, G.F., Clermont, G.: Outlier detection for patient monitoring and alerting. J. Biomed. Inf. 46(1), 47–55 (2013). https://doi.org/10.1016/j.jbi.2012.08.004
Ypma, R., Duin, P.W.: Novelty detection using self-organizing maps. In: Kasabov, N., Kozma, R., Ko, K., O’Shea, R., Coghill, G., Gedeon, T. (eds.) Progress in Connectionist-Based Information Systems, vol. 2, pp. 1322–1325. Springer, London (1997)
Banerjee, A., Chandola, V., Lazarevic, A., Kumar, V., Srivastava, J.: Anomaly Detection: A Tutorial. In: SIAM Data Mining Conference, Atlanta, GA (2008)
Song, X., Wu, M., Jermaine, C., Ranka, S.: Conditional anomaly detection. IEEE Trans. Knowl. Data Eng. 19(5), 631–645 (2007)
Olivetti & Oracle Research Laboratory, The Olivetti & Oracle Research Laboratory Face Database of Faces. http://www.cam-orl.co.uk/facedatabase.html
TILDA, Textile defect image database. University of Freiburg, Germany (1996)
Geman, S., et al.: Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992)
Zhang, Y., Meratnia, N., Havinga, P.J.M.: Outlier Detection Techniques for Wireless Sensor Network: A Survey. University of Twente, Enschede (2008)
Ghaemi, R., Sulaiman, M.N., Ibrahim, I., Mustapha, N.: A Survey: Clustering Ensembles Techniques. World Academy of Science, Engineering and Technology, Singapore (2009)
Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: KDD, pp. 157–166 (2005)
Hellerstein, J.M.: Quantitative data cleaning for large databases. http://db.cs.berkeley.edu/jmh/papers/cleaning-unece.pdf (Last visited in 2010)
Hodge, V.J., Austin, J.A.: Survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 85–126 (2004)
Fausette, V.L.: Fundamentals of Neural Networks. Prentice Hall, Upper Saddle River (1993)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S. (Eds.). Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, pp. 103–114. ACM Press, New York (1996)
Ester, M., Kriegel, H-P., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, pp. 226–231 (1996)
Stolfo, S.J., Prodromidis, A.L., Tselepis, S., Lee, W., Fan, D.W., Chan, P.K.: JAM: Java agents for meta-learning over distributed databases. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 74–81 (1997)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont, CA (1984)
Cohen, W.W.: Fast effective rule induction. In: International Conference on Machine Learning, pp. 115–123 (1995)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8, 237–253 (2000)
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
Saunders, R., Gero, J.S.: A curious design agent: a computational model of novelty-seeking behavior in design. In: Proceedings of the Sixth Conference on Computer Aided Architectural Design Research in Asia (CAADRIA2001), Sydney, pp. 725–738(2001a)
Vesanto, J., Himberg, J., Siponen, M., Simula, O.: Enhancing SOM based data visualization. In: Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems. Methodologies for the Conception, Design and Application of Soft Computing, vol. 1, pp. 64–67. Singapore: World Scientific (1998)
Graham, W., Rohan, B., Hongxing, H., Hawkins, S., Gu, L.: A comparative study of RNN for outlier detection in data mining. In: ICDM ’02 Proceedings of the 2002 IEEE International Conference on Data Mining IEEE Computer Society Washington, DC, USA (2002)
Hawkins, S., Hongxing, H., Graham, W., Rohan, B., Baxter, A.: Outlier Detection Using Replicator Neural Networks, DaWaK, pp. 170–180. Springer, New York (2002)
Kohonen, T.: Self-Organizing Maps. Springer, New York (2001)
Jiawei, H., Micheline, K., Pei, P.: Data Mining: Concepts and Techniques, 3rd edn. Elsevier, New York (2010)
Saunders, R., Gero, J.S.: Designing for interest and novelty: motivating design agents. In: Proceedings of CAAD Futures 2001, pp. 725–738. Eindhoven (2001)
Marsland, S.: On-line novelty detection through self-organization, with application to inspection robotics. Ph.D. thesis, Faculty of Science and Engineering, University of Manchester, UK (2001)
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorization. J. Inf. Fusion 6(1), 5–20 (2005)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles. Mach. Learn. 51, 181–207 (2003)
Savdra, C., Salas, R., Moreno, S., Allende, H.: Fusion of self organizing maps. In: Prudhomme et al. (eds.) LNCS 4507, (2007); ISMIS, LNAI 4994 (2008)
Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-Organizing Map in Matlab: the SOM Toolbox. In: Proceedings of the Matlab DSP Conference, pp. 35–40. Espoo, Finland (1999)
Moglu, F., Alpaydin, E.: Combining multiple representations for pen-based handwritten digit recognition. Turk J. Electr. Eng. 9(1) (2001)
Xue, Z., Shang, Y., Feng, A.: Semi-supervised outlier detection based on fuzzy rough C-means clustering. Math Comput. Simul. 80(9) (2010)
Buizza, R., Palmer, T.N.: Impact of Ensemble Size on Ensemble Prediction, European Centre for Medium-Range Weather Forecasts, Reading, Berkshire, UK (1988)
UC Irvine machine learning repository. http://archive.ics.uci.edu/ml/index.html (2010)
Acknowledgements
This work was supported by the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia, through the Research Group Project under Grant RG -1436-023.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Elmougy, S., Hossain, M.S., Tolba, A.S. et al. A parameter based growing ensemble of self-organizing maps for outlier detection in healthcare. Cluster Comput 22 (Suppl 1), 2437–2460 (2019). https://doi.org/10.1007/s10586-017-1327-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1327-0