Abstract
Although research on outlier detection methods has been an investigation area for long, few of those studies relate to an Internet of Things (IoT) domain. Several critical decisions taken on daily business operations depend on various data collected over time. Therefore, it is mandatory to guarantee its correctness, integrity, and accuracy before any further processing can commence. Outliers are often assumed to be Error by most algorithms in the past, which is always attributed to faulty sensors. Hence, this assumption has been investigated and results show that outliers can be classified into Error and Event types with the support of a Non-parametric sequence-based learning algorithm. The event type outlier is majorly caused by abnormality from sensor readings, which are very important and should not be ignored. However, the non-parametric sequence approach and other existing techniques still find it elusive to detect outliers in the global search space of a large dataset. Therefore, this paper proposes an Enhanced Non-parametric sequence learning algorithm based on Ensemble Clustering Techniques to detect Event and Error outliers in large datasets. Experiments are conducted on six different datasets from the UCL repository, except one collected from a laboratory testbed, to demonstrate the robustness and effectiveness of the proposed approach over the existing techniques. The results show a remarkable performance rate of 96.653% accuracy, 94.284% precision, and 98.112% for Error outlier detection. It also performs better in Event outlier detection with 87.611% accuracy, 71.141% precision and 85.755% specificity with 1291 s execution time.
Similar content being viewed by others
References
Kamal S, Ramadan RA, El-Refai F (2016) Smart outlier detection of wireless sensor network. Electron Energies 29:383–393
Graham B (2019) Frozen speed sensors may be blame for the Russian plane crash that killed 71 people. News.com.au Publishing website. https://www.news.com.au/travel/travel-updates/incidents/. Accessed Nov 2019
Deng X, Jiang P, Peng X, Mi C (2019) An intelligent outlier detection method with one class support tucker machine and genetic algorithm toward big sensor data in Internet of Things. IEEE Trans Ind Electron 66:4672–4683
Nesa N, Ghosh T (2018) IndrajitBanejee: non-parametric sequence-based learning approach for outlier detection in IoT. Future Gener Comput Syst 82:412–421
Erfani SM, Rajasegarar S, Karunasekera S, Leckie C (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn 58:121–134
Lyu L, Jin J, Rajasegarar S, He X, Palaniswami M (2017) Fog-empoered anomaly detection in IoT using hyperellipsoidal clustering. IEEE Internet Things J 4:1174–1184
Zhang R, Ji P, Mylaraswamy D, Srivastava M, Zahedi S (2013) Cooperative sensor anomaly detection using global information. Tsinghua Sci Technol 18:209–219
Yu T, Wang X, Shami A (2017) Recursive principal component analysis-based data outlier detection and sensor data aggregation in IoT systems. IEEE Internet Things J 4:2207–2216
Kong X, Chang J, Niu M, Huang X, Wang J, Chang SI (2018) Research on real time feature extraction method for complex manufacturing big data. Int J Adv Manuf Technol 99:1101–1108
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Yu J, Rui Y, Chen B (2014) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimed 16(1):159–168
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 1–14 (In Press)
Razzak I, Zafar K, Imran M, Xu G (2020) Randomized nonlinear one-class support vector machines with bounded loss function to detect outliers for large scale IoT data. Future Gener Comput Syst 112:715–723
Simar L, Wilson PW (2015) Statistical approaches for non-parametric frontier models: a guided tour. Int Stat Rev 83:77–110
Shahid N, Naqvi IH, Qaisar SB (2015) Characteristics and classification of outlier detection techniques for wireless sensor networks in harsh environments: a survey. Artif Intell Rev 43:193–228
Zhang Y, Ren J, Liu J, Xu C, Guo H, Liu Y (2017) A survey on emerging computing paradigms for big data. Chin J Electron 26:1–12
Thuc KX, Insoo K (2011) A collaborative event detection scheme using fuzzy logic in clustered wireless sensor networks. AEU Int J Electron Commun 65:485–488
Ayadi A, Ghorbel O, Obeid AM, Abid M (2017) Outlier detection approaches for wireless sensor networks: a survey. Comput Netw 129:319–333
Fan H, Zaïane OR, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19:31–51
Tsai C-H, Chang C-L, Chen L (2003) Applying grey relational analysis to the vendor evaluation model. Int J Comput Internet Manag 11:45–53
Modi K, Oza B (2016) Outlier analysis approaches in data mining. Int J Innov Res Technol 3:6–12
Zhao Z, Chen W, Wu X, Chen PC, Liu J (2017) LSTM network: a deep learning approach for short-term traffic forecast. IET Intell Transp Syst J 11:68–75
Ajitha P, Chandra E (2015) A survey on outliers detection in distributed data mining for big data. J Basic Appl Sci Res 5:31–38
Fan H, Zaïane OR, Foss A, Wu J (2006) A nonparametric outlier detection for effectively discovering Top-N outliers from engineering data. In: Pacific-Asia conference on knowledge discovery and data mining, pp 557–566
Shekhar S, Lu CT, Zhang P (2003) A unified approach to detecting spatial outliers. GeoInform J 7:139–166
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42:2785–2797
Manogaran G, Vijayakumar V, Varatharajan R, Kumar PM, Sundarasekar R, Hsu CH (2018) Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering. Wirel Personal Commun 102:2099–2116
Yang MS, Lai CY, Lin CY (2012) A robust EM clustering algorithm for Gaussian mixture models. Pattern Recogn 45:3950–3961
Wang Y, Chen W, Zhang J, Dong T, Shan G, Chi X (2011) Efficient volume exploration using the Gaussian mixture model. IEEE Trans Vis Comput Gr 17:1560–1573
Das D (2018) Time and space complexity of algorithm: asymptotic notation. https://www.csetutor.com/time-complexity-and-space-complexity-of-an-algorithm/. Accessed 01 Dec 2019
Dua D, Taniskidou EK (2018) UCI Machine learning Repository. University of California, Ivine, School of Information and Computer Sciences. https://archive.ics.uci.edu/ml/index.php. Accessed 04 Dec 2019
Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126
de Souza PS, dos Santos Marques W, Rossi FD, da Cunha Rodrigues G, Calheiros RN (2017) Calheiros performance and accuracy trade-off analysis of techniques for anomaly detection in IoT sensors. In: International conference on information networking, Da Nang, Vietnam, pp 486–491
Fonollosa J, Sheik S, Huerta R, Marco S (2015) Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors Actuators 215:618–629
Burgués J, Jiménez-Soto JM, Marco S (2018) Estimation of limit of detection in semiconductor gas sensors through linearized calibration models. Anal Chem Acta 1013:13–25
Anguita D, Ghio A, Oneto L (2013) A public domain dataset for human activity recognition using smartphones. In: European symposium on artificial neural networks, computational intelligence and machine learning, Bruges Belgium, pp 437–442
Raafat HM, Hossain MS, Essa E, Elmougy S, Tolba AS, Muhammad G, Ghoneim A (2017) Fog intelligence for real-time IoT sensor data abalytics. IEEE Access 5:24062–240069
Aljawarneh SA, Vangipuram R (2020) GARUDA: Gaussian dissimilarity measure for feature representation and anomaly detection in Internet of Things. J Supercomput 76:4376–4413
Luo T, Nagarajan SG (2018) Distributed anomaly detection using autoencoder neural networks in WNS for IoT. In: IEEE international conference on communications, Kansas City, MO, USA, pp 1–6
Bandyopadhyay S, Ukil A, Puri C, Singh R, Bose T, Pal A (2016) SensIPro: smart sensor analytics for Internet of Things. In: IEEE symposium on computers and communications (ISCC), Messina Italy, pp 1–7
Sharma V, You I, Kumar R (2017) Isma: intelligent sensing model for anomalies detection in cross platform OSNs with a case study on IoT. IEEE Access 5:3284–3301
Acknowledgement
Special thanks to Neshreen Nesa, a Research Scholar in the Department of Information Technology at IIEST Shibpur, India. For are tremendous support in providing a detailed explanation and deliverables of the existing research work which she co-authored and happens to be the lead author. Our gratitude also goes to the rest co-authors (Tania Ghosh and Indrajit Banerjee) of the existing research work, which our current study is based on. Also, acknowledging SwapanShakhari for retrieving the laboratory dataset utilized in this study. Appreciation also goes to the Research Management Centre (RMC) Universiti Teknologi Malaysia for their support with the research grant (Q.J130000.2451.07G48), in collaboration with the Research Management Centre of Universiti Tun Hussein Onn Malaysia (K028) and Universiti Teknikal Malaysia Melaka (PJP/2018/FTMK-CACT/CRG/S01649).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Edje, A.E., Abd Latiff, S.M. & Chan, H.W. Enhanced Non-parametric Sequence-based Learning Algorithm for Outlier Detection in the Internet of Things. Neural Process Lett 53, 1889–1919 (2021). https://doi.org/10.1007/s11063-021-10473-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10473-2