skip to main content
research-article

EXPERIENCE: Algorithms and Case Study for Explaining Repairs with Uniform Profiles over IoT Data

Published: 27 April 2021 Publication History

Abstract

IoT data with timestamps are often found with outliers, such as GPS trajectories or sensor readings. While existing systems mostly focus on detecting temporal outliers without explanations and repairs, a decision maker may be more interested in the cause of the outlier appearance such that subsequent actions would be taken, e.g., cleaning unreliable readings or repairing broken devices or adopting a strategy for data repairs. Such outlier detection, explanation, and repairs are expected to be performed in either offline (batch) or online modes (over streaming IoT data with timestamps). In this work, we present TsClean, a new prototype system for detecting and repairing outliers with explanations over IoT data. The framework defines uniform profiles to explain the outliers detected by various algorithms, including the outliers with variant time intervals, and take approaches to repair outliers. Both batch and streaming processing are supported in a uniform framework. In particular, by varying the block size, it provides a tradeoff between computing the accurate results and approximating with efficient incremental computation. In this article, we present several case studies of applying TsClean in industry, e.g., how this framework works in detecting and repairing outliers over excavator water temperature data, and how to get reasonable explanations and repairs for the detected outliers in tracking excavators.

References

[1]
Subutai Ahmad, Alexander Lavin, Scott Purdy, and Zuha Agha. 2017. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262 (2017), 134–147.
[2]
G. E. P. Box and G. M. Jenkins. 2010. Time series analysis : forecasting and control. Journal of Time 31, 3 (2010).
[3]
D. R. K. Brownrigg. 1984. The weighted median filter. Commun. ACM 27, 8 (1984), 807–818.
[4]
Wei Cao, Yusong Gao, Bingchen Lin, Xiaojie Feng, Yu Xie, Xiao Lou, and Peng Wang. 2018. TcpRT: Instrument and diagnostic analysis system for service quality of cloud databases at massive scale in real-time. In Proceedings of the International Conference on Management of Data (SIGMOD’18). 615–627.
[5]
Liang-Chieh Chen, Tsung-Ting Kuo, Wei-Chi Lai, Shou-De Lin, and Chi-Hung Tsai. 2012. Prediction-based outlier detection with explanations. In Proceedings of the IEEE International Conference on Granular Computing (GrC’12). 44–49.
[6]
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series Classification Archive. Retrieved from www.cs.ucr.edu/∼eamonn/time_series_data/.
[7]
Javier Contreras, Rosario Espinola, Francisco J. Nogales, and Antonio J. Conejo. 2003. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 18, 3 (2003), 1014–1020.
[8]
Xuan-Hong Dang, Barbora Micenková, Ira Assent, and Raymond T. Ng. 2013. Local outlier detection with interpretation. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’13). 304–320.
[9]
Yijun Duan, Adam Jatowt, and Katsumi Tanaka. 2019. Discovering latent threads in entity histories. Data Sci. Eng. 4, 4 (2019), 336–351.
[10]
Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier Detection for Temporal Data. Morgan & Claypool Publishers.
[11]
Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2250–2267.
[12]
Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, and Christos Faloutsos. 2018. Beyond outlier detection: LookOut for pictorial explanation. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’18). 122–138.
[13]
Riyaz Ahamed Ariyaluran Habeeb, Fariza Nasaruddin, Abdullah Gani, Ibrahim Abaker Targio Hashem, Ejaz Ahmed, and Muhammad Imran. 2019. Real-time big data processing for anomaly detection: A survey. Int. J. Inf. Manag. 45 (2019), 289–307.
[14]
Ruihong Huang, Zhiwei Chen, Zhicheng Liu, Shaoxu Song, and Jianmin Wang. 2019. TsOutlier: Explaining outliers with uniform profiles over IoT data. In Proceedings of the IEEE International Conference on Big Data (Big Data’19). 2024–2027.
[15]
J. Stuart Hunter. 1986. The exponentially weighted moving average. J. Qual. Technol. 18, 4 (1986), 203–210.
[16]
Shawn R. Jeffery, Gustavo Alonso, Michael J. Franklin, Wei Hong, and Jennifer Widom. 2006. Declarative support for sensor data cleaning. In Proceedings of the 4th International Conference on Pervasive Computing (PERVASIVE’06). 83–100.
[17]
Shawn R. Jeffery, Minos N. Garofalakis, and Michael J. Franklin. 2006. Adaptive cleaning for RFID data streams. In Proceedings of the 32nd International Conference on Very Large Data Bases. 163–174. Retrieved from http://dl.acm.org/citation.cfm?id=1164143.
[18]
Aimad Karkouch, Hajar Mousannif, Hassan Al Moatassime, and Thomas Noël. 2016. Data quality in internet of things: A state-of-the-art survey. J. Netw. Comput. Applic. 73 (2016), 57–81.
[19]
Eamonn J. Keogh, Selina Chu, David M. Hart, and Michael J. Pazzani. 2001. An online algorithm for segmenting time series. In Proceedings of the IEEE International Conference on Data Mining. 289–296.
[20]
Shou-De Lin and Hans Chalupsky. 2008. Discovering and explaining abnormal nodes in semantic graphs. IEEE Trans. Knowl. Data Eng. 20, 8 (2008), 1039–1052.
[21]
Manuel Mejía-Lavalle and Atlántida Sánchez Vivar. 2009. Outlier detection with explanation facility. In Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM’09). 454–464.
[22]
Barbora Micenková, Raymond T. Ng, Xuan-Hong Dang, and Ira Assent. 2013. Explaining outliers by subspace separability. In Proceedings of the IEEE 13th International Conference on Data Mining. 518–527.
[23]
Eduardo H. M. Pena, Marcos V. O. de Assis, and Mario Lemes Proença Jr. 2013. Anomaly detection using forecasting methods ARIMA and HWDS. In Proceedings of the 32nd International Conference of the Chilean Computer Science Society (SCCC’13). 63–66.
[24]
Umaa Rebbapragada, Pavlos Protopapas, Carla E. Brodley, and Charles R. Alcock. 2009. Finding anomalous periodic time series. Mach. Learn. 74, 3 (2009), 281–313.
[25]
Siwoon Son, Myeong-Seon Gil, and Yang-Sae Moon. 2017. Anomaly detection for big log data using a Hadoop ecosystem. In Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp’17). 377–380.
[26]
Fei Song, Yanlei Diao, Jesse Read, Arnaud Stiegler, and Albert Bifet. 2018. EXAD: A system for explainable anomaly detection on big data traces. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDM Workshops). 1435–1440.
[27]
Shaoxu Song, Lei Chen, and Hong Cheng. 2014. Efficient determination of distance thresholds for differential dependencies. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2179–2192.
[28]
Shaoxu Song, Chunping Li, and Xiaoquan Zhang. 2015. Turn waste into wealth: On simultaneous clustering and cleaning over dirty data. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 1115–1124.
[29]
Shaoxu Song, Aoqian Zhang, Jianmin Wang, and Philip S. Yu. 2015. SCREEN: Stream data cleaning under speed constraints. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 827–841.
[30]
Aoqian Zhang, Shaoxu Song, and Jianmin Wang. 2016. Sequential data cleaning: A statistical approach. In Proceedings of the International Conference on Management of Data. 909–924.
[31]
Aoqian Zhang, Shaoxu Song, Jianmin Wang, and Philip S. Yu. 2017. Time series data cleaning: From anomaly detection to anomaly repairing. PVLDB 10, 10 (2017), 1046–1057.

Index Terms

  1. EXPERIENCE: Algorithms and Case Study for Explaining Repairs with Uniform Profiles over IoT Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal of Data and Information Quality
      Journal of Data and Information Quality  Volume 13, Issue 3
      September 2021
      117 pages
      ISSN:1936-1955
      EISSN:1936-1963
      DOI:10.1145/3460503
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 April 2021
      Accepted: 01 November 2020
      Revised: 01 October 2020
      Received: 01 February 2020
      Published in JDIQ Volume 13, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Outlier explanation
      2. outlier repairs
      3. data profiling
      4. time series

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Key Research and Development Plan
      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 143
        Total Downloads
      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media