Abstract
On-line statistical and machine learning analytic tasks over large-scale contextual data streams coming from e.g., wireless sensor networks, Internet of Things environments, have gained high popularity nowadays due to their significance in knowledge extraction, regression and classification tasks, and, more generally, in making sense from large-scale streaming data. The quality of the received contextual information, however, impacts predictive analytics tasks especially when dealing with uncertain data, outliers data, and data containing missing values. Low quality of received contextual data significantly spoils the progressive inference and on-line statistical reasoning tasks, thus, bias is introduced in the induced knowledge, e.g., classification and decision making. To alleviate such situation, which is not so rare in real time contextual information processing systems, we propose a progressive time-optimized data quality-aware mechanism, which attempts to deliver contextual information of high quality to predictive analytics engines by progressively introducing a certain controlled delay. Such a mechanism progressively delivers high quality data as much as possible, thus eliminating possible biases in knowledge extraction and predictive analysis tasks. We propose an analytical model for this mechanism and show the benefits stem from this approach through comprehensive experimental evaluation and comparative assessment with quality-unaware methods over real sensory multivariate contextual data.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abbott D (2014) Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst, (1 ed.). Wiley Publishing
Awang A et al (2007) RIMBAMON: A forest monitoring system using wireless sensor networks. In: ICIAS, pp 1101–1106
Zervas E et al (2011) Multisensor data fusion for fire detection. Inform Fusion, Elsevier 12(3):1566–2535
Nittel S (2009) A Survey of geosensor networks: Advances in dynamic environmental monitoring. Sensors 9:5664–5678
Xu G et al (2014) Applications Of wireless sensor networks in marine environment monitoring: a survey. Sensors 14(9):16932–16954
Su X et al (2011) Using classifier-based nominal imputation to improve machine learning. In: 15th PAKDD, Part I, LNAI 6634, pp 124–135
Farhangfar A et al (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recogn 41(12):3692–3705
Enders CK (2010) Applied Missing data analysis. Guilford Press, NY
Anagnostopoulos C, Triantafillou P (2014) Scaling out big data missing value imputations: pythia vs. godzilla. In: 20th ACM SIGKDD (KDD ’14), pp 651–660
Hall DL, McMullen SAH (2004) Mathematical techniques in multisensor data fusion, Second. Artech House, Norwood
Das S (2008) High-Level Data fusion. Artech House Publishers, Norwood
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A survey of context modelling and reasoning techniques. Pervasive Mob Comput 6(2):161– 180
Jong-yi H, Eui-ho S, Sung-Jin K (2009) Context-aware systems: A literature review and classification. Expert Syst Appl 36(4):8509–8522
Henricksen K, Indulska J (2006) Developing context-aware pervasive computing applications: Models and approach. Pervasive Mob Comput 2(1):37–64
Ye J, Dobson S, McKeever S (2012) Situation identification techniques in pervasive computing: A review. Pervasive Mob Comput 8(1):36–66
Anagnostopoulos C, Ntarladimas Y, Hadjiefthymiades S (2007) Situational computing: An innovative architecture with imprecise reasoning. J Syst Softw 80(12):1993–2014
Anagnostopoulos C, Hadjiefthymiades S (2008) Enhancing situation-aware systems through imprecise reasoning. IEEE Trans Mob Comput 7(10):1153–1168
Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2015) A Time optimized scheme for top-k list maintenance over incomplete data streams. Inform Sci 311, C:59–73
Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2015) An efficient time optimized scheme for progressive analytics in big data. Big Data Res 2(4):155–165
Eidson GW et al (2009) The South carolina digital Watershed: end-to-end support for realtime management of water resources, Proc. 4th Intl. Symposium on Innovations and Real-time Applications of Distributed Sensor Networks (IRADSN 09), 2010, USA
Xia HB et al (2009) Design of water environment data monitoring node based on ZigBee technology. Proc. Intl. Conference on Computational Intelligence and Software Engineering (CiSE 09), 1–4
Nguyen N et al (2010) A Real-time control using wireless sensor network for intelligent energy management system in buildings. Proc. IEEE Worskshop on Environmental Energy and Structural Monitoring Systems (EESMS 10), 87–92
Oliveira LM, Rodrigues JJ (2011) Wireless Sensor networks: a survey on environmental monitoring. J Commun 6(2):143–151
Kim J-J et al (2010) Wireless monitoring of indoor air quality by a sensor network. Indoor Built Environ 19(1):145–150
Kelley K et al (2012) On effect size. Psychol Methods 17(2):137–152
Little R, Rubin D (2002) Statistical Analysis with Missing Data, Wiley Series in Probability and Statistics
Peskir G, Shiryaev A (2006) Optimal Stopping and Free-Boundary problems, Ed. 1, Lectures in Mathematics, ETH Zuerich, Birkhauser Basel
Shiryaev A (2007) Optimal stopping rules, series: Stochastic modelling and applied probability, vol. 8 springer
Daskalakis C et al (2012) Learning poisson binomial distributions. In: 44th ACM STOC ’12, pp 709–728
Tomas C (2006) Exponential smoothing for irregular data. Appl Math 51(6):597–604
Rousseeuw PJ, Croux C (1993) Alternatives to the Median Absolute Deviation. J Am Stat Assoc 88(424)
Vergara A, Vembu S, Ayhan T, Ryan MA, Homer ML, Huerta R (2012) Chemical gas sensor drift compensation using classifier ensembles. Sensors Actuators B Chem 166:320–329
Rodriguez-Lujan I, Fonollosa J, Vergara A, Homer M, Huerta R (2014) On the calibration of sensor arrays for pattern recognition using the minimal number of experiments. Chemometr Intell Lab Syst 130:123–134
Anagnostopoulos C, Kolomvatsos K, Hadjiefthymiades S (2015) Time-optimised user grouping in location based services. Comput Netw, Elsevier 81:220–244
Kolomvatsos K, Anagnostopoulos C, Hadjiefthymiades S (2014) An efficient recommendation system based on the optimal stopping theory. Expert Syst Appl, Elsevier 41(15):6796–6806
Anagnostopoulos C, Hadjiefthymiades S (2014) Intelligent trajectory classification for improved movement prediction. IEEE Trans Syst Man Cybern Syst Hum 44(10):1301–1314
Anagnostopoulos C (2014) Time-optimized contextual information forwarding in mobile sensor networks. J Parallel Distrib Comput, Elsevier 74(5):2317–2332
Anagnostopoulos C, Hadjiefthymiades S (2013) Multivariate context collection in mobile sensor networks. Comput Netw, Elsevier 57(6):1394–1407
Anagnostopoulos C, Hadjiefthymiades S, Zervas E (2013) Optimal stopping of the context collection process in mobile sensor networks. In: IEEE 24Th international symposium on personal, indoor and mobile radio communications (PIMRC), london, UK, pp 8–11
Delakouridis C, Anagnostopoulos C (2013) On enhancement of ’share the secret’ scheme for location privacy. In: 9th International Workshop on Security and Trust Management (STM 2013), England, UK, pp 09–13
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Anagnostopoulos, C. Quality-optimized predictive analytics. Appl Intell 45, 1034–1046 (2016). https://doi.org/10.1007/s10489-016-0807-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0807-x