A method to detect data outliers from smart urban spaces via tensor analysis

https://doi.org/10.1016/j.future.2018.09.062Get rights and content

Highlights

  • A new method to explore the multiway nature of urban spaces data in outliers detection.

  • Our method outperforms the MPCA-based classical method by about 23.5% in accuracy.

  • We used a real large-scale dataset collected from urban sensors.

  • Deep understanding of the dynamics patterns from smart urban spaces.

Abstract

With the increasing amount of data available nowadays, especially in urban spaces, it has become critical extracting knowledge to get insight from all this big data. This need becomes even more important and less obvious to supply when these data have discrepant events (i.e., outliers). Here we propose a method to explore the multiway nature of urban spaces data in outliers detection which includes three stages: (i) dimensionality reduction, where we model data as a 3rd-order tensor; from this reduction, we extract a set of latent factors to obtain the best fit for the next classification step; (ii) classification of latent factors, where the latent factors from the stage (i) are used to generate instances of similar events in monitoring smart urban spaces which result in high-quality clusters from the factorization; and (iii) combining steps (i) and (ii) to generate a refined urban space pattern identification model. We analyzed a real large-scale dataset with valuable data captured and streamed by urban sensors from 4 cities: Elda and Rois (Spain), Nuremberg (Germany), and Tallinn (Estonia). Our results allow us to conclude there is a kind of cyclic time patterns of urban sensing.

Introduction

Urban areas play a relevant role in the trend of environmental variables changes [1]. There are different initiatives around the world which have allowed urban environments to become smarter, e.g., Amsterdam, San Francisco, and Barcelona [2]. In all these cases, the Information and Communication Technologies (ICTs) help to improve the citizen life quality and the efficiency of urban infrastructures. Air quality monitoring, noise levels, waste, public lighting, vehicle traffic, heat islands and other applications related to a smart city vision helps us to understand the new challenges from big urban centers better.

The Internet of Things (IoT) and cyber-physical systems are responsible for developing many smart cities applications [3], [2]. A significant challenge is the problem of monitoring, mining and analyzing the massive and heterogeneous data. The system gathers these data from smart objects and sensor devices. Furthermore, traditional processing techniques and analytical procedures have faced some limited performance in such scenarios [4], [5]. Additionally, the challenge becomes even more significant when such data present deviations from the dataset observed, what we call outliers.

Outliers are observations that appear to be inconsistent with the rest of the dataset [6]. Practical applications of outliers detection in the context of smart cities are broad, such as identifying patterns of unusual events in the urban traffic flow, trends in air quality change, or water quality monitoring [7], [8], [9]. Here we propose a method for the outliers detection that explores the complex dependencies and higher order interactions between space, time, and environmental variables. We focus on a multidimensional configuration to summarize the high-dimension data in tensors [10]. In our environmental modeling, each multivariate structure represents a specific city. Our method includes three stages:

  • 1.

    Dimensionality reduction, where we model the data as a third-order tensor, and we extract a set of latent factors from the reduction, to obtain the best approximation of the information retained for the next classification step;

  • 2.

    Classification of latent factors, where the obtained factors (stage 1) are used for classification to obtain instances of occurrence of similar events in environmental monitoring, thus obtaining high-quality clusters from the factorization; and

  • 3.

    Generation of a refined environmental pattern identification (combining stages 1 and 2) where we apply a process monitoring statistic to detect events outside the normality patterns of the observed dataset.

To perform our evaluation, we used a real large-scale dataset [11], [12] which monitoring environmental variables of the following cities: Elda-Spain, Rois-Spain, Nuremberg-Germany, and Tallinn-Estonia. We found patterns in the semantics of information extracted from the multiway configuration pointing to the cyclical occurrence of temporal patterns revealed by latent factors allowing us to infer about the degree influence of the variables on this behavior, and thus to reveal discrepant events that were once invisible. We performed a comparison with Multiway Principal Component Analysis (MPCA) [13]. This method uses the multidimensional nature of the data and explores the intrinsic relationships between the environmental data dimensions collected in this research.

The main contribution of this paper is combining tensor decomposition with data classification to detect outliers in environmental urban spaces applications providing useful information for the best planning and operation of cities. The tensor decomposition technique explores the multidimensional nature of the data improving the outliers detection, and the classification based method extracts the data standards collected from the application.

This paper is structured as follows: Section 2 introduces the background and the related works; Section 3 shows the description of our proposal; Section 4 discusses the results; and Section 5 concludes the work and details of some future work.

Section snippets

Background and related work

The central concept addressed here is the Tensors or Multidimensional Arrays. We use it to reduce the dimensionality of data analyzed. These arrays are generalizations of scalars, vectors, and matrices for an arbitrary number of indexes, that is, the number of dimensions defines the order of a tensor, also called as forms or modes [10]. According to the variation of multidimensional data ordering, Kolda established the following division:

  • The zero order tensor (x) representing a scalar;

  • The first

Outliers detection through multiway analysis

This section introduces the outliers detection methodology, Fig. 2 shows the proposed method.

In this diagram, based on the one presented by Aquino et al. [40], [42], N represents the environment and the process to be measured. The study restricted to (denoted as “”) E, the time–space domain and topological characteristics of the monitored area. The phenomenon of interest is P, and V is its domain, i.e., V is the set of all possible phenomena. An example of this model is a city (N), with our

Results and discussion

In this study, we have used a real dataset from the Smart Citizen platform (https://smartcitizen.me/) [11], [12]. The Smart Citizen platform provides a 6-tuple with sensing data of temperature, humidity, brightness, noise, carbon monoxide, and nitrogen dioxide. We used o=6 observers in four different cities (Elda and Rois in Spain, Nuremberg in Germany, and Tallinn in Estonia). We used 15 days of data collected (01/July/2017–15/July/2017), which produced n=360 observations discretized in hours.

Conclusion

Here we present a new method for detecting outliers based on combining multiway decomposition technique with multivariate techniques to recognize data patterns collected from smart urban sensors. To this end, we characterized interactions in high dimensionality way through the method of tensor factorization HOSVD to extract simple latent structures, and we combined it with the kmeans classification method. We compared our method (HOSVD + kmeans) with the MPCA multidimensional method also

Acknowledgments

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES), Brazil -Finance Code 001. The authors also acknowledge the financial support of the CNPq, Brazil (Conselho Nacional de Desenvolvimento Científico e Tecnológico-Brasil, processes #432585/2016-8, #311878/2016-4, #404895/2016-6), FAPESP (Fundação de Amparo á Pesquisa do Estado de São Paulo, process #2015/24544-5) and FAPEAL (Fundação de Amparo á Pesquisa do Estado de Alagoas,

Thiago I. A. Souza ( [email protected]) is a PhD candidate at Federal University of Ceara. Thiago obtained his BSc in Physics (2013) and his MSc in Teleinformatics Engineering (2016) both from the Federal University of Ceara, Brazil. His research interests include big data analytics, urban computing and environmental monitoring.

References (45)

  • BigdeliE. et al.

    Incremental anomaly detection using two-layer cluster-based structure

    Inform. Sci.

    (2018)
  • AlamdariM.M. et al.

    A spectral-based clustering for structural health monitoring of the sydney harbour bridge

    Mech. Syst. Signal Process.

    (2017)
  • PiroG. et al.

    Information centric services in smart cities

    J. Syst. Softw.

    (2014)
  • Fanaee-TH. et al.

    Tensor-based anomaly detection: An interdisciplinary survey

    Knowl.-Based Syst.

    (2016)
  • AhmedM. et al.

    A survey of anomaly detection techniques in financial domain

    Future Gener. Comput. Syst.

    (2016)
  • AhmedM. et al.

    A survey of network anomaly detection techniques

    J. Netw. Comput. Appl.

    (2016)
  • AyadiA. et al.

    Outlier detection approaches for wireless sensor networks: A survey

    Comput. Netw.

    (2017)
  • HuangJ. et al.

    A novel outlier cluster detection algorithm without top-n parameter

    Knowl.-Based Syst.

    (2017)
  • OsanaiyeO. et al.

    Distributed denial of service (ddos) resilience in cloud: Review and conceptual cloud ddos mitigation framework

    J. Netw. Comput. Appl.

    (2016)
  • JainA.K.

    Data clustering: 50 years beyond k-means

    Pattern Recognit. Lett.

    (2010)
  • MurA. et al.

    Determination of the optimal number of clusters using a spectral clustering optimization

    Expert Syst. Appl.

    (2016)
  • MillsG.

    Cities as agents of global change

    Int. J. Climatol.

    (2007)
  • Cited by (10)

    View all citing articles on Scopus

    Thiago I. A. Souza ( [email protected]) is a PhD candidate at Federal University of Ceara. Thiago obtained his BSc in Physics (2013) and his MSc in Teleinformatics Engineering (2016) both from the Federal University of Ceara, Brazil. His research interests include big data analytics, urban computing and environmental monitoring.

    Andre L. L. Aquino ( [email protected]) is a Professor at Federal University of Alagoas, Brazil. He received his PhD in Computer Science from the Federal University of Minas Gerais, Brazil, in 2008. His research interests include data reduction, distributed algorithms, wireless ad hoc and sensor networks, mobile and pervasive computing. In addition, he has published several papers in the area of wireless sensor networks.

    Danielo G. Gomes ( [email protected]) is an associate professor at the Department of Teleinformatics Engineering from the Federal University of Ceara, Brazil. He received his PhD degree in Réseaux et Telecoms from the University of Evry, France (2004). His research interests include sensor networks, urban computing, precision apiculture, environmental monitoring, natural and renewable resources management.

    View full text