Water end-use consumption in low-income households: Evaluation of the impact of preprocessing on the construction of a classification model

https://doi.org/10.1016/j.eswa.2021.115623Get rights and content

Highlights

  • Water consumption variability affects the preprocessing of time series features.

  • Random Forest and 1NN with the ERP measure show similar performances.

  • Errors linked to preprocess method must be known in order to select that model.

Abstract

The challenge of transforming massive water flow data into desegregated smart information according to water end uses is an issue that has motivated many researchers. This challenge is even more difficult in low-income regions owing to the high variability of data because predominant hydraulic devices offer many activation possibilities for users as they are controlled by globe valves. Devices with standardized flow rates such as washing machines or dishwashers are exceptions. A common practice is to apply commercial software that classifies events at the end-use level and then to develop a personalized classification model with enhanced alignment with the database. If the preprocessing step is not performed properly, it can affect perceived device behaviors, which may lead to incorrect conclusions. To evaluate how this variability can interfere with commercial software responses, we developed classification models using a dataset preprocessed by Trace Wizard® as training data and then applied the trained models to a test dataset consisting of events that were authenticated by individual flow sensors. Our goal was to identify the degree of difference between the two datasets. The results demonstrate that when Trace Wizard® is applied, the features of each device differ from the original water consumption flow, indicating that data variability interferes with the credibility of feedback. Additionally, preprocessing tended to increase the volume, duration, and flow rates, giving the impression that the consumption was higher than the real scenario. The constructed models were not able to overcome the distortions introduced by Trace Wizard® classification. For example, fixtures had poor matches for several houses, with statistical measures below 50%.

Introduction

Water end use control and rationalization are essential for the universalization of water access. This concept has been proven in the cases of agricultural and industrial water use and must be expanded to urban and domestic water use. Many countries, particularly developing countries, still experience unacceptable water losses in their urban distribution systems. In some cases, more than 50% of the water produced at treatment facilities does not reach households. Additionally, significant water waste occurs in both rich and poor households with various population demographics.

Given current consumption patterns and habits, there will likely be an increase in residential water demand as a result of global urban growth (Cominola, Giuliani, Piga, Castelletti, & Rizzoli, 2015). Additionally, water pollution, urban development, agricultural irrigation, climate change, and droughts also contribute to disparities between the availability of quality water sources and consumption demand (Jorgensen, Graymore, & O’Toole, 2009). Therefore, in the face of water scarcity, information regarding how and when water is used can aid the development of policies aimed at reducing water consumption (Vašak, Banjac, & Novak, 2015).

Water consumption in buildings, residential areas, commercial enterprises, or institutional facilities depends on multiple factors as discussed by Kiperstok and Kiperstok (2017). Consumption depends on technological, managerial, and behavioral issues. It is widely accepted that awareness is key to the rational use of water and that there can be no awareness without proper control. However, control is not possible without accurate measurements. Water distribution systems are being designed to tackle this issue and the hydro metering of all consumer units is either already conducted or is being actively pursued by water authorities worldwide. Combining information from residential water meters with regional or water sector flow and pressure records can allow urban water losses to be curtailed. Moving water control inside buildings and households is an important challenge for water authorities and consumers. Understanding how water is consumed and whether it is properly meeting a demand or being wasted allows consumers to adopt necessary measures to reduce consumption while satisfying their desires. It also allows authorities, researchers, and suppliers to design strategies to favor more rational equipment and practices.

To identify how water is consumed, wasted, or lost through building hydraulic installations, pipes, reservoirs, faucets, tubs, washing machines, or showers, two main methods are typically used: installing a water meter for each equipment or developing a means to interpret the flow signals from a central water meter1. Previous works (Mello et al., 2018, Soares et al., 2018) have applied both types of methods.

Over the past three decades, research has promoted the development of intelligent water meters for fostering the characterization of water consumption patterns according to end uses (Bennett et al., 2013, Liu et al., 2016, Nguyen et al., 2014). Several issues make water end use recognition challenging, such as whether an observed time series represents individual or combined events and how combined events can be separated. Additionally, it is difficult to handle multiple behaviors associated with the same fixture or with new user patterns. Currently, a common practice is to use commercial software such as Trace Wizard® to address these issues. Another possibility is to create custom models using preprocessed data, such as with data that are already treated (in cases with simultaneous uses) and labeled according to end-use equipment. This requires reliable software responses because processed data may not represent the real water consumption behaviors of each device. This is a significant issue because models (including secondary steps, if any) mold themselves to the characteristics of data, which influences the choice of pattern recognition techniques, decisions regarding the factors that influence consumption, information about the quantity of water used by hydraulic equipment, and user behavior. Therefore, it is crucial to have a means of verifying the preprocessing step.

This study aimed to explore the importance of having a dataset that is truly rated by end-use equipment to highlight the impact on data behavior when using Trace Wizard® preprocessing, as well as the impact when a model is constructed based on preprocessed data and then applied to classify truly rated data. Two models are explored: a random forest (RF) based on extracting features from time series and a 1-nearest neighbor (1NN) model using edit distance with a real penalty measure (ERP), which calculates the similarity between an unknown time series and reference time series dataset.

The remainder of this paper is organized as follows. The limitations of Trace Wizard® and their implications are discussed in Section 2. In Section 3, a concise literature review of related works on end-use classification methods is presented. In Section 4, the considered classification models are presented. In Section 5, information regarding water flow data is presented and the water consumption characterization results is discussed. Our methodology is discussed in Section 6. We present (a) comparisons between a dataset classified by Trace Wizard® and by individual flow sensors and (b) demonstrate that the selected models are not fully able to conform to data acceptably, when there is major differences between preprocess method responses. Additionally, models were constructed using training data from Trace Wizard® and tested on data classified by sensors. The results are discussed in Section 7. Finally, in Section 8, we summarize the main conclusions drawn from our experimental results.

Section snippets

Limitations of the Trace Wizard® application

Trace Wizard® (DeOreo, Heaney, & Mayer, 1996) is a commercial software that can split simultaneous device uses and can classify a time series of flow data into end uses. It uses a decision tree to perform event classification by evaluating similarity based on manually predefined parameters for each type of equipment. The use of this software requires attention to some key points. For example, it is highly dependent on human inputs for the choice of statistics derived from water flow series,

Background of End Use classification methods

For water consumption time series classification, the most popular approach is to use software that applies pattern recognition tools. Some suitable commercially available software are Identiflow® (Kowalski & Marshallsay, 2005), HydroSense® (Larson et al., 2012), BuntBrainEndUses® (Pastor-Jabaloyes, Arregui, & Cobacho, 2018), and Trace Wizard®.

Identiflow® applies a decision tree, similar to Trace Wizard®, to identify and classify events based on discriminating information regarding the use of

Developed classification models

The most commonly used hydraulic devices are faucets (kitchen, bathroom, outdoor areas), showers, and toilets. These types of end uses depend on human handling, which in turn depends on user behavior, level of awareness regarding the proper use of water, and the condition of hydraulic installations (Kiperstok & Kiperstok, 2017). Additionally, based on the characteristics of these types of devices, it is natural to assume that they will be used for performing several functions, contributing to

Characteristics of the study area

Our study was performed in Plataforma, Salvador, Bahia. The study location is highlighted in Fig. 3.

Plataforma is one of the oldest districts in Salvador. This neighborhood can be characterized as residential based on the presence of only small- and medium-sized businesses. An important local characteristic is that the majority of the inhabitants have low purchasing power and little schooling, with most residents having only completed elementary school.

This information was confirmed in a survey

Comparison of databases using preprocessing methods

The investigative week was a period used to understand and label the water consumption values related to each fixture. The data collected during this period represents 2% to 4% of the training dataset for each residence. This dataset was classified using Trace Wizard® (TW-class) and individual flow sensors (FS-class). For TW-class, an experienced researcher inputted of the features and statistics required for classification. In FS-class, data on consumption were obtained from the YF-S201

Impact of preprocessing on the perception of equipment behavior

When comparing the classification results provided by the two methods, approximately 34.3% of the events were classified equally, corresponding to 33.8% of the water volume (with FS-class as the reference for classification). Table 3, Table 4 list the percentages of correspondence per house and fixture in terms of both events and consumed volumes. For the external faucet, house E was classified completely oppositely from Trace Wizard®, whereas 45% of the events in house F were classified

Conclusions

An investigative period is fundamental for the development and validation of supervised models, but also for the quantitative understanding of features per device and potential changes over time. However, even residences located in neighborhoods with similar architectural, socioeconomic, and climatic characteristics exhibit considerable water consumption variations per device, which makes generalization difficult. Therefore, a reliable prior labelling of time series is fundamental for

CRediT authorship contribution statement

Karla Oliveira-Esquerre: Conceptualization, Methodology, Project administration, Supervision, Writing - original draft, Writing - review & editing, Formal analysis, Funding acquisition. Mariza Mello: Software, Writing - original draft, Writing - review & editing, Visualization, Formal analysis, Validation, Data curation. Gabriella Botelho: Writing - original draft, Writing - review & editing, Data curation, Investigation, Formal analysis. Zikang Deng: Software, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to acknowledge PROSAB and FINEP for providing financial support for our research, Teclim-BA for providing datasets, and the Coordination for the Improvement of Higher Education Personnel-CAPES (CAPES/PRINT - 41/2017, Proc. N. 88887.467907/ 2019–00) for their visiting scholarship at UCSD. Additionally, we wish to acknowledge PhD Kelly Fontoura for helping with data collection and the residents of the households who gave their time and allowed their water consumption to be

References (34)

  • K.A. Nguyen et al.

    An autonomous and intelligent expert system for residential water end-use classification

    Expert Systems with Applications

    (2014)
  • E.S. Soares et al.

    Development of a model to identify combined use in residential water end use events

    Computer Aided Chemical Engineering

    (2018)
  • M. Vašak et al.

    Water use disaggregation based on classification of feature vectors extracted from smart meter data

    Procedia Engineering

    (2015)
  • M. Wonders et al.

    Training with synthesised data for disaggregated event classification at the water meter

    Expert Systems with Applications

    (2016)
  • L. Abualigah et al.

    A comprehensive survey of the Grasshopper optimization algorithm: Results, variants, and applications

    Neural Computing and Applications

    (2020)
  • L. Abualigah et al.

    Advances in Sine Cosine Algorithm: A comprehensive survey

    Artificial Intelligence Review

    (2021)
  • J. Bergstra et al.

    Random search for hyper-parameter optimization

    Journal of Machine Learning Research

    (2012)
  • Cited by (0)

    View full text