Water end-use consumption in low-income households: Evaluation of the impact of preprocessing on the construction of a classification model
Introduction
Water end use control and rationalization are essential for the universalization of water access. This concept has been proven in the cases of agricultural and industrial water use and must be expanded to urban and domestic water use. Many countries, particularly developing countries, still experience unacceptable water losses in their urban distribution systems. In some cases, more than 50% of the water produced at treatment facilities does not reach households. Additionally, significant water waste occurs in both rich and poor households with various population demographics.
Given current consumption patterns and habits, there will likely be an increase in residential water demand as a result of global urban growth (Cominola, Giuliani, Piga, Castelletti, & Rizzoli, 2015). Additionally, water pollution, urban development, agricultural irrigation, climate change, and droughts also contribute to disparities between the availability of quality water sources and consumption demand (Jorgensen, Graymore, & O’Toole, 2009). Therefore, in the face of water scarcity, information regarding how and when water is used can aid the development of policies aimed at reducing water consumption (Vašak, Banjac, & Novak, 2015).
Water consumption in buildings, residential areas, commercial enterprises, or institutional facilities depends on multiple factors as discussed by Kiperstok and Kiperstok (2017). Consumption depends on technological, managerial, and behavioral issues. It is widely accepted that awareness is key to the rational use of water and that there can be no awareness without proper control. However, control is not possible without accurate measurements. Water distribution systems are being designed to tackle this issue and the hydro metering of all consumer units is either already conducted or is being actively pursued by water authorities worldwide. Combining information from residential water meters with regional or water sector flow and pressure records can allow urban water losses to be curtailed. Moving water control inside buildings and households is an important challenge for water authorities and consumers. Understanding how water is consumed and whether it is properly meeting a demand or being wasted allows consumers to adopt necessary measures to reduce consumption while satisfying their desires. It also allows authorities, researchers, and suppliers to design strategies to favor more rational equipment and practices.
To identify how water is consumed, wasted, or lost through building hydraulic installations, pipes, reservoirs, faucets, tubs, washing machines, or showers, two main methods are typically used: installing a water meter for each equipment or developing a means to interpret the flow signals from a central water meter1. Previous works (Mello et al., 2018, Soares et al., 2018) have applied both types of methods.
Over the past three decades, research has promoted the development of intelligent water meters for fostering the characterization of water consumption patterns according to end uses (Bennett et al., 2013, Liu et al., 2016, Nguyen et al., 2014). Several issues make water end use recognition challenging, such as whether an observed time series represents individual or combined events and how combined events can be separated. Additionally, it is difficult to handle multiple behaviors associated with the same fixture or with new user patterns. Currently, a common practice is to use commercial software such as Trace Wizard® to address these issues. Another possibility is to create custom models using preprocessed data, such as with data that are already treated (in cases with simultaneous uses) and labeled according to end-use equipment. This requires reliable software responses because processed data may not represent the real water consumption behaviors of each device. This is a significant issue because models (including secondary steps, if any) mold themselves to the characteristics of data, which influences the choice of pattern recognition techniques, decisions regarding the factors that influence consumption, information about the quantity of water used by hydraulic equipment, and user behavior. Therefore, it is crucial to have a means of verifying the preprocessing step.
This study aimed to explore the importance of having a dataset that is truly rated by end-use equipment to highlight the impact on data behavior when using Trace Wizard® preprocessing, as well as the impact when a model is constructed based on preprocessed data and then applied to classify truly rated data. Two models are explored: a random forest (RF) based on extracting features from time series and a 1-nearest neighbor (1NN) model using edit distance with a real penalty measure (ERP), which calculates the similarity between an unknown time series and reference time series dataset.
The remainder of this paper is organized as follows. The limitations of Trace Wizard® and their implications are discussed in Section 2. In Section 3, a concise literature review of related works on end-use classification methods is presented. In Section 4, the considered classification models are presented. In Section 5, information regarding water flow data is presented and the water consumption characterization results is discussed. Our methodology is discussed in Section 6. We present (a) comparisons between a dataset classified by Trace Wizard® and by individual flow sensors and (b) demonstrate that the selected models are not fully able to conform to data acceptably, when there is major differences between preprocess method responses. Additionally, models were constructed using training data from Trace Wizard® and tested on data classified by sensors. The results are discussed in Section 7. Finally, in Section 8, we summarize the main conclusions drawn from our experimental results.
Section snippets
Limitations of the Trace Wizard® application
Trace Wizard® (DeOreo, Heaney, & Mayer, 1996) is a commercial software that can split simultaneous device uses and can classify a time series of flow data into end uses. It uses a decision tree to perform event classification by evaluating similarity based on manually predefined parameters for each type of equipment. The use of this software requires attention to some key points. For example, it is highly dependent on human inputs for the choice of statistics derived from water flow series,
Background of End Use classification methods
For water consumption time series classification, the most popular approach is to use software that applies pattern recognition tools. Some suitable commercially available software are Identiflow® (Kowalski & Marshallsay, 2005), HydroSense® (Larson et al., 2012), BuntBrainEndUses® (Pastor-Jabaloyes, Arregui, & Cobacho, 2018), and Trace Wizard®.
Identiflow® applies a decision tree, similar to Trace Wizard®, to identify and classify events based on discriminating information regarding the use of
Developed classification models
The most commonly used hydraulic devices are faucets (kitchen, bathroom, outdoor areas), showers, and toilets. These types of end uses depend on human handling, which in turn depends on user behavior, level of awareness regarding the proper use of water, and the condition of hydraulic installations (Kiperstok & Kiperstok, 2017). Additionally, based on the characteristics of these types of devices, it is natural to assume that they will be used for performing several functions, contributing to
Characteristics of the study area
Our study was performed in Plataforma, Salvador, Bahia. The study location is highlighted in Fig. 3.
Plataforma is one of the oldest districts in Salvador. This neighborhood can be characterized as residential based on the presence of only small- and medium-sized businesses. An important local characteristic is that the majority of the inhabitants have low purchasing power and little schooling, with most residents having only completed elementary school.
This information was confirmed in a survey
Comparison of databases using preprocessing methods
The investigative week was a period used to understand and label the water consumption values related to each fixture. The data collected during this period represents 2% to 4% of the training dataset for each residence. This dataset was classified using Trace Wizard® (TW-class) and individual flow sensors (FS-class). For TW-class, an experienced researcher inputted of the features and statistics required for classification. In FS-class, data on consumption were obtained from the YF-S201
Impact of preprocessing on the perception of equipment behavior
When comparing the classification results provided by the two methods, approximately 34.3% of the events were classified equally, corresponding to 33.8% of the water volume (with FS-class as the reference for classification). Table 3, Table 4 list the percentages of correspondence per house and fixture in terms of both events and consumed volumes. For the external faucet, house E was classified completely oppositely from Trace Wizard®, whereas 45% of the events in house F were classified
Conclusions
An investigative period is fundamental for the development and validation of supervised models, but also for the quantitative understanding of features per device and potential changes over time. However, even residences located in neighborhoods with similar architectural, socioeconomic, and climatic characteristics exhibit considerable water consumption variations per device, which makes generalization difficult. Therefore, a reliable prior labelling of time series is fundamental for
CRediT authorship contribution statement
Karla Oliveira-Esquerre: Conceptualization, Methodology, Project administration, Supervision, Writing - original draft, Writing - review & editing, Formal analysis, Funding acquisition. Mariza Mello: Software, Writing - original draft, Writing - review & editing, Visualization, Formal analysis, Validation, Data curation. Gabriella Botelho: Writing - original draft, Writing - review & editing, Data curation, Investigation, Formal analysis. Zikang Deng: Software, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to acknowledge PROSAB and FINEP for providing financial support for our research, Teclim-BA for providing datasets, and the Coordination for the Improvement of Higher Education Personnel-CAPES (CAPES/PRINT - 41/2017, Proc. N. 88887.467907/ 2019–00) for their visiting scholarship at UCSD. Additionally, we wish to acknowledge PhD Kelly Fontoura for helping with data collection and the residents of the households who gave their time and allowed their water consumption to be
References (34)
- et al.
The Arithmetic Optimization Algorithm
Computer Methods in Applied Mechanics and Engineering
(2021) - et al.
ANN-based residential water end-use demand forecasting model
Expert Systems with Applications
(2013) - et al.
On The Marriage of Lp-norms and Edit Distance
- et al.
Benefits and challenges of using smart meters for advancing residential water demand modeling and management: A review
Environmental Modelling and Software
(2015) - et al.
Household water use behavior: An integrated model
Journal of Environmental Management
(2009) - et al.
Disaggregated water sensing from a single, pressure-based sensor: An extended analysis of HydroSense using staged experiments
Pervasive and Mobile Computing
(2012) - et al.
Urban water conservation through customised water and end-use information
Journal of Cleaner Production
(2016) - et al.
Comparative study of similarity measures used to classify residential water flow pattern of low-income households in salvador - Brazil
Computer Aided Chemical Engineering
(2018) - et al.
An intelligent pattern recognition model to automate the categorisation of residential water end-use events
Environmental Modelling & Software
(2013) - et al.
Development of an intelligent model to categorise residential water end use events
Journal of Hydro-Environment Research
(2013)