Full length articleVision-based estimation of the number of occupants using video cameras
Introduction
The building industry has been shifting towards improved building sustainability, while the tools and methods for evaluating building performance has been gaining increased importance. Simulation tools are widely used in analyzing building performance and the quantitative benchmarking of design alternatives to improve performance across a range of relevant criteria. As such, various decisions on the selection of systems and components selection, form explorations, equipment sizing or spatial organization can be guided by simulation results. Similarly, during building retrofit, simulations are used to verify whether a building fulfils its performance, and explore various interventions to improve performance.
An extensive amount of building information is required in an energy model to describe a building to a sufficient level of detail. However, there are many assumptions and uncertainties underlying performance simulations that may not fully match reality, resulting in a ‘performance gap’ [1]. Building-related uncertainties are mainly due to occupant behavior patterns, such as metabolic heat gains, set points, use of equipment/lighting, operation of windows and shading devices [2]. The unpredictable nature of occupants, their interaction with building systems/components, and their lifestyle choices are critical factors that introduce uncertainties into the results and contribute to the discrepancy between actual and calculated energy use [3], [4]. Accurate representations of occupancy are critical for the correct estimation of a variety of factors including occupant-produced air contaminants, metabolic heat gains and other model dynamics related to human-building interaction. Fine-grained occupancy information with high temporal resolution is also crucial in demand-driven control applications, and can lead to improved performance in lighting, heating, ventilation and air conditioning as well as improved space utilization [5].
Accurate occupancy modeling has become a crucial aspect for optimal decision making in efficient energy management and smart refurbishments [6]. For this, predictive and descriptive models have emerged for the estimation of occupancy, appliances, variety of loads and operation hours. Descriptive models have the primary aim of specifing important and realistic occupant characteristics. Predictive models, on the other hand, aim to predict occupant behavior for the forecasting of future time-steps for the application in building control [7]. Stochastic models and ML-based methods are widely used in predictive models. Stochastic approaches make predictions by assigning uncertainties to existing deterministic schedules and by establishing correlations between environmental variables (i.e. illumination, temperature) and the likelihood of occupants engaging in adaptive actions (i.e. changing setpoints, turning lights on). Stochastic models can introduce various degrees of uncertainty, presenting variation to an otherwise uniform occupancy pattern. Such models uncover statistical relationships between environmental factors and the targeted operations by processing large amounts of observed data [8]. The use of stochastic models can support the prediction of the number of people in a zone [9], window opening behavior [10], [11] occupant behavior [12]. Another type of stochastic modeling is the agent-based approach, which recreates the actual behavior of occupants as autonomous decision-making entities that interact with each other and systems [13]. Langevin et al. developed a system in which agents with comfort desires modify clothing and the use of windows, fans and heaters [13]. A multi-agent stochastic simulator was proposed by Wate et al. for heating and cooling load predictions [14]. Mocolier et al. developed an agent-based approach that simulates the reactive, deliberative and social behavior of occupants based on the Belief–Desire–Intention architecture [15]. Machine learning (ML) methods (i.e. artificial neural networks, support vector machines, decision trees) also are used for predictive modeling. ML approaches are known to be based on black-box models that can substitute for knowledge-driven models embodying domain specific knowledge. Some examples include the use of ML algorithms for the change solar shading states [16], artificial lighting [17], and equipment use [18].
Although predictive approaches can generate realistic occupancy time-series, their validity and reliability are challenged when they have a weak empirical basis [19]. These models may even fail when a priori knowledge on the probabilistic relationships regarding occupants and building-related factors (i.e. attributes, behavioral rules, memory, resources, decision-making sophistication, and rules for modifying current behavioral rules) is missing [8]. ML approaches also require empirical evidence to train models for the prediction of temporal occupancy sequences. The lack of empirical data is a serious obstacle for rooms that have restricted opportunities for long-term, large-scale in-situ monitoring, which can increase capacity to generalize populations, building types, locations, and climates [20]. Also, rooms with a high number of occupants having diverse, unknown motivations and interactions, and mixed-use building types that do not have historical data for model training are challenging cases for occupant modeling. The systematic collection of observational data can augment the development of descriptive, stochastic and ML models that are empirically grounded [8], [21].
The real-time detection and localization of occupants is also important for smart building applications that require immediate occupant information, such as occupancy-driven control systems, model predictive control systems or responsive building componens. Real-time occupant detection requires a sensing infrastructure for data collection [7]. Amongst these, wearable sensors or smartphones (i.e. Radio Frequency Identification (RFID), WIFI or Bluetooth Low Energy (BLE)) can be used to track occupants’ movement patterns, but can be disadvantageous as they can intrude the natural behaviors of occupants [22]. Other monitoring techniques may involve measurements through environmental sensor networks (i.e. indoor CO2, CO, TVOC, small particulates, temperature, humidity). These are called “proxy” measurements, which are based on constitutive models that utilize the spatial and physical features in a room to infer occupant presence [23]. However, according to the same source, proxy measurements may require comprehensive sensor infrastructures, sensor calibrations with a high maintenance cost and inference models that are dependent upon sensor location, link function and latent factors.
A robust alternative can be implemented by computer vision methods by the use of video cameras for data collection and advanced video analysis techniques. The artificial intelligence methods that are also utilized in computer vision can transcend traditional decision making processes to achieve high levels of operational efficiency, decision quality and system reliability [24]. Computer vision methods are already widely used in construction sites for worker safety [25], [26], [27], [28], [29], [30], [31], [32], worker detection [26], ergonomic posture recognition [27], multiple worker tracking [33], automated productivity analysis [25], worker trajectory prediction [34] and activity analysis [26]. Similarly, vision-based methods are being utilized in urban analytics to exploit deep learning and computer vision technologies using image-based or video-based data resources [35]. Specifically, vision-based object detection methods can address a wide range of urban issues such as neighborhood safety [36], crowd disaster avoidance [37], pedestrian risk analysis in traffic [38], pedestrian-vehicle interactions [39], vehicle–bicycle interactions [40], pedestrian flow statistics [41], commercial activeness [42], the perception of urban environment [43] and the quality, impact of urban appearance [44].
During building energy modeling, computer vision methods can facilitate the fast analysis of temporal and spatial video content and increase the accuracy of simulation results. Wang et al. proposed a vision-based method for occupant number detection using data fusion of video and CO2 concentration for the predictive control of indoor environment [45]. Majumder et al. developed an occupancy logger with vision sensors that collect occupancy data under different movement scenarios and illumination conditions [46]. Balaji et al. proposed a cascading video analysis algorithm based on support vector machine (SVM), convolutional neural network (CNN) and K-means cluster to provide fine grained occupancy-based HVAC actuation [47]. Liu et al. proposed a two-stage DBN-based fusion method based on multiple vision sensors with video recordings and motion sensors [48]. Benezeth et al. proposed a vision-based system for human detection and activity analysis with change detection, tracking and recognition [49]. Erickson et al. used wireless camera sensor networks for occupancy mobility detection with Gaussian and agent-based models [50]. Chen et al. developed an algorithm for head contour based detection using a novel blob segmentation method [51]. Zou et al. developed a video analysis technique combining deep learning and traditional artificial feature [52]. Wang et al. proposed fusion techniques using a combination of active RFID and video cameras for occupancy monitoring [53]. Wang et al. proposed an image-based system that uses a human pose estimation model providing positioning and orientation information in real time [54].
While existing computer vision-based methods targeting occupant counting have achieved certain degrees of success, they did not address some key challenges regarding object recognition. The existing approaches were applied in smaller rooms (<100 m2) with few occupants. However, the detection of occupants located far away in a scene is difficult as they are captured in very low resolution and lack the adequate amount of visual detail. This might require the installation of multiple cameras that are responsible for different regions in the room. Moreover, in rooms with dense occupancy patterns and large furniture, the scene is usually heavily cluttered as objects occlude one another. Consequently, segmentation and recognition can be challenged and the detection accuracy can be reduced. These conditions are typically observed in public and mixed-use buildings, where a high number of people with high occupancy variation make it difficult to estimate occupancy number and occupant modeling [55].
This paper explores the applicability of vision-based methods to capture occupancy patterns in indoor spaces that are densely populated and highly cluttered. We aim to develop a video-analytical method that utilizes deep learning architectures to estimate the number of people in large indoor environments with multiple cameras. Based on the gaps in the existing literature that we discussed previously, we particularly focus on indoor conditions that are sufficiently large to necessitate multiple cameras for a whole visual coverage. For the same reason, a highly obstructed and crowded room is selected that also shows large variations in the number of people in time.
We first install IP cameras in a large classroom in an educational building. We develop a counting method that instantaneously counts the number of people in a scene, a second method that incrementally counts the number of people entering or exiting a room through the door, and a third method that combines the data from the former two methods under different conditions. By utilizing the estimates of the proposed method, (sub)hourly data on the number of occupants can be generated. We implement and validate our results using the video recordings in the classroom during a week. To make a detailed comparative analysis between the actual and calculated data, ground truth measurements are made during a day with the highest number of occupants. Finally, discrepancies between the calculated and theoretical occupancy count are calculated. The occupants’ impact on building energy performance (energy use and thermal discomfort) is also calculated through energy simulations, which is presented in the supplementary section.
The rest of the paper is organized in four parts. Section 2 outlines the experimental setup of the room that the method is implemented on and points to the challenges that our approach specifically addresses. Section 3 proposes the video-analytical method, and Section 4 presents the results of the method. The final section offers limitations and future work in the light of the provided analyses.
Section snippets
Experimental setup
We conduct our experiments in a classroom located on the second floor of an existing university building (Fig. 1). The total floor area of the classroom is 316 m2, with dimensions 24 m × 13.1 m and a ceiling height of 5.1 m. The classroom is used as a studio for the first-year architecture students. In the classroom, each student has a drawing desk used for modeling and drawing. This means that the environment is heavily cluttered with furniture, modeling tools and materials.
The classroom is
Methodology
The main aim of the proposed method is to estimate the number of people in a room from video recordings. For this aim, the proposed method estimates the number of heads of people in a scene using object detection methods that are widely used in computer vision. Moreover, background elimination and object tracking are also utilized in order to eliminate false alarms and to benefit from temporal continuity among object-pairs in each frame.
For estimating the number of people, three different
Experimental results of the proposed person counting methods
In this section, the proposed methods’ performance in the estimation of the number of people within the classroom is evaluated. For this aim, ground truth measurement types, evaluation measures and results are explained below.
Conclusion
Buildings have been identified as one of the major factors influencing energy consumption. It is widely argued in the literature that building performance is highly sensitive to occupant presence, behavior and occupant-related aspects. For reliable and accurate modeling with minimal uncertainties, capturing precise information on occupants is essential. This work presented a vision-based method that estimates the number of people in large indoor environments with multiple cameras by utilizing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by an Institutional Links grant under the Newton-Katip Celebi partnership, Grant No. 217 M519 by the Scientific and Technological Research Council of Turkey (TUBITAK) and ID [352335596] by British Council, UK. All building occupants who agreed to take part in the experiments are gratefully acknowledged. The authors would also like to thank Sahin Akin, who contributed to data collection.
References (79)
- et al.
Predictability of occupant presence and performance gap in building energy simulation
Appl. Energy.
(2017) - et al.
User behavior in whole building simulation
User behavior in whole building simulation
(2009) - et al.
Occupancy measurement in commercial office buildings for demand-driven control applications - A survey and detection system evaluation
Energy Build.
(2015) - et al.
Modeling occupant behavior in buildings
Build. Environ.
(2020) - et al.
Occupant behavior modeling for building performance simulation : current state and future challenges
Energy Build.
(2015) - et al.
A generalised stochastic model for the simulation of occupant presence
Energy Build.
(2008) - et al.
A stochastic model of user behaviour regarding ventilation
Build. Environ.
(1990) - et al.
Framework for emulation and uncertainty quantification of a stochastic building performance simulator
Appl. Energy.
(2020) - et al.
Li-BIM, an agent-based approach to simulate occupant-building interaction from the Building-Information Modelling
Eng. Appl. Artif. Intell.
(2019) - et al.
ANN based automatic slat angle control of venetian blind for minimized total load in an office building
Sol. Energy.
(2019)
Smart lighting system using ANN-IMC for personalized lighting control and daylight harvesting
Build. Environ.
A scalable Bluetooth Low Energy approach to identify occupancy patterns and profiles in office spaces
Build. Environ.
A systematic literature review on intelligent automation: aligning concepts from theory, practice, and future perspectives
Adv. Eng. Informatics.
Computer vision for behaviour-based safety in construction: a review and future directions
Adv. Eng. Informatics.
Computer vision techniques for construction safety and health monitoring
Adv. Eng. Informatics.
A deep learning-based approach for mitigating falls from height with computer vision: convolutional neural network
Adv. Eng. Informatics.
Real-time smart video surveillance to manage safety: a case study of a transport mega-project
Adv. Eng. Informatics.
Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors
Autom. Constr.
Autonomous pro-active real-time construction worker and equipment operator proximity safety alert system
Autom. Constr.
Tracking multiple construction workers through deep learning and the gradient based method with re-matching based on multi-object tracking accuracy
Autom. Constr.
A context-augmented deep learning approach for worker trajectory prediction on unstructured and dynamic construction sites
Adv. Eng. Informatics.
Understanding cities with machine eyes: a review of deep computer vision in urban analytics
Cities
Computer vision based crowd disaster avoidance system: a survey
Int. J. Disaster Risk Reduct.
Investigating secondary pedestrian-vehicle interactions at non-signalized intersections using vision-based trajectory data
Transp. Res. Part C Emerg. Technol.
Automated safety diagnosis of vehicle–bicycle interactions using computer vision analysis
Saf. Sci.
Predictive control of indoor environment using occupant number detected by video data and CO2 concentration
Energy Build.
Towards a sensor for detecting human presence and characterizing activity
Energy Build.
Occupancy detection in the office by analyzing surveillance videos and its application to building energy conservation
Energy Build.
Image-based occupancy positioning system using pose-estimation model for demand-oriented ventilation
J. Build. Eng.
Review of occupancy sensing systems and occupancy modeling methodologies for the application in institutional buildings
Energy Build.
Efficient detection under varying illumination conditions and image plane rotations
Comput. Vis. Image Underst.
Efficient adaptive density estimation per image pixel for the task of background subtraction
Pattern Recognit. Lett.
The impact of climate change on the overheating risk in dwellings—A Dutch case study
Build. Environ.
Visual privacy protection methods: a survey
Expert Syst. Appl.
Uncertainty analysis in building performance simulation for design support
Energy Build.
Machine learning for estimation of building energy consumption and performance: a review
Vis. Eng.
Verification of stochastic models of window opening behaviour for residential buildings
J. Build. Perform. Simul.
Coupling stochastic occupant models to building performance simulation using the discrete event system specification formalism
J. Build. Perform. Simul.
Cited by (9)
DeepVision based detection for energy-efficiency and indoor air quality enhancement in highly polluted spaces
2024, Journal of Building EngineeringEstimating the number of occupants and activity intensity in large spaces with environmental sensors
2023, Building and EnvironmentA decoding-based method for fast background filtering of roadside LiDAR data
2023, Advanced Engineering InformaticsHIGSA: Human image generation with self-attention
2023, Advanced Engineering InformaticsCitation Excerpt :More significantly, HIGSA can be applied to many artifacts-centered engineering fields [66–69]. We believe that the use of HIGSA as a tool to generate new human images can be further developed to strengthen vision-based estimation of the number of occupants [70]. In general, the proposed method for directly strengthen the human neuroepithelial organoids task will provide interesting inputs for future work [71,72].
A fusion framework for vision-based indoor occupancy estimation
2022, Building and EnvironmentCitation Excerpt :Previous indoor occupancy estimation approaches using two-vision fusion are rare. Moreover, existing few studies [24,28,29,38] based on two-vision fusion mainly focus on the third-level fusion. Besides, these studies do not address the problems of cumulative errors and unstable predictions, resulting in errors in indoor occupancy estimation.