Full length article
Vision-based estimation of the number of occupants using video cameras

https://doi.org/10.1016/j.aei.2022.101662Get rights and content

Highlights

  • A vision-based approach using deep learning architectures to estimate people count.

  • Two methods that instantaneously and incrementally count people are combined.

  • The method was tested in a large, crowded and severely occluded classroom.

  • Results show that our method has high predictive capacity.

  • Future work should address high computational cost and privacy preservation.

Abstract

Although occupancy information is critical to energy consumption of existing buildings, it still remains to be a major source of uncertainty. For reliable and accurate occupant modeling with minimal uncertainties, capturing precise occupant information on occupants is essential. This paper proposes a computer vision-based approach that utilizes deep learning architectures to estimate of the number of people in large, crowded spaces using multiple cameras. Various vision techniques (head detection, background elimination, head tracking) are implemented in three methods: (i) a method that instantaneously counts people in a scene, (ii) a method that incrementally counts people entering/exiting a room and (iii) a combination of the first two methods. These methods were applied in a classroom with heavy occlusions, and resulted in a high prediction capacity when compared to ground truth measurements. Future work in video-analytical approaches can address problems regarding lowering the computational cost of analysis, capturing occupancy data in complex room geometries and addressing concerns in privacy preservation.

Introduction

The building industry has been shifting towards improved building sustainability, while the tools and methods for evaluating building performance has been gaining increased importance. Simulation tools are widely used in analyzing building performance and the quantitative benchmarking of design alternatives to improve performance across a range of relevant criteria. As such, various decisions on the selection of systems and components selection, form explorations, equipment sizing or spatial organization can be guided by simulation results. Similarly, during building retrofit, simulations are used to verify whether a building fulfils its performance, and explore various interventions to improve performance.

An extensive amount of building information is required in an energy model to describe a building to a sufficient level of detail. However, there are many assumptions and uncertainties underlying performance simulations that may not fully match reality, resulting in a ‘performance gap’ [1]. Building-related uncertainties are mainly due to occupant behavior patterns, such as metabolic heat gains, set points, use of equipment/lighting, operation of windows and shading devices [2]. The unpredictable nature of occupants, their interaction with building systems/components, and their lifestyle choices are critical factors that introduce uncertainties into the results and contribute to the discrepancy between actual and calculated energy use [3], [4]. Accurate representations of occupancy are critical for the correct estimation of a variety of factors including occupant-produced air contaminants, metabolic heat gains and other model dynamics related to human-building interaction. Fine-grained occupancy information with high temporal resolution is also crucial in demand-driven control applications, and can lead to improved performance in lighting, heating, ventilation and air conditioning as well as improved space utilization [5].

Accurate occupancy modeling has become a crucial aspect for optimal decision making in efficient energy management and smart refurbishments [6]. For this, predictive and descriptive models have emerged for the estimation of occupancy, appliances, variety of loads and operation hours. Descriptive models have the primary aim of specifing important and realistic occupant characteristics. Predictive models, on the other hand, aim to predict occupant behavior for the forecasting of future time-steps for the application in building control [7]. Stochastic models and ML-based methods are widely used in predictive models. Stochastic approaches make predictions by assigning uncertainties to existing deterministic schedules and by establishing correlations between environmental variables (i.e. illumination, temperature) and the likelihood of occupants engaging in adaptive actions (i.e. changing setpoints, turning lights on). Stochastic models can introduce various degrees of uncertainty, presenting variation to an otherwise uniform occupancy pattern. Such models uncover statistical relationships between environmental factors and the targeted operations by processing large amounts of observed data [8]. The use of stochastic models can support the prediction of the number of people in a zone [9], window opening behavior [10], [11] occupant behavior [12]. Another type of stochastic modeling is the agent-based approach, which recreates the actual behavior of occupants as autonomous decision-making entities that interact with each other and systems [13]. Langevin et al. developed a system in which agents with comfort desires modify clothing and the use of windows, fans and heaters [13]. A multi-agent stochastic simulator was proposed by Wate et al. for heating and cooling load predictions [14]. Mocolier et al. developed an agent-based approach that simulates the reactive, deliberative and social behavior of occupants based on the Belief–Desire–Intention architecture [15]. Machine learning (ML) methods (i.e. artificial neural networks, support vector machines, decision trees) also are used for predictive modeling. ML approaches are known to be based on black-box models that can substitute for knowledge-driven models embodying domain specific knowledge. Some examples include the use of ML algorithms for the change solar shading states [16], artificial lighting [17], and equipment use [18].

Although predictive approaches can generate realistic occupancy time-series, their validity and reliability are challenged when they have a weak empirical basis [19]. These models may even fail when a priori knowledge on the probabilistic relationships regarding occupants and building-related factors (i.e. attributes, behavioral rules, memory, resources, decision-making sophistication, and rules for modifying current behavioral rules) is missing [8]. ML approaches also require empirical evidence to train models for the prediction of temporal occupancy sequences. The lack of empirical data is a serious obstacle for rooms that have restricted opportunities for long-term, large-scale in-situ monitoring, which can increase capacity to generalize populations, building types, locations, and climates [20]. Also, rooms with a high number of occupants having diverse, unknown motivations and interactions, and mixed-use building types that do not have historical data for model training are challenging cases for occupant modeling. The systematic collection of observational data can augment the development of descriptive, stochastic and ML models that are empirically grounded [8], [21].

The real-time detection and localization of occupants is also important for smart building applications that require immediate occupant information, such as occupancy-driven control systems, model predictive control systems or responsive building componens. Real-time occupant detection requires a sensing infrastructure for data collection [7]. Amongst these, wearable sensors or smartphones (i.e. Radio Frequency Identification (RFID), WIFI or Bluetooth Low Energy (BLE)) can be used to track occupants’ movement patterns, but can be disadvantageous as they can intrude the natural behaviors of occupants [22]. Other monitoring techniques may involve measurements through environmental sensor networks (i.e. indoor CO2, CO, TVOC, small particulates, temperature, humidity). These are called “proxy” measurements, which are based on constitutive models that utilize the spatial and physical features in a room to infer occupant presence [23]. However, according to the same source, proxy measurements may require comprehensive sensor infrastructures, sensor calibrations with a high maintenance cost and inference models that are dependent upon sensor location, link function and latent factors.

A robust alternative can be implemented by computer vision methods by the use of video cameras for data collection and advanced video analysis techniques. The artificial intelligence methods that are also utilized in computer vision can transcend traditional decision making processes to achieve high levels of operational efficiency, decision quality and system reliability [24]. Computer vision methods are already widely used in construction sites for worker safety [25], [26], [27], [28], [29], [30], [31], [32], worker detection [26], ergonomic posture recognition [27], multiple worker tracking [33], automated productivity analysis [25], worker trajectory prediction [34] and activity analysis [26]. Similarly, vision-based methods are being utilized in urban analytics to exploit deep learning and computer vision technologies using image-based or video-based data resources [35]. Specifically, vision-based object detection methods can address a wide range of urban issues such as neighborhood safety [36], crowd disaster avoidance [37], pedestrian risk analysis in traffic [38], pedestrian-vehicle interactions [39], vehicle–bicycle interactions [40], pedestrian flow statistics [41], commercial activeness [42], the perception of urban environment [43] and the quality, impact of urban appearance [44].

During building energy modeling, computer vision methods can facilitate the fast analysis of temporal and spatial video content and increase the accuracy of simulation results. Wang et al. proposed a vision-based method for occupant number detection using data fusion of video and CO2 concentration for the predictive control of indoor environment [45]. Majumder et al. developed an occupancy logger with vision sensors that collect occupancy data under different movement scenarios and illumination conditions [46]. Balaji et al. proposed a cascading video analysis algorithm based on support vector machine (SVM), convolutional neural network (CNN) and K-means cluster to provide fine grained occupancy-based HVAC actuation [47]. Liu et al. proposed a two-stage DBN-based fusion method based on multiple vision sensors with video recordings and motion sensors [48]. Benezeth et al. proposed a vision-based system for human detection and activity analysis with change detection, tracking and recognition [49]. Erickson et al. used wireless camera sensor networks for occupancy mobility detection with Gaussian and agent-based models [50]. Chen et al. developed an algorithm for head contour based detection using a novel blob segmentation method [51]. Zou et al. developed a video analysis technique combining deep learning and traditional artificial feature [52]. Wang et al. proposed fusion techniques using a combination of active RFID and video cameras for occupancy monitoring [53]. Wang et al. proposed an image-based system that uses a human pose estimation model providing positioning and orientation information in real time [54].

While existing computer vision-based methods targeting occupant counting have achieved certain degrees of success, they did not address some key challenges regarding object recognition. The existing approaches were applied in smaller rooms (<100 m2) with few occupants. However, the detection of occupants located far away in a scene is difficult as they are captured in very low resolution and lack the adequate amount of visual detail. This might require the installation of multiple cameras that are responsible for different regions in the room. Moreover, in rooms with dense occupancy patterns and large furniture, the scene is usually heavily cluttered as objects occlude one another. Consequently, segmentation and recognition can be challenged and the detection accuracy can be reduced. These conditions are typically observed in public and mixed-use buildings, where a high number of people with high occupancy variation make it difficult to estimate occupancy number and occupant modeling [55].

This paper explores the applicability of vision-based methods to capture occupancy patterns in indoor spaces that are densely populated and highly cluttered. We aim to develop a video-analytical method that utilizes deep learning architectures to estimate the number of people in large indoor environments with multiple cameras. Based on the gaps in the existing literature that we discussed previously, we particularly focus on indoor conditions that are sufficiently large to necessitate multiple cameras for a whole visual coverage. For the same reason, a highly obstructed and crowded room is selected that also shows large variations in the number of people in time.

We first install IP cameras in a large classroom in an educational building. We develop a counting method that instantaneously counts the number of people in a scene, a second method that incrementally counts the number of people entering or exiting a room through the door, and a third method that combines the data from the former two methods under different conditions. By utilizing the estimates of the proposed method, (sub)hourly data on the number of occupants can be generated. We implement and validate our results using the video recordings in the classroom during a week. To make a detailed comparative analysis between the actual and calculated data, ground truth measurements are made during a day with the highest number of occupants. Finally, discrepancies between the calculated and theoretical occupancy count are calculated. The occupants’ impact on building energy performance (energy use and thermal discomfort) is also calculated through energy simulations, which is presented in the supplementary section.

The rest of the paper is organized in four parts. Section 2 outlines the experimental setup of the room that the method is implemented on and points to the challenges that our approach specifically addresses. Section 3 proposes the video-analytical method, and Section 4 presents the results of the method. The final section offers limitations and future work in the light of the provided analyses.

Section snippets

Experimental setup

We conduct our experiments in a classroom located on the second floor of an existing university building (Fig. 1). The total floor area of the classroom is 316 m2, with dimensions 24 m × 13.1 m and a ceiling height of 5.1 m. The classroom is used as a studio for the first-year architecture students. In the classroom, each student has a drawing desk used for modeling and drawing. This means that the environment is heavily cluttered with furniture, modeling tools and materials.

The classroom is

Methodology

The main aim of the proposed method is to estimate the number of people in a room from video recordings. For this aim, the proposed method estimates the number of heads of people in a scene using object detection methods that are widely used in computer vision. Moreover, background elimination and object tracking are also utilized in order to eliminate false alarms and to benefit from temporal continuity among object-pairs in each frame.

For estimating the number of people, three different

Experimental results of the proposed person counting methods

In this section, the proposed methods’ performance in the estimation of the number of people within the classroom is evaluated. For this aim, ground truth measurement types, evaluation measures and results are explained below.

Conclusion

Buildings have been identified as one of the major factors influencing energy consumption. It is widely argued in the literature that building performance is highly sensitive to occupant presence, behavior and occupant-related aspects. For reliable and accurate modeling with minimal uncertainties, capturing precise information on occupants is essential. This work presented a vision-based method that estimates the number of people in large indoor environments with multiple cameras by utilizing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by an Institutional Links grant under the Newton-Katip Celebi partnership, Grant No. 217 M519 by the Scientific and Technological Research Council of Turkey (TUBITAK) and ID [352335596] by British Council, UK. All building occupants who agreed to take part in the experiments are gratefully acknowledged. The authors would also like to thank Sahin Akin, who contributed to data collection.

References (79)

  • N.K. Kandasamy et al.

    Smart lighting system using ANN-IMC for personalized lighting control and daylight harvesting

    Build. Environ.

    (2018)
  • Z.D. Tekler et al.

    A scalable Bluetooth Low Energy approach to identify occupancy patterns and profiles in office spaces

    Build. Environ.

    (2020)
  • K.K.H. Ng et al.

    A systematic literature review on intelligent automation: aligning concepts from theory, practice, and future perspectives

    Adv. Eng. Informatics.

    (2021)
  • W. Fang et al.

    Computer vision for behaviour-based safety in construction: a review and future directions

    Adv. Eng. Informatics.

    (2020)
  • J. Seo et al.

    Computer vision techniques for construction safety and health monitoring

    Adv. Eng. Informatics.

    (2015)
  • W. Fang et al.

    A deep learning-based approach for mitigating falls from height with computer vision: convolutional neural network

    Adv. Eng. Informatics.

    (2019)
  • H. Luo et al.

    Real-time smart video surveillance to manage safety: a case study of a transport mega-project

    Adv. Eng. Informatics.

    (2020)
  • M. Memarzadeh et al.

    Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors

    Autom. Constr.

    (2013)
  • J. Teizer et al.

    Autonomous pro-active real-time construction worker and equipment operator proximity safety alert system

    Autom. Constr.

    (2010)
  • O. Angah et al.

    Tracking multiple construction workers through deep learning and the gradient based method with re-matching based on multi-object tracking accuracy

    Autom. Constr.

    (2020)
  • J. Cai et al.

    A context-augmented deep learning approach for worker trajectory prediction on unstructured and dynamic construction sites

    Adv. Eng. Informatics.

    (2020)
  • M.R. Ibrahim et al.

    Understanding cities with machine eyes: a review of deep computer vision in urban analytics

    Cities

    (2020)
  • B. Yogameena et al.

    Computer vision based crowd disaster avoidance system: a survey

    Int. J. Disaster Risk Reduct.

    (2017)
  • T. Fu et al.

    Investigating secondary pedestrian-vehicle interactions at non-signalized intersections using vision-based trajectory data

    Transp. Res. Part C Emerg. Technol.

    (2019)
  • T. Sayed et al.

    Automated safety diagnosis of vehicle–bicycle interactions using computer vision analysis

    Saf. Sci.

    (2013)
  • F. Wang et al.

    Predictive control of indoor environment using occupant number detected by video data and CO2 concentration

    Energy Build.

    (2017)
  • Y. Benezeth et al.

    Towards a sensor for detecting human presence and characterizing activity

    Energy Build.

    (2011)
  • J. Zou et al.

    Occupancy detection in the office by analyzing surveillance videos and its application to building energy conservation

    Energy Build.

    (2017)
  • H. Wang et al.

    Image-based occupancy positioning system using pose-estimation model for demand-oriented ventilation

    J. Build. Eng.

    (2021)
  • J. Yang et al.

    Review of occupancy sensing systems and occupancy modeling methodologies for the application in institutional buildings

    Energy Build.

    (2016)
  • M. Osadchy et al.

    Efficient detection under varying illumination conditions and image plane rotations

    Comput. Vis. Image Underst.

    (2004)
  • Z. Zivkovic et al.

    Efficient adaptive density estimation per image pixel for the task of background subtraction

    Pattern Recognit. Lett.

    (2006)
  • M. Hamdy et al.

    The impact of climate change on the overheating risk in dwellings—A Dutch case study

    Build. Environ.

    (2017)
  • J.R. Padilla-López et al.

    Visual privacy protection methods: a survey

    Expert Syst. Appl.

    (2015)
  • P. De Wilde, Y. Sun, G. Augenbroe, Quantifying the performance gap-An initial probabilistic attempt, in: Eur. Gr....
  • C.J. Hopfe et al.

    Uncertainty analysis in building performance simulation for design support

    Energy Build.

    (2011)
  • S. Seyedzadeh et al.

    Machine learning for estimation of building energy consumption and performance: a review

    Vis. Eng.

    (2018)
  • M. Schweiker et al.

    Verification of stochastic models of window opening behaviour for residential buildings

    J. Build. Perform. Simul.

    (2012)
  • H.B. Gunay et al.

    Coupling stochastic occupant models to building performance simulation using the discrete event system specification formalism

    J. Build. Perform. Simul.

    (2014)
  • Cited by (9)

    • HIGSA: Human image generation with self-attention

      2023, Advanced Engineering Informatics
      Citation Excerpt :

      More significantly, HIGSA can be applied to many artifacts-centered engineering fields [66–69]. We believe that the use of HIGSA as a tool to generate new human images can be further developed to strengthen vision-based estimation of the number of occupants [70]. In general, the proposed method for directly strengthen the human neuroepithelial organoids task will provide interesting inputs for future work [71,72].

    • A fusion framework for vision-based indoor occupancy estimation

      2022, Building and Environment
      Citation Excerpt :

      Previous indoor occupancy estimation approaches using two-vision fusion are rare. Moreover, existing few studies [24,28,29,38] based on two-vision fusion mainly focus on the third-level fusion. Besides, these studies do not address the problems of cumulative errors and unstable predictions, resulting in errors in indoor occupancy estimation.

    View all citing articles on Scopus
    View full text