Abstract
Developing policies for a greener society calls for understanding the energy consumption patterns of its households. Using data from the Bureau of Labor Statistics 2015 Consumer Expenditure Survey and the U.S. Energy Information Administration, this article considers variations in energy expenditure and consumption patterns in the United States and seeks to determine if there is a relationship between a household’s energy expenditure and use patterns, and its sociodemographic characteristics. The study begins with a set of sociodemographic characteristics such as housing size, family size, number of cars, and education level, and uses cluster analysis to reduce these variables into a single categorical sociodemographic variable. Analyses of variance are then performed to study differences in energy consumption patterns among the clusters across the United States. Additionally, chi-square tests are applied to study associations between energy use with other defining variables such as geographic region and housing tenure. Notable findings include an economy of scaling when multiple people live together, larger energy demands of more isolated residences, and lower energy demands of urban blue-collar households. In the face of climate change, there has been growing interest in developing energy conservation goals. With this study, we seek to contribute to the discussion by investigating possible factors associated with certain energy use patterns.
Similar content being viewed by others
References
Bin S, Dowlatabadi H (2005) Consumer lifestyle approach to US energy use and the related CO2 emissions. Energy Policy 33:197–208
Burbidge J, Magee L, Robb AL (1988) Alternative transformations to handle extreme values of the dependent variable. J Am Stat Assoc 83(401):123–127
Bureau of Labor Statistics (2017) Consumer expenditure survey public use microdata. https://www.bls.gov/cex/pumd_data.htm. accessed 6 May 2017
Bureau of Transportation Statistics, US Department of Transportation. Commuting to Work. https://www.bts.gov/archive/publications/state_transportation_statistics/state_transportation_statistics_2006/table_04_01. accessed 2 Apr 2022
Chowdhury S (2013) A computation of Taylor linearization and balanced repeated methods for variance estimation in medical expenditure panel survey. Agency for Healthcare Research and Quality. Working Paper No. 13004, July 2013
Cumming G, Finch S (2005) Inference by eye: confidence intervals and how to read pictures of data. Am Psychol 60(2):170–180
Druckman A, Jackson T (2008) Household energy consumption in the UK: a highly geographically and socio-economically disaggregated model. Energy Policy 36(8):3177–3182
Druckman A, Jackson T (2009) The carbon footprint of UK households 1990–2004: a socio-economically disaggregated, quasi-multi-regional input–output model. Ecol Econ 68:2066–2077
Eitches E, Crain V (2016) Using gasoline data to explain inelasticity. In beyond the numbers. Vol. 5, No. 5. Bureau of Labor Statistics. https://www.bls.gov/opub/btn/volume-5/using-gasoline-data-to-explain-inelasticity.htm. accessed 18 May 2019
Ewing R, Rong F (2008) The impact of urban form on US residential energy use. Housing Policy Debate 19(1):1–30
Gallo A (2015) A refresher on regression analysis. Harvard Business Review. https://hbr.org/2015/11/a-refresher-on-regression-analysis. accessed 16 Jan 2020
Garner T, Martinez W (2022) Computational statistics
Glaeser E, Kahn M (2010) The greenness of cities: carbon dioxide emissions and urban development. J Urban Econ 67:404–418
Heyman M, Eekout I (2019) Applied missing data analysis with SPSS and R studio, section 13.4. https://bookdown.org/mwheymans/bookmi/pooling-methods-for-categorical-variables.html. accessed 31 July 2019
Horton N, Lipsitz S (2001) Multiple imputation in practice: comparison of software packages for regression models with missing variables. J Am Stat Assoc 55:244–254
Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River
Kassambara A (2017) Practical guide to cluster analysis in R: unsupervised machine learning, 1st edn. STDA, Montpellier
Mangiafico S (2016) Summary and analysis of extension program evaluation in R, version 1.18.1. https://rcompanion.org/handbook/G_05.html. accessed 23 Jan 2020
Matthiesen R, Frades I (2010) Overview on techniques in cluster analysis. Methods Mol Biol 593:81–107
McDonald J (2014) Handbook of biological statistics, 3rd edn. Sparky House Publishing, Baltimore
National Oceanic and Atmospheric Administration (2017) National weather service: climate. http://w2.weather.gov/climate/index.php accessed 20 May 2019
Office of National Statistics, Methodology and Variables. Details of the methodology and 2011 Census statistics used for the 2011 area classifications. https://www.ons.gov.uk/methodology/geography/geographicalproducts/areaclassifications/2011areaclassifications/methodologyandvariables accessed 1 June 2019
Pendall R, Goodman L, Zhu J, Gold A (2016) Urban wire: people and homes are aging quickly in our rural communities. Urban Institute. https://www.urban.org/urban-wire/people-and-homes-are-aging-quickly-our-rural-communities accessed 25 June 2019
Rubin D (1987) Multiple imputation for nonresponse in surveys, 1st edn. Wiley, New York
Sakshaug W, Brady T (2014) Important considerations when analyzing health survey data collected using a complex sample design. Am J Public Health 104(1):15–16
The Economist Intelligence Unit (2015) Tracking energy demand trends. http://trackingenergydemandtrends.eiu.com. accessed 15 Mar 2017
US Energy Information Administration (2011) Household heating fuels vary across the country. https://www.eia.gov/todayinenergy/detail.php?id=3690 accessed 5 June 2018
US Energy Information Administration (2014) Gasoline prices tend to have little effect on demand for car travel. https://www.eia.gov/todayinenergy/detail.php?id=19191 accessed 16 Jan 2020
US Energy Information Administration (2017a) State profile and energy estimates. https://www.eia.gov/state/analysis.php accessed 5 June 2018
US Energy Information Administration (2017b) Natural gas prices. https://www.eia.gov/dnav/ng/ng_pri_sum_a_EPG0_PRS_DMcf_m.htm accessed 5 June 2018
US Energy Information Administration (2017c) Electricity data browser. https://www.eia.gov/electricity/data/browser/ accessed 5 June 2018
US Energy Information Administration (2017d) Gasoline and diesel fuel update. https://www.eia.gov/petroleum/gasdiesel/ accessed 5 June 2018
Williams R (2000) A note on robust variance estimation for cluster correlated data. Biometrics 56(2):645–646
Zavarella L (2018) Mosaic plot and chi-square test. Towards data science. https://towardsdatascience.com/mosaic-plot-and-chi-square-test-c41b1a527ce4 accessed 16 Jan 2020
Acknowledgements
We would like to thank Eric Nord, from the Pennsylvania State University, for his R expertise and guidance. We would also like to acknowledge Mosuk Chow, also from the Pennsylvania State University, for her continued encouragement during this project. Lastly, we would like to thank the ASA Statistical Computing, Government Statistics, and Statistical Graphics sections and Wendy Martinez, Thesia Garner, Jürgen Symanzik, and the team at the Bureau of Labor Statistics for sponsoring the 2017 Data Challenge for which this study was done. The cluster, correlation, and chi-square analyses in this project were performed using R and the following R packages: nlme, scales, cluster, lsmeans, and multcompView. Documentation for all R packages can be found in their respective pages within the Comprehensive R Archive Network: https://cran.r-project.org/. Multiple imputation and analyses of variance were conducted in SAS using procedures SURVEYREG, MI, MIANALYZE, and SGPLOT. Documentation for all SAS procedures can be found within SAS Support: https://support.sas.com/en/documentation.html.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: redefinition of levels for BLS_URBN, REGION, BUILDING, CUTENURE, FAM_TYPE, and HIGH_EDU
The Bureau of Labor Statistics data dictionary defines and records values for these variables on a numeric scale. Table 8 shows the original BLS definition and the simplification used in this study. See Sect. 3.1.1.
Appendix 2: test results for determining number of clusters k
Results from Elbow, Silhouette, and Gap Statistics Methods. As referenced in Sect. 4.2.2, the Elbow (Fig. 12), Silhouette (Fig. 13), and Gap Statistic methods (Table 9) were considered when determining the number clusters.
Appendix 3: average quantitative sociodemographic characteristics of the six sociodemographic clusters
After applying the K-Means six-cluster partitioning, Table 10 summarizes the characteristics of the resulting households clusters. See Sect. 4.2.3.
Appendix 4: 95% confidence intervals (CI) for least squares means of the inverse hyperbolic sine transformed energy consumption levels
Results from the two-way ANOVA by region and cluster. Tables 11, 12, 13 and 14 and Figs. 14, 15, 16 and 17 show results of the Tukey–Kramer procedure with confidence intervals of energy consumption levels of each energy type. Table 11 and Fig. 14 present natural gas consumption by region and socioeconomic cluster; Table 12 and Fig. 15 show electricity consumption by region and socioeconomic cluster; Table 13 and Fig. 16 show gasoline and motor oil consumption by region, while Table 14 and Fig. 17 show gasoline and motor oil consumption by socioeconomic cluster. See Sect. 4.3.
4.1 Appendix 4.1: natural gas consumption by region and sociodemographic cluster
4.2 Appendix 4.2: electricity consumption by region and sociodemographic cluster
4.3 Appendix 4.3: gasoline and motor oil consumption by region
4.4 Appendix 4.4: gasoline and motor oil consumption by sociodemographic cluster
Appendix 5: variance and efficiency tables
To address the incomplete observations in the data, a multiple imputation (MI) procedure was applied as described in Sect. 3.2.1. Tables 15, 16, 17 and 18 of this appendix show the variance and efficiency values from this imputation and the analyses of variance: Table 15 presents the values from the analysis on natural gas consumption by region and socioeconomic cluster; Table 16 for electricity consumption by region and socioeconomic cluster; Table 17 for gasoline and motor oil consumption by region; Table 18 for gasoline and motor oil consumption by socioeconomic cluster. See Sect. 4.3.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meechai, J., Wijesinha, M. Household energy expenditure and consumption patterns in the United States. Comput Stat 37, 2095–2127 (2022). https://doi.org/10.1007/s00180-022-01255-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01255-y