Predicting body fat percentage from anthropometric and laboratory measurements using artificial neural networks
Graphical abstract
Introduction
Obesity [1] is widely considered to be one of the most important current public health problems due to its continuously increasing prevalence in the developed world (affecting both adults [2], [3] and children [4], [5]) on the one hand, and the seriousness of the health risks it gives rise to on the other hand. Increased risk of a number of diseases have been casually linked to obesity, including type 2 diabetes mellitus, hypertension, ischaemic heart disease, stroke, infertility, osteoarthritis, liver and gallbladder disease and certain tumors [6]. Not surprisingly, obesity also increases all-cause mortality [7] and poses a significant economical burden as well [8], [9].
Screening for the disease and accurate tracking of the severity for the already ill both underline the importance of the exact measurement of obesity. This is, however, not a trivial question: the definition of obesity (“condition of excess body fat” [1,p. 3]) does not directly give rise to any quantitative metric. Weight is a straightforward proxy for body fat and is easy to measure but is almost meaningless without information on the overall stature of the person. Usually height is used for that purpose, leading to indicators such as body mass index (BMI) [10], which is so widely used that even the definition of obesity is sometimes linked to it, and is endorsed by the World Health Organization [11].
It is, however, well-known that these indicators, even though stature is taken into account, often perform poorly [12] in predicting health outcomes because they do not measure body fat itself, much less its distribution (which is also known to be prognostic: visceral fat, i.e. abdominal obesity is especially associated with negative outcome [13]), among others. Methods such as waist circumference or waist-to-hip ratio measurement try to correct for this aspect [14].
A much better approach would be the direct measurement of body fat mass itself, or body fat percentage (BFP), i.e. body fat mass divided by body weight, but it is hindered by the fact that its measurement is difficult, unfit for wider use. (Precise methods include dual energy X-ray absorptiometry (DXA), bioelectrical impedance analysis (BIA) and air displacement plethysmography [15].)
It would be therefore important if BFP could be predicted from easily measurable parameters such as basic sociodemographic data (age, gender), basic anthropometric data (weight, height, waist circumference) and basic laboratory parameters obtained from routine blood drawing. The rationale of this last component is that obesity is associated with a systemic inflammation state [16] and is demonstrated to be associated with changes in clinical chemistry parameters [17]. It is therefore appealing intuitively to include these parameters too.
The aim of the present research is to investigate how well BFP can be predicted from these parameters. That is, clinical prediction models [18] were built and validated from these variables using an empirical database (where BFP was measured with BIA as gold standard).
The current aim was not the analysis of the relationships between the predictor variables and the response aimed to provide new clinical knowledge, rather the building of a purely predictive model. This allowed us to use modern tools of soft computing (machine learning) which are black-box in that sense, but might provide better predictions. Neural networks were chosen as an example, in particular ordinary multi-layer feedforward neural networks [19] and support vector machines [20]. As a comparison, linear regression was used, illustrating the more traditional biostatistical approach.
Section snippets
Database
National Health and Nutrition Examination Survey (NHANES) is now a continuous American public health program, with results published in biannual cycles [21]. It is a nation-wide survey aimed to be representative for the whole civilian non-institutionalized US population, by employing a complex, stratified multi-stage probability sampling plan. The amount of collected data is tremendous (although sometimes varying from cycle to cycle), including demographic data, physical examination, collection
Results
The results of the linear regression for the whole database are shown in Fig. 2.
This illustrates that this model is interpretable, i.e. a clinical meaning can be associated with its results. (But note that it is the model for the whole dataset, without validation.)
Results of the parameter search, and the fitted vs. actual plot of the best model can be seen in Fig. 3 for the ordinary feedforward neural network and in Fig. 4 for the support vector machine. As the figures show, the optimal
Discussion
Results obtained with modern soft computing techniques were not convincingly better than linear regression.
Only SVM was able to clearly outperform simple regression, but this was rather an advantage in terms of stability of the results across bootstrap replicates and not substantially improved average value (RMSE: 0.0988 ± 0.00288 vs. 0.107 ± 0.012). Feedforward neural networks had an average performance nearly identical to regression with only minimally improved stability (RMSE: 0.108 ± 0.00928).
Conclusions
The soft computing methods investigated in the present study were not able to substantially outperform simple linear regression. While support vector machine did exhibit some advantage (but even it had an R2 of 44%), it is overall balanced by the fact that regression models are clinically interpretable, i.e. “white-box”. While for some modern methods of machine learning, exploration of the models is a possibility, we aimed to focus on very well established, classical methods. Inclusion of other
Funding sources
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author declaration
There are no conflicts of interest associated with this publication and there has been no financial support for this work that could have influenced its outcome.
Acknowledgements
Tamás Ferenci was supported by UNKP-16-4/IIINew National Excellence Program of the Ministry of Human Capacities.
References (43)
- et al.
Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013
Lancet
(2014) - et al.
Health and economic burden of the projected obesity trends in the USA and the UK
Lancet
(2011) - et al.
Indices of relative weight and obesity
J. Chron. Dis.
(1972) - et al.
Indices of abdominal obesity are better discriminators of cardiovascular risk factors than BMI: a meta-analysis
J. Clin. Epidemiol.
(2008) - et al.
The assessment of obesity: methods for measuring body fat and global prevalence of obesity
Best Pract. Res. Clin. Endocrinol. Metab.
(1999) - et al.
Healthy percentage body fat ranges: an approach for developing guidelines based on body mass index
Am. J. Clin. Nutr.
(2000) - et al.
The relationship between BMI and percent body fat, measured by bioelectrical impedance, in a large adult sample is curvilinear and influenced by age and sex
Clin. Nutr.
(2010) - et al.
Predicting body fat percentage based on gender, age and BMI by using artificial neural networks
Comput. Methods Programs Biomed.
(2014) - et al.
Obesity: Epidemiology, Pathophysiology, and Prevention
(2012) - et al.
Prevalence of obesity and trends in the distribution of body mass index among US adults, 1999–2010
JAMA
(2012)
Epidemiology of obesity in the western hemisphere
J. Clin. Endocrinol. Metab.
Prevalence of obesity and trends in body mass index among US children and adolescents, 1999–2010
JAMA
Health risks associated with overweight and obesity
Obes. Rev.
Association of all-cause mortality with overweight and obesity using standard body mass index categories: a systematic review and meta-analysis
JAMA
The economic burden of obesity worldwide: a systematic review of the direct costs of obesity
Obes. Rev.
Technical report series 894: Obesity: Preventing and Managing the Global Epidemic
Body mass index, waist circumference and waist:hip ratio as predictors of cardiovascular risk – a review of the literature
Eur. J. Clin. Nutr.
Abdominal obesity and the metabolic syndrome: contribution to global cardiometabolic risk
Arterioscler. Thromb. Vasc. Biol.
Inflammation, oxidative stress, and obesity
Int. J. Mol. Sci.
Two applications of biostatistics in the analysis of pathophysiological processes
Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating
Cited by (23)
Sex-based approach to estimate human body fat percentage from 2D camera images with deep learning and machine learning
2023, Measurement: Journal of the International Measurement ConfederationPredicting body fat using a novel fuzzy-weighted approach optimized by the whale optimization algorithm
2023, Expert Systems with ApplicationsInformation fusion via symbolic regression: A tutorial in the context of human health
2023, Information FusionArtificial intelligence and body composition
2023, Diabetes and Metabolic Syndrome: Clinical Research and ReviewsA hybrid feature selection algorithm using simplified swarm optimization for body fat prediction
2022, Computer Methods and Programs in BiomedicineDetermination of Body Fat Percentage by Gender Based with Photoplethysmography Signal Using Machine Learning Algorithm
2022, IRBMCitation Excerpt :The use of machine learning algorithms is more efficient than statistical methods. As a matter of fact, it provides better performance in BFP estimation [9–11]. In a study comparing BIA and skinfold thickness measurement methods in BFP calculation, it was concluded that the methods could be used interchangeably [4].