Elsevier

Applied Soft Computing

Volume 67, June 2018, Pages 834-839
Applied Soft Computing

Predicting body fat percentage from anthropometric and laboratory measurements using artificial neural networks

https://doi.org/10.1016/j.asoc.2017.05.063Get rights and content

Highlights

  • Body fat percentage is predicted from easily measurable data to quantify obesity risk.

  • Linear regression, neural networks and support vector machines are used.

  • Models built on empirical data from a representative US health survey (n = 862).

  • Optimal parameters are chosen and bootstrap validation is used.

  • Linear regression is slightly outperformed by support vector machines, but not neural networks.

Abstract

Purpose of the research

Obesity is a major public health problem with rapidly growing prevalence and serious associated health risks. Characterized by excess body fat, the accurate measurement of obesity is a non-trivial question. Widely used indicators, such as the body mass index often poorly predict actual risk, but the direct measurement of body fat mass is complicated. The aim of the present research is to investigate how well can body fat percentage be predicted from easily measureable data: age, gender, weight, height, waist circumference and different laboratory results. For that end, linear regression, feedforward neural networks and support vector machines are applied on the data of a representative US health survey (n = 862) using adult males. Optimal parameters are chosen and bootstrap validation is used to get realistic error estimates.

Results

No methods can well predict the body fat percentage, but support vector machines slightly outperformed feedforward neural networks and linear regression (root mean square error 0.0988 ± 0.00288, 0.108 ± 0.00928 and 0.107 ± 0.012 respectively).

Conclusion

Even this best performance means that soft computing methods had an R2 of 44%, but this slight advantage is balanced by the fact that regression models are clinically interpretable.

Introduction

Obesity [1] is widely considered to be one of the most important current public health problems due to its continuously increasing prevalence in the developed world (affecting both adults [2], [3] and children [4], [5]) on the one hand, and the seriousness of the health risks it gives rise to on the other hand. Increased risk of a number of diseases have been casually linked to obesity, including type 2 diabetes mellitus, hypertension, ischaemic heart disease, stroke, infertility, osteoarthritis, liver and gallbladder disease and certain tumors [6]. Not surprisingly, obesity also increases all-cause mortality [7] and poses a significant economical burden as well [8], [9].

Screening for the disease and accurate tracking of the severity for the already ill both underline the importance of the exact measurement of obesity. This is, however, not a trivial question: the definition of obesity (“condition of excess body fat” [1,p. 3]) does not directly give rise to any quantitative metric. Weight is a straightforward proxy for body fat and is easy to measure but is almost meaningless without information on the overall stature of the person. Usually height is used for that purpose, leading to indicators such as body mass index (BMI) [10], which is so widely used that even the definition of obesity is sometimes linked to it, and is endorsed by the World Health Organization [11].

It is, however, well-known that these indicators, even though stature is taken into account, often perform poorly [12] in predicting health outcomes because they do not measure body fat itself, much less its distribution (which is also known to be prognostic: visceral fat, i.e. abdominal obesity is especially associated with negative outcome [13]), among others. Methods such as waist circumference or waist-to-hip ratio measurement try to correct for this aspect [14].

A much better approach would be the direct measurement of body fat mass itself, or body fat percentage (BFP), i.e. body fat mass divided by body weight, but it is hindered by the fact that its measurement is difficult, unfit for wider use. (Precise methods include dual energy X-ray absorptiometry (DXA), bioelectrical impedance analysis (BIA) and air displacement plethysmography [15].)

It would be therefore important if BFP could be predicted from easily measurable parameters such as basic sociodemographic data (age, gender), basic anthropometric data (weight, height, waist circumference) and basic laboratory parameters obtained from routine blood drawing. The rationale of this last component is that obesity is associated with a systemic inflammation state [16] and is demonstrated to be associated with changes in clinical chemistry parameters [17]. It is therefore appealing intuitively to include these parameters too.

The aim of the present research is to investigate how well BFP can be predicted from these parameters. That is, clinical prediction models [18] were built and validated from these variables using an empirical database (where BFP was measured with BIA as gold standard).

The current aim was not the analysis of the relationships between the predictor variables and the response aimed to provide new clinical knowledge, rather the building of a purely predictive model. This allowed us to use modern tools of soft computing (machine learning) which are black-box in that sense, but might provide better predictions. Neural networks were chosen as an example, in particular ordinary multi-layer feedforward neural networks [19] and support vector machines [20]. As a comparison, linear regression was used, illustrating the more traditional biostatistical approach.

Section snippets

Database

National Health and Nutrition Examination Survey (NHANES) is now a continuous American public health program, with results published in biannual cycles [21]. It is a nation-wide survey aimed to be representative for the whole civilian non-institutionalized US population, by employing a complex, stratified multi-stage probability sampling plan. The amount of collected data is tremendous (although sometimes varying from cycle to cycle), including demographic data, physical examination, collection

Results

The results of the linear regression for the whole database are shown in Fig. 2.

This illustrates that this model is interpretable, i.e. a clinical meaning can be associated with its results. (But note that it is the model for the whole dataset, without validation.)

Results of the parameter search, and the fitted vs. actual plot of the best model can be seen in Fig. 3 for the ordinary feedforward neural network and in Fig. 4 for the support vector machine. As the figures show, the optimal

Discussion

Results obtained with modern soft computing techniques were not convincingly better than linear regression.

Only SVM was able to clearly outperform simple regression, but this was rather an advantage in terms of stability of the results across bootstrap replicates and not substantially improved average value (RMSE: 0.0988 ± 0.00288 vs. 0.107 ± 0.012). Feedforward neural networks had an average performance nearly identical to regression with only minimally improved stability (RMSE: 0.108 ± 0.00928).

Conclusions

The soft computing methods investigated in the present study were not able to substantially outperform simple linear regression. While support vector machine did exhibit some advantage (but even it had an R2 of 44%), it is overall balanced by the fact that regression models are clinically interpretable, i.e. “white-box”. While for some modern methods of machine learning, exploration of the models is a possibility, we aimed to focus on very well established, classical methods. Inclusion of other

Funding sources

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author declaration

There are no conflicts of interest associated with this publication and there has been no financial support for this work that could have influenced its outcome.

Acknowledgements

Tamás Ferenci was supported by UNKP-16-4/IIINew National Excellence Program of the Ministry of Human Capacities.

References (43)

  • E.S. Ford et al.

    Epidemiology of obesity in the western hemisphere

    J. Clin. Endocrinol. Metab.

    (2008)
  • C.L. Ogden et al.

    Prevalence of obesity and trends in body mass index among US children and adolescents, 1999–2010

    JAMA

    (2012)
  • P. Kopelman

    Health risks associated with overweight and obesity

    Obes. Rev.

    (2007)
  • K.M. Flegal et al.

    Association of all-cause mortality with overweight and obesity using standard body mass index categories: a systematic review and meta-analysis

    JAMA

    (2013)
  • D. Withrow et al.

    The economic burden of obesity worldwide: a systematic review of the direct costs of obesity

    Obes. Rev.

    (2011)
  • World Health Organization

    Technical report series 894: Obesity: Preventing and Managing the Global Epidemic

    (2000)
  • R. Huxley et al.

    Body mass index, waist circumference and waist:hip ratio as predictors of cardiovascular risk – a review of the literature

    Eur. J. Clin. Nutr.

    (2010)
  • J.-P. Després et al.

    Abdominal obesity and the metabolic syndrome: contribution to global cardiometabolic risk

    Arterioscler. Thromb. Vasc. Biol.

    (2008)
  • A. Fernández-Sánchez et al.

    Inflammation, oxidative stress, and obesity

    Int. J. Mol. Sci.

    (2011)
  • T. Ferenci

    Two applications of biostatistics in the analysis of pathophysiological processes

    (2013)
  • E. Steyerberg

    Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating

    (2008)
  • Cited by (23)

    • Artificial intelligence and body composition

      2023, Diabetes and Metabolic Syndrome: Clinical Research and Reviews
    • Determination of Body Fat Percentage by Gender Based with Photoplethysmography Signal Using Machine Learning Algorithm

      2022, IRBM
      Citation Excerpt :

      The use of machine learning algorithms is more efficient than statistical methods. As a matter of fact, it provides better performance in BFP estimation [9–11]. In a study comparing BIA and skinfold thickness measurement methods in BFP calculation, it was concluded that the methods could be used interchangeably [4].

    View all citing articles on Scopus
    View full text