We implement jackknife model averaging (JMA) and a new prediction technique—hybrid-boost model averaging (HbMA)—to a surgical dataset that includes categorical explanatory variables. The model requirements for HbMA are different to that for JMA. HbMA generally does not require decent models to be included in the model average. However, the utility of HbMA is limited by the possibility of multiple solutions for the HbMA weights. Both model averaging approaches are comparable under the appropriate conditions. Among all the model averages considered, the best jackknife model average gives slightly better predictions of the surgery durations than the best hybrid-boost model average when evaluated on our surgical dataset. Finally, we discuss several methods that may further improve the performance of HbMA.

Similar content being viewed by others
This term is occasionally used in the literature to represent a particular model averaging method. In Refs. [8, 9], this term represents “jackknife model averaging”. However, this term is also used to denote “ex-ante model averaging” [10]. This inconsistency creates unnecessary confusion and we shall conform with the widely-recognised view among statisticians that stacking refers to the class of methods that combine models to improve their predictive ability.
Al-Benna S. The impact of late-starts and overruns on theatre utilisation rates. J Perioper Pract. 2012;22(6):197–9. https://doi.org/10.1177/175045891202200603.
Dexter F, Epstein RH, Bayman EO, Ledolter J. Estimating surgical case durations and making comparisons among facilities: identifying facilities with lower anesthesia professional fees. Anesth Analg. 2013;116(5):1103–15. https://doi.org/10.1213/ANE.0b013e31828b3813.
Devi SP, Rao KS, Sangeetha SS. Prediction of surgery times and scheduling of operation theaters in optholmology department. J Med Syst. 2012;36(2):415–30. https://doi.org/10.1007/s10916-010-9486-z.
ShahabiKargar Z, Khanna S, Good N, Sattar A, Lind J, O’Dwyer J. Predicting procedure duration to improve scheduling of elective surgery. In: Pham D-N, Park S-B (Eds.), PRICAI 2014: trends in artificial intelligence. Springer International Publishing, Cham, ISBN 978-3-319-13560-1, pp. 998–1009, 10.1007/978-3-319-13560-1\_86, 2014.
Hosseini N, Sir MY, Jankowski CJ, Pasupathy K. Surgical duration estimation via data mining and predictive modeling: a case study. In: AMIA annual symposium proceedings. 2015; pp 640–8.
Edelman ER, van Kuijk SMJ, Hamaekers AEW, de Korte MJM, van Merode GG, Buhre WFFA. Improving the prediction of total surgical procedure time using linear regression modeling. Front Med. 2017;4:85. https://doi.org/10.3389/fmed.2017.00085.
Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–59.
Clarke B. Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J Mach Learn Res. 2003;4:683–712.
Le T, Clarke B. A Bayes interpretation of stacking for \({\cal{M}}\)-complete and \({\cal{M}}\)-open settings. Bayesian Anal. 2017;12(3):807–29. https://doi.org/10.1214/16-BA1023.
Morana C. Model averaging by stacking. Open J Stat. 2015;5(7):797–807. https://doi.org/10.4236/ojs.2015.57079.
Clemen RT. Combining forecasts: a review and annotated bibliography. Int J Forecast. 1989;5(4):559–83.
Hansen BE, Racine JS. Jackknife model averaging. J Econom. 2012;167(1):38–46. https://doi.org/10.1016/j.jeconom.2011.06.019.
Gao Y, Luo M, Zou G. Forecasting with model selection or model averaging: a case study for monthly container port throughput. Transp A Transport Sci. 2016;12(4):366–84. https://doi.org/10.1080/23249935.2015.1137652.
Yu X, Xiao L, Zeng P, Huang S. Jackknife model averaging prediction methods for complex phenotypes with gene expression levels by integrating external pathway information. Comput Math Methods Med. 2019;2019:8. https://doi.org/10.1155/2019/2807470.
Soh KW, Lumley T, Walker C, O’Sullivan M. Model averaging with the hybrid model: an asymptotic study and demonstration, 2020; (submitted).
Soh KW, Walker C, O’Sullivan M, Wallace J. An evaluation of the hybrid model for predicting surgery duration. J Med Syst. 2020;44:42. https://doi.org/10.1007/s10916-019-1501-4.
LeBlanc M, Tibshirani R. Combining estimates in regression and classification. J Am Stat Assoc. 1996;91(436):1641–50. https://doi.org/10.2307/2291591.
van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:123. https://doi.org/10.2202/1544-6115.1309.
Ando T, Li K-C. A model-averaging approach for high-dimensional regression. J Am Stat Assoc. 2014;109(505):254–65. https://doi.org/10.1080/01621459.2013.838168.
Soh KW, Walker C, O’Sullivan M, Wallace J, Grayson D. Case study of the prediction of elective surgery durations in a New Zealand teaching hospital. Int J Health Plan Manag. 2020;. https://doi.org/10.1002/hpm.3046 (forthcoming).
R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2018. https://www.R-project.org/.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that they have no conflicts of interest.
Human and animal participants
This article does not contain any studies with human participants performed by any of the authors. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1: Formulae of Models for JMA
Appendix 1: Formulae of Models for JMA
In “Models for JMA”, we briefly describe the linear regression models for JMA. To describe the fitted equations of these models, we require the following notations. For a finite set \(A = \{a_{0}, a_{1}, \ldots , a_{p}\}\) of binary explanatory variables considered in the linear regression model, we shall write its fitted relationship for the response variable, y, as \(y = f(A) = \beta _{0} + \sum _{i=1}^{p}\beta _{i}a_{i}\). We remark that \(\beta _{0}\) predicts the baseline which is represented by \(a_{0} = 1\) and \(a_{1} = a_{2} = \ldots = a_{p} = 0\). We similarly consider another set of binary explanatory variables \(B = \{b_{0}, b_{1}, \ldots , b_{q} \}\), where \(b_{0} = 1\) and \(b_{1} = b_{2} = \ldots = b_{q} = 0\) is the baseline. Now, the fitted relationship between the explanatory variables in \(A \cup B\) and the response variable can be expressed in a similar form: \(y = f(A \cup B) = \beta _{0} + \sum _{i=1}^{p}\beta _{i}a_{i} + \sum _{j=1}^{q}\delta _{j}b_{j}\). We also denote \(A * B = A \cup B \cup \{a_{i}b_{j} \;\vert \; \forall i \ge 1, \forall j \ge 1 \}\). Note that each interaction term \(a_{i}b_{j}\) is binary, and the associated fitted equation is \(y = f(A * B) = \beta _{0} + \sum _{i=1}^{p}\beta _{i}a_{i} + \sum _{j=1}^{q}\delta _{j}b_{j} + \sum _{i=1}^{p}\sum _{j=1}^{q}\gamma _{ij}a_{i}b_{j}\).
Referring back to the surgical dataset, we let \(\mathcal {P}\) be the set of binary explanatory variables for the 172 procedures. Similarly, let \(\mathcal {S}\) and \(\mathcal {R}\), respectively, be the set of 43 surgeons and 5 ratings for patients’ physical status in the surgical dataset. Recall that each (rolling) training set is a subset of the dataset, so we can similarly define the sets \(P_{k}, S_{k}, R_{k}\) for the kth training set such that \(P_{k} \subseteq \mathcal {P}\), \(S_{k} \subseteq \mathcal {S}\) and \(R_{k} \subseteq \mathcal {R}\). We shall drop the subscripts as we are only interested in the general form of the fitted equations and not the specific levels of explanatory variables that appear in each training set. The linear regression models that will be trained from a training set and used in our jackknife model averages are: \(y = f(P)\), \(y = f(P \cup S)\), \(y = f(P \cup S \cup R)\) and \(y = f(P * S)\). As seen in “Models for JMA”, we denote these models by “J1”, “J2”, “J3” and “J4”, respectively.
Rights and permissions
About this article
Cite this article
Soh, K., Walker, C., O’Sullivan, M. et al. Comparison of Jackknife and Hybrid-Boost Model Averaging to Predict Surgery Durations: A Case Study. SN COMPUT. SCI. 1, 316 (2020). https://doi.org/10.1007/s42979-020-00339-0
DOI: https://doi.org/10.1007/s42979-020-00339-0