Derivative-free optimization of combinatorial problems – A case study in colorectal cancer screening

doi:10.1016/j.compchemeng.2020.107193

Computers & Chemical Engineering

Volume 145, February 2021, 107193

https://doi.org/10.1016/j.compchemeng.2020.107193 Get rights and content

Abstract

In the US, colorectal cancer (CRC) is a significant burden on society as the 2^nd most deadly cancer. This burden can be mitigated by early detection or prevention via screening asymptomatic individuals using a screening strategy. The progression of CRC is not known with certainty, exhibiting a challenge for determining an optimal screening strategy for populations. A microsimulation model is utilized to incorporate this uncertainty within a population to estimate the benefits of a given screening strategy. The optimization problem for determining CRC screening strategies is formulated as a combinatorial problem, a challenging problem type for derivative-free optimization (DFO) solvers. We assess ten DFO solvers’ ability to handle combinatorial problems using a test problem. Then, a simulation-optimization approach is used to determine the optimal strategy for the population. The best-identified screening strategies were shown to reduce the societal impact of CRC more so than the currently recommended screening strategies.

Introduction

In 2020, it is estimated that there will be approximately 105,000 new cases of colorectal cancer (CRC) and an estimated 43,000 deaths due to colorectal cancer in the United States alone (American Cancer Society, 2020). This makes CRC the 4^th most common form of cancer and the 2^nd most deadly form of cancer in the US for men and women combined (American Cancer Society, 2020). The survival rate of CRC is closely linked to the diagnostic stage. If diagnosed at a localized stage, there is a 90% five-year survival rate; if diagnosed at a late stage, i.e., the cancer has metastasized, the survival rate decreases to about 14% (National Cancer Institute, 2019). The difference in outcomes based on the diagnostic stage demonstrates the benefits and necessity of early detection. Early detection of the disease can be achieved through the use of screening or tests that are performed on asymptomatic individuals to detect CRC or its precursors.

There are two main categories of screening tests for CRC: visual tests, e.g., colonoscopy or computed tomographic colonography (CTC), and stool sample tests. Visual tests are often more accurate; however, they have a higher risk of complications compared to stool-based tests (Issa and Noureddine, 2017). The US Preventative Services Task Force (USPSTF) has compiled a list of the recommended screening guidelines, or strategies, for numerous screening test types (Bibbins-Domingo et al., 2016). These screening strategies, depicted in Fig. 1, are defined by: (1) screening test type(s), (2) how often to use each test type, (3) starting age of screening, and (4) ending age of screening. In Fig. 1, the black boxes represent the years in which screening takes place given the different screening modalities. In the case of when the strategy is a combination of sigmoidoscopy and fecal immunochemical test (FIT), the black represents the years in which a sigmoidoscopy is recommended, and the dark gray represent the years in which a FIT is recommended. The currently recommended screening strategies were constructed through a review of clinical trials (Lin et al., 2016) as well as a computational study (Knudsen et al., 2016), which evaluated and compared the effectiveness of a set of screening strategies for the general population of the US. The effectiveness was assessed per screening test using the life-years gained due to screening considering the test's burden and was calculated using data-driven microsimulation models.

A microsimulation model is a type of model that aims to simulate a large number of independent entities that are described through stochastic parameters to study the effects of specified actions over an entire population (Orcutt, 1957). With microsimulation models, researchers replicate the growth and progression of colorectal cancer within a population and assess population-level trends for given screening strategies. These models estimate the effects of various screening strategies in a shorter timeframe for a drastically lower cost than clinical trials. The models also benefit from a framework of tunable parameters, allowing researchers to tailor their studies to a specific population, e.g., the general US population (Knudsen et al., 2016), Israel's population (Strul et al., 2006), the population of Illinois (Gopalappa et al., 2011), and the general population of China (Song and Wang, 2016). Three well-known CRC microsimulations are MISCAN single bond COLON (Loeve et al., 1999), SimCRC (Frazier et al., 2000), and CRC-SPIN (Rutter and Savarino, 2010). These simulations are part of the American National Institutes of Health's initiative for modeling various forms of cancer, including colorectal cancer, called the Cancer Intervention and Surveillance Modeling Network (National Cancer Institute, 2020), or CISNET. Microsimulations are routinely used to evaluate screening strategies or new screening and treatment options for CRC. Recent studies include determining the cost-effectiveness of a non-invasive theoretical biomarker based screening test compared to using FIT for screening (Lansdorp-Vogelaar et al., 2018), exploring the effect of an early start age for screening in the general US population (Ladabaum et al., 2019), estimating the cost-effectiveness of utilizing risk related screening strategies (Naber et al., 2020), and evaluating the cost-effectiveness of the use of FIT-based screening in the Medicaid population of the US (Wheeler et al., 2020).

The comparative studies that are carried out using microsimulations employ what-if analysis for evaluating the screening strategies. For the what-if analysis, a set of selected screening strategies are constructed by selecting two or three values for the different screening strategy components and evaluating all combinations of those values. The most comprehensive study, Knudsen et al. (2016), considered 204 unique screening strategies. If every unique strategy were considered based on the range of variables defined within the study, around 1,800 CRC screening strategies would have been evaluated. With only evaluating 11% of all possible strategies, it is unlikely that the screening periods, and starting and ending ages identified as the best in the study are the optimal ones for different test types. The identification of the best screening strategy can be formulated as an optimization problem in which the objective is to reduce the overall health and economic impact CRC has on society, and the optimal decision is the screening strategy identified based on the population under study. Here, we refer to this optimization problem as the CRC screening problem.

Several recent publications aimed to solve the CRC screening problem, or similar cancer screening problems, utilize one of the two main approaches: Partially Observable Markov Decision Processes (POMDPs) or simulation-optimization. To solve the problem, POMDPs develop a screening policy, or when to take a screening action, based on the believed state of the underlying Markov Chain representation of the progression of CRC. Simulation optimization searches for the optimal screening strategy using Derivative-Free Optimization (DFO) algorithms, with the output of a microsimulation as the objective function.

The utilization of POMDPs for screening strategy determination have been used for numerous forms of cancer in addition to CRC, such as cervical (Namén León, 2015), stomach (Saumoy et al., 2018), prostate (Zhang and Denton, 2018), breast (Cevik et al., 2018; Madadi et al., 2015; Molani et al., 2019), liver (Lee et al., 2019), and lung (Petousis et al., 2019) cancer. As the literature available for the use of POMDPs is vast when considering the general topic of cancer screening, we focus our review on POMDP studies specific to CRC (Erenay et al., 2014; Li et al., 2015). Erenay et al. (2014) solved a POMDP to develop personalized colonoscopy screening policies based on the sex of the individual and three levels of individualized risk, e.g., low risk, high risk, and post-CRC. The aim was to maximize the total quality-adjusted life-years gained (TQALYG) from screening. Their findings revealed that even with perfect screening compliance, the currently recommended colonoscopy screening strategy is not aggressive enough, with the optimal screening policy determined by the POMDP requiring around twice the total screens as the recommended strategy. Li et al. (2015) investigated the effects of the screening compliance rate on optimal screening policies for TQALYG. The study included both FOBT and colonoscopy as screening tests. The findings confirmed those of Erenay et al. (2014) that the currently recommended screening strategies are not aggressive enough. The results also revealed that for individuals with lower compliance rates, a more frequent screening rate should be recommended. The strategies identified by both of these studies yielded greater TQALYG estimates than the recommended strategies. However, the application of POMDPs for identifying optimal screening strategies requires the implementation of many simplifying assumptions for the health states included, types of screening tests available, and/or how screening is applied in the model to maintain computational tractability.

Simulation-optimization has been used to determine optimal cancer screening strategies for different forms of cancer, including cervical (McLay et al., 2010), breast (Rauner et al., 2010), and prostate (Bertsimas et al., 2018) cancer. Rauner et al. (2010) developed an optimization framework to generate screening strategies for chronic disease that maximized quality-adjusted life-years gained and minimized cost. The framework was used to generate the Pareto optimal set of breast cancer screening strategies. It determined the percentage of a population to screen for disease and how often screenings should occur, including a penalty term if a screening strategy exceeds an annual budget. Using a multi-objective implementation of the ant colony opptimization algorithm, the study found over 5,000 screening polices that outperformed the current recommended screening policy for breast cancer, revealing policy-makers should urge a higher compliance rate for older age women. McLay et al. (2010) performed a simulation-optimization study to generate dynamic age-based screening strategies for cervical cancer. The simulation-optimization was performed using OptQuest for a total number of lifetime screens ranging from one to 22 to determine the optimal ages in which screening should be performed. The age-based screening strategies identified showed similar benefits to current fixed-interval screening strategies while reducing the total number of screens in an individual's lifetime. Bertsimas et al. (2018) developed and presented an approach that aimed to determine screening strategies that performed robustly for multiple microsimulation models. They generated a Pareto optimal set of screening strategies using two objectives, the maximum average quality-adjusted life expectancy (QALE) across the models, and the maximization of the minimum of the QALE values of the models. The Pareto optimal screening strategies improved the average QALE compared to the average QALE obtained by the currently recommended screening strategies.

In the simulation-optimization studies, little justification was provided on how the DFO solver was selected to identify the optimum screening strategy. Different DFO solvers use different search heuristics, which may not converge to a solution within a given computational budget. We argue that the selection of the DFO solver should depend on the optimization problem and its characteristics. As such, it is desirable to use a solver that reliably identifies the optimal or a near-optimal solution in the fewest number of microsimulation evaluations for the problem. Nevertheless, all of the studies demonstrate the benefit of using a simulation-optimization approach to generate cancer screening strategies. To the best of our knowledge, there are no studies to develop screening strategies using simulation-optimization for CRC.

The objective of the CRC screening problem is to identify a screening strategy that maximizes the expected gain in quality-adjusted life-years (QALY) while minimizing the costs associated with CRC screening and treatment. This problem is a combinatorial optimization problem, where the decision variables are the ages to start and cease screening, the period between screens, and the screening test type. The ages and screening period could be defined as continuous variables. However, an issue arises when implementing the identified strategy due to the infeasibility of scheduling a screening test with infinite precision. The screening test type is a categorical variable. A preliminary analysis revealed that the objective function has multiple local minima with numerous symmetric solutions. The DFO solver that is used for solving the CRC screening problem should be able to handle these complexities.

A comprehensive review of DFO solvers has been carried out by Rios and Sahanidis (2013). The review compares the performance of 22 DFO solvers on 502 NLP problems, which were categorized into four groups, smooth-convex, non-smooth-convex, smooth-nonconvex, and non-smooth-nonconvex, and also considered the effects of the number of decision variables. The review concluded that the solvers within the TOMLAB optimization suite outperformed the other DFO solvers in general. However, the analysis did not include combinatorial problems. Recently, Ploskas et al. (2018) performed a comparison of five constrained DFO solvers on the heat exchanger refrigerant circuitry problem, which is a binary combinatorial optimization problem. In the analysis, the number of tubes in the heat exchanger ranged from 4 to 36, to explore the effect of the number of decision variables on the DFO solver performance. The findings matched that of Rios and Sahanidis (2013) in that the TOMLAB solvers outperformed the other solvers.

This paper aims to solve the CRC screening problem using simulation optimization. To determine the DFO solver that is best suited to solve the problem, we constructed a test problem that shares similar characteristics to those of the CRC screening problem. We evaluated the performance of ten DFO solvers that can handle combinatorial optimization problems using this test problem. In the second section of the paper, brief descriptions of the selected DFO solvers are presented (more detailed information on these solvers are included in the supplementary material). Section 3 outlines the test problem used for in-depth analysis of the solvers, the setup for the computational experiments, and the results of the experiments. The details of the CRC screening problem, the simulation-optimization framework built to solve this problem, and the results are discussed in Section 4. Finally, concluding remarks are presented in Section 5.

Section snippets

Overview of DFO solvers that can handle combinatorial optimization problems

Among 32 DFO solvers reviewed, ten publicly and commercially available DFO solvers were identified as candidates for our study, given their ability to handle a combinatorial problem. These solvers are described in the following sections and have been separated by their core search domains. Derivative Free Line search (DFL) solver and Nonlinear Optimization by Mesh Adapted Direct Search (NOMAD) solver are the local search solvers. The global search solvers include Distributed Evolutionary

Definition of the test problem

The DFO algorithms were used to solve several instances of a test problem that share the characteristics of the CRC screening problem. The number of decision variables was varied to explore the impact of problem dimensions on the performance of the solvers. The decision variables within the test problem are $x_{i}$ , for $i = {0, 1, \dots, N}$ where N depends on the test instance. The test problem instances, all nonlinear combinatorial programs, are given in Eqs. (1)-(7) (centered test problem), and in Eqs. (1)-

CRC screening problem description

The aim of the CRC screening problem is to identify a screening strategy for a defined population that maximizes the health benefits, quality-adjusted life years (QALY), of screening for CRC while minimizing the costs associated with CRC and screening. The decision variables are the screening test type, $x_{1}$ , screening period, $x_{2}$ , screening start age, $x_{3}$ , and screening end age, $x_{4}$ . The objective (Eq. (12)) is to minimize the expected value of the difference between the total cost incurred due to

Conclusions and future directions

In this paper, we used two different test problems to assess ten different derivative-free optimization solvers. Two of the solvers, DFL and NOMAD, were local search solvers that systematically restrict the search domain. Six of the solvers, TOMLAB/glcSolve and glcDirect, SimAnneal, DEAP-GA, GPyOpt, and CMA-ES, were global search solvers that allow the entire search domain to be considered for new points. The last two solvers, RBFOpt and MIDACO, utilize aspects of both global and local search

CRediT authorship contribution statement

David Young: Conceptualization, Methodology, Software, Investigation, Formal analysis, Writing - original draft, Visualization. Wyatt Haney: Software, Investigation. Selen Cremaschi: Conceptualization, Methodology, Writing - review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

This work was supported by the Department of Education GAANN Fellowship [P200A150074].

References (55)

F. Boukouvala et al.
Global optimization advances in Mixed-Integer Nonlinear Programming, MINLP, and Constrained Derivative-Free Optimization
CDFO. Eur. J. Oper. Res.
(2016)
I. Lansdorp-Vogelaar et al.
Cost-effectiveness of High-performance Biomarker Tests vs Fecal Immunochemical Test for Noninvasive Colorectal Cancer Screening
Clin. Gastroenterol. Hepatol.
(2018)
F. Loeve et al.
National Polyp Study data: evidence for regression of adenomas
Int. J. Cancer
(2004)
L.A. McLay et al.
Using simulation-optimization to construct screening strategies for cervical cancer
Health Care Manag. Sci.
(2010)
S.K. Naber et al.
Cost-Effectiveness of Risk-Stratified Colorectal Cancer Screening Based on Polygenic Risk: current Status and Future Potential
JNCI Cancer Spectr
(2020)
G.H. Orcutt
A new type of socio-economic system
Rev. Econ. Stat.
(1957)
F.De Rainville et al.
DEAP : a Python Framework for Evolutionary Algorithms
M. Schlüter et al.
An extended ant colony optimization algorithm for integrated process and control system design
Ind. Eng. Chem. Res.
(2009)
M. Schlüter et al.
The oracle penalty method
J. Glob. Optim.
(2010)
D. Ackley
A connectionist machine for genetic Hillclimbing
The Springer International Series in Engineering and Computer Science
(2012)

American Cancer Society, 2020. Colorectal Cancer Facts & Figures [WWW Document]. URL...

M. Aronsson et al.

Cost-effectiveness of high-sensitivity faecal immunochemical test and colonoscopy screening for colorectal cancer

Br. J. Surg.

(2017)

C. Audet et al.

Erratum: mesh adaptive direct search algorithms for constrained optimization

SIAM J. Optim.

(2008)

D. Bertsimas et al.

Optimal healthcare decision making under multiple mathematical models: application in prostate cancer screening

Health Care Manag. Sci.

(2018)

K. Bibbins-Domingo et al.

Screening for colorectal cancer: US Preventive Services Task Force recommendation statement

JAMA

(2016)

M. Cevik et al.

Analysis of mammography screening policies under resource constraints

Prod. Oper. Manag.

(2018)

A. Costa et al.

RBFOpt : an open-source library for black-box optimization with costly function evaluations

Optim.

(2014)

F.S. Erenay et al.

Optimizing colonoscopy screening for colorectal cancer prevention and surveillance

Manuf. Serv. Oper. Manag.

(2014)

A. Frazier et al.

Cost-effectiveness of screening for colorectal cancer in the general population

JAMA

(2000)

Frazier, P.I., 2018. A Tutorial on Bayesian Optimization...

C. Gopalappa et al.

Probability model for estimating colorectal polyp progression rates

Health Care Manag. Sci.

(2011)

N. Hansen

A CMA-ES for Mixed-Integer Nonlinear Optimization

INRIA [Researc R

(2011)

Haug, U., Knudsen, A.B., Lansdorp-vogelaar, I., Kuntz, K.M., Baden-wuerttemberg, C.R., Cancer, G., Hospital, M.G.,...

Holmstr, K., Anders, O.G., Edvall, M.M., 2010. User's Guide for TOMLAB...

Issa, I.A., Noureddine, M., 2017. Colorectal cancer screening : an updated review of the available options 23,...

D.R. Jones

Direct Global Optimization Algorithm

Encycl. Otimization.

(2010)

A.B. Knudsen et al.

Estimation of Benefits, Burden, and Harms of Colorectal Cancer Screening Strategies: modeling Study for the US Preventive Services Task Force

JAMA

(2016)

Cited by (1)

Efficient Scenario Generation for Stochastic Programs with Extreme Events
2022, Computer Aided Chemical Engineering
Citation Excerpt :
Therefore, to properly represent the expected benefit of screening, it is essential to portray the lower probability scenarios accurately. The problem uncertainty space was constructed using data from a microsimulation model (Young et al., 2021; Young and Cremaschi, 2018) that simulates the progression of CRC within a population. We used the model to simulate 400 replications of a population of 1,000,000 males and recorded the values of a cancer progression random vector for each person.
Stochastic programming (SP), a popular approach for solving optimization problems under uncertainty, is commonly used to tackle chemical engineering problems, e.g., in production planning or process synthesis. Scenarios, which represent uncertain outcomes, significantly impact the SP solution. This study evaluates seven methods to generate scenarios for a two-stage stochastic program where the decision-maker only sees benefits within the rare-event space of the uncertainty. The methods belong to one of three main categories: Monte Carlo sampling, space-filling sampling, and clustering. We assess the methods using (1) the difference between the optimum objective values and (2) the distance between the decision variable values of the optimum SP solution and the best-known solution for the problem. The results revealed that the SP solutions obtained using the scenarios generated by the clustering-based approaches were close to the best-known solutions and did not change significantly as the number of scenarios increased. The SP solution for the scenarios generated by Latin Hypercube sampling was the closest to the best-known solution for the maximum number of scenarios.

View full text

Derivative-free optimization of combinatorial problems – A case study in colorectal cancer screening

Abstract

Introduction

Section snippets

Overview of DFO solvers that can handle combinatorial optimization problems

Definition of the test problem

CRC screening problem description

Conclusions and future directions

CRediT authorship contribution statement

Declaration of Competing Interest

Funding

CDFO. Eur. J. Oper. Res.

Clin. Gastroenterol. Hepatol.

Int. J. Cancer

Health Care Manag. Sci.

JNCI Cancer Spectr

Rev. Econ. Stat.

Ind. Eng. Chem. Res.

J. Glob. Optim.

A connectionist machine for genetic Hillclimbing

The Springer International Series in Engineering and Computer Science

Cost-effectiveness of high-sensitivity faecal immunochemical test and colonoscopy screening for colorectal cancer

Br. J. Surg.

Erratum: mesh adaptive direct search algorithms for constrained optimization

SIAM J. Optim.

Optimal healthcare decision making under multiple mathematical models: application in prostate cancer screening

Health Care Manag. Sci.

Screening for colorectal cancer: US Preventive Services Task Force recommendation statement

JAMA

Analysis of mammography screening policies under resource constraints

Prod. Oper. Manag.

RBFOpt : an open-source library for black-box optimization with costly function evaluations

Optim.

Optimizing colonoscopy screening for colorectal cancer prevention and surveillance

Manuf. Serv. Oper. Manag.

Cost-effectiveness of screening for colorectal cancer in the general population

JAMA

Probability model for estimating colorectal polyp progression rates

Health Care Manag. Sci.

A CMA-ES for Mixed-Integer Nonlinear Optimization

INRIA [Researc R

Direct Global Optimization Algorithm

Encycl. Otimization.

Estimation of Benefits, Burden, and Harms of Colorectal Cancer Screening Strategies: modeling Study for the US Preventive Services Task Force

JAMA