Keywords

1 Introduction

The goal to reduce the Greenhouse Gas (GHG) emissions is only achievable by increasing the use of Renewable Energy Sources (RESs) in electricity generation and electrifying fossil-fuel-driven processes to shift demand. However, this transformation also challenges the existing energy system due to the volatility of future generation and consumption patterns. Even in the early stages of this transformation, large RES farms are already overloading the electrical grid in some periods of time, leading to generation curtailment. To address this challenge, providing flexibility on both the generation and consumption sides is essential. Distributed Energy Resources (DERs) can offer this flexibility to the grid. Especially with the Energy Hub (EH) concept, first introduced by [22,23,24,25], the combination of a wide range of already existing technologies brings a promising approach to address the upcoming challenges. The EH concept has already shown its benefits in previous works [45, 46, 48]. However, scheduling problems e.g. scheduling of the EH components, is a NP-hard optimization problem [10, 56] and requires powerful heuristic algorithms to find high quality solutions within reasonable time. The complexity due to the high dimensionality as stated in [39] is hard to formulate simplified as Mixed Integer Linear Program (MILP) and requires excessive computational effort to solve as a Mixed Integer Non-Linear Problem (MINLP). Fortunately, using an Evolutionary Algorithm (EA) to find suitable solutions has proven to be a valid approach [34, 41, 45, 48].

To apply such an EA to EH scheduling, we first need to define the objective function to be optimized. Although different objectives are plausible, one key objective to provide flexibility for the surrounding grid is to follow an external control signal for the electrical output of the EH that is provided by a grid operator. In this regard, an EA can produce low-quality solutions concerning this objective if the control signal, further referred to as the target schedule, is highly variable. Each of the proposed solutions is evaluated with regards to the objective by an Energy Management System (EMS). Previously, this evaluation was based on a 15-minute fixed interval length. Whilst the general time resolution could be increased to improve the solution quality further, this increase leads to an increased computational effort. Therefore, the time resolution should only be increased when required, i.e. for particularly complex regions of the search space. To consider if variable time resolution is beneficial, in [46] it is shown that increasing the time resolution, particularly in regions of the search space that are complex, leads to an improved optimization quality. Furthermore, in [47], the authors show that an inference of the optimization quality can be used to direct the computational effort of the EA to such difficult time segments. However, such approaches rely on knowing which regions in the search space are complex and, therefore, require a prediction of the optimization quality.

Consequently, this paper presents a concept and its application for the dynamically adapted interpretation of EA-based optimization solutions using the predicted optimization quality. We conduct the schedule optimization with the EA General Learning Evolutionary Algorithm and Method (GLEAM)Footnote 1 [8, 9], a generic optimization algorithm that interacts with an EMS. Our approach is designed to efficiently use the computing capacity whilst also improving the optimization results. This is achieved by dynamically adapting the time interval length, which is enabled by the genotype-phenotype mapping within the EMS to be adapted based on the predicted quality of the optimization solution.

The paper is structured as follows: In Sect. 2, existing related work is discussed before we present our novel approach in Sect. 3. After a detailed description of the used forecasting method in Sect. 4, this approach is evaluated in Sect. 5. Finally, the paper concludes in Sect. 6 and provides an outlook on future work.

2 Related Work

Using a generic EA for optimized scheduling is a widely applied approach. In [41] an EA is compared to a MILP in context of unit commitment. It is stated that with high constrained use cases, similar to the EH scheduling problem, the EA performs better than the presented MILP. Further application of EAs can be found e.g., in [50] and in the reviews conducted in [18, 53].

Several concepts are investigated using multiple timescales, e.g., multi-timescale coordinated optimized scheduling in [57], multi-timescale rolling optimal dispatch in [49], timescale adaptive dispatch in [37] and multi-timescale model predictive control in [14], which are grouped under the term multi-timescale scheduling. The basic idea is using multiple timescales for calculating the optimal schedule of various energy resources [35]. Multi-timescales are mainly used in literature for solving problems regarding uncertainties in the optimization, e.g., in [3, 37, 49], separating optimization for economic and operational factors, e.g., in [14, 58, 59], improving the control in energy systems or shifting deferrable loads in [40]. In this context, a rolling optimization approach is often used allowing different look-ahead periods to combine short-term and long-term benefits [36]. Furthermore, different components, e.g., tap changers in a substation and batteries, are separated in various timescales for their operation coordination in [60]. All mentioned works on the topic of multi-timescale scheduling for optimal control of DERs use discrete and equal-length time step sizes within each timescale. The approach introduced in the present paper differs from these previous works in this point by adjusting individual time interval lengths according to the respective predicted optimization quality.

A different approach to reduce the complexity in the optimization process using an equation-based algorithm involving the substitution of similar time intervals is presented in [52]. Hence, the amount of decision variables can be reduced if similar intervals can be determined. In [51], this approach is extended by a systematic method for the identification of promising initial time intervals that can be aggregated, thus further reducing the number of decision variables.

In the literature, some research works, e.g., [5, 6, 16, 17] have proposed different algorithms and techniques to deal with complex search spaces of EAs. While [16, 17] alter the genetic operators or the optimization problem itself to reduce the complexity, [5, 6] instrument a Variational Autoencoder (VAE) to generate a new and simple search space from a complex and discontinuous one aiming at reducing the problem dimensionality. The authors of [5, 6] proposed a method with three steps to create a better search space by mapping a difficult search space to a learned latent representation that enables easier discovery by an EA. The results of both works show that using VAE to reduce the complexity of a search space can improve the overall performance of an EA in terms of solution quality and effort required to reach such a solution.

In order to dynamically parameterize the chromosome interpretation of the used optimizer for directing its computational effort on difficult regions in the search space, the present work uses a forecast of the quality of the optimization solution. This quality is interpreted as the absolute error between the target schedule and the actual EH output power for each point in time and is therefore a time series. Time series forecasts are commonly applied in the context of renewable energy systems [1] and a variety of time series forecasting methods exist [7, 12, 13, 38, 43, 54, 55]. Such time series forecasts are often used as the inputs for a given optimization [1], and have even been applied to optimally determine the input parameters of an optimization problem [21]. However, to the best of our knowledge, no previous work has used time series forecasts of the quality of an optimization solution for the purpose of parameterizing the interpretation of the used optimizer dynamically.

3 Concept

In this section, we introduce our approach for scheduling DERs using an EA with a dynamic genotype-phenotype mapping based on Machine Learning (ML) techniques. Our approach emphasizes the interpretation of the solution proposed by the utilized EA, rather than focusing on the EA itself. Therefore, this approach can be adapted to any EA designed for optimizing scheduling tasks, provided that it utilizes an external evaluation service to calculate the objectives of the proposed chromosomes. In this section, we consider prior research that forms the foundation for our novel concept, elaborate on the problem statement, and present the overarching framework for dynamic genotype-phenotype mapping.

Fig. 1.
figure 1

Concept Overview [47]

3.1 Previous Work

The selection of the EA GLEAM was motivated by its proven effectiveness in various scheduling tasks, both in general [9, 31], and specifically for DERs [30, 34, 45, 48]. GLEAM also offers a versatile, application-centric approach for mapping decision variables and other degrees of freedom of a certain application to the genetic representation [8, 9], which is critical for the coding process applied in this paper [30, 34]. The applied framework [33] is based on parallelization utilizing a coarse-grained or island population model [11]. Each island employs a structured population model based on the neighborhood or a fine-grained model utilizing a ring [27]. This combination of population models effectively mitigates premature convergence risks and allows for a self-adaptive balance between the breadth and depth of the search [28, 31]. Due to the utilization of dynamic-length chromosomes [30, 33], the related genetic operators described in [8, 30, 31] can be applied. In the island sub-population, a neighborhood size of 8, as commonly used in GLEAM applications, is employed [8]. Subsequently, in [46] the authors introduce a method with adaptive time segments and varying resolution using the same EA. This approach enhances flexibility compared to previous works [45, 48] by employing a two-step optimization process: firstly, an optimization with low time resolution is conducted, followed by an evaluation. A second optimization with higher time resolution is performed for time segments with substantial deviations from the target EH power output. This improvement comes at the cost of increased computation time. Furthermore, the distinction between well and poorly approximated time segments is discrete. A proof of concept, to overcome these drawbacks is investigated in [47], which the present work is based on.

3.2 Problem Statement

As described in Sect. 1, scheduling of DERs is a difficult task, especially with regard to multiple objectives. Further, the solution space grows exponentially with the number of decision variables. In the context of scheduling DERs, it is common to use a fixed time interval length of, e.g., 15  min. This results in a homogeneous distribution of decision variables within the search space with differences in the difficulty of finding adequate solutions for individual parts of the optimized time frame neglected. To overcome this difficulty, we present a general approach that focuses the computational effort on the challenging areas within the search space. By applying ML to predict the quality of the schedules proposed by the EA optimization, the associated time interval lengths are adapted. Hence, more alleles are placed in time ranges where the prediction forecasts poor approximation quality and vice versa. Although the optimization task is a multi-objective problem, the presented concept focuses only on one objective, because the needed flexibility provision by an EH mainly depends on an adequate approximation of the target schedule. This approximation is expressed as Root Mean Squared Error (RMSE), \(d_{RMSE}\), between the scheduled electrical output of an EH \(P_{EH}(t)\) and the target schedule \(P_{target}(t)\), while n represents the number of intervals to be scheduled for each optimization task. In context of scheduling an EH n is set to 96, according to the previous homogeneous distribution of 15 minute intervals for a complete day:

$$\begin{aligned} {\begin{matrix} \min \,d_{RMSE} & = \sqrt{\frac{\sum _{t=1}^{n} \left( P_{EH}(t) - P_{target}(t)\right) ^2}{n} }\\ \end{matrix}} \end{aligned}$$
(1)

\(d_{RMSE}\), as the approximation measure, is set into relation to the maximal and minimal possible deviation \(d_{RMSE max}\) and \(d_{RMSE min}\), according to Eq. 2 to determine a Degree of Fulfillment (DOF) \(DOF_{RMSE}\). Finally, this DOF is mapped to a fitness in context of EA-based optimization.

$$\begin{aligned} {\begin{matrix} DOF_{RMSE} & = {\frac{d_{RMSE max} - d_{RMSE}}{d_{RMSE max} - d_{RMSE min}} }\\ \end{matrix}} \end{aligned}$$
(2)

3.3 General Concept

The concept is illustrated in Fig. 1. Initially, a forecast model is trained using historical data generated through an optimization process with a fixed time interval length. Subsequently, scheduling optimization with dynamic interval lengths is performed based on the forecasts of the optimization quality generated from the trained forecast model. The optimized scheduling process commences with the initialization of an initial population within the EA, such as GLEAM. As further elaborated in Subsect. 3.5, the list of chromosomes is interpreted as raw schedules. Each of these raw schedules is processed within the interval length assignment to define the exact timestamp for each power fraction from the previous ascending ordered interval numbers (see Subsect. 3.6). To ensure the validity of a schedule, boundary conditions, including ramp rate and State-of- Charge (SOC), are enforced within the EMS or an equivalent evaluation service equipped with domain-specific knowledge (see Subsect. 3.7). As a result of this evaluation process, each chromosome (i.e., schedule) and its respective outcomes related to the assessed objectives are recorded. This list of results for the entire generation is then provided to the EA for mapping to weighted fitness values. Subsequently, considering termination criteria, the EA determines either to finalize the schedule or to continue the evolutionary process. During the evolutionary process, genetic operators are applied based on the fitness of each individual. Detailed information on this process can be found in [8, 30, 31].

3.4 Population Generation and Coding

Concerning the optimization of DERs scheduling, the population generated by the EA comprises a set of chromosomes, each representing a schedule proposal. In each generation, every chromosome contains a list of genes, where each gene corresponds to different decision variables that are akin to alleles. In the context of DERs scheduling, these alleles encompass attributes such as “Unit ID”, “Start Time”, “Duration”, and “Power Fraction”. The initial population can be constructed using various methods, as elaborated in [8, 9]. We employ a random distribution method to generate the initial population in our approach.

3.5 Chromosome Interpretation

In the context of our approach, the interpretation of the proposed chromosomes as schedules is critical. Depending on the adopted gene model of the EA, this offers a broad spectrum of applications and interpretation space. As mentioned in Subsect. 3.4, following the framework of GLEAM, a gene consists of four alleles. This modeling scheme is adapted from the work in [30]. Specifically, the “Unit ID” corresponds to a designated component within the respective EH. The “Start Time” determines the interval from which the “Power Fraction” serves as a setpoint for the given “Unit ID”. The “Duration” specifies the length of the “Power Fraction” in terms of interval counts. Each chromosome comprises a list of n genes, collectively interpreted as a schedule. The transformation process from a chromosome to a raw schedule is adopted from [34]. To construct the Raw Schedule from an individual chromosome within the Chromosome List, the chromosome interpretation iterates through the listed genes sequentially. During this process, if multiple genes contain information for the same time interval and component, the preceding gene is overwritten by the subsequent one. Hence, a uniformly shaped phenotype is built based on diverse genotypes. Following this, a list of raw schedules is generated and subsequently fed into the interval length assignment process. Within each raw schedule from this list, each component is associated with a power fraction that serves as a setpoint during a time interval, as further defined in the subsequent step.

3.6 Interval Length Assignment

The primary contribution of the current concept is to direct the computational effort of the optimization towards the time intervals that are difficult to approximate. We direct this computational effort by introducing variability in the length of intervals, as described in Subsect. 3.2. Starting with a previously ordered set of setpoints per DER within each schedule proposal, a specific point in time is determined during the interval length assignment process. This determination is based on the predicted quality of the approximation by the optimizer. When the optimizer anticipates a higher degree of accuracy, it assigns longer time intervals, and conversely, shorter intervals are assigned when the approximation quality is expected to be lower. Thus, the EA is assigned a particularly large number of alleles for variation in time intervals in which an approximation to the target schedule is predicted to be difficult. Furthermore, the total number of intervals does not change, but the length of each interval depends on the predicted quality. This results in the interval length being dynamically determined.

Translating Forecast Result into Time Interval Length. The machine learning-based forecasts predict the deviation between the target schedule and the predicted power output of the considered EH instance. To derive the corresponding time interval lengths for evaluating the proposed schedules, the relative error \({\mathcal {E}}_\text {opt, rel}(n)\) for each time interval n is determined. To determine this relative error, we divide the predicted error \({\hat{\mathcal {E}}_\text {opt}(n)}\) per time interval n by the total predicted error summed over the entire day, as illustrated in Eq. (3). Following the standard time interval length for a unit commitment of 15 minutes, the total number of intervals per day n is exemplarily 96:

$$\begin{aligned} {\mathcal {E}}_\text {opt, rel}(n) = {\frac{\hat{\mathcal {E}}_\text {opt}(n)}{\sum _{n=1}^{96}\hat{\mathcal {E}}_\text {opt}(n)}} \end{aligned}$$
(3)

To ensure that the relative error distribution \(\mathcal {E}_\text {opt, rel}\) aligns with the updated setpoint distribution over time, each relative error per time interval is scaled by the total number of setpoints (96). This redefined number of setpoints per interval is summed up and plotted as a function of time. To determine the new time interval length per setpoint, the inverse function of the aforementioned plot is calculated. Consequently, a list of 96 entries, each providing time information, is calculated. This time information is crucial in the interval length assignment process, enabling the preparation of schedules for evaluation within the EMS.

3.7 Boundary Condition Enforcement and Evaluation

Before the schedules can undergo evaluation, the technical boundary conditions derived from the underlying physical models must be analyzed. If necessary, adjustments must be made to the respective power fractions to ensure these conditions are met, which leads to the generation of updated schedules. During this process, the rates of change are considered, the current SOC of storage infrastructure is monitored, and the technical viability of implementing the proposed schedule is confirmed. To facilitate this process, the verifying instance, such as an EMS, possesses critical information regarding the underlying models. Following the enforcement of boundary conditions, the evaluation of objective functions takes place. In this step, one, or multiple, objective functions can be considered. For each objective function, the attainable maximum and minimum values are determined, and the evaluation outcome is related to these extremes according to [44]. Consequently, each objective’s relative DOF is computed as the final result. This DOF for each schedule proposal and each objective is then consolidated into a list of results, which is subsequently handed back to the EA for translation into fitness values per schedule proposal.

4 Forecasting Method

The prediction of the optimization quality for the next 24 h provides the basis for the interval length assignment. Since there is a monotonic relationship between the fitness value of the EA and the error between the target schedule and the actual EH power time series, we use this error as a proxy for optimization quality. We define this optimization error time series \(\mathcal {E}_\text {opt}\) as

$$\begin{aligned} \mathcal {E}_\text {opt} = \left| \mathcal {P}_\text {EH} - \mathcal {P}_\text {Target} \right| , \end{aligned}$$
(4)

where \(\mathcal {P}_\text {EH}\) is the power time series of the EH and \(\mathcal {P}_\text {Target}\) the target schedule time series. Our approach to forecast \(\mathcal {E}_\text {opt}\) is implemented as a pyWATTS [29] pipeline.Footnote 2 In this section, we explain our approach by first describing the pre- and post-processing, before discussing the evaluation criteria and the ML methods applied.

Pre- and post-processing: In the pre-processing, we first re-sample the numerical input time series to a resolution of fifteen minutes before standardizing them to re-scale the mean to zero and scale the time series to a variance of one. Furthermore, we extract the hour of the day, day of the week, and month of the year and encode these with Sine and Cosine to retain temporal similarity. We also include a Boolean variable indicating whether it is a workday or not. Since the resulting optimization error forecast is also standardized, we post-process this forecast with a simple inverse standardization to obtain the original scale.

Evaluation Criteria: As described in Subsect. 3.6, the computational effort of the optimization is directed with a varied interval length assignment. This interval length assignment is calculated based on the relative optimization error defined in Eq. (3). As a result, the forecast of the optimization quality should ideally perform well with regard to the relative optimization error. Therefore, to evaluate the performance of our forecasts in terms of the relative optimization error we first calculate the relative optimization error time series according to Eq. (3). We then evaluate the quality of the forecast based on this relative optimization error time series.

Forecast Methods: To forecast the \(\mathcal {E}_\text {opt}\) time series we consider multiple forecasting methods including state-of-the-art and basic methods. Each of these methods receives the same input data, i.e. historical information, calendar information, and the target schedule time series, and directly forecasts the next 24 h of the \(\mathcal {E}_\text {opt}\) time series, with a resolution of 15 min. This multi-horizon forecast approach results in each method having an output dimension of 96. We use data, as described in pre-processing, from 2021 for training and validation of the models and evaluate the models in the year 2022.

For benchmarking, the first forecasting method is a perfect forecast. This method is designed to explore the maximum benefit of the proposed dynamic interval assignment and is not viable in practical application. Here, the perfect forecast is the mean optimization error calculated from the 30 simulation runs used for training. As simple forecasting methods, we consider a Random Forest (RF) regression [54] and a simple Feed Forward Neural Network (NN) [55] with six hidden layers consisting of 256, 210, 150, 80, 64, and 52 neurons respectively, both implemented with sci-kit learn [42]. To include a more complex non-linear forecasting method, we implement an XGBoost regression [13] with default hyper-parameters using the XGBoost library [19]. Finally, to include state-of-the-art forecasting methods, we consider the Neural Hierarchical Interpolation for Time Series Forecasting (N-HITS) [12], and Temporal Fusion Transformer (TFT) [38], both implemented with default hyper-parameters via the PyTorch Forecasting library [4]. We train these models using the default configurations from the respectivie implementation of the libraries. This results in RF, XGBoost and NN minimizing a squared error, N-HITS minimizing Mean Absolute Scaled Error (MASE), and TFT minimizing a quantile loss.

5 Evaluation

In this section, we present our evaluation by first introducing the evaluation environment, including a definition of the use case and the evaluation criteria to assess whether the approach enhances the flexibility provision of an EH. Furthermore, we evaluate the forecast of the optimization quality and the optimization results with a dynamic interval length assignment.

5.1 Evaluation Environment and Criteria

The evaluation environment is based on previous work [46] and the relevant data for the use case are taken from real-world data from the years 2021 and 2022, according to [45] More specifically, the electrical load profile of an industrial area for the mentioned period is used for defining the target schedule which would be provided by a grid operator in a real-world application for flexibility retrieval. The target schedule aims to dampen the power flow variation at the substation. The general configuration of the used EA GLEAM are a limit of 50 generations and a population size of 100. These settings have been tested in advance and proved to produce sufficiently high quality results. A drastically smaller population size or generation count leads to poor solutions, whereas larger values increase the calculation time without a significant gain in solution quality. Further settings concerning the genetic operators are adopted from [8, 31, 32]. Although the optimization is implemented as a multi-objective optimization, the evaluation of the present approach concentrates on the objective of approximating the given target schedule for the electrical output of the EH. Hence, as evaluation criteria, two different aspects are taken into account. First, the general quality of the approximation by the optimized scheduling is assessed by calculating the RMSE, the resulting DOF according to [44] and its fitness representation within GLEAM: The corresponding objective function is given by Eq. 1. Taking the DOF into account for the evaluation, our approach is compared to the theoretical upper and lower boundaries, as described in [44]. Second, the influence of the forecast quality is determined by comparing two different forecast models (RF and TFT) to a perfect forecast as ground truth and the Base Case without dynamic interval length assignment.

5.2 Forecast of the Optimization Quality

We evaluate the forecast quality by calculating the Mean Absolute Error (MAE) and RMSE for each forecast model presented in Table 1 on \({\mathcal {E}}_\text {opt, rel}\). With regards to MAE and RMSE, we observe that the RF performs best, whilst XGBoost performs comparably to RF. Furthermore, the N-HITS, and the NN perform similarly and the TFT perform noticeably worse.

To further evaluate our approach, we evaluate two forecast models used to determine the dynamic interval length assignment. First, we select the RF model as the best forecast model. Second, to investigate whether a poor forecast is also beneficial, we consider the TFT as the worst forecast model. Combined with the benchmark perfect forecast, this selection includes a range of forecasts with varying performance.

Table 1. MAE and RMSE for the optimization target error \({\mathcal {E}}_\text {opt, rel}\), calculated over the forecast period for each of the considered forecasting models on the test data set. The best values for each metric are highlighted in bold.

5.3 Results

To evaluate our approach, we consider an exemplary week in 2022. This week is characterized by a typical load profile for an industrial area, which is the basis for the target schedule to approximate. In total, we compare four different cases in our evaluation. First, we perform a fixed-time interval optimization for each day in the week to generate a Base Case. Second, we create a Ground Truth forecast that is based on previous optimization results that are also used for training the forecast models. This Ground Truth is then used for the time interval length assignment and is included to show the maximum potential of our approach. Third, we use the forecasts from the trained RF for the interval length assignment and fourth, the forecasts from the trained TFT. Given these four cases and the exemplary week, we evaluate the average performance.

We report the mean DOF of each considered case (Base Case, RF, TFT and Ground Truth) over a complete week in Fig. 2. The mean DOF of the Base Case over 30 repetitions of the week, resulting in 210 optimizations, is \(88.9\%\). Using the forecast results of the RF model, the mean DOF is \(4.4\%\) higher at \(92.8\%\). Furthermore, the mean DOF obtained with the TFT model is \(91.3\%\) and \(2.7\%\) higher than the Base Case. Finally, using the Ground Truth results in a mean DOF of \(93.2\%\), which is increased by \(4.8\%\) compared to the Base Case. Interestingly, the variance of the DOF values is similar for all cases except for the RF, where a noticeably smaller range of values over the optimizations is observed.

Fig. 2.
figure 2

Boxplot of mean DOF values over the complete week

Regarding the significance of the results, we first note that according to the D’Agostino-Pearson test for normality [15], the results for all four cases are distributed normally. Therefore, we consider the parametric statistical t-test according to [20] to compare the result‘s distribution of the Base Case and the three cases using the dynamic interval length assignment. All three tests show with sufficient confidence (p-values \(<<~1\%\)), that the DOF of the cases with dynamic interval length assignment are better than those from the Base Case.

Daily Performance: In Fig. 3, we compare the daily DOF from three cases (Base Case, RF, and TFT) across the considered week in 2022. Furthermore, we plot the associated evaluation metric (MAE and RMSE) for each day for the two considered forecast models. Based on this plot, we make three key observations. First, the dynamic interval length optimization improves the DOF for each day when using the RF model and for each day except Monday when using the TFT model. Second, there is a large variation in the DOF across the days for all considered cases. However, this variation is most noticeable in the Base Case and least noticeable when using the RF for the interval length assignment. Third, we observe a general correlation between the evaluation metrics and the DOF improvement when using the corresponding forecast model for the interval length assignment. Specifically, the performance of the TFT is noticeably worse than that of the RF on Monday, resulting in a worse DOF. A similar result is seen on Thursday and Friday, where the TFT also performs worse. On Tuesday and Wednesday, the performance of both forecast models is similar, resulting in a similar DOF. However, this correlation cannot be confirmed for the weekend. On Saturday and Sunday, the TFT performs worse than the RF according to the evaluation metrics, but the resulting DOF is similar or higher.

Fig. 3.
figure 3

Comparison of daily DOF and the associated forecast evaluation metric

5.4 Discussion

The evaluation shows the validity of this approach, however, there are numerous points worth discussing further. In this section, we first discuss these results in more detail before highlighting some key aspects regarding the forecasts of the optimization quality. Finally, we discuss the limitations and benefits.

Insights from Results: With regards to the results, we discuss four important aspects. First, our evaluation shows that the proposed approach can be used to enhance the optimization quality of the EA and thus obtain a better approximation of the underlying power output of an EH to a target schedule. This results in a better flexibility provision by an EH. Second, although our approach improves mean optimization quality, it generally does not reduce the variance in the DOF results. This may be due to fact that the phenotype mapping does not affect the EA core algorithm with its genetic operators. Third, with improving the forecast quality, this generally results in an improvement of the DOF. However, fourth, this correlation does not carry over to weekends. One possible explanation for this lack of correlation is the difference in the weekend target schedule. Due to this differing control schedule, the TFT is possibly directing the computational power to a particularly difficult region on weekends, even though its performance across the entire day is worse.

Forecast of Optimization Quality: Although our results are promising, the forecasts have not been explicitly designed to meet use case requirements. For this dynamic interval length calculation, the forecasts should be designed to accurately predict the shape of the optimization error independent of the scale. This is due to the relative optimization error (see Eq. (3)) used when determining the dynamic interval length. Currently, forecast models are trained with quality-based metrics, such as MAE and RMSE, and an accurate forecast according to these metrics may not be optimal regarding the shape. Furthermore, the predictions of the optimization quality are based on a single simulation run compared to the perfect forecast, which is calculated as the mean of 30 runs of the simulation. However, the results of our forecasts suggest that the considered forecasting methods are robust and multiple simulation runs are not required.

Limitations and Benefits: The first limitation of our approach is that it requires an initial training dataset. As a result, when not prior data are available, i.e. the “cold start” problem, it is currently impossible to directly apply our approach. A second limitation is that we only consider one objective (RMSE) for the dynamical interpretation of chromosomes, due to the focus on flexibility provision. Consequently, the positive effect of our approach is concentrated on the same objective without influencing other objectives. A further limitation is that it currently does not consider uncertainty as proposed by [2, 26].

The key benefit of our approach is the general performance, which leads to an improvement of the optimization quality by up to \(4.8\%\) when using a perfect forecast as a theoretical maximal improvement. The implemented forecast models, i.e., RF and TFT show sufficient results by enhancing optimization quality by \(4.4\%\) and \(2.7\%\), respectively. This enhancement leads to a significant improvement of the flexibility provision by the underlying EH. Thereby, the concept of guiding the computational effort of the EA on difficult time segments helps to improve the results. Another key benefit is the computational effort. After the forecast model is trained once, no additional computational effort is needed to improve the optimization results. This is achieved by intelligently allocating the computational effort based on the optimization quality prediction. Additionally, the computational cost of the training process for the ML forecast models is negligible in contrast to the computational cost of the scheduling by the EA. Furthermore, our evaluation suggests that simple forecasting methods with default hyper-parameters and simple input features are sufficient. Finally, our approach is independent of the amount of controlled DERs, due to the fact that the forecasting models are trained on the aggregated EH output time series. This underlines the scalability of our approach.

6 Conclusion and Outlook

Ensuring the reliable provision of flexibility to the electricity grid through the scheduling of DERs depends on the optimization performance. Our work shows that employing an EA for this optimization task is a viable and effective approach. By applying ML models to predict the optimization quality, once trained, the generated forecasts can be used to direct the computational effort of the EA towards time segments that are challenging. Consequently, our strategy eliminates the need for additional computational resources to find better solutions. We evaluate two forecast models determining dynamic interval length assignments and thus direct the computational effort of the EA. When using the RF model, we improve the DOF of the optimization by 4.4% on average and with the TFT 2.7%, respectively. These improvements are both statistically significant and form a basis for better flexibility provisioning by the considered DERs.

In light of our promising results, several possible directions for future work should be considered. First, to overcome the “cold-start” problem, future work could investigate whether a feedback loop can be introduced to train the forecast models online. Second, it would be interesting to extend the approach by including different objective functions within the prediction of the optimization quality. Third, future work could investigate ways to include and account for uncertainty in our approach. Fourth, forecast models specifically designed for the interval length assignment and focusing on the shape instead of absolute performance could be investigated. Particularly, the impact of such forecasting methods on the variance in optimization DOF and the correlation between forecasting quality and optimization quality is interesting. Additionally, the method should be applied to further EAs to investigate the generalizability of the presented approach. Finally, the transferability of the trained forecast models to further instances of EHs should be investigated to determine whether the models are beneficial in general.