1 Introduction

BOF steelmaking is the most widely used steelmaking method in the world. In this process, the gas content curves of CO and CO2 in the off-gas profile are of critical importance. These two curves directly reflect the primary steelmaking reaction (the carbon–oxygen reaction) and serve as essential references for dynamic control in BOF steelmaking. Since the CO + CO2 curve is minimally affected by the reaction between CO and pipeline oxygen, it also serves as an important reference for dynamic BOF control. Additionally, real-time monitoring of the CO curve is crucial in the gas recovery system to prevent explosions. However, due to the need for off-gas cooling or structural constraints within the plant, most gas composition probes are positioned some distance above the BOF vessel mouth. This results in delays in detecting and displaying the gas composition, creating significant challenges for dynamic steelmaking control and safe gas recovery based on the off-gas profile. In some plants, this delay can reach up to 1 min. Forecasting the curves of CO, CO2, and CO + CO2 in the off-gas profile not only eliminates the negative impact of delays but also serves as a forecast of the carbon–oxygen reactions themselves, since these curves directly reflect those reactions. Using the forecasted curves, pre-control of the steelmaking process can be implemented. Currently, there are no public precedents for forecasting BOF steelmaking's off-gas profile and using forecasted curves in such profile to pre-control a BOF steelmaking process. This study fills the gap in forecasting off-gas profiles and dynamic control based on forecasted curves, offering a novel concept with significant originality. In this study, a method for forecasting the curves of CO, CO2, and CO + CO2 in the off-gas profile was proposed, and accurate forecasting were achieved. It also introduces several approaches for dynamic pre-control based on the forecasted curves.

In fact, IIn fact, the off-gas profile is a specialized time series, with each time-step containing instantaneous values for various gas content percentages. Forecasting the off-gas profile can be abstracted as a short-term time-series forecasting task. In recent years, NLP-based deep-learning technologies have increasingly been applied to time-series tasks. Bahdanau et al. introduced the use of seq2seq attention-based Long short-term memory (LSTM)/gated recurrent unit (GRU) (2014) [1] for time-series processing. For processing  time-series, Ashish et al. proposed the Transformer model (2017) [2], while Tan et al. employed an improved convolutional recurrent neural network (2018) [3]. Compared to traditional machine learning models, these approaches offer higher robustness and greater capacity for handling complex time series. Additionally, with pre-trained models, deep-learning techniques facilitate easier deployment and transfer learning.

  1. (1)

    Beyond the theoretical research mentioned, time-series forecasting techniques are also widely applied to address practical problems. For example, Gasparin et al. achieved electric load forecasting (2022) [4] with deep networks, Cao et al. used LSTM for financial time-series forecasting (2020) [5], and Prakarsha et al. developed ANN model for biomedical signal forecasting (2022) [6]. These studies demonstrate the significant application potential of deep-learning models and time-series forecasting techniques. However, applying them directly to forecasting the curves in off-gas profile in BOF steelmaking presents substantial challenges: Inefficient backbone. These methods primarily use naive ANN or LSTM models, which have been shown in surveys [7, 8] to perform worse than state-of-the-art models. To improve the backbone, new AI techniques such as residual connections and attention mechanisms should be employed. SOTA models should be tested or used for benchmarks. Additionally, some studies did not conduct ablation experiments, making it unclear whether the adopted modules actually improve accuracy.

  2. (2)

    Different data format: Some methods involve data with only a single set of curves, but with many time-steps. For example, the ETT-small dataset introduced by Zhou et al. (2020) [9] consists of data collected from July 2016 to July 2018. Unlike these datasets, off-gas profile data consist of many heats, but each heat has only about a thousand time-steps. Therefore, different data pre-processing methods are required.

  3. (3)

    Different data characteristics: Unlike the data used in the aforementioned studies, the curves in the off-gas profile exhibit nearly no seasonality and cyclicality. According to field experience and research [10,11,12,13], the curves in the off-gas profile are related to endpoint conditions but are subject to many unpredictable sudden changes, especially during adjustments in blowing practices and additives. Therefore, more efficient models are needed.

By achieving accurate forecasting of key indicators, the proposed method can be applied to real-time control and pre-control of the BOF process, as well as to the control of the gas recovery system. To address the above challenges and accurately forecast the CO, CO2, and CO + CO2 curves in the off-gas profile, this study undertakes the following key tasks:

  1. (i)

    The concept of off-gas profile forecasting was proposed. A statistical analysis of the off-gas profile data was conducted, and based on the data characteristics, a new data pre-processing method, termed the mixed-batch method, was introduced.

  2. (ii)

    A channel and time-step attention mechanism module was designed, along with a deep-learning algorithm that incorporates AI techniques such as aggregation structures, causal dilation convolution, attention mechanisms, and residual connections. Using this algorithm, accurate forecasting of the CO, CO2, and CO + CO2 curves in the off-gas profile was achieved.

  3. (iii)

    The concept of using forecasted curves for pre-control in BOF steelmaking was proposed, along with several examples. Attention weights were used to quantify the importance of channels and time-steps in the off-gas profile data.

  4. (iv)

    The methods proposed in this work have been applied to external validation. The results were proposed. A corresponding forecasting tool was developed.

The rest of this paper is organized as follows: Sect. 2 introduces the task of off-gas profile forecasting, gives description of off-gas profile data, mixed-batch method, targets and evaluation metrics, Sect. 3 introduces the structures of the proposed model, Sect. 4 shows the results, Sect. 5 gives the examples of pre-control methods, Sect. 6 shows the results of external validation, and Sect. 7 is the conclusion.

2 Preliminaries

2.1 Short-Term Time-Series Forecasting Task

With a rolling side window of fixed \({N}_{x}\)-size and a fixed time interval \(\omega\), the input \({\chi }^{t}=\left\{{x}_{1}^{t}, {x}_{2}^{t}, {x}_{3}^{t},\dots \dots ,{x}_{{N}_{x}-1}^{t},{x}_{{N}_{x}}^{t} |{x}_{i}^{t} \in {\mathbb{R}}^{{d}_{x}}\right\}\) at time-step \(t\) is received, and the target is to forecast corresponding value \({y}^{t+\omega } \in {\mathbb{R}}^{{d}_{y}}\) at time-step \(t+\) \(\omega\). The interval between each time-step is 1 s according to LOMAS sampling frequency. In this work, the \(\omega\) is fixed to 32 according to the BOF workshop requirements, which means that the forecasting model receives the input \({\chi }^{t}\) including data from a \({N}_{x}\)-seconds side window and forecast future value \({\gamma }^{t+32}\) that will occur at the next 32 s. The models for \({N}_{x}=8\), \({N}_{x}=16,\) and \({N}_{x}=32\) were developed for comparison.

2.2 Data Description

The dataset contains independent time-series with seven channels from 5198 heats. 60% of heats was divided to training set randomly, 20% to validation set, and 20% to test set. Each channel is a curve that is taken during the descent and elevation of the oxygen lance. The curves are total off-gas flow (Flow), oxygen lance height (Lc-height), oxygen cumulative consumption (O2-blown), and content percentage of carbon monoxide (cp-CO), carbon dioxide (cp-CO2), hydrogen (cp-H2), and oxygen (cp-O2). According to the sampling frequency of the off-gas recovery system, the time-step interval is 1 s. All channels are normalized with mean and variance of train set. The following is a quantitative and visual description. Table 1 is a statistical description of the values for each channel. And Fig. 1 shows the visualization results of each curve in the time-series data of a typical heat.

Table 1 Descriptive statistics of channels
Fig. 1
figure 1

Visualized description of channels in a typical heat: (a) curves of cp-CO, cp-CO2, cp-H2 and, cp-O2; (b) curves of flow, Lc-height, and O2-blown

2.3 Mixed-Batch Method

On this multi-scenario (different heat), low time-series length condition, a rolling \({N}_{x}+32\) time-steps slide-window is used to process time-series in train and validation set. Each time it slides, a new subsequence is produced. The first \({N}_{x}\) time-steps of the subsequence are inputs, and the values in the last time-step is target. In the training and validation process, such sub-sequences were randomly put in batches with no replacement. In particular, the mixed-batch operation was not carried out for the time-series in test set. Instead, the time-series of each heat in test set was processed by sliding window and all sub-sequences in this heat was input to model to forecast for whole curves. Figure 2 is a schematic diagram of the input and output of the models.

Fig. 2
figure 2

Schematic diagram of proposed mixed-batch method

2.4 Target

Curves of cp-CO and cp-CO2. They directly represent decarbonization reactions.

Curve of content percentage of carbon monoxide plus carbon dioxide (cp-CO + CO2). It is more stable, because the total carbon content is hardly affected by the later reaction. Its features are also directly related to the chemical reaction of the BOF steelmaking process.

2.5 Evaluation Metrics

Coefficient of determination (R2) = \(1-\frac{\sum {\left({Y}_{i}-\widehat{Y}\right)}^{2}}{\sum {\left({Y}_{i}-\overline{Y }\right)}^{2}}\) and mean squared error (MSE) = \(\frac{1}{n}{\sum }_{i=1}^{n}({{Y}_{i}-\widehat{Y})}^{2}\) were taken as error metrics, where with sample index \(i,\) \({Y}_{i}\) is the actual value of the target, \(\widehat{Y}\) is the forecasted value, \(\overline{Y }\) is the average, and \(n\) is the number of the samples.

3 Attention-Based Convolutional Aggregation (ABCA)

ABCA is a deep-learning model proposed for the effective forecasting of the curves of cp-CO, cp-CO2, and cp-CO + CO2. Aggregation means the combination of different functional blocks throughout a specially designed architecture. The major SOTA time-series forecasting models such as Informer proposed by Zhou et al. [9] and SCINet [14] proposed by Liu et al. (2022) only considered features of the time domain. However, ABCA implements forecasting of off-gas profiles with both the time and frequency domains, which improves the forecasting accuracy. There are five important parts of the model, which are input block, basic block, down-sampling block, output block, and aggregation architecture. Followings are detailed description of all parts of ABCA.

3.1 Input Block

The input block is proposed to aggregate features of frequency domain and time domain and extract features initially. In the input block, according to Eqs. 1 and 2, the input time-series (time domain) is transformed to frequency domain with fast Fourier transform. Equation 1 is the basic Fourier transform formula. \(f\left(t\right)\) is an aperiodic function, \(F\left(\omega \right)\) is the representation of the function in the frequency domain, \({e}^{-iwt}\) is a complex exponential function, and ω is the angular frequency. Equation 2 is the formula of the fast Fourier transform derived from Eq. 1. \({X}_{k}\) represents the signal in the frequency domain; \({x}_{n}\) represents the signal in the time domain; \(i\) is an imaginary number; \(k\) is the degree of motion; \(N\) represents the size of the data. The frequency-domain sequences are all padded to \({N}_{x}/2\) hertz and normalized

$$F\left( \omega \right) = \int\limits_{{ - \infty }}^{\infty } {f\left( t \right)e^{{ - iwt}} dt}$$
(1)
$$X_{k} = \sum\limits_{{n = 0}}^{{N - 1}} {x_{n} e^{{ - i2\pi kn/N}} .}$$
(2)

Then, the input time-series and frequency-domain sequences are concatenated and embedded with a full convolution layer. Figure 3 shows the structure of an input block.

Fig. 3
figure 3

Structure of the proposed input block

3.2 Basic Block

Basic block is proposed as the main feature extraction block. It consists of four parts: causal dilation convolution module, attention mechanism module, Layer-Normalization, and residual connection.

Temporal Convolutional Network (TCN) proposed by Lea et al. [15] is a time-series processing model based on one-dimensional convolutional neural network. It introduced causal convolution and dilation convolution. Causal convolution guarantees that it is a one-way model, because the input of any layer at time \(t\) only depends on the previous layer outputs at time t and before, which is expressed as Eq. 3. \(p\left(x\right)\) is the final output and \(p({x}_{t}|{x}_{1}{x}_{2}\dots \dots ,{x}_{t-1})\) is output of the previous layers

$$p\left(x\right)= \prod\limits_{t=1}^{T}p({x}_{t}|{x}_{1},{x}_{2}\dots \dots ,{x}_{t-1}).$$
(3)

Dilation convolution allows the input of convolution to have interval sampling. It introduces the parameter of dilation rate, so that a same size convolution kernel can obtain a larger receptive field. When the input is \(x\), the dilation parameter is \(d\), and the time-series information is \(s\), the dilation convolution function is expressed as Eq. 4.

$$F\left( s \right) = \sum\limits_{i}^{{k - 1}} {f(i)*x_{{s - d*i}} .}$$
(4)

A causal dilation convolution module consists of both causal convolution and dilation convolution, the features will sequentially pass through causal dilation convolutions and their supporting weight normalization operations, Relu activation functions, and dropout layers. Unlike traditional causal dilation convolution layers, the residual connection is after attention module and layer-normalization. Figure 4 shows the structure of modified causal dilation convolution module.

Fig. 4
figure 4

Structure of modified causal dilation convolution module for off-gas forecasting

Attention Mechanism. The input to each block is tensors with multistep and multichannel. However, the causal dilation convolution module cannot determine the importance of time-steps and channels and then weight them. The attention mechanism for channels and time-steps is mainly introduced in two-dimensional cases, such as object detection and image classification. Specific implementation methods include SENet [16], CBAM [17], etc. In this work, a one-dimensional attention mechanism for time-step and channel is proposed. It has two connected parts, which are time-step attention part and channel attention part. Their input is tensors in size of c (channels) × n (time-steps).

Time-Step Attention. The input feature maps are processed by one-dimensional average pooling and maximum pooling in channel dimension. The outputs are concentrated in channel dimension and then processed by a one-dimensional fully connected convolution layer (CNN) for down-sampling of channels. After sigmoid operation, an n time-steps attention matrix is received.

Channel Attention. The input feature maps are processed by one-dimensional average pooling and maximum pooling operations in time-step dimension. Their outputs are then extended and squeezed by a feed-forward network that consists of one-dimensional CNN. The squeezed feature maps are added together and then transformed by sigmoid function to receive a c channels attention matrix. Such matrix is an exact weight-map for every channel of original feature map. Figure 5 is the structure of attention modules.

Fig. 5
figure 5

Structure of proposed attention modules

In a basic block, the causal dilation convolution module is connected to the channel and time-step attention modules in sequence. The initial input is processed by a 1 × 1 convolution kernel and residually connected to the output of time-step attention block and normalized by a layer-normalization layer. Figure 6 is the structure of a basic block.

Fig. 6
figure 6

Structure of proposed basic block

3.3 Down-Sampling Block

A down-sampling block is proposed to continue to extract higher dimensional abstract features and reduce the dimension of parallel input. Channel attention module, time-step attention module and feed-forward convolutional neural network are arranged in sequence in it. The inputs are first concatenated and then processed by a convolutional layer to implement dimensionality reduction. Then, the inputs are assigned weights by attention modules and processed by feed-forward convolutional neural network. The dimensionally reduced original input is residually connected to feed-forward convolutional neural network's output. The residual is normalized by a layer-normalization layer to get final output. Figure 7 is the structure of a down-sampling block.

Fig. 7
figure 7

Structure of proposed down-sampling block

3.4 Output Block

The output block is proposed to transform the final feature map to the output. It consists of a convolution layer and a fully connected layer (feed-forward layer). The input dimension is first reduced by the convolutional layer, and then flattened. The flattened tensor is transformed to a single output by the fully connected layers. Figure 8 is structure of an output block.

Fig. 8
figure 8

Structure of proposed output block

3.5 Aggregation Structure

Aggregation structure merges blocks in a parallel pyramid network to combine and extract features in each channel. With Aggregation structure layers are combined to learn more abundant and combinations that across more of the feature hierarchy. Figure 9 describes the network structure of aggregation structure.

Fig. 9
figure 9

The structure of proposed aggregation structure. The input block integrates the features of frequency domain and time domain and carries on the embedding operation. After that, embedded features are extracted and learned by special arranged basic blocks. In the network, down-sampling blocks are used to combine and learn the output of basic blocks at a deeper level and finish dimension reduction. The output of the last down-sampling block is processed by the output block to form the final forecasting value

4 Results and Analysis

4.1 Results of Comparison Experiments

Firstly, the performance of ABCA for all targets, based on different input time-step lengths (\({N}_{x}\)), is detailed in Tables 2, 3, 4. Additionally, several SOTA time-series forecasting and classification backbones were set as benchmarks. These models include SCINet [14], D-linear proposed by Zeng et al. [18], Autoformer proposed by Wu et al. [19], and FEDformer proposed by Zhou et al. [20].

Table 2 Forecasting accuracy for different models (\({N}_{x}=8\))
Table 3 Forecasting accuracy for different models (\({N}_{x}=16\))
Table 4 Forecasting accuracy for different models (\({N}_{x}=32\))

The results in Tables 2, 3, 4 demonstrated that the curves in off-gas profile are forecastable, and most efficient models were able to achieve good results. This proved that gas composition in the BOF steelmaking process exhibits significant autocorrelation, meaning that short-term data can effectively reflect long-term trends. Furthermore, gas generation in the steelmaking process exhibits stability at specific stages, allowing short-term forecasts to remain effective over the long term.

The results also showed that ABCA model consistently outperforms other SOTA models across different forecasting scenarios, especially in forecasting cp-CO and cp-CO + CO2. This indicates that ABCA has the capability to capture complex features and local patterns in time-series, attributed to its unique aggregation structure and attention mechanisms.

According to Tables 2, 3, 4, an input time-step (\({N}_{x}\)) of 16 yields the best results. This is because shorter time-step inputs help reduce noise and focus on primary trends and signals, but much shorter time-steps may lead to insufficient information. In the pipeline, CO reacts with oxygen, causing instability in CO2 levels. This is the main reason for the lower accuracy in forecasting CO2 curves higher accuracy in forecasting CO + CO2 curves. The forecasting example for one heat are presented in Fig. 10.

Fig. 10
figure 10

Forecasting examples of ABCA for a typical heat (\({N}_{x}=16)\). (a) ABCA’s forecasting results of cp-CO, (b) ABCA’s forecasting results of cp-CO2, and (c) ABCA’s forecasting results of cp-CO + CO2

4.2 Results of Ablation Experiments

The ablation experiment was designed to assess the impact of different model configurations on forecasting accuracy, with results detailed in Tables 5, 6, 7.

Table 5 Forecasting accuracy for different models (\({N}_{x}=8\))
Table 6 Forecasting accuracy for different models (\({N}_{x}=16\))
Table 7 Forecasting accuracy for different models (\({N}_{x}=32\))

In the first variant, all attention mechanism modules were removed from the ABCA model, named causal dilation convolution -aggregation (Cdc -Agr). Another variant, referred to as naive convolutional aggregation (NCA), replaced all TCN causal dilation convolution modules with a convolution network block. A third configuration involved connecting the causal dilation convolution modules sequentially without incorporating the aggregation structure, denoted as Cdc. Additionally, a model named Attn-Cdc was created by sequentially connecting the causal dilation convolution modules to the attention mechanism modules, but without the aggregation structure.

The findings reveal that the ABCA model consistently outperforms all other ablation models across various configurations. This demonstrates that each component is indispensable. The inclusion of the aggregation structure in ABCA markedly enhances forecasting accuracy compared to models that rely solely on the attention mechanism. This is evident in the results presented in Table 5 for \({N}_{x}=8\), Table 6 for \({N}_{x}=16\), and Table 7 for \({N}_{x}=32\). The superior performance of the ABCA model underscores the effectiveness of its aggregation structure in improving forecasting accuracy over both sequential configurations and attention-based approaches.

4.3 Results of Importance Analysis

Based on the results presented, an importance analysis was conducted with \({N}_{x}=16\). This analysis involved arranging an attention module and a causal dilation convolution module in sequence to evaluate the significance of each channel and time-step within the input data. The importance is determined by the weights assigned by the attention module. The visualized result for a typical heat is shown in Fig. 11. In Fig. 11, the y-axis shows the sequential batch numbers within a single batch, while the x-axis represents channel names (channels column) and time-step numbers (time-steps columns).

Fig. 11
figure 11

Results of importance analysis for channels and time-steps for a heat

Based on Fig. 11, it can be concluded that for different tasks and stages, the Flow, Lc-height, and O2-blown curves are all important, with the O2-blown curve being the most crucial. The gas flow rate (Flow) impacts the oxygen content in the pipeline, which, through the reaction with CO, subsequently affects the levels of CO and CO2. And the importance of the Lc-height and O2-blown curves lies in their direct influence on key chemical reactions and temperature control in steel production by altering the blowing practice, ultimately affecting the generation of CO and CO2.

For the CO and CO + CO2 curves, the further into the middle of the steelmaking process, the more important the three mentioned curves become. This suggests that these three curves play a key role in influencing the mass transfer process of the primary C–O reactions. Interestingly, the forecasted curves (cp-CO, cp-CO2) themselves are the least important. This highlights the causal relationship, where changes in external factors, such as blowing practices, lead to changes in the CO and CO2 curves. Their own changes have little impact on subsequent changes.

For the importance of input time-steps at different stages, it can be observed that later time-steps are always more important than earlier ones, especially at the beginning and near the end of the steelmaking process. This is because the curves fluctuate significantly, and the later time-steps better capture the changing trends.

5 Methods of Pre-controlling BOF Steelmaking with the Forecasted Off-Gas Profile

With forecasted off-gas profile, various pre-control methods of BOF steelmaking can be realized and implemented. There are examples of BOF steelmaking pre-control in different aspects.

5.1 Forecasting BOF Steelmaking Stage

Accurate judgment of steelmaking stage is a basic requirement of many practices. The BOF steelmaking stage can be divided into three stages according to the carbon monoxide content in off-gas, namely, the rising, stable, and declining stage. The disturbance of carbon and oxygen reaction should be eliminated in the rising and declining period to reach a stable steelmaking endpoint, while the adjusting operations such feeding additives are mainly in the stable stage. With off-gas profile forecasting, the timing of the stages can be determined in advance by the forecast curves’ turning point. Therefore, any adjustment operation can be prepared in advance according to the forecast curve. Such as feeding additives, oxygen lance height adjustment, bottom blow adjustment, etc. And Fig. 12 is an example of steelmaking stage.

Fig. 12
figure 12

A typical example of steelmaking stage, where (1) is the early (rising) stage, (2) is the middle (stable) stage, and (3) is declining stage

5.2 Forecasting Steelmaking Reactions

After the oxidation of silicon and manganese in the early (rising) stage, it is particularly important to adjust the additives and oxygen blowing operation with forecast steelmaking reactions in the middle (stable) stage. For example, carbon–oxygen reactions can be forecast by carbon monoxide and carbon dioxide curves. After a sudden rise in the carbon monoxide curve in the off-gas profile for a few seconds, the forecasting curve responds and forecasts a sharp rise in carbon monoxide content. It indicates that the oxidation of the slag is too strong, and the stable carbon–oxygen reaction is disturbed. In this case, the system will pre-control carbon–oxygen reaction through reducing the oxygen blowing rate, increasing the height of the oxygen lance, or adjusting additives.

5.3 Pre-identifying Raw Material Quality

The quality variation of main raw materials will bring changes into off-gas profile features in early (rising) steelmaking stage. If the raw material contains more reducible oxide or unburnt flux than expected, the failure of material quality forecasting and BOF system pre-control will lead to serious consequences. For example, incomplete calcination or improper sealing of the lime will result in more calcium carbonate than expected, which will lead to a large amount of unexpected carbon dioxide being produced during the rising stage of BOF steelmaking and finally lead to slopping. Fortunately, if such problem occurs, a precipitous rise in the carbon dioxide curve will occur in early stage. With application of off-gas profile forecasting model, just a few seconds after the rising of carbon dioxide curve, the model will respond and forecast a sharp rise in the future carbon dioxide content and output the corresponding forecast curve. After receiving such forecasting result, the system can adjust the oxygen blowing operation in advance, and replace the lime silo to avoid slopping in time. Figure 13 is example analysis of the typical reaction disturbances.

Fig. 13
figure 13

Examples of the typical reaction disturbance and the material quality problem in off-gas profile. The carbon dioxide curve in box (1) refers to the unexpected reducible oxide or unburnt flux; the carbon monoxide curve in box (2) indicates that additives have disturbed the stable oxygen-carbon reactions, due to the added sinter; the carbon monoxide curve in box (3) identifies an explosive oxygen–carbon reaction in the middle stage of steelmaking

5.4 Reducing Response Latency

In many BOF mills, the LOMAS sampler is placed far away from the mouth of BOF vessel. It takes a long time for the off gas to go through the exhaust pipe and be detected by the sampler and sensor, which causes certain latency in the BOF system response. The current model can forecast carbon monoxide and carbon dioxide curve at the next 32 s, which greatly reduces the latency for BOF steelmaking response.

6 Results of External Validation

ABCA model was deployed for external validation. A Python-based tool was programmed to facilitate this process. The tool receives data of 7 channels and 16 time-steps every second. The input is then normalized, converted into tensors, and fed into the ABCA model for forecasting.

The tool's human–machine interaction (HMI) interface allows users to easily select and display both real-time and forecasted curves. In case of an unexpected event, the system includes an Alert Report feature that enables users to manually record abnormal phenomena for further analysis. Figure 14 shows the HMI interface of the off-gas forecasting system.

Fig. 14
figure 14

The HMI interface of off-gas forecasting tool

After applying the ABCA model for external validation, its forecasting accuracy was recorded over 88 consecutive heats. Figure 15 presents the running charts showing the accuracy of the field production test results.

Fig. 15
figure 15

The running chart of forecasting accuracy in external validation

7 Conclusion

This study accomplishes the forecasting of CO, CO2, and CO + CO2 curves in BOF steelmaking to eliminate the display delay of these curves, while simultaneously achieving pre-control values for steelmaking based on the forecasted curves. Based on the research findings, the following conclusions can be drawn:

  1. 1.

    The three curves in the off-gas profile were proven to be forecastable. And the ABCA model proposed in this study demonstrates superior forecasting accuracy compared to other benchmarks. Specifically, its R2 values for the forecast of the CO, CO2, and CO + CO2 curves are 0.9386, 0.8566, and 0.9428, respectively, while the MSE values are 47.3884, 11.9314, and 54.3583, respectively. Ablation experiments have demonstrated that modifications to the ABCA model reduces its performance.

  2. 2.

    The forecasted curves can address the issue of display delays, while utilizing the features of the forecasted curves enables pre-control in BOF steelmaking. Four examples are provided to illustrate these two aspects.

  3. 3.

    The input curves related to off-gas flow rate and blowing practice are crucial in the forecasting process, with the most significant being the oxygen cumulative consumption curve; The later time-steps of the input curves are of greater importance in the forecasting process.

  4. 4.

    In external validation, the performance of ABCA closely mirrors its results on the test set, demonstrating strong generalization capability and robustness.