1 Introduction

Radiation therapy (radiotherapy) employs high doses of radiation to kill tumor cells, serving a vital component in the treatment of approximately two-thirds of global cancer patients (Baskar et al. 2012; Gutt et al. 2021). Since its inception, radiotherapy has been continually advanced through technological advancements such as image guidance, adaptive radiotherapy, heavy-particle therapy, and ultra-high dose rate (FLASH) radiotherapy (Park et al. 2018; Panta et al. 2012). However, in the radiotherapy treatment of thoracic and abdominal cancers, respiration-induced motion poses a significant limitation to further improving radiotherapy outcomes (Rietzel et al. 2005). Variance in organ shape and position during motion can lead to substantial errors in imaging, treatment planning, and delivery. Inaccuracies in the knowledge of the target’s shape and trajectory often necessitates larger field margins, resulting in suboptimal dose conformation (Cai et al. 2011). Addressing these limitations requires employing advanced imaging technologies capable of precise observation and analysis of movement, enabling effective motion management and facilitating conformal treatment planning.

Recentl advancements in four-dimensional imaging (4D-imaging) techniques, such as 4D computed tomography (4D-CT), 4D magnetic resonance imaging (4D-MRI), 4D cone-beam computed tomography (4D-CBCT), and four-dimensional positron emission tomography (4D-PET), have increased clinical applicability and played crucial roles in managing respiratory motion (Rietzel et al. 2005; Cai et al. 2011). Compared with traditional static imaging, which captures anatomical structures during a breath-hold or at a specific time, 4D-imaging techniques can provide tumor and organ motion information through the respiration cycle in addition to the three-dimensional (3D) anatomical structures. This added capability offers substantial potential for enhancing the accuracy and efficiency of tumor localization. However, 4D-imaging encounters certain inherent challenges such as insufficient spatiotemporal resolution, motion artifacts, and increased radiation doses (Zhang et al. 2021; Terpstra et al. 2023; Noid et al. 2017).

Artificial intelligence (AI) involves developing and utilizing complex computer algorithms that emulate aspects of human intelligence in tasks such as visual perception, pattern recognition, decision-making, and problem-solving, often achieving comparable or enhanced performance (Huynh et al. 2020; Li et al. 2022). In recent years, the availability of large datasets and high-performance computers has led to the emergence of more sophisticated AI agents, presenting immense potential to address unresolved challenges in medical imaging. Specifically, deep learning (DL), a subset of AI, is commonly used for various tasks. DL has demonstrated remarkable capabilities in enhancing imaging quality, efficiency, and diagnostic capabilities (Akagi et al. 2019; Ahishakiye et al. 2021; Litjens et al. 2017). Thus, it has the potential to address many of the challenges faced in 4D-imaging for motion management, and, consequently enhance the quality of radiation therapy. Figure 1 provides an example of 4D-MRI outlining the major steps in 4D-imaging for radiation therapy and highlighting the steps in which AI is involved.

In this review article, we discuss the applications of AI in advancing 4D-imaging with a specific emphasis on motion management. We outline the inherent challenges in current 4D-imaging practices and provide examples of how AI can address these challenges to increase efficiency, accuracy, and image quality. Additionally, we offer insight into future directions in this field. Notably, several studies have attempted real-time reconstruction of 3D CT images from two-dimensional (2D) images to track irregular motion patterns for radiotherapy (Shen et al. 2019; Montoya et al. 2022; Loÿen et al. 2023). However, this review focuses on 4D imaging technologies and thus does not cover 2D-to-3D reconstruction. The remainder of this paper is organized as follows. Section 2 is the search strategies and inclusion criteria for conducting this review. Section 3 provides a concise overview of current 4D-imaging techniques. In Sect. 4, we analyze the challenges specific to each imaging modality and discuss the current progress made in leveraging AI to overcome these challenges. Section 5 discusses the remaining challenges and outlines potential future directions for advancing 4D-imaging. Section 6 conclude of the review.

Fig. 1
figure 1

The workflow of AI-assisted 4D-imaging in radiation therapy, illustrated using 4D-MRI as an example. Certain steps can be performed interchangeably concurrently. DVF, deformable vector field. 1 is from Weykamp et al. (2023). 2 is from Zhang et al. (2023)

2 Search strategy and inclusion criteria

The search was conducted in November 2023 using databases including PubMed, Google Scholar, IEEE, ScienceDirect, and Elsevier. The search term included a comprehensive list of descriptors covering the constructs “acquisition mode", “imaging modality", “artificial intelligence" and “clinical application" to ensure exhaustive coverage of the search space. Table 1 provides illustrative examples of search terms used for each category.

This paper aims to provide a comprehensive and accurate introduction to AI’s application in 4D-imaging, with a specific focus on motion management. Studies were selected based on the following criteria: (1) primary research studies involving respiratory motion management; (2) focused on addressing the existing issues in 4D-imaging techniques or supporting the clinical application of 4D-imaging; and (3) included any type of AI algorithms, such as deep neural networks (DNNs).

Papers were excluded if they (1) did not focus on respiratory motion but instead examined other types of physiological movements (such as cardiac motion or gastrointestinal motion), tumor dynamic changes, or blood flow, (2) conducted research and experiments solely on static imaging, or (3) investigated only non-AI methods. Furthermore, we also excluded non-English papers, conference abstracts, posters, and theses for academic degrees.

Table 1 Search strategy

3 4D-imaging modalities

3.1 4D-CT

4D-CT is a powerful technique for observing internal organ motion and integrating motion information in treatment planning (Vergalasova and Cai 2020). This technique consists of a series of phase-resolved 3D-CT images, each representing a specific breathing bin of the patient’s respiratory cycle (Keall et al. 2006). 4D-CT can be acquired via either prospective or retrospective strategies (Hugo and Rosu 2012). In prospective respiratory-gated acquisition, images are obtained at specific respiratory phases. In contrast, retrospective 4D-CT can be performed using cine mode or helical mode (He et al. 2015). In cine mode, the CT scanner continuously captures multiple axial images at one position before moving to the next. This process is repeated until the entire target region is scanned. In helical mode, the table moves at a constant low speed while the CT scanner continuously acquires images, capturing numerous axial images over multiple respiratory cycles. For retrospective acquisition, a separate signal related to the patient’s breathing state must be acquired simultaneously and synchronized with image acquisition. This respiratory signal is then used to sort the partial images or the acquired projection data into the correct respiratory phase.

In 4D-CT imaging, there are two common image-binning approaches: phase binning and amplitude binning (Abdelnour et al. 2007). Phase binning associates each 3D-CT image with a specific phase or fraction of the breathing cycle period, offering a straightforward interpretation of temporal information (breathing cycle equidistantly sampled) and can be directly used for concepts like mid-position and mid-ventilation (Werner et al. 2017). In contrast, amplitude binning assigns each 3D-CT image to bins on the basis of the full amplitude of the breathing signal. This approach can provide more detailed information on breathing motion and leads to fewer artifacts. However, amplitude binning may necessitate longer acquisition times during prospective gating because of breathing variations such as baseline drifts (Stemkens et al. 2018). In retrospective reconstruction-with identical acquisition time-amplitude sorting may suffer from insufficient binning data points (Zhang et al. 2024).

3.2 4D-MRI

Compared with 4D-CT, 4D-MRI offers superior soft tissue contrast and does not expose patients to ionizing radiation (Harris et al. 2017; Liu et al. 2016). 4D-MRI can be implemented using either prospective or retrospective approaches. Prospective MRI is commonly achieved via using 3D acquisition. The acquired MRI data must be reordered before image reconstruction in k-space because data are collected multiple times in a 3D readout (Li et al. 2017). Most 4D-MRI techniques are retrospective. Therefore, images are continuously acquired over the whole region of interest (ROI) and retrospectively sorted into respiratory phases (Yang et al. 2014). Retrospective MRI mainly adopts a multi-slice 2D acquisition approach with T2 weighted turbo spin echo (T2-TSE) or balanced steady-state free precession (bSSFP) sequences.

To reorder 4D-MRI data, three common methods are used: external surrogates, internal surrogates, and self-navigation (Kavaluus et al. 2020). External surrogates, similar to those in 4D-CT, encounter challenges in MRI such as signal saturation and synchronization issues (Stemkens et al. 2018). MRI-specific internal surrogates include pencil-beam navigators. Self-navigation can be performed in both the image and frequency domains, using 2D image navigators and changes in body surface area (Celicanin et al. 2015; Cai et al. 2011). Similar to 4D-CT, the sorting of 4D-MRI data can be done via either phase-based or amplitude-based methods.

3.3 4D-CBCT

Integrating a CBCT scanner with linear accelerators offers significant advantages for assessing tumor and organ motion during treatment. By examining patients in their treatment position directly before or during radiotherapy, CBCT enhances target localization for beam delivery (Hong et al. 2022). However, conventional 3D-CBCT cannot capture the full range of tumor motion, limiting its ability to localize moving targets accurately. To address this, 4D-CBCT has been developed in recent years as a powerful tool for providing respiration-resolved images that improve the localization of moving targets.

4D-CBCT retrospectively sorts images in the projection space, yielding subsets of projections corresponding to specific respiratory phases (Sonke et al. 2005). Each subset is then reconstructed into phase-resolved images (PRIs), resulting in a set of images. In practice, 10-phase 4D-CBCT reconstruction typically involves dividing the full projection dataset into 10 subsets, each containing approximately one-tenth of the total projections, which can lead to an extremely sparse-view CT problem for PRI reconstruction. The sparse-view nature of 4D-CBCT reconstruction leading to various artifacts, including view aliasing (streaks) and blurring at high-contrast boundaries. Leng et al. (2008) These artifacts can significantly hinder the clinical utility of 4D-CBCT images, making it essential to develop strategies to mitigate these effects and improve image quality.

3.4 4D-PET

4D-PET has been developed alongside 4D-CT to address the impact of patient motion, including respiration, on PET imaging, which affects lesion size, shape, and measured standardized uptake value (SUV) (Nehmeh et al. 2004). Two primary acquisition methods for 4D-PET are commonly used: prospective gated acquisition and retrospective stacked acquisition (Grootjans et al. 2016). In prospective gated PET, the respiratory signal is monitored, and scans are conducted only during a specific breathing state for a pre-determined time window. Retrospective acquisition uses a \('\)list mode\('\), involves continuous acquisition, and individual counts are tagged with the breathing state from a separate respiratory signal and then sorted into separate bins for each breathing state for reconstruction.

4 AI in 4D-imaging

In this review, we cover 64 studies on AI applications in 4D-imaging for respiratory motion management. Figure 2 depicts trends in AI-based studies per modality over the years and the distribution of studies per task. AI-related 4D-imaging is a relatively new research area, with the first article published in 2018. In recent years, there has been a remarkable increase in publications in this field, indicating researchers’ growing interest and recognition. Most studies have focused on CT, MRI, and CBCT, whereas studies specifically focusing on PET imaging have been comparatively limited. Image post-processing and motion estimation are extensively explored fields. Figure 3 provides a knowledge graph summarizing major challenges, researched tasks, current approaches, and strategies for each modality, which may inspire researchers in this field.

Fig. 2
figure 2

Distributions of papers per year (a) and per task (b)

Fig. 3
figure 3

Knowledge graph of the current progress of AI in 4D-imaging for motion management. DIR deformable image registration, SR super-resolution, GAN generative adversarial network

4.1 AI in 4D-CT

4D-CT imaging is an essential component of radiotherapy for treating thoracic and abdominal tumors (Madesta et al. 2024). Despite extensive research conducted in 4D-CT, challenges associated with the use of 4D-CT imaging in clinical applications persist. First, current CT scanners cannot cover the entire anatomical ROI within a single gantry rotation, leading to artifacts from organ movement across multiple cycles. Second, achieving accurate motion estimation in 4D-CT is challenging because of irregular motion patterns and changing air density in lungs throughout the respiratory cycle. DL models have emerged as a powerful tool in 4D-CT, improving imaging quality, modeling motion, and enabling automatic target delineation. Since this review focuses on motion management, studies on ventilation image generation and 4D-CT generation are excluded. The related studies are categorized into three groups: artifact reduction, motion estimation, and target delineation, as shown in Table 2.

Table 2 Summary of AI-based studies in 4D-CT

4.1.1 AI in 4D-CT artifact reduction

Double structure (DS) and interpolation (INT) artifacts are commonly observed in 4D-CT data (Madesta et al. 2024), as illustrated in Fig. 4. DS artifacts result from breathing variability during acquisition, causing inconsistent representations of anatomical structures across different breathing phases, whereas INT artifacts arise due to insufficient projection data for reconstructing image slices at the desired breathing phases and couch positions. These motion artifacts can significantly degrade image quality and affect the accuracy of target volume delineation and dose calculation in radiotherapy (Mori et al. 2019). Traditional post-processing methods, such as registration-based image interpolation and graph-based structure alignment, have been employed to mitigate artifacts, but they are time-consuming and only consider DS artifacts. Recently, DNNs have been used to reduce artifacts in 4D-CT. Mori et al. (2019) initially proposed a DL-based inpainting method for DS artifacts using an autoencoder to translate artifact-affected images into artifact-free images, as shown in Fig. 5a. However, this approach had limitations, including the use of simulated artifacts, a lack of clinical evaluation, and an insufficient reduction of artifacts in 3D. On this basis, Madesta et al. 2024 developed a convolutional neural network (CNN)-based conditional inpainting model that incorporated patient-specific artifact-free images as prior information and operats via 3D patches, as depicted in Fig. 5b. Their method significantly reduces the average root mean squared error (RMSE) by 60% for DS structures and 42% for INT structures with the in-house evaluation data. Nevertheless, incorporating artifact-free images remains challenging in clinical practice. The authors selected prior images that were typically less affected by artifacts: the end-exhalation phase for DS artifacts and the temporal average CT for INT inpainting. However, even with these selections, end-exhalation phase images can still be affected by DS artifacts. Similarly, for INT artifacts, the temporal average CT becomes increasingly blurred with larger motion amplitudes, leading to the loss of fine structural details during inpainting.

Fig. 4
figure 4

Example of 4D-CT data with INT and DS artifacts. Figure reprinted from Madesta et al. 2024

Fig. 5
figure 5

Common deep learning schemes for 4D-imaging enhancement and motion estimation. a Single Image Enhancement: Enhances individual images independently. b Prior-Image Guided Enhancement: Uses a prior image to guide the enhancement of the input image. c Supervised DIR: Guided by the reference DVF. d Unsupervised DIR: Constrained by the distance between the warped and fixed images. e Motion Compensation Enhancement: Combines multiple phase images via DIR to enhance the target phase. The gray dashed line in (e) indicates that the reconstructed result can iteratively improve the DIR process in reverse, as seen in methods like SMEIR. DIR deformable image registration, DVF deformable vector field, STN spatial transformation network, WP warped phase. Notably, the gray dashed line in subfigure e indicates that the reconstruction result can be used to improve the DIR process in a reverse manner, as adopted in certain studies known as SMEIR

4.1.2 AI in 4D-CT motion estimation

Deformable image registration (DIR) is a promising tool for processing 4D-CT images, enabling accurate motion tracking of internal organs and fiducial markers during respiratory cycles. Fast and accurate DIR via 4D-CT assists in treatment planning, including target definition, tumor tracking, and organ-at-risk sparing. Traditional DIR methods for 4D-CT datasets minimize dissimilarity measures to find the optimal transformation mapping between two-phase images. However, these methods have drawbacks such as long computational times, manual parameter tuning, and the risk of being trapped in local optima (Wei et al. 2021). Moreover, repeated application of spatial filters throughout the iteration process in these methods leads to over-smoothed motion fields and false deformation of bony structures with minimal motion. The large appearance variances and low image contrast of abdominal 4D-CT present additional challenges for accurate registration. To address these challenges, numerous DL-based studies, including supervised and unsupervised methods, have extensively explored improved DIR techniques.

Supervised learning-based registration Supervised DIR involves the use of ground truth deformation vector fields (DVFs) from conventional algorithms to guide the training process, as shown in Fig. 5c. Sentker et al. (2018) developed GDL-FIRE, the first CNN-based registration method for 4D-CT, in which ground truth DVFs obtained from three traditional methods are employed. GDL-FIRE achieved a comparable target registration error (TRE) to traditional DIRs but with a significantly reduced computation time, representing a 60-fold increase in speed-up. Teng et al. (2021) developed a patch-based CNN for inter-phase registration, using DVFs from VelocityAI as the ground truth for training. This method proved effective not only on 4D-CT but for demonstrating robustness against artifacts on 4D-CBCT scans. Despite these promising results, manual preparation of training datasets remains laborious, subjective, and prone to error. To overcome this issue, Eppenhof and Eppenhof and Pluim (2018) provided a solution that uses synthetic random transformations to train the network, eliminating the need to manually annotate ground truth DVFs. However, the artificial transformation may significantly differ from the actual lung motion.

Unsupervised learning-based registration Unsupervised registration is highly desirable when ground truth DVFs are unavailable. This approach relies solely on moving and fixed image pairs, without the need for ground truth DVFs. In unsupervised methods, a moving image is deformed via a spatial transformer network (STN) (Jaderberg et al. 2015), and models are trained by minimizing the error between the fixed and deformed images, as depicted in Fig. 5d. However, additional regularization is necessary to improve the reliability and accuracy of DVF predictions. To address this issue, numerous studies have employed generative adversarial networks (GANs) to enforce DVF regularization and prevent unrealistic DVF prediction. Lei et al. (2019) proposed a pioneering GAN-based method for 4D-CT abdominal images, integrating a dilated inception module to extract multi-scale structural features, resulting in robust motion estimation. They further developed MS-DIRNet (Lei et al. 2020), which combines global and local registration networks with a self-attention mechanism in the generator, improving the differentiation of minimal moving structures. Fu et al. (2019) introduced a cascaded model for 4D-CT lung registration, comprising CoarseNet and FineNet. CoarseNet predicts a rough DVF on downsampled images, whereas the patch-based FineNet model predicts local lung motion on a fine-scale image. Additionally, they enhance vessel contrast by extracting pulmonary vascular structures before registration, leading to improved accuracy compared with that of conventional methods. Similarly, Yang et al. (2021) and Jiang et al. (2020) proposed multi-scale unsupervised frameworks for pulmonary 4D-CT registration, using three cascaded models at different resolutions to progressively refine DIR. The submodels are initially trained independently at each resolution, and then jointly optimized in a multi-scale framework to increase the end-to-end registration accuracy. The experimental results demonstrated that the proposed MANet (Yang et al. 2021) outperformed other registration methods, with an average TRE of 1.53 ± 1.02 mm on the Dir-Lab dataset. Moreover, the DVF estimation process is completed in only about approximately 1 s and requires no manual parameter tuning.

In addition, researchers have explored various approaches to incorporate prior information as regularization terms for constraints in thoracic-abdominal 4D-CT registration. Lu et al. (2021) employed recurrent networks to leverage the temporal continuities of 4D-CT, aiming to reduce the influence of artifacts in certain phases. Wei et al. (2021) introduced a U-Net-based model for intra-subject registration, achieving improvements in all ROIs, particularly for tumor volumes. Duan et al. (2023) integrated a lung segmentation network into the registration network to create a spatially adaptive regularization term, accommodating smooth and sliding motion. Furthermore, Iqbal et al. (2024) employed Jacobian regularization to prevent undesirable deformation and folding in the displacement field. Considering complex and large motion patterns in abdominal 4D-CT, Xu et al. (2023) proposed a recursive cascaded full-resolution residual network that performs progressive registration cooperatively. Recently, Xu et al. (2023) adopted a recursive registration strategy using ordinary differential equations integration of voxel velocities. Their method outperformed other learning-based methods, producing the smallest TREs of 1.24 mm and 1.26 mm with two publicly available lung 4D-CT datasets, Dir-Lab and Popi. This method produced less than 0.001% unrealistic image folding (fraction of negative values in the Jacobian determinant) and computed each CT volume in under 1 s.

Despite the impressive results achieved by DL-based methods, they have certain limitations. A primary challenge is their heavy reliance on large amounts of training data. Another limitation is their inability to register images that are significantly different from the training images. To address this issue, one-shot learning methods for DIR have been proposed (Fechter and Baltas 2020; Zhang et al. 2021; Chi et al. 2022). These methods, such as GroupRegNet (Zhang et al. 2021), use CNNs as feature extractors to register multiple 3D-CT images. The computed transformation is used to warp the input image into a common space, and iterative weight updated through backpropagation. Convergence criteria are evaluated to determine when to terminate the iterative process. Compared with the other DL-based methods, GroupRegNet reduced the original TRE from 8.12 ± 4.77 mm to 1.03 ± 0.64 mm on the public dataset, Popi, achieving a 44% reduction in the RMSE. However, these one-shot methods still require optimization for registering unseen images, making them similar to traditional iterative optimization methods. This approach can lead to overfitting and a lack of stability. Additionally, the registration process is typically slower than that of end-to-end DL methods, requiring several minutes to 30 min to complete (Zhang et al. 2021).

4.1.3 AI in 4D-CT tumor delineation

AI-driven automatic target segmentation in 4D-CT imaging is also an intensively investigated field. Manual delineation in each phase of 4D-CT can be time-consuming, exhaustive, and prone to subjective errors due to variations in tumor location caused by respiratory motion. Therefore, there is a demand for developing a computer-aided method for automatic, fast, and accurate tumor segmentation via 4D-CT. Li et al. (2018) utilized an Inception V3 architecture pre-trained with the ImageNet dataset to segment gross tumor volume (GTV) on each phase and combined them to predict the internal GTV (iGTV) for non-small cell lung cancer, demonstrating the potential of DL approaches in improving target delineation accuracy. Ma et al. (2023) explored 3D U-Net and its variants to leverage multiple phases of 4D-CT for automated intelligent delineation of iGTV in lung cancer. Momin et al. (2021) developed a motion region CNN that automated 4D-CT lung data delineation by incorporating global and local motion estimation networks and employing a self-attention strategy. Zhou et al. (2022) proposed a patient-specific target contour prediction model for the pancreas, which achieved a high Dice similarity coefficient (DSC) of 98% for tumor positioning without the need for pancreas segmentation. Yang et al. (2024) introduced a dual-encoding network for liver tumor segmentation, yielding promising results with a mean DSC of 0.869 for GTVs and 0.882 for iGTVs. Overall, these studies highlight the viability of employing AI techniques to increase the precision and efficiency of tumor delineation in 4D-CT imaging across various types of cancer, including lung, liver, and pancreatic tumors.

4.2 AI in 4D-MRI

Currently, 4D-MRI is still under investigation with challenges that need to be overcome before it can be fully adapted for clinical use (Yuan et al. 2019). One major challenge is the tradeoff between spatial and temporal resolution. To achieve a reasonable imaging time, 4D-MRI images are often heavily undersampled, resulting in low spatial resolution and motion artifacts that can blur fast-moving structures. This challenge poses difficulties in modeling DVFs from 4D-MR images for tumor tracking, especially in the abdominal region, which has complex soft anatomical variations. Recently, DL has been employed in 4D-MRI. A detailed summary of DL-based studies is presented in Table 3. Most studies in 4D-MRI have focused on alleviating the spatiotemporal tradeoff during the reconstruction and post-processing stages, whereas others have aimed to improve motion modeling accuracy despite poor image quality. These papers can be categorized into three classes: 4D-MRI reconstruction, 4D-MRI super-resolution, and motion estimation. Figure 6 presents an example of enhanced 4D-MRI results using different deep algorithms.

Fig. 6
figure 6

Visual example of low-quality 4D-MRI (a) and the super-resolved images by EDSR (b), Pixel2pixel (c), and 2.5D-cGAN (d). The selected ROI (yellow rectangle) represents the detailed features affected by respiratory motion. Figure reprinted from Zhi et al. (2023)

Table 3 Summary of AI-based studies in 4D-MRI

4.2.1 AI in 4D-MRI reconstruction

In recent years, researchers have explored various techniques to reconstruct high-quality MR images from undersampled acquisitions, such as parallel imaging and compressed sensing (CS) (Lustig et al. 2007). However, these methods have limitations in efficiently removing artifacts and noise, especially at high acceleration rates. Moreover, selecting the regularization parameters in constrained reconstruction methods is often empirical, computationally intensive, and time-consuming. DL approaches have emerged as an alternative approach that can bypass these issues by unrolling the iterative process and learning the parameters through network training. Several studies have proposed AI-based approaches for 4D-MRI reconstruction. Zhang et al. (2021) proposed a hybrid approach using the parallel non-Cartesian convolutional recurrent neural network (PNCRNN) for undersampled abdominal dynamic parallel MR data. The PNCRNN combines CRNN-based blocks to learn spatial and temporal redundancies, along with non-Cartesian data-consistency (DC) layers that imitate gradient descent for non-Cartesian data fidelity. The PNCRNN achieves high image quality and fast convergence within only a few iterations, and it can be combined with other unrolled networks for abdomen imaging with non-Cartesian sampling. Küstner et al. (2020) proposed a motion-corrected reconstruction network that unrolls an the alternating direction method of multipliers (ADMM) algorithm via cascaded (3+1)D U-Net to exploit spatial-temporal redundancies. They also introduced a self-supervised approach to improve the accuracy and reliability of the registration network (Küstner et al. 2022). Another study proposed the stDLNN (Wang et al. 2023), a method that combines model-based techniques with spatial-temporal dictionary learning approach to increase the efficiency and quality of 4D-MRI reconstruction. The experiment results showed that the stDLNN outperformed other state-of-the-art (SOTA) methods in terms of reconstruction quality and computational efficiency. Furthermore, Murray et al. (2024) proposed Movienet, which exploits space-time-coil correlations and motion preservation instead of k-space data consistency, to accelerate the acquisition and reconstruction of dynamic MR images. Overall, DL approahces have demonstrated the potential to improve the speed and quality of 4D-MRI reconstruction, especially in non-Cartesian acquisitions. These techniques can reduce noise and artifacts effectively, resulting in clearer and more accurate images, even at high acceleration rates. Moreover, DL-based models can also automatically select appropriate reconstruction parameters, eliminating the need for time-consuming empirical parameter selection.

4.2.2 AI in 4D-MRI super-resolution

Super-resolution methods can address the spatial-temporal tradeoff by improving spatial resolution and reducing artifacts in 4D-MR image post-processing. These methods can be broadly categorized as single-image super-resolution and prior image-based super-resolution methods.

Single-image methods generate high-resolution images from single low-resolution input images via image-to-image translation models. However, applying these methods to 4D-MRI faces challenges such as scarce training data, and potential mismatching with ground truth due to respiratory movements. To address these issues, researchers have developed data generation and augmentation modules. Chun et al. (2019) proposed a cascaded model with a downsampling network to generate perfectly paired low- and high-resolution data for training. Gao et al. (2023) developed a 3D GAN and proposed a novel data augmentation approach by gating into multiple respiratory states. Eldeniz et al. (2021) used unsupervised learning to minimize reconstruction artifacts by exploiting incoherent artifact patterns. Park et al. (2021) proposed an in-plane super-resolution method named ACNS, which achieves high image quality with significantly reduced computational time. However, most researchers have used 2D networks, which cannot capture the rich structural information that 3D networks could provide. Moreover, 2D-based methods only enhance the in-plane resolution, leaving the slice thickness unchanged from the original value.

Existing prior image-based super-resolution methods utilize multiple low-quality images or high-quality patient-specific MR images as reference images to leverage additional information regarding the fine anatomical structures. Gulamhussene et al. (2022) proposed a DL-based model that directly learns the relationship between the navigator and static volume slices, enabling high-quality 4D full-liver MRI reconstruction in near real time. Recently, transfer learning has been incorporated to address the time-consuming training for each patient and overcome domain shifting (Gulamhussene et al. 2023). Sarasaen et al. (2021) proposed a U-Net-based super-resolution model with fine-tuning using subject-specific static high-resolution MRI, resulting in high-resolution dynamic images. Terpstra et al. (2023) introduced MODEST, which uses low-dimensional subnetworks to reconstruct 4D-MRI by registering the exhale phase to every other respiratory phase using undersampled 4D-MRI and computed DVFs as input. More recently, Jafari et al. (2023) proposed GRASPNET, which sequentially leverages spatial and temporal correlations, to remove aliasing artifacts in the image domain, while achieving rapid reconstruction within seconds.

4.2.3 AI in 4D-MRI motion estimation

In addition to improving image quality from undersampled acquisitions, significant interest and challenges involve obtaining reliable 3D motion fields from compromised images with inconsistent tumor contrast, severe artifacts, and limited spatial resolution. Lv et al. (2018) introduced an unsupervised CNN-based registration method for motion analysis in abdominal images. This method outperforms non-motion corrected and local affine registration methods in visual score and vessel sharpness, with a substantial reduction in registration time from one hour to one minute. Küstner et al. (2020, 2022) proposed an aliasing-free motion estimation method in k-space using optical flow equations, which demonstrated improved reconstruction quality compared with that of image-based motion-corrected reconstruction. Moreover, Terpstra et al. (2021) developed TEMPEST, a multi-resolution CNN for analyzing DVFs in 3D cine-MRI data. Interestingly, TEMPEST also showed promising results with a public 4D-CT dataset without any retraining, indicating its excellent generalizability.

The integration of motion estimation with reconstruction or super-resolution techniques has also been explored. Xiao et al. (2022) proposed DDEM, a dual-supervised mode that mitigates the challenges of noise and artifacts in unsupervised methods by incorporating referenced DVFs as supplementary constraints. By using DDEM, 4D-DVFs are computed and used to deform prior images, resulting in high-quality 4D-MRI with improved accuracy. Recently, Xiao et al. (2023) extended this method by integrating a DenseNet-based reconstruction module, demonstrating the feasibility of real-time imaging even at high downsampling factors up to 500. In addition, Zhi et al. (2023) developed a cascaded model named CoSF-Net that simultaneously enhances the DIR and image quality of 4D-MRI. It incorporates two registration submodels for coarse-to-fine registration and a 2.5D cGAN super-resolution module. The experiment results showd that CoSF-Net outperformed SOTA networks and algorithms in motion estimation and image resolution enhancement for 4D-MRI.

4.3 AI in 4D-CBCT

The use of 4D-CBCT improves both target coverage and normal tissue avoidance in thoracic IGRT (Rusanov et al. 2022). However, in 4D-CBCT, sparse-view sampling at each respiratory phase leads to noise and streak artifacts in images reconstructed with the clinical back-projection algorithm, adversely affecting target localization accuracy. As a result, improving the quality of 4D-CBCT reconstructions is essential for ensuring the precision of radiation therapy delivery. In addition, motion estimation in 4D-CBCT is impacted by streak artifacts. Existing methods enhance image quality through a combination of enhancement techniques to improve motion estimation accuracy. This review discusses these methods within the context of motion compensation-based enhancement, avoiding repetition. Table 4 report DL-based approaches for 4D-CBCT enhancement.

Table 4 Summary of AI-based studies in 4D-CBCT

4.3.1 AI in 4D-CBCT enhancement

In previous years, various algorithms have been developed to address the intra-phase undersampling issue in 4D-CBCT. Notably, compressed-sensing (CS)-based methods have been applied to sparse-view CT/CBCT reconstruction and demonstrated high image quality by leveraging the sparsity characteristics in specific domains (e.g., the gradient domain or other transform domains) (Jiang et al. 2019). However, CS-reconstructed images may lose some fine structures and over-smooth the edge information. The prior-deformation-based methods (Zhang et al. 2018; Ren et al. 2014) assume that the onboard 4D-CBCT is a deformation of the prior 4D-CT. They reconstruct high-quality 4D-CBCT by deforming the prior 4D-CT using the DVFs solved under data fidelity and bending energy constraints. However, the deformation accuracy can be compromised in low-contrast regions. In addition, both CS- and prior-deformation-based algorithms require manual tuning of hyper-parameters and iterative optimization, which can require minutes or even hours to complete. Another category of 4D-CBCT reconstruction methods is motion-compensated algorithms. These methods apply motion models to deform other phases onto the target phase to overcome the intra-phase undersampling. However, the inter-phase deformation accuracy is limited by the poor quality of the initial or intermediate-phase images.

Recently, DL has also been utilized for improving the image quality of sparse-view 4D-CBCT. The existing methods generally fall into three categories: projection pre-processing, image reconstruction, and image post-processing.

Projection pre-processing Projection pre-processing methods utilize DL approaches to interpolate or synthesize unmeasured projection views. After that, the analytical algorithm is adopted for reconstruction. For example, Beaudry et al. (2019) proposed a DL method to reconstruct high-quality 4D-CBCT images from sparse-view acquisitions. They estimated projection data for each respiratory bin by drawing projections from adjacent bins and linear interpolation and then fed them into a CNN model to predict full projection data. This approach successfully promoted streaking artifact removal and noise reduction in FDK reconstructed images. However, DL-based projection pre-processing has not been extensively studied because the raw data of commercial scanners are usually unavailable to most researchers, and improper operations on the projection data may lead to secondary artifacts in the reconstructed images.

Image reconstruction DL approaches have been explored to enhance the image quality of 4D-CBCT images. Similar to 4D-MRI, the hybrid methods integrate data fidelity, domain transformation knowledge, and image restoration into one DL framework to improve the reconstruction performance. Several studies have attempted to exploit spatiotemporal correlation using DNNs as the constraint term in the objective reconstruction model. These methods usually adopt a joint learning strategy for optimization in both the projection and image domains. For example, Liu et al. (2019) incorporated a prior deformation motion derived from CNN into the iterative reconstruction framework for compensating for the CBCT volume and optimized it via a variable splitting algorithm. Chen et al. (2020) adopted a proximal forward-backward splitting method to the proposed 4D-AirNet models. Hu et al. (2022) proposed a framework termed PRIOR for 4D CBCT, which uses a well-trained neural network as the regularization constraint to improve the reconstruction image via an effective iterative strategy. These deep models have achieved promising performance by synergizing iterative and DL methods for image reconstruction. Xiao et al. (2023) developed a motion-sensitive cascaded model for real-time 4D-CBCT reconstruction. This model combines a dual attention mechanism, a residual network, and a principal component analysis model to map single projections from different breathing phases to each phase of 3D-CBCT, enabling real-time 4D-CBCT reconstruction.

Image post-processing Most studies have focused on improving 4D-CBCT via image post-processing, which aims to correct errors or artifacts in the images after reconstruction. These methods process the initially reconstructed image as input and enhance the image quality to better align with the fully-sampled images. Numerous efforts have been devoted to using spatial-temporal information in conventional analytic or total variation (TV)-based images to mitigate streaking artifacts and noise or recover structral details. These methods can be divided into three categories: group data-driven methods, motion-compensated methods, and patient-specific prior image-guided methods.

Group data-driven methods involve training DNNs with datasets containing groups of patients to learn the mapping from initial reconstructed images to target images. Jiang et al. (2019) proposed the use of a symmetric residual CNN to increase the sharpness of edges in TV-regularized undersampled CBCT. Lee et al. (2019) constructed a residual U-Net with a wavelet-based process to remove streaking artifacts from FBP-reconstructed images. Sun et al. (2021) incorporated transfer learning to fine-tune a group-trained model with a patient-specific dataset for individual patients, demonstrating superior performance in recovering small lung textures and eliminating noise. In the above studies, researchers prepared paired training samples by simulating 4D-CBCT from ground truth 4D-CT and then trained supervised models with pixel-level loss functions (e.g., L1 loss, L2 loss). However, it is worth noting that there may be a significant difference between the simulated data and real data, which would inevitably decrease the model performance when applied in the clinical setting. In contrast, Madesta et al. (2020) proposed a self-contained method by training a CNN with pseudo-average and time-average CBCT images to suppress streaking artifacts without additional data requirements.

Recently, GAN-based models, particularly CycleGANs, have gained attention for weakly supervised, even unsupervised learning, specifically tailor to 4D-CBCT, where it is challenging to obtain perfectly matched image pairs owing to respiratory movements. Dong et al. (2022) built a CycleGAN to learn the relationship between unpaired undersampled CBCT images and high-quality CT images with a contrastive loss function to preserve the anatomical structure in the corrected image. Usui et al. (2022) utilized CycleGAN to train unpaired thoracic 4D-CBCT images with high-quality multi-slice CT (MSCT), resulting in enhanced images with fewer artifacts and improved visibility of lung tumor regions. More recently, Zhang et al. (2021, 2022) demonstrated the effectiveness of GAN-based models in enhancing 4D-CBCT for radiomics analysis. Motion compensation (MoCo) compensates for the respiratory motion of each phase-correlated image by employing interphase DVFs Zhang et al. (2019), as shown in Fig. 5e. Compared with earlier MoCo methods (Zhang et al. 2019; Wang and Gu 2013), DL has improved the efficiency and accuracy of MoCo by enhancing the prior motion estimation model. For instance, Huang et al. utilized two DNNs to obtain high-quality DVFs and embedded them into the SMEIR workflow to produce refined 4D images. However, these methods still relied on the estimation of DVFs from low-quality initial images, and the performance of MoCo reconstruction heavily relies on registration accuracy. To solve this issue, researchers have proposed alternative approaches from two different perspectives. On the one hand, Zhang et al. (2023) hypothesized that high-quality initial 4D-CBCT images would improve motion estimation accuracy. Thus, they incorporated a 3D CNN to reduce the structural artifacts from initial FDK-reconstructed images and then estimated motion on the basis of the artifact-mitigated initial images, which could further restore the lost information. On the other hand, Jiang et al. (2022) developed FeaCo-DCN with deformable convolution networks (DCNs) to align adjacent phases to the target phase at the feature level instead of explicitly deriving interphase DVFs from low-quality images. The model achieves SOTA performance in the SPARE challenge with Monte-Carlo 4D-CBCT datasets. However, the image quality may degrade when applying the model to clinical 4D-CBCT scans because of noise variations. This common issue for DL-based methods, which can be resolved by tuning the model via real projections.

Fig. 7
figure 7

Examples of 4D-CBCT images and results by various algorithms. A depicted the prior CT image, while BE present the CBCT images reconstructed by the FDK, ASD-POCS, 3D U-Net, and proposed models, respectively. F shows the corresponding ground truth CBCT. The red arrows indicate image details for visual inspection. FDK Feldkamp-Davis-Kress. Figure reprinted from Jiang et al. (2021b)

Another category is the patient-specific prior image-guided method, in which the intra-patient prior image is incorporated into the phase-by-phase reconstruction process. Jiang et al. (2021b) proposed a merging-encoder CNN (MeCNN) that leverages patient-specific information from the prior CT image to enhance under-sampled image reconstruction (Fig. 7). They also introduced a dual-encoder CNN for average-image-constrained enhancement that extracts features from both the average 4D-CBCT image and the target phase image (Jiang et al. 2021a). Zhi et al. (2021) developed N-Net and its enhanced version, CycN-Net, to refine phase-resolved images by incorporating prior images reconstructed from the full 4D-CBCT projection set. In CycN-Net, five consecutive phase-resolved images and the prior image are independently encoded, and the extracted feature maps are fused during decoding to predict the target phase. The experimental results demonstrated that the CycN-Net outperformed other 4D-CBCT methods in preserving delicate structures and reducing artifacts and noise. However, the prior image may still contain blurred artifacts from CT exposure, which can result in residual artifacts in the reconstructions.

4.4 AI in 4D-PET

DL-based studies for 4D-PET imaging in the abdomen are relatively limited with only four papers focusing on image enhancement, particularly denoising, as shown in Table 5. PET acquisition is typically completed in 10 to 20 min (Manber et al. 2015), during which patient breathing and movement cause motion artifacts that affect image quality. Conventional image post-processing methods such as Gaussian filtering (Floberg and Holden 2013) and non-local mean filter (Dutta et al. 2013) can improve image quality to some extent, but they often result in over-smoothing in ultra-low-dose data. This limitation has led to the exploration of DL-based approaches, which can be classified into two categories - those that use only low-quality PET data as input (Gong et al. 2018; Zhou et al. 2021) and those that incorporate MR/CT images as input (Munoz et al. 2021). While both approaches have achieved superior denoising performance with static PET data, none have addressed motion estimation and denoising for respiratory-gated PET.

To address the issues of motion estimation and denoising for respiratory-gated PET, several MoCo approaches have been proposed. Similar to 4D-CBCT, these approaches typically involve initial image reconstruction gate-by-gate and image registration for motion estimation among different gates. However, noisy gated images can lead to inaccurate motion estimation, and iterative DIR is time-consuming. To address the noise, Zhou et al. (2020) proposed a siamese adversarial network (SAN) to estimate motion between pairs of low-dose gated images. They first denoised the low-dose gated images and then estimated motion on the basis of the denoised images. Building upon this work, they introduced the MDPET (Zhou et al. 2021), which combines motion correction and denoising into a unified framework. MDPET uses an RNN-based motion estimation network to leverage temporal information, and a denoising network to generate high-quality denoised PET images, as shown in Fig. 8. In addition, Li et al. (2020) proposed an unsupervised non-rigid image registration framework to estimate deformation fields in respiratory-gated images. On this basis, Li et al. (2021) developed a joint estimation method that incorporates DL-based image registration into a constrained image reconstruction algorithm. This unsupervised learning approach does not require ground truth for training, which is often unavailable.

Fig. 8
figure 8

Examples of 4D-PET and denoising results by various algorithms. The average low-dose gated images generated from different motion estimation methods are shown in the 1st row. The corresponding denoised images are shown in the 2nd row. From left to right: ground truth, U-Net denoising from the averaged image without any deformation, U-Net denoising on the averaged image based on NRB-derived deformation fields, U-Net denoising on the averaged image based on VM-derived deformation fields, U-Net denoising on the averaged image based on SAN-derived deformation fields, and the end-to-end output from MDPET. Figure reprinted from Zhou et al. (2021)

Table 5 Summary of AI-based studies in 4D-PET

5 Discussion

In the past five years, AI has made remarkable advancements in the field of 4D-imaging, leading to improvements in imaging speed and quality. Additionally, AI approaches hold significant promise for motion management, such as the use of deep models to replace traditional iterative registration for real-time tumor tracking. However, some research challenges still need to be addressed. Moving forward, it is crucial to focus on developing and optimizing DL technologies for 4D imaging in the context of clinical practice. This chapter offers a comprehensive overview of the achievements and limitations in current research on AI approaches in 4D-imaging. We also discuss the remaining challenges in this field and propose future research directions.

5.1 Achieved advances

The advances achieved by AI in 4D-imaging for motion management can be summarized as follows:

Improved image quality: AI techniques have shown great promise in enhancing image quality in 4D-imaging during the reconstruction and post-processing stages. For example, DL-based algorithms have successfully reduced DS artifacts in 4D-CT, enhanced spatial resolution in 4D-MRI, suppressed streak artifacts in 4D-CBCT, and mitigated noise in 4D-PET. These improvements enhance tumor visibility, facilitating accurate target localization and treatment delivery, as well as augmenting radiomics analysis.

Accelerated acquisition and processing: AI algorithms have enabled faster acquisition and processing of 4D images. For example, a cascaded model has been proposed to reconstruct 4D-MRI at downsampling factors of up to 500, enabling real-time applications of ultra-fast 4D-MRI (Xiao et al. 2023). Moreover, data-driven methods have replaced iterative processes in image reconstruction and registration with single-step predictions, reducing the computation time from minutes to microseconds (Sentker et al. 2018; Zhang et al. 2021; Xiao et al. 2022). This acceleration greatly reduces the processing time for tasks like image reconstruction and motion estimation, thus expediting the entire workflow.

More accurate motion modeling: AI techniques have been leveraged to enhance motion estimation in 4D imaging. By addressing the challenges associated with poor image quality, such as low spatial resolution, streak artifacts, and noise, AI-driven approaches, including cascaded image refinement and motion estimation, as well as incorporating patient-specific temporal information, have the potential to improve the accuracy and robustness of motion estimation algorithms (Lu et al. 2021; Zhi et al. 2023).

Reduced manual intervention and costs: AI-based automation has reduced manual intervention and costs in 4D-imaging. Model-based reconstruction processes can be unrolled by deep networks, eliminating the need for manual parameter tuning.(Liang et al. 2019) Similarly, AI-based DIR bypasses the parameter selection process in traditional registration algorithms. Additionally, AI approaches have shown promise for fully automated delineation in 4D-imaging reducing the workload of physicians in performing manual delineation on all breathing phases,minimizing time-comsumption, and potentially lowering costs.

5.2 Current limitations

Despite notable achievements, current research in 4D-imaging still has several limitations that need attention. These limitations include the following:

Limited data volume and lack of external validation: Currently, 4D-imaging has not been fully integrated into clinical practice, especially 4D-MRI, resulting in a small volume of available data. Studies often involve a limited number of patients ( \(\le\) 50 patients) from a single institution, and lack external validation. Some studies use simulated datasets instead of real patient data, potentially deviating from real-world clinical scenarios and thus hinders a comprehensive evaluation of AI methods’ accuracy, generalizability, and clinical applicability.

One-sided evaluation: Despite the extensive study of AI approaches in medical imaging, the evaluation standards remain somewhat insufficient. For instance, in reconstruction and enhancement, most studies still rely on evaluation metrics commonly used in general images, such as the RMSE and SSIM (Akagi et al. 2019; Chun et al. 2019; Zhi et al. 2023). However, these metrics may not adequately assess a model’s performance, particularly its effectiveness in addressing respiratory motion-related issues in clinical scenarios.

Absence of public datasets: 4D-imaging is still a relatively new field, and there is a lack of publicly available datasets for cross-comparison. Public datasets are crucial for advancing algorithm research and facilitating algorithm translation into clinical practice. While public datasets exist for 4D-CT DIR, enabling direct comparison of registration accuracy improvements achieved by different AI algorithms, other 4D-imaging tasks and modalities suffer from a scarcity of public datasets (Table 6). This challenges limits the intuitive comparison of algorithm strengths and weaknesses as they often rely on private datasets.

Table 6 Summary of public datasets of 4D-imaging

5.3 Remaining challenges

Although significant progress has been made in AI applications of 4D-imaging for motion management, this field is still in its early stages, and several challenges remain. These challenges need to be addressed to further advance research in this area.

Limited data: As mentioned earlier, 4D-imaging studies often suffer from limited data, which poses a significant challenge for AI algorithms, especially DL models. Limited data can easily lead to overfitting, poor generalizability, and decreased performance, particularly when dealing with irregular respiratory patterns in real-life applications.

Lack of ground truth: Data-driven models in 4D-imaging struggle with the absence of ground truth data. In motion estimation, traditional methods are used to obtain the ground truth DVFs for training and validating deep models. However, these DVFs are not the true ground truth and may introduce errors. In image enhancement, obtaining high-quality reference images at the pixel level is challenging because of respiratory motion, making supervised learning models difficult to train and prone to structural distortions. Additionally, reference-based evaluation metrics may be inaccurate for evaluation.

Inability to restore details and avoid distortion: Restoring image quality from undersampled acquisitions is a critical challenge. Current studies predominantly rely on data-driven post-processing approaches. However, these approaches face two primary challenges. First, post-processing methods are incapable of creating details out of nothing (Jiang et al. 2019). Second, the use of deep models for artifact removal and denoising may introduce distortions and loss of anatomical structures due to difficulties in distinguishing specific features. Although incorporating patient-specific prior images has been identified as a potential solution, these prior images may still contain imperfections such as noise, motion artifacts, or blurring, potentially affecting the quality of enhanced images. Additionally, misalignment between static prior images and 4D-imaging can lead to unintended distortions in the enhanced results.

Insufficient motion estimation: The current level of registration accuracy in 4D-imaging remains unsatisfactory, particularly when dealing with complex motion patterns, poor contrast, extremely low spatial resolution, and various artifacts. These factors pose additional challenges to AI models in accurately extracting anatomical features for motion estimation while mitigating potential interference. Furthermore, irregular respiratory motion has yet to be thoroughly investigated.

5.4 Future directions

To advance the field of AI-based 4D-imaging and facilitate its integration into clinical practice, future research should focus on several key directions:

Reliability: It is essential to develop AI approaches that are more reliable, precise, and explainable. Researchers should explore approaches that integrate AI approaches with prior information, instead of relying solely on black-box methods. By incorporating prior knowledge and constraints, AI models can produce more interpretable and trustworthy outputs, enhancing their practical value in clinical settings.

Efficiency: While current studies primarily emphasizes improving accuracy, factors such as processing time, computational cost, and memory usage should also be considered. Real-time applications of 4D-imaging in treatment workflows require low latency, high speed, and computational efficiency. Therefore, future studies should aim to develop AI models that not only achieve high accuracy but also meet the efficiency requirements for real-time applications.

Generalizability: To advance AI in 4D-imaging for clinical practice, it is necessary to move beyond proof-of-concept studies with limited single-source data or simulated data. Conducting multi-institutional studies involving diverse patient populations and imaging systems is crucial for enhancing the generalizability of AI methods. These studies provide valuable insights into the performance and limitations of AI-based 4D-imaging techniques across various clinical settings, enabling more comprehensive feedback and validation of these methods.

Clinical validation: More comprehensive clinical evaluation and validation metrics are highly needed to assess the true clinical impact beyond traditional evaluation metrics. Future research should consider factors such as the integration of 4D-imaging into the clinical workflow and treatment outcomes. This will help verify the effectiveness of AI-based 4D-imaging in real-world clinical practice, facilitating its adoption in clinical radiation therapy.

6 Conclusion

The growth of AI methods, particularly DL methods, has notable increased the advancement of 4D-imaging techniques. Numerous studies have demonstrated the potential of AI models in enhancing the efficiency and accuracy of 4D-imaging. Moreover, AI-based approaches have facilitated the application of 4D imaging in motion management, resulting in substantial reductions in time consumption and human intervention. Despite these remarkable achievements, there are still limitations and challenges in the field that need to be addressed. Future studies should focus on the reliability (or transparency), efficiency, and generalizability of the developed methods and systems to integrate of AI-based 4D-imaging into clinical practice for motion management.