1 Introduction

According to the World Health Organization (WHO), cerebrovascular diseases like Stroke are the second major cause of death. This disease causes almost 11% of total deaths among adults with the vital effect of everlasting disability. The studies show that after every two seconds, an individual suffers from a stroke, resulting in approximately 6.24 million deaths globally (www.who.int 2022), (Hewage et al. 2023), Feigin et al. 2022). The disease is caused by several factors, which include fast and unhealthy lifestyles, high blood pressure, atherosclerosis, diabetes, hypolipoproteinaemia, tobacco smoking, and obesity. Stroke or Cerebral Infarction is classified into two major types: (1) Ischemic stroke and (2) Hemorrhagic stroke. The expired brain cells, produced by Ischemia, are prone to infarction or ischemic stroke. This cerebral infarct passes through different stages with increasing severity levels: (1) Hyper-acute, (2) Acute, (3) Sub-acute, (4) Chronic.

The Alberta Stroke Programme Early CT Score (ASPECTS) defines the quantitative gauge for ischemic stroke identification. The ASPECTS has defined the sub-classes of stroke as (1) acute, (2) sub-acute, and (3) chronic. An earlier phase of acute infarcts is referred to as ischemic core, which cannot be determined through baseline imaging; however, histological information will be helpful for confirmation of fat tissue death. The region around the Ischemic core is called “penumbra, “representing the brain tissue that is still salvageable. The “penumbra” area is at risk of infarction if proper treatment is not provided (Alaya et al. 2022). The Bamford clinical classification system (https://neurovascularmedicine.com) defines three types of Ischemic stroke: (1) Partial anterior circulation syndrome (PACS), which influences the middle/anterior cerebral portion of the brain; (2) Lacunar syndrome (LACS), which causes the hollow organ and (3) Posterior circulation syndrome (POCS ) which disturbs posterior flow of blood supply towards uni-direction of the brain (Parmar,2018). Localizing and measuring the brain lesions is necessary to diagnose functional shortfall in patients. There are two significant classes of diagnosis and treatment methods for this disease: interventional and non-interventional.

(He et al. 2024) Blockages in cerebral vessels require prompt and effective treatment to enhance patient outcomes. Key methods for reopening these occluded vessels or interventions include thrombolysis and mechanical thrombectomy. Thrombolysis employs tissue plasminogen activator (tPA), an enzyme that converts plasminogen to plasmin, facilitating the breakdown of fibrin clots; this method is most effective within 4.5 h of stroke onset. Mechanical thrombectomy involves the physical extraction of the clot using specialized devices like stent retrievers or aspiration catheters, which are guided through the vascular system to the occlusion site. Emerging therapies and adjunctive treatments include intra-arterial thrombolysis, neuroprotection, antiplatelet and anticoagulant therapy, and advanced imaging techniques. Recent research in neuroprotective strategies aims to minimize neuronal damage and enhance recovery by targeting critical molecular processes involved in ischemic stroke pathogenesis. Modulating pathways such as NMDA receptor signaling and PI3K/Akt activation can significantly improve outcomes for stroke patients. (He et al. 2024).

The non-interventional methods are those where the diagnosis, classification, and treatment of the stroke can be made using medical images like Computed Tomography (CT), MR spectroscopy (MRS), MR angiography (MRA), and perfusion, and diffusion MRI (dMRI) (Bernal, Kushibar,. To extract important information from images, computer vision and medical image analysis-based methods involve visual perception, rendering, and interpretation of images.

Machine Learning (ML) has become a leading technology to aid medical professionals in making predictions leading to relevant decisions (Zhang 2017). Compared to other computational areas (i.e., rule-based systems, expert systems), ML-based methods have shown high performance in detecting and classifying Ischemic stroke disease using medical imaging. ML methods are broadly categorized as supervised and unsupervised, are used in various research studies to segment, and classify ischemic stroke. These studies focus on ML methods like Logistics Regression (LR) (Landwehr et al. 2005), Naive Bayes classifier (NB) (davisJohn and Langley 2013), Support Vector Machines (SVM) (Bentley, Ganesalingam et al. 2014), Bagging Ensembles (BaE), Boosting Ensembles (BoE) and Neural Networks (NN).

Each ML method requires dataset(s) of reasonable size for evaluation and validation. Authors (Sirsat et al. 2020) have identified 84 datasets used in different Ischemic stroke-related studies employing more than 19 ML methods. Although these methods show good progress, ML methods are limited in relying on hand-fabricated features engineered by domain experts. The features engineered by domain experts show variability and biases in observation and data interpretation. This variation and biases could range from prediction to Individual observation to sub-set and even at the complete dataset level. Deep learning methods like VGG-Net, Alex Net, ResNet, CNN, and fully connected Networks, with the property of automatic feature extraction, have provided improved support for disease diagnosis and related prediction-making.

Neural networks, especially convolutional neural networks (CNNs), have drawn significant attention in the last decade by boosting performance in various computer vision applications such as object recognition, identification, and classification. In medical image analysis, CNN architectures are mostly employed with multimodal images, mainly MRI and CT, for various tasks like pre-processing, detecting and segmenting lesions, and classification of strokes (Bernal et al. 2019). (Chung et al.2020) (Kamnitsas et al. 2017), (Zhang, Yu et al. 2023) (El-Hariri 2022).

Recently, it has been observed that deep learning-based methods called transformers have shown superior performance than conventional DL methods. (Kuang et al. 2024) (Ostmeier et al. 2024), (Suberi et al.2019), (Xu and Ding 2023a, b). Despite extensive research on ischemic stroke classification and segmentation, there remains a gap in understanding the performance comparison of different deep learning methods in this domain. Various deep learning models are used for ischemic stroke detection, each with its advantages and limitations. For instance, AlexNet provides fast training and inference times, which is highly beneficial for quick application in clinical environments. However, it has drawbacks, including limited hierarchical feature learning capabilities, which are essential for medical imaging tasks, and it often underperforms when working with small or limited datasets. In contrast, VGG-16, based on transfer learning, performs better for feature extraction in a hierarchical fashion, especially on large datasets. However, VGG-16 has its own drawbacks, such as overfitting on small datasets, which can be problematic in automated, non-interventional medical diagnostic methods. Additionally, training on large datasets requires high computational resources. With its more profound architecture (up to 152 layers), ResNet addresses the vanishing gradient problem often encountered in deep models, making it suitable for biomedical applications, such as ischemic stroke detection. The deeper network ensures a persistent flow of gradients, which reduces the risk of overfitting and improves performance on medical imaging tasks.VGG-16 generally works well in transfer learning models and achieves better accuracy through hierarchical feature extraction. However, it still suffers from high computational costs and a tendency to overfit, especially on smaller datasets. The ensemble model of VGG and U-Net outperforms in the case of Ischemic Stroke prediction.

U-Net and its variations, such as U-Net++, D-UNet, VGG-U-Net, and Inception U-Net, are particularly effective for ischemic stroke prediction and medical image segmentation. These models are better at generalizing on small datasets and are particularly suited for tasks requiring detailed segmentation. However, U-Net has limitations regarding its receptive field, gradient vanishing, and capturing spatial relationships. The Dense U-Net architecture, such as U-Net++, improves gradient flow with its dense structure and enhances feature extraction, making it better suited for complex structures. Additionally, the use of Dimension U-Net architecture improves feature engineering by processing data across multiple scales and modalities, capturing both local and global contexts.

Due to the limited availability of annotated data in medical imaging and the limited characteristics of stroke detection benchmark datasets, unsupervised models, such as Restricted Boltzmann Machines (RBM), are gaining traction. These models, combined with techniques like Contrastive Divergence, where weights and biases are randomly initialized and adjusted during training, can provide effective solutions for tasks with limited labelled data.

Specifically, there is a need to evaluate whether these methods are adequate for both classification and segmentation tasks. This involves comparing flat and hierarchical methods for their performance. A thorough study of available datasets and their performance metrics is necessary. Lastly, a discussion on state-of-the-art methods is required to address these gaps comprehensively.

The rest of the document is organized as follows. Section 3 describes image modalities with pattern identification and clinical and commercial applications for ischemic stroke detection. Sections 4 and 5 describe available open-source datasets and analyse the popular deep learning architectures used for segmentation and classification and performance metrics used to gauge the selected methods. A state-of-the-art method used for binary and multi-classification and segmentation with performance comparison is provided in Sect. 6. Section 7 provides a list of recommendations for investigators, and Sects. 8 and 9 list the critical analysis/challenges and conclusion with future research direction.

2 Study objectives and methodology

2.1 Study objective

This study aims to conduct a detailed review and evaluate popular deep learning models that have been successfully deployed for binary and multi-classification of selected diseases. In this regard, we will.

  • Discuss open-source datasets focusing on their strengths and weaknesses.

  • Critical evaluation of clinical and commercial applications for ischemic stroke detection, segmentation, and classification.

  • Comparison of different performance metrics and their usability from different perspectives.

  • Presentation and review of recent advancements and future directions to find better computational solutions for the disease.

2.2 Research methodology

This study’s data is extracted from different sources like Google Scholar, IEEEXplore, ScienceDirect, Ahajournals, Pubmed, Springer, Elsevier, Wiley, and NIH. The study spans the works published from 2013 to 2024. Search queries include “Ischemic stroke,” “Deep Learning and ISLES,” “Supervised and unsupervised lesion detection,” and “Neuroimaging.” A few of the more relevant research studies based on Neuroimaging before 2013 are also included. The following are the main factors used to select the research studies:

  • Is the performance of DL methods adequate in Ischemic stroke detection, classification, and segmentation?

  • What research directions and methods can produce high to clinically acceptable results?

  • Which of flat and hierarchical methods can produce better results?

  • Are the available datasets sufficient for chosen tasks?

The studies using neuroimaging and deep learning algorithms qualified for the review. Initially, 252 publications were selected based on the coined search strategy as given below in Figure 1. Not all collected papers meet the criteria inclusion in this study. Research works having any of the following factor are excluded:

  1. 1.

    Non-image-based stroke detection methods.

  2. 2.

    Traditional machine learning methods i.e., methods other than neural networks.

  3. 3.

    Haemorrhage & other types of infarcts.

  4. 4.

    Stroke treatment and recovery methods.

In the next phase, 154 out of 252 papers are filtered, focusing on supervised and unsupervised learning. From these 154 articles, 10 articles are filtered to exact the unique features of disease detection. In the third phase, 103 papers are selected based on statistical methods used, i.e., deep learning models for disease detection, segmentation, and classification.

Recommendations for researchers to address the identified issues and bridge the research gap have been extracted, carefully formulated, and clearly presented. The literature review is organized around the following research questions.

Research Question

Achievable

• Is the performance of DL methods adequate in Ischemic stroke detection, classification, and segmentation? What key parameters are utilized for performance comparison? Are there recommendations for improving the performance of current trends?

A key result of this study is a performance comparison of different deep learning architectures for ischemic stroke detection, which is detailed in Figs. 11 and 12. This will help select the most adaptable strategy for future research. Section 7 highlights various recommendations aimed at improving methods and enhancing generalization. These recommendations are discussed in detail, focusing on key parameters.

• What research directions and methods can produce high and clinically acceptable results?

Sections 7 and 8, 9 present a critical view to discuss current research directions and methods that can produce high clinical acceptance results.

• Can some methods operate on medical images hierarchically to extract multi-level information?

• How can methods such as Convolutional Neural Networks (CNNs), Multi-Scale Analysis, Pyramid Networks, Region-based Hierarchical Models, and Attention Mechanisms hierarchically operate on medical images to extract multi-level information?

CNN-based and global feature extraction methods are hierarchical ways to extract multi-level information, as discussed in Sects. 5 & 6.

Are the available datasets sufficient for chosen tasks? Which measure contributes to the dataset’s volume with their feature set extraction and a key characteristic of ischemic stroke detection?

a critical evaluation discussed in Sect. 4 describes how the benchmark dataset is further enhanced regarding feature set and image interpretation to produce more clinical acceptance results. Section 7 highlights various recommendations regarding to the benchmark dataset.

• What are the parameters that can make better generalizations with currently available datasets? Which highlights recent issues of class imbalance, approaches that resolve limited dataset issues and solutions toward enhanced generalization. How does A diverse and high-quality dataset help the model generalize better?

Sections 6, 7, 8 & 9 discuss the parameters that can make better generalizations in supervised and unsupervised learning methods.

3 Background

Medical images are a good source for ischemic stroke detection using non-invasive methods. Advanced imaging modalities are adding enriched information for statistical and computational analysis. These techniques include computed tomography (CT), positron emission tomography (PET), T1 MRI, T2 MRI, diffusion and perfusion MRI, diffusion tensor imaging, pH-weighted MRI, blood-brain barrier permeability MRI, and functional MRI. Within the subsequent context, we describe two imaging modalities, i.e., computed tomography (CT) and Magnetic Resonance Imaging (MRI). We also discuss the selection of suitable modalities according to different disease stage classifications and finding the unique features related to disease in various imaging modalities, which are essential for precise detection and treatment (Fig. 1).

Fig. 1
figure 1

Search strategy

3.1 Prominent medical imaging modalities for ischemic stroke

The imaging modalities used in Ischemic stroke detection include computed tomography (CT), positron emission tomography (PET), T1 MRI, T2 MRI, diffusion and perfusion MRI, diffusion tensor imaging, pH-pH-weighted MRI, blood-brain barrier permeability MRI, and functional MRI. PET is considered a gold standard for stroke pathophysiology evaluation. The advantage of PET is its ability to work semi-quantitative or quantitative hemodynamically. However, it is less used due to high cost, dangerous radiation, and limited availability(Ning, Sarracino et al. 2010)MRI and CT are the most used modalities in Ischemic stroke detection and diagnosis. For accurate segmentation, CT scans are faster and yield quantifiable knowledge on Cerebral Blood Flow (CBF); however, MRI can produce more detailed scans.(Michel et al. 2007).

3.1.1 Computed tomography (CT)

Neuroimaging Computed Tomography (CT) is a first-line imaging modality that focuses on the detection of acute neurological changes like tumors, stroke or infarction, and intracranial hemorrhage. Other CT Images can display the presence of cancer, ruptured disks, aneurysms, hematomas, and multiple types of different pathologies. Axial, coronal, sagittal, median, and para-sagittal. CT images are the most critical anatomical plans of CT.Several image acquisition modes are available for these modalities, including dual-energy imaging, axial and helical imaging, and perfusion imaging, and gated cardiac. Management of image noise, radian loss, and spatial dimension is necessary to produce better-quality images because the voxel dimension becomes smaller at the same radian dose, making the image noisy. For better medical image analysis, iterative construction techniques and residual denoising GAN can be implemented to produce high-resolution images from low-resolution images. In Ischemic stroke cases, CT Images describe the size, location, and vascular diffusion of infarction, judge the mutability of the Ischemic penumbra, and define the pathophysiology and etymology of the stroke (Amar and Arun 2011).

3.1.2 Multimodal computed tomography (MCT)

Computed Tomography is also helpful for better visualization, combining CT angiograms, non-contrast CT scans, and CT perfusion scans. Many organizations have created customized multimodal CT protocols (Kapila, Conley et al. 2011) that provide specific advantages for hyper-acute, sub-acute, acute, and chronic stroke visualizations. Multimodal computed tomography has the advantages of easy availability, detailed imaging of soft and hard tissues, and may assess salvageable penumbra. On the other hand, it may produce ionizing radiation effects, possible allergic reactions, nephrotoxicity, and relative insensitivity, specifically in the posterior fossa. For better diagnoses from CT images, contrast adjustment can produce better visualization. To measure neurological deficiency in stroke patients, non-contrast cerebral computed tomography is the prime selection. (von Kummer et al. 1994, Marks et al. 1999,Torbey et al. 2013).

CT plays a vital role in the differentiation between different types of strokes and other diseases such as tumors, bleeding, Alzheimer’s, etc. For acute focal neurological deficiency measures in patients, Non-contrast Cerebral Computed Tomography (NCCT) is preferred (von Kummer et al. 1994, Marks et al.1999, Torbey et al. 2013). Initial signs of stroke in non-contrast CT images are cortical sulcal effacement, loss of insular ribbon, blurring of grey-white matter interface, obscuration of the lentiform nucleus, and hyperdense artery. However, in most of the cases using CT scans, the signs are not clear and overlapped with normal signs Contrast enhancement agents play a good role in acquiring hidden information that is not visible when non-contrast images are used. CT angiography is performed for the assessment of vessel enhancement with the addition of a time-optimized bolus of contrast material (Amar and Arun 2011). Dynamic CT Perfusion (CTP) imaging is used to assess lasting infarcted brain from mutable Ischemic penumbra in stroke patients. CTP images are used for chasing the path of an intravenous bolus of iodinated contrast agent for 45 s to increase the attenuation. Quantitative analysis of different parameters is made to assess the cerebral blood volume (CBV), cerebral blood flow (CBF), mean transit time (MTT), and the time variance between arterial inflow and venous outflow.

3.1.3 Magnetic resonance imaging (MRI)

MRI sequences consist of T1, T2, Fluid-Attenuated Inversion Recovery (FLAIR), Gradient Recalled Echocardiogram (GRE), Diffusion-Weighted Imaging (DWI), and Perfusion-Weighted Imaging (PWI) together with MR angiography. MRI scans are highly proficient at capturing detailed features of internal body soft tissues and hence are superior to CT. MRI modalities produce multi-spectral images obtained through varying radio frequencies. Popular MRI modalities used in Ischemic stroke detection and analysis are T1, diffusion-weighted imaging (DWI), T2, T2*, fluid-attenuated inversion recovery (FLAIR), the mismatch between DWI & ADC, and Perfusion-Weighted Imaging (PWI) (Mezzapesa et al. 2006). Modality-wise visualization of different symptoms of Ischemic stroke can vary in MRI-based modalities. Table 1 presents a visualization of stroke indicators in different MRI modalities.

Table 1 Visualizations of stroke symptoms on MRI modalities

3.2 Imaging modalities for disease stage classification

Each imaging modality provides information from a specific perspective; considering the case of MRI, each variation in the MRI sequence presents information from different aspects which may help in the identification of different stages of the disease, i.e., acute, sub-acute, and chronic. However, any single modality is not sufficient to prognosticate stroke in all required details, therefore it is required to fuse MRI images Acquired through different modalities to extract maximum information.

Various studies show that researchers have focused on finding a suitable modality or combination of modalities that can help in the efficient detection of stroke. For example (Davis, Robertson et al. 2006) have concluded that Diffusion Weighted Imaging (DWI) is a more appropriate modality in the case of hyper-acute and other Ischemic stroke stages prediction. (Mohr et al. 1995) have shown that T2-weighted and FLAIR images are preferable modalities in case of early acute detection. However, finding the best combination of these modalities is still a research challenge (Shafaat and Sotoudeh 2022). Table 2 presents a short list of studies that focus on cherry-picking appropriate modalities for Ischemic stroke detection.

Table 2 Selection of imaging modalities in ischemic stroke detection studies

3.3 Imaging modalities and pattern recognition

The clinical progression of Ischemic stroke is shown in Fig. 2, where it progresses from early hyper to chronic. It is to be noted that early hyper-acute and late hyper-acute are less visible stages on CT scans.

Fig. 2
figure 2

Stages of Ischemic stroke

In the succeeding section, we discuss differences and variations in the visual features of medical images obtained through different imaging modalities for ischemic disease.

3.3.1 Hyper- acute

Hyper-acute can be considered a normal stage in CT scans. Significant signs are visualization of clots and early parenchymal signs developed within one or two hours of blocking occurrence (Jensen-Kondering et al. 2010). Figure 3 represents the earlier signs of Hyper-acute as compared to a normal brain. In the MRI, Reduced blood flow is observed at high intensity of T2/FLAIR.

Fig. 3
figure 3

A Normal CT head B Clot Visualization C Early parenchymal sign

3.3.2 Acute

Acute infarct onset time ranges from 24 h to one week of obstruction occurring. The initial signs of acute infarction include hyperdense artery/vessels which can be defined as any artery that appears more denser compared to other adjacent vessels. After occlusion, the clot can increase red blood cells in hemoglobin which raises the normal attenuation value i.e. 40–80 HU. Clots that show higher attenuation values are visible as hyperdense (Oguro et al. 2022). Second major sign of acute infarct is loss of grey/white matter differentiation which is depicted in Fig. 4.

Fig. 4
figure 4

1 Hyperdense artery (black line) 2 Normal gray-white matter 3 Loss of gray-white matter differentiation (side A is normal region and B is region showing loss of gray/white matter)

The following Fig. 5 presents a detailed view of the loss of gray white matter in different regions of the brain.

Fig. 5
figure 5

Loss of gray-white matter differentiation in different brain regions (insular region, basil ganglia, lentiform region)

At the earlier stage, the occlusion reduces the blood volume, which lowers attenuation and grey matter that is visible as iso-dense compared to white matter. The cells are deprived of blood supply, resulting in functional disorder and swelling, and they look more isodense. This state is called loss of grey-white matter differentiation (cytotoxic edema) (Qazi et al. 2015). The loss of grey-white matter differentiation can be seen in various brain areas, such as the basal ganglia, the lentiform nucleus, and the lateral margin of the insula (insular ribbon sign). The continuous increase in mass can cause the brain tissue to deviation from its normal position, which is known as midline shift/brain herniation. There are several herniations, which include subfacine, transtentorial, tonsillar, and external herniation. Figure 5 presents different tissue deviations in different brain regions Fig. 6.

Fig. 6
figure 6

Different types of brain herniation

3.3.3 Sub-acute

In the case of sub-acute infarct, the onset time is 1 to 3 weeks after the initial occurrence of stroke. The mass effect increases, producing more hypodense (dark) and compressed brain regions. Figure 7 shows the different brain changes caused by Sub-Acute infarct.

Fig. 7
figure 7

1 A hypodense wedge-shaped infarct (Sub-acute) 2 Sub-acute infarct with increased mass effect a hypodense sub-acute infarct b compressed ventricles (increased mass effect) c Normal ventricles

Haemorrhagic transformation occurs because of permanent resistant effects in the blood-brain barriers. These blood-brain barrier disruptions occur due to alteration in pericytes, tight junctions, astrocytes end feet, vascular remodelling and basement membrane thickness. Haemorrhagic transformation can lead to Intracerebral haemorrhage, Subarachnoid haemorrhage, Subdural haemorrhage, and Intraventricular haemorrhage (Spronk et al. 2021).

After one week of hypoattenuation, swelling becomes more markable, noteworthy and increase in mass effect can be perceived. The changes caused by these three factors are known as gyral enhancement (increased in grey matter) that is another sign of sub-acute infarct. The mass effect issues are resolved nearly after 8 week duration however, the contrast enhancement of soft tissues continues.

3.3.4 Chronic

Chronic infarct (stroke), also known as old infarct, starts from months to years after disease onset. After three weeks of occlusion, the process of tissue breakdown starts, and brain lobes are converted into cystic CSF-Full space. This type of cystic-CSF large volume space or enlargement of ventricles is called Hydrocephalus ex vacuo. Intracranial atherosclerosis (ICAS) caused by the composition of fat and cholesterol (plaque) is the major reason for the blockage of arteries creating a life threat for the patient (Liebeskind and Wardlaw 2021). Figure 8, 9 represent different signs of the chronic stage of Ischemic Stroke.

Fig. 8
figure 8

Hydrocephalus ex vacuo. 1 Normal ventricles 2 Enlarged ventricles

Fig. 9
figure 9

: Multiple Stages of Chronic. A Intracranial Atherosclerosis B Old infarct C Leukoaraiosis D Atrophy

4 Clinical and commercial applications on ischemic stroke detection

Computer Aided Diagnosis (CAD) concept was initiated in the 1980s to facilitate radiologists with a second opinion in their routine medical practice. Initially, CAD applications were used for the detection of breast cancer (Huang 2014). With the advancements in the field of computation, especially in Artificial Intelligence, clinically acceptable applications can now be seen in the healthcare market. These applications provide great value in helping radiologists and medical practitioners understand and interpret the data at hand (Harris and Parekh 2019). The literature shows that, at present, computer-assisted and automatic diagnosis is performing better than average practitioners. Table 3 lists some prominent commercially available software that are used for the diagnosis of Ischemic stroke detection.

Table 3 Commercially available software for diagnosis of ischemic stroke

This section provides the relevant background knowledge to comprehend and understand the review and analysis made in this work.

Medical images are a good source for Ischemic stroke detection for non-invasive methods. There are several imaging modalities available in healthcare for body internal visualizations. Advanced imaging modalities are providing more detailed data for statistical and computational analysis. Early detection of disease through medical images like MRI can help in its in-time treatment planning resulting in human life survival or improvement in quality of life. Timely detection of different stages of Ischemic stroke and its treatment can slow down the disease transformation from acute to chronic and is helpful in fast recovery. The literature review shows that no single modality can extract all relevant visual information. Different imaging modalities offer unique insights into various aspects of pathology and anatomy. Some are good for extracting soft tissue information, whereas others are better for high-level feature analysis. For example, X-ray images can better visualize bone fractures, whereas Single Photon Emission Computer Tomography (SPECT) can be used for cardiac imaging, bone scans, and brain imaging.

5 Open source datasets

It is observed that both open-source and proprietary datasets are used in different research studies. Few prominent open source datasets are listed in Table 4.

Table 4 Open access datasets of ischemic stroke

Following Table 5 presents the list of proprietary databases used in literature for the study of Ischemic disease:

Table 5 List of proprietary datasets used in reviewed studies

Tables 5, 6 observe that Benchmark Datasets are limited in size compared to proprietary datasets. Deep learning models normally require many parameters for better generalization, which requires large datasets. Transfer learning (TL) is a useful technique for performing tasks with a small amount of computation using a limited-size dataset.

However, datasets can be annotated more densely to represent different stages of disease, e.g., Ischemic. Available benchmark datasets lack such detailed demarcation, and providing this information can lead to the creation of more specific and detailed models. Adding this additional detail can also help improve the medical image-captioning task.

6 Popular deep learning models

This section reviews some of the popular architectures, that have been deployed for Ischemic stroke detection. Most of these models are based on supervised learning that includes LeNet-5, AlexNet, GoogleNet, ResNet, ZFNet, VGGNet, DenseNet, and variant of fully connected network and Auto-Encoder (AE), Restricted Boltzmann Machine (RBM) and generative adversarial networks. However, researchers have also used unsupervised learning algorithms. One of the main advantages of neural networks is their ability to automatically extract the task relevant features (Bengio et al. 2012).

6.1 Supervised CNN architectures

CNN is a well-known deep learning technique developed by K. Fukushima (Fukushima 1979) where Neural networks consist of multiple layers that are considered to work like human brain neurons. The neurons in CNN are then activated through an activation function\(\:f\). Each neuron generates a response against a specific event. For example, a response to positive or negative values. CNNs have delivered better results in diversified application areas, including medical, engineering, robotics, automobile, computer science, and business. (Fukushima 1979) proposed the two fundamental types of layers in a CNN structure: convolutional layers and down-sampling layers. The convolution layer performs the task of feature extraction. In contrast, the down-sampling layer reduces the dimensions of the extracted features to extract more high-level features and is the modified form of recognition (Riesenhuber and Poggio 1999) (Fukushima 1986). A convolution operation can be represented mathematically as follows:

$$\:\left(f*g\right)\left(n\right)=\sum\:_{m}f\left(m\right)g\left(n-m\right)$$

Where \(\:f\)the image is the function, and \(\:g\) is the filter function.

CNN based models can operate in supervised or unsupervised manner where supervised learning refers to the model training using labelled data. There are number of high-performance supervised CNN architectures, which are successfully applied in stroke disease detection, segmentation and classification. Popular examples of these architectures include LeNet-5, AlexNet, GoogleNet, ResNet, ZFNet, VGGNet, DenseNet, and variant of fully connected networks.

6.1.1 LeNet-5

(LeCun et al. 1995) proposed LeNet-5 architecture which consists of five layers: three convolution layers combined with average pooling, two fully connected layers, and at the end SoftMax based classifier. The first convolutional layer convolves the input image with six different filters and extracts feature map of size 28 × 28 as shown in Fig. 10. In this architecture, 2-subsampling layers provide the advantage of increasing the generalization capability of the network by learning the deep-rooted patterns in an input image. The output of the convolutional layer is then flattened and fed to the fully connected layer.

Fig. 10
figure 10

(LeCun et al. 1995) LeNet-5 Architecture

6.1.2 AlexNet

In 2012,(Krizhevsky et al. 2012) proposed an AlexNet architecture with five convolutional layers by combining max-pooling and three fully connected layers. The Dropout layer prevents overfitting. A fully connected layer, with the function, finds the output probabilities for input data. Researchers have used the Alexnet model for Ischemic stroke detection, classification, and segmentation. (Lo et al. 2021), (Talo et al. 2019).

6.1.3 VGGNet

(Simonyan and Zisserman 2014) proposed VGG network architecture where blocks of computations are formed. Each block contains two convolution layers and a max-pooling layer. The depth of extracted features is increased in each feature extraction layer. Scale jittering, a new data augmentation method, enhances the dataset size. They have presented two variants of their VGG model: VGG 16 and VGG 19. The VGG-16 variant has thirteen convolutional layers, five pooling layers, and three fully connected layers. Various researchers have shown good performance in classifying and segmenting Ischemic stroke using VGG-16 network architecture. (Chung et al. 2020) proposed an architecture by combining 3D VGG and ResNet approaches for the detection of stenosis of patch-oriented and patient-oriented data. Their work focused on rescaling important features using 3D SE-ResNet and attained better results for patch-based method i.e. AUC score of 0.053 and 0.065 AUC using the patient-oriented method. (Chen 2022) proposed a model for acute and haemorrhage infraction detection. In this research transfer learning methods is employed in which three different models CNN-2, VGG-16, and ResNet-5 are trained with 4,7692 images of NCCT. Reported accuracy from VGG-Net achieved is 0.95%. Their study proved that vascular calibre and size of the medical image could be a supportive prognostic for Ischemic stroke detection.

6.1.4 ResNet

Microsoft Research Asia (MSA) proposed a neural network architecture comprising 152 layers with residual connections to propagate information from the previous to the next layer. ResNet delivers satisfactory performance in image data classification, detection, and localization. The research community to solve different tasks has presented a number of variations of the base ResNet model. A hybrid model known as Inception-ResNet improved the training capacity of the Inception model and produced excellent results in Ischemic stroke classification (Suberi et al. 2019). Extreme inception convolution neural network architecture, based on insight divisible convolution layers called Xception, is applied in research to predict Middle Cerebral Artery (MCA) signs in Acute Ischemic stroke, resulting in improved performance. (Shinohara et al. 2020). Literature review shows that researchers have used ResNet and its variants to classify and segment Ischemic stroke. (Hilbert et al. 2019) proposed a model to predict two types of outcomes: good functional output post-stroke (mRS > 2) and good perfusion measured by mTICI > 2b mRS that measures the disability of stroke patients in routine life and, secondly, good reperfusion. This work combined supervised structured receptive fields neural networks (RFNN) and unsupervised auto-encoders (AE) techniques with ResNet. The accuracy achieved by the proposed model is 0.71 (0.62–0.75). (Liu et al. 2018) proposed a residual-structured fully convolutional network (Res-FCN) using a gradient-based method that focuses on training the network using normal images and diseased images separately. This method produced a mean dice coefficient score of 0.645.

6.1.5 Fully connected networks

(Ronneberger, Fischer et al. 2015) proposed U-Net architecture is an extension of a fully connected network that uses contraction paths (with skip connections) for high and low-level feature extraction and expansion path for reconstruction of the image in the desired dimension. U-Net produced better results even for small dataset. (Zhang 2020) proposed Dense Inception U-Net architecture, which combines Inception Residual module with convolutional module. U-Net + + model capture the coarse details by dense convolutional layers using nested skip connections. Literature review shows that many researchers have used U-Net as a base model and designed new network architectures. (Karthik et al. 2019) proposed a deep supervised Fully Convolutional Network (FCN) with Leaky Relu. In this segmentation task for Ischemic stroke, researchers have achieved mean dice coefficient score of 0.70. (Zhou et al. 2019, Kadry et al. 2021) presented dimension-fusion-UNet (D-UNet) with a 2D and 3D combo as an input of convolutions by using a dimension transformation block for stroke lesion segmentation. The produced result is 0.5349 DSC and 0.6331 precision. (Kadry et al. 2021) presented a network architecture VGG-UNET, and the performance of the proposed method is 90.74 Jaccard score, 95.18 Dice Score, and 98.68 accuracy.

6.1.6 GoogleNet

This model was developed by (Szegedy et al. 2015) and is the winner of ILSVRC 2014 with an error rate of 6.67%, which used nine inception modules by stacking them. Literature review shows that researchers have used Google Net for the prediction of Ischemic Stroke. (Chen et al. 2018) presented a multimodal lesion identification framework following an attention mechanism for predicting the lesion presence in 3D DWI images. In this research, extracted brain regions with textual report data are fused and used as input to two models: VGGNet and GoogleNet. Their multimodal technique provided an accuracy of 68% by using VGG-16 + attention layers.

6.1.7 Transformer

Recently, Vision Transformer, with its self-attention mechanism, has demonstrated notable advancements over CNN architectures in tasks such as classification and segmentation in medical imaging. Muhammad Ayoub et al. (Ayoub, Liao et al. 2023) proposed an end-to-end vision transformer architecture that utilizes a self-attention mechanism to capture long-range dependencies between patches of size 16 × 16. Image refinement is achieved through patch embedding using the transformer encoder. Three inputs are then processed: a class token carrying global contextual information, sequences of patches carrying local features, and bounding box coordinates. These inputs are passed to an RNN, which learns to extract important temporal patterns and contextual information, enhancing feature representation and classification accuracy. Wang et al. (2022a, b) presented the METrans network, which features a multi-encoder designed for multiscale feature extraction using CBAM and a transformer. The network concludes with a decoder performing up-sampling to generate the final image.

6.2 Unsupervised deep learning architectures

The unavailability of large-size annotated datasets in the medical domain has created opportunities for unsupervised methods, and researchers are focusing on finding high-performing unsupervised methods(Armya and Abdulazeez 2021). Unsupervised deep learning architectures in (Armya and Abdulazeez 2021) use two step procedure: firstly, a neural network is trained in an unsupervised fashion as a Denoising Auto-Encoder (DAE) or a restricted Boltzmann machine (RBM), and secondly manually labelled data is used for supervised fine-tuning with regular feed forward multi-layer network and skip connections. Commonly used unsupervised deep learning architectures are restricted Boltzmann machines, deep belief networks, auto encoders, and generative adversarial networks.

6.2.1 Restricted Boltzmann machine (RBM)

RBM is a generative algorithm in which all neurons of visible layer are connected with hidden layer whereas interlayer connections are restricted. One can increase the probability by adjusting the parameters (weight and biases) that can be updated in parallel, both in visible and hidden layers. This procedure is very time consuming and learning capability is very slow which can be increased by Contrastive Divergence (CD). In CD, weights and biases are initialized randomly and then updated iteratively. Some of variants of the RBMs are introduced by adding some connection information between visible and hidden information units. Semi-restricted and Temporal-Restricted Boltzmann Machines (TRBM) rely on context information passed from state to state. Different variants of RBMs include mean covariance RBMs, gated RBMs, factored three way RBMs and Robust-Restricted Boltzmann. (Bharathi et al. 2019) proposed a hybrid automatic method for acute lesion detection combing patch-based textual feature extraction (TEF) using Random Forest (RF) with k-means-based unsupervised feature learning (UFL). Their proposed method produced a dice coefficient (DC) of 0.886, precision of 0.979, recall of 0.831, and accuracy of 0.8201. Another hybrid approach to predict the stroke lesion location and cerebral blood flow dynamics using a two-branch Restricted Boltzmann Machine is proposed by (Pinto, Pereira, in which, first, feature maps are extracted based on time-resolved perfusion and blood-flow-dynamics known as parametric Magnetic Resonance Imaging maps. This parametric MRI maps data is encoded using a Restricted Boltzmann Machine to obtain structural features. These unsupervised structural features are fused with the standard parametric maps that are obtained by a supervised model. Gated Recurrent Neural Networks gather long spatial context information. Their work produced a Dice score of 0.38, a Hausdorff Distance of 29.21 mm, and an Average Symmetric Surface Distance of 5.52 mm on ISLES 2017 Data.

6.2.2 Auto encoders

Autoencoders act as a generative model that is subsequently used for feature extraction through a coding process using unlabeled data. A multilayer perceptron (MLP) in auto-association mode can perform the tasks of reducing dimensionality and compressing data. The research community that may include sparse autoencoders, contractive autoencoders, denoising autoencoders, convolutional autoencoders and zero bias autoencoders presents different variants of basic autoencoder. As an example, (Praveen et al. 2018) presented an unsupervised feature extracting module based on stacked sparse auto encoder for segmentation of sub-acute Ischemic stroke lesion. An RFNN-ResNet architecture is presented in (Hilbert et al. 2019) by combining supervised and unsupervised learning using CTA Images. The model is a combination of autoencoder and Resnet where the first part learns best features for representation and ResNet which trained itself without updating weights. RFNN-ResNet produced accuracy mRS Avg (range) 0.65 (0.55–0.72) and mTICI - Avg (range) 0.71 (0.62–0.75). In 2020, Akrami et al. in (Akrami et al. 2020) proposed a Variational Auto Encoder (VAE) for brain lesion detection that is trained initially using the BRATS 2015 dataset (VAEbr) which produced an accuracy of 0.9, then they applied transfer learning on the pre-trained VAE model (PreVAE) and achieved an accuracy of 0.93.

6.2.3 Generative adversarial networks (GAN)

The GAN architecture was first presented by Ian Goodfellow (Goodfellow et al. 2014). A GAN consists of two major components: generator and discriminator. The generative model works as an unsupervised model that sums up the distribution of input variables to generate new examples in the input distribution, and the discriminative model works like a supervised model to classify the given input. GANs are mostly used for generating synthetic data, semi-supervised learning, increasing image resolution, text-to-image generation, image-to-image generation, image in-painting, and a closer relationship between inputs and outputs explored by conditional GANs.

Deep Convolutional Generative Adversarial Networks (DCGAN) are more suitable and stable models for image processing tasks (Radford et al. 2015). Other variations of GANs are CGAN, WGAN, InfoGAN, BEGAN, and CycleGAN, which provide the advantages of manageable training, resolving gradient vanishing problems, producing expressible sample features, producing sample data variance in a better way, and eliminating the need for pair training set. In medical applications like medical image augmentation, medical image classification, and medical and biological image analysis, the use of variants of generative adversarial networks has produced good results compared to traditional neural network methods (Chen et al. 2021). Wang (Wang et al. 2020) proposed a method in which the CTA image feature set and CTP images are concatenated to obtain synthesized pseudo-DW images. In the segmentation task, switchable normalization and blocked structure played a role in generating better results. The results produced through this work are Dice = 0.51 ± 0.31, precision = 0.55 ± 0.36, and Recall = 0.55 ± 0.34. A similar architecture named MPC-GAN, et al. 2020) is proposed for Ischemic and hemorrhage lesion segmentation in which contextual information like location intensity variation and distance map is used as input for the generator and discriminator. Afterward, validation is done on NCCT images. A dense multipath UNet model is used in the generator and discriminator for segmentation. This MPC-GAN model performance is DC = 70.6 ± 12.4 for segmentation.

6.3 Performance metrics

The performances of algorithms used for the tasks of segmentation and classification are measured through well-defined evaluation metrics. Common evaluation metrics for classification and segmentation tasks include Accuracy, Precision, Jacaquard Similarity Coefficeint, Recall, Sensitivity, Specificity, Dice Similarity Coefficient, F1-Score, Average Hausdorff Distance, and Average Symmetric Surface Distance. Mathematically, these evaluation measures are based on four parameters represented in the form of confusion matrix:

  • True Positive (TP): The actual diseased instance is classified as a diseased instance.

  • True Negatives (TN): A non-diseased instance is classified as a non-diseased instance.

  • False Positive (FP): The classifier mistakenly predicts healthy instances as a diseased instance.

  • False Negatives (FN): A diseased instance is misclassified as non-diseased.

Accuracy represents the portion of correct classification of diseased and non-diseased instances.

$$\:Accuracy=\frac{TP+TN}{TP+TN+FP+FN}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(1\right)$$

Sensitivity or Recall, or TPR represents the rate of prediction of true positive cases concerning actual positive cases.

$$\:Sensitivity=\frac{TP}{TP+FN}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(2\right)$$

Specificity or TNR represents the rate of true negative predictions with respect to the actual negative cases.

$$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:Specificity=\frac{TN}{TN+FP}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(2\right)\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$$

Precision or PPV is the proportion of true positive prediction with the actual positive cases

$$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:Precision\left(PPV\right)=\frac{TP}{TP+FP\:}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(3\right)\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$$

In the task of binary classification and segmentation, F1-Score and Dice coefficient (DSC) is treated as a most commonly used performance evaluation metrics. The Dice Similarity Coefficient (DSC) measures the amount of overlap between true positive sets and predicted values, which will show importance in the task of segmentation.

$$\:\:DSC=\frac{2TP}{2TP+FP+FN}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(4\right)\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:$$

F1-Score, is a harmonic mean of precision and recall. It fuses not only false positives but also false negatives into a single benchmark evaluation metric.

$$\:F{1}_{score}=\frac{2}{\frac{1}{precision}+\frac{1}{Recall}}=\frac{2*precision*recal}{precision+recal}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(5\right)$$

Average Symmetric Surface Distance (ASSD) is used to measure averages of all the distance between the ground truth and actual image.

$$\:ASSD=mean\left(D\left(MS,GT\right)\right)$$

Hausdorff Distance is used to measure the boundary distance mismatches between the segmented image (actual image) and Ground Truth (GT).

$$\:AHD=\:\:\:\:\:\frac{\begin{array}{c}\frac{d\left(Gto\:S\right)}{v\left(G\right)}+\frac{d\left(S\:to\:G\right)}{v\left(S\right)}\end{array}}{2}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\left(7\right)$$

7 Prominent CNN-based methods for ischemic stroke detection

This section reviews the supervised and unsupervised learning methods for classification and segmentation tasks related to Ischemic disease.

7.1 Supervised binary classification

Classifying Ischemic stroke images into positive and negative groups is known as a binary classification task. Literature review shows that researchers have done binary classification using supervised learning to detect stroke based on different neuroimaging techniques. In 2017, (Rezaei et al. 2017) published a deep neural network-based model for modularized and residual learning on various brain diseases, obtaining 95% accuracy.

Later, in 2019, a DNN-based architecture was presented by (Soltanpour et al. 2019) for acute stroke prediction using a pixel-level classifier. In this work, information related to the texture of stroke lesions is first gathered from CTP maps, and the pixel-level classifier is used to differentiate between healthy and disease images. This network adopts a method of layer-wise hierarchical feature learning in which features are learned from higher to lower levels. The reported performance of this work is DSC = 71.3%, Recall = 73.6%, and VS = 82.1%.

Using 3D images, a CNN architecture is presented by (Öman et al. 2019) for acute infarct prediction using three different input parameters of CTA, cortical hemispheric, and a comparison of NCCT data of 60 patients. The approach chosen in this work focuses on clinically related brain sub-region segmentation by anatomical region visualization and voxel overlapping. This method achieved high performance when images were taken from different modalities, i.e., CTA + Cerebral Hemispheric + NCCT, and produced a DSC score of 0.72.

Based on our study, LeNet-based methods have shown exemplary performance in classifying ischemic stroke. As an example, (Gaidhani et al. 2019) proposed a classification and segmentation model for brain stroke prediction where SegNet is used for segmentation and LeNET for classification that achieved an accuracy of 96% on 629 images of the Atlas dataset. The classifier-segmenter network (CSNet) proposed (Kumar et al. 2020) designs an architecture combining U-Net and fractal Net. This network consists of the classifier network fed with disease image slices and the segmenter network that extracts 3D spatial information for segmentation. The network has produced 83% accuracy (dice coefficient), 79% precision, and 89% recall score.

(Lo et al. 2021) proposed a DCNN architecture for detecting acute Ischemic stroke using Alexnet, Inception–v3, and 101 with Softmax layer. The last classifier achieved an accuracy of 97.12% using NCCT images of 96 patients. The NCCT images are acquired after 6 h of stroke occurrence that help in early-stage Ischemic stroke detection. Another work using ResNet is proposed by (Pan et al. 2021) for early-stage lesion detection using NCCT and DWI modality. Variations in images or overlapping issues are resolved by image registration. In this research, the map investigation model is employed to integrate multipath data and post-processing is used to remove incorrectly identified data points. Different size patch classification is used to gather the neighbourhood information to enhance network performance. The achieved accuracy of this approach is 75.9%. (Cai et al. 2020) have used resting-state fMRI to detect the functional connectivity of the brain tissues. The modality is helpful to analyze the newly generating cells or better understanding anti-NMDAR encephalitis in Ischemic stroke patients.

A 3D CNN based model is presented by (Zeng (2022). ) For the prediction of different Ischemic, stroke NIHSS stages using DWI, and Flair images after the hyper-acute. Images obtained through different scanners are normalized at the same intensity scale by Minimum-Maximum method and Z-score. The proposed model consists of nine convolutional, five max pooling, and three fully connected layers. The performance of this network is 0.895 in terms of Accuracy. The data is tested on Desnet121, Resnet 18, and ResNetXt models.

Another semantic segmentation-guided detector network (SGD-Net) is proposed by Wei et al. (Wei et al. 2022) which used a proprietary dataset of 261 MRIs to detect acute Ischemic stroke. In this work, a two-stage model is employed for lesion segmentation and binary classification.

The authors have also presented another method called SGD-Net Plus using brain atlas images.

In 2022, (Yalçın and Vural 2022) proposed a D-Unet architecture for binary classification of Ischemic or haemorrhage stroke. The accuracy of this proposed model is 98.5% in haemorrhage and Ischemia classification.

Transformer based models have recently shown high performance in various tasks related to medical image processing. A transformer-based multimodal network that creates a fusion of clinical and NCCT data to predict. Samak in (Samak 2023) proposed a transformer-based multimodal network that creates a fusion of clinical and NCCT data to predict the functional outcomes of stroke. The clinical information include gender, age, glucose level, and hypertension. This study has achieved an accuracy score of 0.85.

7.2 Supervised multi-classification

Ischemic stroke can be divided into multiple classes that include hyper-acute, acute, sub-acute and chronic. During this study, it was observed that a comparatively small number of researchers focused on the multi-classification problem.

A two-pathway architecture using a convolutional neural network for lesion segmentation is proposed by (Dutil et al. 2015). Each path is responsible for local and global feature extraction. The authors used an effective method of handling class imbalance through two-phase model training. In the initial phase, patches from multi-class data are extracted for model training. In the second phase, by understanding the nature of imbalanced class data, only the output layer of the model is trained (weights of all other layers are frozen). The achieved performance of this network is ASSD = 8.92.

A framework by using AlexNet, Vgg-16, ResNet-18, ResNet-34, and ResNet-50 pre-trained models to automatically categorize MR images into different classes like normal, cerebrovascular, neoplastic, degenerative, and inflammatory diseases is presented in (Talo et al. 2019). The comparative results of AlexNet architecture are lower as compared to ResNet-50. This work can be improved by simultaneous constrained matrix and tensor factorization to extract unique properties from images. (Takahashi et al. 2019) used the AlexNet to identify hypoattenuation in the lentiform nucleus and insula regions. This work achieved an overall average sensitivity of 87.5% with overall average specificity is 90%.

BrainNet(BrN) proposed by (Tripura et al. 2023) for classifying Ischemic and Hemorrhage stroke. This technique is a composition of CNN and SVM (Support Vector Machine) with dropout layer to address the overfitting and improve the generalization. The Accuracy achieved in this work is 91.91%.

(Yousaf et al.2023), proposed a DL based multi-class disease detection method which utilized BRATS 2015 and ISLES 2015 datasets for the detection of Ischemic and brain tumours employing UNet as a base model. In this work, a feature fusion mechanism is used to enhance model performance. The reported accuracy of this work on chosen datasets is 99.56%.

Two transfer learning models Inception-v3, EfcientNet-b0 and one self-derived model were used by (Lee et al. 2023) for the classification of anterior circulation infarct (ACI), posterior circulation infarct (PCI) and normal stroke. Grad-CAM Technique provides visual explanations and calculates a score based on a gradient with a coarse localization map, which points out the important regions of the image to make a prediction. The reported performance in term of accuracy is 86.3%, F1-Score (86.2%), and kappa score is achieved from Inception -v3 Model.

7.3 Supervised binary segmentation

Image segmentation is the task of extracting region of interest (ROI) from the images. A number of segmentation methods have been developed by the researchers which may include thresholding, region-based segmentation, and model-based methods. Literature review reveals that different research works show clinically acceptable performance for Ischemic Stroke Segmentation.

In 2018, ((Zhang et al. 2018) proposed an automatic segmentation technique based on deep learning known as 3D FC-Dense Net model that works on multi-scale contextual information. This work is inspired by the work of (Yu et al. 2017) who proposed a 3D U-ResNet, and (Chen et al. 2018) which focused on VoxResNet for brain segmentation work. The proposed 3D FC-DenseNet outperforms the selected models by nearly 2% in DSC and achieves the best sensitivity, lesion-wise precision, and F1 score. In the same year, (Liu, Cao et al. 2018) proposed a residual-structured fully convolutional network (Res-FCN) that used a gradient-based method that focuses on training the network using normal images and diseased images separately. This method produced a mean dice coefficient score of 0.645.

Valverde in (Clerigues et al. 2019) proposed an automated system for acute infarct core segmentation. They evaluated their algorithm using 94 images of CT and CTP taken from the ISLES 2018 challenge dataset. To resolve the class imbalance issue, patch sampling with cost balancing loss function is used. The proposed method improves the segmentation results by regularization in the training method and studies both hemispheres symmetrically by symmetric modality augmentation.

In 2020 (Chen et al. 2020) proposed a residual network for detecting acute Ischemic stroke by fusing the images produced through different modalities taken from the Ischemic Stroke Lesion Segmentation (ISLES) 2015 challenge dataset. The fusion of modalities is used to reduce the effect of distortion and noises in the images and improve the quality of the image. This work uses six different neural networks (U-NET, EDD-Net, FCN, FC_ResNet, Res-CCN, FCN_ResNet) for disease segmentation. Res-CNN performed the best with DSC score 83.94 (± 2.46), HD = 1.61 (± 0.24) on single modality DWI images, and DSC = 88.43 (± 2.43) on multimodality DWI +(DWI - T2). In 2020, EDDNet (based on VGG) is proposed by (Liu et al. 2020) which induced many false positives for small lesions. To improve this deficiency of EDD Net, another network named MUSCLE Net is proposed which focuses on the precise detection of lesions using small patches of input images. Their proposed model provides good performance on both small and large lesions. The model reported mean performance is 67% in terms of the DSC.

Based on U-Net, another model is presented by (Liu (2020b) named DRANet for Ischemic and WMH stroke segmentation and quantification using an attention module to extract high-quality features. This model, DRANet, produced DSC 76.39 on Ischemic stroke segmentation and 72.83 on WMH segmentation. (Cao, Liu et al. 2020a, b) proposed a weakly supervised method using ResNet, which automatically detects Acute Ischemic Stroke and haemorrhage Infraction lesions based on DWI images.

A multiscale segmentation model proposed by (Li et al. 2021) is based on UNet architecture for cerebral infarction segmentation. The model maps the high-level and low-level features to acquire a richer feature set that outperforms accurate pixel-level positioning. The dice score of this multiscale UNet segmentation model is 0.86 ± 0.04.

El-Hariri (El-Hariri 2022) proposed a model based on 3DCNN and U-Net which is a self-adaptive technique of CNN for Early Ischemic Changes (EIC) segmentation on Non-contrast Computed tomographic images and compared the segmentations with human expert annotations. This model performance in terms of DSC is 39.8%. METrans, a fusion network of local and global features extraction for Ischemic stroke segmentation by using the ISLES 2018 dataset is proposed by Wang in 2022 (Wang 2022). The performance of this network in terms of dice is 67% on ISLES data and 93% on ATLAS dataset.

The fusion prediction model CA-Unet (an ischemic stroke segmentation using CTA Image and prediction model using text data) is proposed by (Zhang et al. 2023) in which a multi-scale loss function is used to address the issue of loss of feature detail in down sampling of images. An encoder-contracting path extracts features and an expanding path combines them into a feature map. At multiple scales, a better fusion of features and a multi-loss joint training scheme is implemented. In a multi-loss joint training scheme, an extra convolutional layer is followed by expanding paths of all four-scale layers. Each scale layer calculates different weights for the losses and is finally combined for final weight calculation. The fusion prediction model comprised an image feature-extraction sub-model and the machine learning-based model consisting of medical records and history data. The achieved accuracy is 89.74% from this fusion prediction model.

A fusion framework of CNN and transformer for instantaneously local and global feature extraction is proposed by (Xu and Ding 2023a, b) exploiting AISD dataset for Acute Ischemic stroke segmentation. In this Research originally, CNN Encoder extracts features that are augmented by CBAM to generate accurate feature maps. Afterwards, transformer encoder is used to compensate the issue of limited respective field and gather global dependencies. This method outperformed with Dice coefficient of 58.66% for acute infarct delineation.

7.4 Supervised multi-class segmentation

Researchers have designed the segmentation methods using 2D and 3D medical images. Clerigues in (Clèrigues et al. 2020) proposed a 3D asymmetrical auto-encoder network with residual connection by keeping in view the anatomical and pathophysiology of stroke lesions for acute and sub-acute segmentation. First, the symmetric augmentation method is used for learning features related to the brain hemisphere. Small sized patches (24 × 24 × 16) sampled with the balanced patch sampling strategy of ISLES 2015 data are passed as input to resolve the issue of class imbalance. In balanced patch sampling strategy, equal number of patch samples of healthy and unhealthy images are selected and the class that has a smaller number of patches as compared to others is balanced by augmentation. The performance of proposed network is DSC = 0.84 ± 0.10, PPV = 0.82 ± 0.15, Sensitivity = 0.89 ± 0.06 and HD = 20.7 ± 13.9. Lesion segmentation methods which can carry out correlation studies between lesion location and chronic disability status would make this estimate even more informative in the treatment decision context.

In 2017, Kamnitsas (Kamnitsas et al. 2017) proposed a 3D CNN network for lesion segmentation consisting of 11-layer deeper network that uses small convolution kernels. The use of 33small kernels with deep network helps to handle the issue of overfitting originating in deeper networks, large kernel and high weights. At training time, selection of training images with 50% probability being set of foreground or background voxels that resolves the issue of class imbalance with dividing of training samples from various classes. The best point of this method is the use of a parallel convolutional path for handling multi-scale images containing local and global contextual information. The ensemble model using three networks, combined with the fully connected CRF outperformed in terms of DSC is 59%.

Another multi-class work called NVAUTO is presented by (Siddique et al. 2022) where they trained the model using SegResNet (https://monai.io/apps/auto3dseg) for semantic segmentation of acute and sub-acute stroke. They have used the ISLES 2022 challenge dataset which consists of images taken through ADC and DWI modalities. The achieved dice score by this method is 0.824.

It is observed that most of the methods developed for Ischemic stroke detection and segmentation work with small data sets. The neural network architectures usually overfit on these datasets resulting in weak generalization when deployed. This limitation can be controlled by changing the capacity of the neural network through updating the structure of the network i.e.: the value and number of weights of the network. Several strategies can be used to avoid overfitting that may include the weight decay coefficient (Nakamura and Hong 2019), weight regularization methods (Ghiasi et al. 2018), dropping out high-value weights (Ghiasi et al. 2018) and early stopping (Prechelt 1998). Other problems associated with datasets include class imbalance and quality of data. In different studies, intensity normalization and registration are performed to improve the quality of neuroimage which adds positive effect on the performance of Ischemic stroke detection. Neuroimaging and clinical factors both are used in Ischemic stroke detection. However, the selection of neuroimaging modalities is still debatable. A fusion of different MRI modalities is more likely to be efficient for all stages of Ischemic stroke detection. 2D CNN performance is comparatively better than 3DCNN for local features. 3DCNN model outperformed with high computational cost by capturing 3D information of the brain Area.

Table 6 presents supervised learning methods used for binary and multi-classification & segmentation methods for Ischemic Stroke.

Table 6 Supervised deep learning methods used for binary/multi-classification and segmentation

7.5 Unsupervised & semi-supervised learning methods for binary classification

An unsupervised Deep learning model extracts the task-specific representation from neuroimaging data using unlabelled data for feature learning and a limited amount of data for fine-tuning the parameter. Literature review reveals that researchers have focused on unsupervised learning methods to address the limited availability of labelled data.

A hybrid approach to predict the stroke lesion localization using a two-branch Restricted Boltzmann Machine (RBM) is proposed by Pinto in (Pinto et al. 2021). In this method, feature maps are extracted based on time-resolved perfusion and blood-flow dynamics, also known as parametric MRI maps. RBM structural features are computed using these parametric MRI maps and the structural features are then fused with the standard parametric maps, which are obtained through supervised learning. Gated Recurrent Neural Networks (GNN) gathers long spatial context information. Their work produced a dice score of 0.38, a Hausdorff Distance of 29.21 mm, and an Average Symmetric Surface Distance of 5.52 mm on the ISLES 2017 dataset.

Zhao in (Zhao t al. 2019) proposed a DPC-Net for the detection of acute Ischemic stroke by employing weakly labelled subjects and in their work, they attained a mean dice coefficient score of 0.642, F1 score of 0.822 on a small dataset containing images of 150 patients. The first three layers of (DPC-Net) were selected from the VGG-16 network which has a positive influence on results.

Hilbert in (Hilbert et al. 2019) presented an RFNN-ResNet architecture by combining supervised and unsupervised learning using CTA images. The model is a combination of autoencoder and ResNet where the first part learns the best features for representation and ResNet trains itself without updating weights. RFNN-ResNet produced accuracy mRS Avg (range) 0.65 (0.55–0.72) and mTICI - Avg (range) 0.71 (0.62–0.75). An SSD algorithm using multilevel features extracted through VGG Net is presented in (Zhang, Xu et al. 2021) which is a combination of three DL object recognition models including YOLOV3, Faster R-CNN, and SSD. The achieved score through this model in terms of precision is 89.77%.

7.6 Unsupervised & semi-supervised learning methods for multi- classification

For multi-classification, Olivier et al. in 2019 (Olivier, Moal et al. 2019) have proposed a UD-Net architecture that works on multi-classification to predict acute stroke, supratentorial cerebral infarct, and stroke condition after 72 h. This unsupervised learning method works in two phases. In the first phase, whole images taken through the DWI modality are fed to the network and predictions are generated. In the second phase, the disease part of the image is taken as a patch, and patch + whole images are fed as input to the classifier. The performance of this network in terms of dice is 0.711 ± 0.199.

In most research, many deep neural networks are trained from scratch. However, it is not always feasible, especially in the case of the medical field, due to the availability of limited data. The use of a pre-trained model for initialization and fine-tuning on another dataset is known as transfer learning (TL). Moreover, general features at the lower layers are transferred from one application to another application that can benefit in classification tasks for transfer learning (Torrey and Shavlik 2010). (Lo et al. 2021) presented a comparative work on different DCNN architectures i.e. AlexNet, Transferred AlexNet, Transferred ResNet101, and Transferred Inception-v3 based on non-contrast computed tomographic (NCCT) images which produce accuracy of more than 90%.

A transfer learning model using CNN, VGG-16, GoogleNet, and ResNet-50 for the classification of stroke using the posterior part of brain CT images is presented by (Suberi 2019) where ResNet-50 achieved 100% accuracy with good processing efficacy using 400 CT images. (Cetinoglu et al. 2021) used a transfer learning approach for vascular territory stroke classification using MRI image data of 1717 positive slices. In this research, MobileNetV2 and Efficient Net-BO pre-trained models are used. The F1-Score of MobileNetV2 is 96%.

Active learning is a semi-supervised method in which a small amount of labelled data and a large amount of unlabelled data are used to train the model. Few researchers have used annotated data and unlabelled data to get cerebrovascular features of Ischemic stroke. (Raza 2022) Proposed an active learning method for calculating social determinants of health (SDOH) from text in which utilizing an NLP method for assigning each token to a label for extracting clinical factors that affect the disease intervention and treatment. A combination of three neural networks is used to compute determinant features of health from clinical text and labelled data. These models are transformer based BILSTM which captures complex semantic relationships in text data combined with CNN & Bert to extract local features & SDOH features. The performance achieved in terms of F1-Score is 92.98%.

Reinforcement learning methods based on network surgery to produce good results on a limited dataset are investigated in (Tetteh et al. 2021). Class imbalance issues are resolved with cascaded classification that is a better approach in multi-label data. This work can be improved by generative adversarial methods to increase the dataset size. In this work, first digitally subtracted angiography (DSA) based collateral flow grading is obtained and then MR perfusion images utilized for blood flow grading. Eventually these two are merged to get better diagnostic information. For the prediction of collateral flow grading or automated extraction of the region of interest (ROI), the deep reinforcement learning method is adopted in an acute ischemic stroke patient. For assurance of their work, denoising autoencoder (DAE), histogram of oriented gradient (HOG), and local binary pattern (LBP) are used for feature extraction. For further prediction of collateral flow grading random forest (RF), K-nearest neighbour (KNN), support vector machine (SVM), and convolutional neural network (CNN) classifiers are used. The performance of automated ROI is 0.72 (± 0.05).

7.7 Unsupervised & semi-supervised learning methods for Binary/Multi segmentation

A limited number of researchers have focused on the segmentation of Ischemic stroke using unsupervised and semisupervised methods. (van Voorst et al. 2022) Developed a Ischemic and hemorrhage stroke segmentation system using a Generative Adversarial Network. A difference map is used to compute the difference between follow-up and NCCT lesion scans, which are further used to perform lesion segmentation. The performance of this network is measured in terms of DSC and is 0.321 in the case of infarct detection and 0.59 in the case of hemorrhagic stroke detection.

(Praveen et al. 2018) presented an unsupervised feature extraction module based on a stacked sparse autoencoder for segmentation of sub-acute Ischemic stroke lesions. In this work, use of SVM is a good choice for the limited data set. However, normally deep learning models require a large number of parameters to be tuned with given data. The use of a large number of parameters with conventional machine learning methods can easily lead to overfitting. The increase in dataset size can improve the model performance through better generalization.

(Kaya 2023) proposed a transfer learning model for the prediction of Ischemic and hemorrhage using ResNet 34 as a backbone model. The performance of this model for segmentation in terms of Intersection over Union (IOU) is 92.01% for hemorrhagic and 82.22 for Ischemic stroke prediction. An unsupervised segmentation was presented by (Nazari-Farsani et al. 2020) for automated segmentation of AIS lesions. The areas appearing bright on DWI and dark on ADC were delineated as lesions in the input dataset. In this research, the Crawford-Howell t-test method is adopted to compare a single case versus a population of control. The achieved mean DSC through automated segmentation is 0.58. A support vector machine is used as a classifier to discriminate between stroke and non-stroke. The reported accuracy for classification is 73%.

A Mutation Model integrated with a Generative Adversarial Network(GAN) is proposed by (Ghnemat et al. 2023) for Ischemic stroke segmentation. In this research, synthetic data is generated by mutation process and distance map to resolve the issue of limited data. The distance map preserves the contour of the brain and reduces the duplication of intensity values. Moreover, a set of patches belonging to the image is passed to the GAN Model. GAN using a shared module for segmentation and discrimination is helpful to reduce the complexity of the end-to-end model. The mutation model boosts the dice coefficient of the GAN model by 2.54%.

(Kuang et al. 2020) Used contextual information, which is compute-intensive and time-consuming. This method cannot produce good results if it uses a limited dataset. However, increasing the sample size can affect its performance.

Table 7 presents unsupervised works for binary and multi-classification and segmentation of Ischemic stroke.

Table 7 Unsupervised learning methods used for Binary/Multi-classification and segmentation of ischemic stroke

7.8 Current trends

With advancements in research, modern deep-learning models now use CNNs for local feature extraction and employ transformers to capture long-range dependencies.

Zhixiang Xu et al. (Xu and Ding 2023a, b) proposed a model that employs CNN with spatial and channel attention to focus on the most relevant parts of the image and extract precise local features. The transformer encoder, using positional embedding, captures global features. The inputs from the transformer and CNN are fused to create an accurate feature map. Finally, a decoder is used for the automatic segmentation of AIS infarcts. The fusion model achieved a Dice coefficient (DC) of 58.66% for AIS infarct segmentation.

Zelin Wu et al. (Wu Zhang et al. 2022) proposed the multi-scale long-range interactive and regional attention network (MLiRA-Net). This model consists of two main components. The initial patch partition block uses CNNs to extract local features, while the STR block, a key strength of this model, is used for multi-level scaling. The STR subsampling interactive transformer (SiTR) incorporates a dimension attention mechanism to extract channel-level features, enhancing interpretability and feature representation. Finally, The FIP uses a cascaded up-sampler to restore the original image resolution and combines the encoded and interpolated features. This model outperforms the TransUNet model (Chen et al. 2021).

Tingting Li et al. (Li, An et al. 2024) proposed SrSNet, a two-stage segmentation framework for stroke segmentation. This framework consists of a coarse model and a refined model, which have the advantage of integrating local and global features. The Symmetrical Attention Block (SAB) is designed to measure the differences in feature maps between various symmetrical regions, which is crucial for accurately locating ischemic strokes. SrSNet achieved a recall of 83.80% on the ISLES’22 dataset.

A hybrid CNN model for segmenting acute ischemic stroke lesions was proposed, by Hulin Kuang et al.(Kuang et al. 2024) integrating parallel CNN and transformer encoder components. An interaction block facilitates circular feature exchange between the CNN and transformer, effectively merging features extracted by both architectures. To focus on bilateral differences in the brain, a CNN decoder calculates these differences and produces the final output image, resulting in precise segmentation. This model achieved a Dice score of 61.63 ± 20.07 on the AISD dataset and a private dataset.

Ostmeier et al.(Ostmeier et al. 2024) proposed a deep learning model for the segmentation of ischemic stroke in non-contrast CT studies. The first part of the algorithm selects a segmentation image based on a majority vote by experts, while the second part randomly selects images from separate individual experts. The model is built on the U-Net backbone. This work surpasses inter-expert agreement, demonstrating superior performance in comparison.

Banerjee et al.(Bal, Banerjee et al. 2024) proposed a two-pathway technique for acute and sub-acute stroke segmentation. One pathway uses small 3D kernels with small patches to extract local features, requiring fewer weights for training. The other pathway extracts global information using large patches with large kernels. This model was evaluated on the ISLES2015 SSIS and SPES datasets, achieving a Dice score of 0.90 ± 0.08 for acute stroke and 0.87 ± 0.10 for sub-acute stroke.

A transformer (ViT) model was used by Abbaoui et al. (Abbaoui et al. 2024) for the binary classification of ischemic stroke. The model consists of transformer encoder blocks that include Layer Normalization, Multi-head Attention, and Multi-Layer Perceptrons (MLPs), effectively capturing dependencies within the images. This model achieved an accuracy of 97.59%.

Wang et al. (Wang et al. 2024) proposed a 3D deep learning model for delineating acute ischemic stroke, based on SwinUNETR. This model features a self-attention module as an encoder and a convolutional-based decoder. When evaluated on the AISD dataset, it achieved a Dice score of 0.619.

Table 8 presents different supervised deep learning models that have shown high performance for ischemic stroke detection.

Table 8 Eminent performance achiever supervised deep learning methods & descriptions

Figure 11 presents a comparative view of different deep learning models used for Ischemic stroke detection.

Fig. 11
figure 11

Supervised Deep learning models’ performance on Ischemic stroke detection

Table 9 discuss different unsupervised deep learning models with high performance for Ischemic stroke detection.

Table 9 Eminent performance achiever unsupervised deep learning methods & descriptions

The prominent performance of different unsupervised deep learning models for Ischemic stroke detection is mentioned in Fig. 12.

Fig. 12
figure 12

Unsupervised deep learning models’ performance on Ischemic stroke detection

8 Recommendations for researchers

This study analyses various approaches, applications, deep learning techniques, and critical effects of different subtypes of Ischemic stroke on patient treatment. The study focuses on the detection and analysis of Ischemic stroke using medical images and deep learning methods. Fully automated solutions are becoming important and acceptable for practitioners gradually. It has been observed that many of the developed disease detection models target single or few sub-classes of stroke. It is the requirement to design an integrated model that can focus on more sub-classes of stroke at a time using publicly available datasets. Stage-wise identification of the disease under consideration could also be achieved using such integrated models. In the early phase, the system will detect the most suitable modality for disease detection leading to section-wise study of brain images. The section-wise study intends to detect early parenchymal signs, increased mass effect depth calculations, and compression of tissues, which will finally detect the category of Ischemic stroke i.e. acute, sub-acute, and chronic.

In the case of cerebrovascular disease, physical activities play a vital role in changing negative outcomes of vascular risk aspects like age-wise white matter differentiation, reduction in hypertension, geographic area-wise study of stroke patterns due to economic circumstances, genetic variations, and varying lifestyle. These parameters can help reduce the risk of stroke and improve the recovery process. The performance of stroke prediction systems can be improved by considering different study variations, such as gathering data from multiple scanners and studying the anatomical structure of brain regions and stroke pathophysiology. Identification of thrombotic stroke and embolic stroke through pathophysiology and correlation comparison with medical images make a better automation system for stroke (including its sub-types) identification, segmentation, and classification. The complexity of the medical image analysis is increased when the lesion size is small or multiple strokes are present. Currently, publicly available datasets do not cover all stages of stroke and their types and subtypes on a single scan. For better analysis of images, spectral feature selection methods can be used to find relevant features in mixed datasets. The hierarchical method of feature extraction can help in identifying the main type and then its subtypes. Most of the present methods use downsampling initially (in encoder blocks), however, the use of upsampling (to make prominent small lesions) can help to extract supporting features. Early and small lesion detection methods can improve the performance of the system, finding the hypoattenuation on the whole brain in different onset times and atlas-based neuro findings in neuroradiology which can be utilized to produce more generalized results.

9 Critical analysis/challenges

Anatomical information, such as the size and location of a stroke, is acquired using structural data acquisition methods like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). In contrast, functional methods such as functional MRI (fMRI), Positron Emission Tomography (PET), and CT perfusion (CTP) highlight altered brain activities, perfusion, or metabolic changes. Integrating both structural and functional data for the diagnosis of ischemic stroke presents several challenges Aligning datasets with different spatial resolutions poses a significant challenge. Another difficulty is synchronizing static anatomical data with dynamic brain activity over time. Designing a unified framework that can handle both segmentation and signal processing simultaneously requires sophisticated algorithms and expertise. Additional challenges include interpretation difficulties, technical and logistical constraints, cost and accessibility issues, and the need for patient cooperation.(Erdoğan et al. 2024). Medical imaging modalities encompassing structural and functional techniques can integrate with genomic and proteomic data. This integration enhances the diagnostic process and reveals how diseases manifest and progress at both molecular and anatomical levels Adopting standardized protocols and reference parameters presents a challenge that not only improves the reliability of the diagnostic process but also facilitates the validation of new imaging technologies in healthcare (Van Den Brink et al. 2023). Current benchmark datasets for ischemic stroke prediction demonstrate notable successes but face significant limitations due to their narrow focus on specific biomarker types. While clinical biomarkers, such as disability measures and clinical outcomes, provide valuable insights into patient recovery and prognosis, imaging biomarkers like lesion outcomes, segmentation analysis, and core and penumbra separation are crucial for understanding structural brain changes. Molecular biomarkers, such as those related to clot detection and characterization, offer insights into the biological mechanisms underlying stroke. However, most existing datasets are siloed, focusing exclusively on clinical, imaging, or molecular data, limiting their integration scope. This lack of multi-modal datasets hinders the development of comprehensive models that could leverage the combined strengths of all biomarker types for more accurate and generalizable ischemic stroke prediction.

As we discussed the current limitations and successes of deep learning, Models analysed that the deep learning model demonstrated high accuracy and efficiency in detecting ischemic stroke. Once trained, these models are scalable and can quickly process large volumes of data, making them ideal for high-throughput screening and continuous monitoring. The incorporation of transformers in deep learning models enhances their ability to capture long-range dependencies between images, facilitating a comprehensive analysis of stroke-related abnormalities. However, these models are limited by the availability of data and the lack of clinical validation across multiple centers. Integrating imaging technology with other modalities, such as genomics or proteomics, could improve diagnostics but poses significant challenges. These challenges include the complexity of integrating diverse types of information with varying scales, resolutions, and formats, which can be computationally intensive. Achieving this integration requires advanced infrastructure and equipment. Additionally, the black-box nature of deep learning models creates challenges related to explainability, posing significant hurdles for researchers.

This review explores the recent advancements and state-of-the-art methodologies for ischemic stroke detection, segmentation, and classification, focusing on three primary objectives: critical evaluation of clinical and commercial applications for ischemic stroke detection, segmentation and classification; comparison of various performance metrics and their usability from diverse perspectives; and an examination of recent advancements and future directions in computational solutions for the disease. The analysis highlights that current commercial applications address specific disease characteristics, which can be extended to integrate additional disease characteristic parameters. These perspectives are grounded in using deep learning models for ischemic stroke classification, detection, and segmentation, particularly CNN-based methods for binary and multi-class classification, segmentation, and clinical stages of stroke with appropriate modalities.

10 Conclusion and future directions

Recent advancements in ischemic stroke detection focus on multimodal imaging techniques, such as MRI, CT scans, and perfusion imaging, as well as clinical and molecular biomarkers. Various deep learning models, particularly convolutional neural networks (CNNs) like LeNet, AlexNet, VGGNet, ResNet, and transformer-based models, have demonstrated significant potential in automating the detection process and efficiently analyzing limited datasets. Performance comparisons reveal that these CNN-based models achieve an average accuracy of 85–98% in identifying early ischemic changes, predominantly using single and multimodal imaging techniques.

Moreover, integrating vascular imaging data with clinical signs and molecular biomarkers can lead to the development of more robust prognostic models for ischemic stroke, facilitating early detection and better prediction of potential outcomes. Looking ahead, advanced transformer-based models trained on diverse datasets—spanning various biomarkers, age groups, and imaging scanners—are expected to offer greater reliability and robustness for clinical practice. Combining stroke detection parameters is beneficial for Validate across diverse cohorts to ensure robust generalization. Our research highlights integrating both structural and functional data with Proteomic biomarkers (e.g., ischemia-specific proteins) allow real-time monitoring for early signs of stroke and enable pre-emptive monitoring. Critically evaluating the clinical and commercial applications for ischemic stroke detection, segmentation, and classification, we concluded that existing commercial applications each specializing in different aspects of the stroke disease process and primarily focus on specific stroke detection characteristics. A Significant challenge appears to integrate all the critical parameters, such as chronic ischemia with white matter hyper intensities, acute ischemia, salvageable tissue volume, and prediction of infarction in specific brain regions, into a single system. Future efforts should aim to design new commercial applications capable of working across multiple parameters and the prediction of infarction in specific brain regions. Addressing the challenges of integrating diverse data and improving performance across these domains requires the development of a unified multi-task learning model with cross-parameter feature learning. Different parameters may need to be assessed using various modalities, such as MRI for chronic ischemia and CT for acute ischemia. Multi-modal imaging integration can be achieved through transfer learning, where models trained on one modality are adapted for another and transformer-based models. Additionally, federated learning enables collaborative model training across decentralized datasets while ensuring patient data privacy.

While progressive work and applications for ischemic stroke segmentation are available, identifying core and penumbra regions requires further exploration. One of the significant challenges in identifying the core region is its association with extremely low cerebral blood flow (CBF), particularly in the middle cerebral artery (MCA) territory during an MCA stroke. The identification of cortical border-zone areas, characterized by moderate CBF, is also challenging due to their location at the junctions of major cerebral arteries (e.g., between the anterior cerebral artery (ACA) and MCA or the MCA and posterior cerebral artery (PCA)). Similarly, the subcortical border, which lies between the deep and superficial branches of the MCA, requires precise analysis. Additionally, the detection of white matter regions with white matter hyperintensities (WMHs) on imaging plays a crucial role in assessing chronic ischemic stroke, underscoring the importance of advancing methodologies for these regions.

To evaluate the adequacy of deep learning (DL) methods in ischemic stroke detection, classification, and segmentation, Major limitation are Lack of multitasking to handle chronic and acute ischemic stroke classification and detection simultaneously. For comprehensive stroke management and enhanced broad applicability, a model must be trained on the distinct features of acute, subacute, and chronic stages of ischemic stroke. Achieving this requires addressing key challenges, including needing a diverse and large dataset encompassing multiple modalities, such as CT for hemorrhage and MRI for ischemia. Additionally, leveraging multi-task learning with advanced transformer models is essential for handling heterogeneous data effectively. Incorporating transfer-learning approaches can further enable the adaptation of knowledge across different modalities and stroke stages, enhancing the model’s robustness and generalizability.

In unsupervised deep learning methods, optimization through hyper-parameter tuning can produce better results. The effect of hyper-parameter tuning can be observed using deeper and wider architectures trained on large batches(Mehmood et al. 2023). The use of hyperspectral images and maximum intensity projection causes the loss of information, which can be addressed by reducing sparsity (section-by-section and sliding volume editing). In Ischemic stroke prediction, arteries’ diverse nature and stroke location at different brain regions require artery pattern as a feature that can enhance accuracy and produce better results. An increase in data diversity obtained through different modalities can improve the generalizability of trained models. Contextual information like age, gender, group, biographical, and historical patient information can also improve the model’s performance. Deviation-based anomaly detection methods currently using variational auto-encoders can be improved using convolutional layers at the level of anomaly score learning. Supervised learning methods and limited pathological data by discriminating normal/pathological image distributions can provide better results than calculating the average loss per scanned image. The use of a pre-trained model presents several challenges. The pertained models are usually trained on small datasets or some general dataset. In addition to this, variable image parameters and data demographics also create challenges for model learning. The use of generative adversarial networks can benefit to improve the results. This is the reason that generative adversarial networks are getting research attention. Using the latest AI and communication technologies like IOT, remote diagnosis, and monitoring can help the researchers propose better solutions for the disease. Integrating imaging technology with other modalities, such as genomics or proteomics, could significantly improve diagnostics by providing complementary insights into the disease, such as vascular and metabolic changes. However, researchers face challenges in integrating diverse types of information with varying scales, resolutions, and formats. Achieving this integration necessitates advanced infrastructure and equipment, which can increase computational costs. Combining data from different imaging techniques, such as PET, DTI, MRI, or CT, can offer complementary information about various aspects of the disease, including vascular and metabolic changes, tissue integrity, and functional connectivity. Establishing standardized protocols across different healthcare units is a significant challenge, as it is essential for ensuring the consistency and comparability of findings when using novel imaging methods to detect abnormalities. Deep learning models with explainable features enhance interpretability and improve diagnostic accuracy.

Moreover, the future of cerebrovascular disease management and diagnostics holds great promise with advancements in imaging technology, the integration of explainable AI algorithms and learning methods, the use of multimodal imaging techniques, the adoption of standardized protocols, and the inclusion of contextual information. These advancements can enhance diagnostic accuracy, improve treatment procedures, and facilitate better patient care. Table 10 presents a comprehensive analysis of different approaches with pros and cons.

Table 10 Comprehensive analysis of different approaches.