1 Introduction

Acute myeloid leukemia (AML) refers to malignant tumors of the hematopoietic system arising from dysregulation of clonal proliferation of myeloid hematopoietic stem cells/progenitor cells, and it is more common in adults, accounting for 80% of the total cases [1]. The pathogenesis of AML is abnormal proliferation and differentiation of bone marrow stem cells, such as t (8:21) in core binding factor-related AML (CBF-AML) or t(15:17) in acute promyelocytic leukemia (APL). These chromosomal translocations will lead to the formation of RUNX1-RUNX1T1 and PML-RAR fusion proteins, change the normal maturation of myeloid precursor cells, and inhibit the process that myeloid precursor cells are differentiated into granulocytes, macrophages, and dendritic cells [2]. In addition to the above chromosomal rearrangements, genetic mutations are also one of the pathogeneses of AML and more than 97% of cases may have gene protrusions [3]. Among them, FMS-like tyrosine kinase 3 (FLT3), K-Ras, and N-Ras mutations of the Ras virus oncogene homolog family are common mutations in AML cases [4], based on which the classification and treatment of AML can be carried out, but more than 50% of patients are accompanied by ambiguous molecular abnormalities, so more molecular biological markers need to be found.

WT1 gene is located at position 13 on the short arm of chromosome 11 (11p13) and is zinc finger protein with dual functions of activating and inhibiting transcription. It is an oncogene in hematopoietic tumors [5]. Studies have shown that WT1 is overexpressed in 85%-90% AML patients [6]. At present, there is still controversy about whether the WT1 gene is meaningful for the diagnosis and classification of AML, mainly focusing on the accuracy of WT1 classification and its heterogeneity in AML patients [7]. Therefore, it is necessary to explore other means to verify the accuracy of WT1 in AML classification.

With the advent of the era of cloud computing and big data, deep learning has achieved great success in various fields such as text processing and is widely used by academia and industry. Studies have shown that using unsupervised learning for deep autoencoders can not only improve the classification of disease features, but also has scalability [8]. Another study on diversified learning and deep belief network (DBN) has shown good results in classification and prediction of cancer recurrence on real data sets [9]. DBN is a probabilistic generative model. Compared with the traditional discriminant model, it establishes a joint distribution between observation data and labels. The above research shows that DBN uses gene sequencing data as the original data and can extract data features hidden in large-scale sequencing data samples for classification. However, there are relatively few applications of deep learning algorithms in AML classification. Therefore, this study aimed to verify the classification effect of DBN algorithm on AML first and then applied it to the experiment.

This study aimed to explore the role of WT1 gene expression in AML classification and evaluate the classification accuracy of deep learning, expected to provide a basis for AML classification.

2 Materials and methods

2.1 Research subjects

AML group: 121 patients who were diagnosed as AML by MICM classification (morphology, immunology, cytogenetics, and molecular biology) in our hospital from October 2017 to October 2019 were selected as research subjects. There were 69 males and 42 females, with an average age of (47.54 ± 29.68) years and an age of (15–78) years. All patients were classified according to FAB classification diagnostic criteria (jointly developed by France, the USA, and the UK in 1976).

Control group: There were 9 cases of non-leukemia patients (including 4 cases of iron deficiency anemia, 3 cases of megaloblastic anemia, 2 cases of immune thrombocytopenic purpura) and 4 healthy donors of hematopoietic stem cell transplantation, with a total of 13 examples. There were seven males and six females, with an average age of (45.82 ± 18.55) years and an age range of (18–60) years. Therefore, none of the patients were detected with infectious diseases, other malignant tumors, or rheumatic diseases.

Patients and their families were informed and signed informed consent. Patients who withdrew in the middle of the experiment and patients with severe mental disorders were excluded.

2.2 Sample collection

According to the requirements of bone marrow aspiration, 2 ml of bone marrow fluid of two groups of patients was collected and added to EDTA anticoagulation tube. The bone marrow fluid sample was mixed with erythrocyte lysate in the anticoagulation tube at a volume ratio of 1:2. Then, the mixture was kept at room temperature for 5-10 min until the erythrocytes were fully lysed and centrifuged at 2000r/min for 10 min; then, the supernatant was discarded. The above operation was repeated once, and then, the precipitate was washed once with PBS buffer to collect the precipitate.

2.3 Real-time quantitative PCR

The Trizol lysate was used to routinely extract the RNA from the 2.2-step precipitate. The ultraviolet spectrophotometer method was used to detect the total RNA content of the sample to be tested, OD260/OD280 = 1.8–2.2. If the ratio was less than 1.8, it needed to be purified again until the ratio was between 1.8–2.2. The RNA concentration (μg/μL) was 40 × OD260 × dilution factor/1000. In addition, 1% agarose gel was used to detect whether the extracted RNA was contaminated.

First-strand cDNA synthesis was performed using Takara's reverse transcription kit and 25μL reaction system. The reaction system included 2 μM total RNA 15μL, 250 μM dNTP 1μL, 5 × RT buffer 5μL, 200U/μL MMLV 1μL, 100 mM dithiothreitol (DTT) 2μL; then, the mixture was made up to 25μL with sterile pre-cooled deionized water. Then, the mixture was placed in the Bio-Rad company's PCR instrument to synthesize the first-strand cDNA, and the reaction conditions were 25 °C for 10 min, 42 °C for 60 min, and 70 °C for 15 min. Then, the synthesized cDNA was placed at -20 °C and saved for future use.

With cDNA as a template and ABL as an internal reference gene, real-time quantitative PCR amplification was performed. The reaction system included cDNA template 1μL, dNTP 9μL, 10 × PCR buffer 12.5μL, 1μL primer F, 1μL primer R, and probe 0.5μL, and the reaction system was 25μL. PCR reaction conditions were 95 °C pre-denaturation for 5 min, 95 °C denaturation for 40 s, 60 °C annealing for 30 s, 72 °C extension for 1 min, 40 cycles, with signals collected at 59 °C. Target gene expression = target gene copy number/104ABL copy number. The primers and probes were designed and synthesized by Shanghai Shengong Biological Engineering Co., Ltd.

According to the recommendation of the European Leukemia Network, WT1 expression level ≥ 2.5 was defined as high expression, and WT1 expression level < 2.5 was defined as low expression, so as to divide all AML patients into WT1 high-expression group and WT1 low-expression group.

2.4 Gene sequencing

Genomic DNA extraction kit (Shanghai Biyuntian Co., Ltd.) was used to routinely extract the DNA in the 2.2-step precipitate, and the ultraviolet spectrophotometer method was used to detect the total DNA content in the sample to be tested, OD260/OD280 = 1.8–2.2. If the ratio was less than 1.8, it needed to be purified again until the ratio was between 1.8–2.2. The concentration of DNA diluted with sterile deionized water was 100 ng/μL.

PCR amplification was performed using Takara's PCR kit and 50μL reaction system. The reaction system included total DNA 2μL, 250 μM dNTP 4μL, 10 × PCR buffer 5μL, 0.5μL Primer F, 0.5μL Primer R, Taq enzyme 0.25μL, and the mixture was made up to 50μL with sterilized pre-cooled deionized water. PCR reaction conditions were 95 °C pre-denaturation for 5 min, 95 °C denaturation for 30 s, 58 °C annealing for 1 min, 72 °C extension for 1 min, there were 30 cycles, and finally, the mixture was extended at 72 °C for 1 min. 2% agarose gel electrophoresis was used to identify the amplified target band. The primers were designed and synthesized by Shanghai Shenggong Biological Engineering Co., Ltd. The specific prime sequence was 5’-CGTTTTAAAACCTAAGAGTAG-3’.

The amplified PCR products were sent to Shanghai Biotech Engineering Co., Ltd., for sequencing.

2.5 DBN algorithm

The DBN model was used to extract feature information to verify the accuracy of the classification of WT1 high-expression group and WT1 low-expression group. According to real-time quantitative PCR, patients with AML were classified into WT1 high-expression group and WT1 low-expression group, including 76 cases in WT1 high-expression group and 45 cases in WT1 low-expression group. The basic structure of the DBN model was shown below (Fig. 1), and the basic parameter settings of the model are shown in Table 1.

Fig. 1
figure 1

The basic structure of the DBN model

Table 1 Basic parameter settings of DBN model

The binary_ cross-entropy function was used as the loss function as Eq. (1), the unsupervised pre-training weights of the restricted Boltzmann machine (RBM) probability model were used as the initial weights, and the mini-batch gradient descent was used to perform fine-tuning optimization of the network model. The whole model was mainly divided into 5 steps, namely data preprocessing, building DBN, data set feature extraction, building a new training set test set, and test model.

$$ {\text{Loss}} = \mathop \sum \limits_{i = 1}^{n} y_{{\text{i}}} {\text{ log}} y_{{\text{i}}} + \left( {1 - y_{{\text{i}}} } \right){\text{ log}}\left( {1 - y_{{\text{i}}} } \right) $$
(1)

where Loss was the loss function, n was the sample attribute, and yi was the i-th attribute of sample y.

The evaluation indicators were accuracy (%), precision (%), recall (%), F1-score (%). Accuracy is defined as Eq. (2), precision is defined as Eq. (3), recall is defined as Eq. (4), and F1-score is defined as Eq. (5).

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{FN}}}}{{{\text{TP}} + {\text{FP}} + {\text{TN}} + {\text{FN}}}} $$
(2)
$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$
(3)
$$ {\text{Recall}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$
(4)
$$ {\text{F}}1 = 2{\text{PrecisionRecall}}/\left( {{\text{Precision}} + {\text{Recall}}} \right) $$
(5)

where TP indicated that the positive class was predicted as a positive class number, FN indicated that the positive class was predicted as a negative class number, NP indicated that the negative class was predicted as a positive class number, and TN indicated that the negative class was predicted as a negative class number.

2.6 Statistical analysis

All data were statistically analyzed using SPSS22.0 software. Data such as accuracy and precision were expressed as a percentage. The data on the relative expression of WT1 were drawn by Excel based on the median and quartile, and the t-test was used for comparison between groups. P < 0.05 meant that the difference was statistically significant.

3 Results

3.1 Expression of WT1 gene in AML patients

There were nine patients in the control group and 121 patients in the AML group. Figure 2 showed the relative expression of WT1 mRNA of the two groups of patients. The relative expression of WT1 mRNA in the AML group was significantly higher than that of the control group (P < 0.05). In addition, with 2.5 as the critical point, WT1 expression level ≥ 2.5 in the AML group was defined as high expression, and WT1 expression level < 2.5 was defined as low expression, and then, all AML patients were divided into WT1 high-expression group and WT1 low-expression group. Among them, 76 cases (62.8%) were in WT1 high-expression group, and 45 cases (37.2%) were in WT1 low-expression group.

Fig. 2
figure 2

WT1 gene expression in AML patients. * P < 0.05, compared with control group

3.2 DBN to recognize WT1 gene expression in AML patients

In the study, 121 samples were collected to evaluate the accuracy of the DBN in identifying the WT1 gene expression levels of AML patients. Of the 121 cases, 76 were in the WT1 high-expression group and 45 cases in the WT1 low-expression group. As shown in Fig. 3, the accuracy of DBN in classifying the WT1 gene expression in AML patients was 94.06%, the precision was 93.82%, the recall was 93.59%, and the F1-score was 93.63%.

Fig. 3
figure 3

DBN recognition of WT1 gene expression in AML patients (unit: %)

The original data were pre-processed to normalize the data features so that one-hot encoding was performed to obtain the training data set. This data set was divided into three parts, and the three DBNs were trained separately for feature extraction. Figure 4 showed the three DBN training loss functions. The results showed that none of the three DBNs used for feature extraction was fitted.

Fig. 4
figure 4

Three DBN training loss functions

3.3 WT1 gene expression in AML patients with different types

According to the FAB classification criteria, 121 AML patients were divided into 11 cases of M0 (9.1%), 34 cases of M1 (28.1%), 29 cases of M2 (24.0%), 9 cases of M3 (7.4%), 28 cases of M4 (23.1%), 7 cases of M5 (5.8%), 1 case of M6 (0.8%), and 2 cases of M7 (1.7%). Figure 5 showed the relative expression of WT1 mRNA. Compared with M0, the relative expression of WT1 mRNA in M3 patients increased significantly (P < 0.05), and that in M5 patients significantly decreased (P < 0.05). Compared with M1, the relative expression of WT1 mRNA in M3 increased significantly (P < 0.05), and that in M4 and M5 decreased significantly (P < 0.05). Compared with M2, the relative expression of WT1 mRNA in M3 increased significantly (P < 0.05), and that in M5 decreased significantly (P < 0.05). Compared with M3, the relative expression of WT1 mRNA in M4, M5, M6, and M7 was significantly reduced (P < 0.05). Compared with M4, the relative expression of WT1 mRNA in M5 decreased significantly (P < 0.05). Compared with M5, the relative expression of WT1 mRNA in M7 increased significantly (P < 0.05). Among them, the relative expression of WT1 mRNA was the highest in M3 patients, and the lowest was in M5 patients.

Fig. 5
figure 5

WT1 gene expression in different FAB types of AML patients

3.4 Relationship between WT1 gene expression and FLT3 gene mutation

According to the results of gene sequencing, 36 (29.8%) of 121 AML patients showed FLT3 gene mutations, and 85 (70.2%) showed no mutation of FLT3 gene. Based on this, all AML patients were divided into FLT3 mutation group and FLT3 non-mutation group. As shown in Fig. 6, compared with the FLT3 non-mutation group, the relative expression of WT1 mRNA in FLT3 mutation group was significantly increased (P < 0.05), and the difference was statistically significant. The results of gene sequencing are shown in Fig. 7, with abnormal sequences shown in red boxes.

Fig. 6
figure 6

WT1 gene expression in AML patients with FLT3 gene mutation. *P < 0.05, compared with patients without FLT3-mutation

Fig. 7
figure 7

Gene sequence of FLT3

(Above: FLT3 without mutations; Below: FLT3 with mutations).

3.5 Relationship between WT1 gene expression and PML/RAR fusion gene

According to the results of real-time quantitative PCR experiments, 50 (41.3%) of 121 AML patients expressed PML/RAR fusion genes, and 71 (70.2%) did not. Based on this, all AML patients were divided into PML/RAR expression group and PML/RAR non-expression group. As shown in Fig. 8, there was no significant difference in the relative expression of WT1 mRNA between the PML/RAR expression group and the PML/RAR non-expression group. Figure 9 showed the real-time quantitative PCR detection results of four patients in the PML/RAR fusion gene expression group and the PML/RAR fusion gene non-expression group, set as P1, P2, P3, P4 and N1, N2, N3, N4, respectively.

Fig. 8
figure 8

Expression of WT1 gene in PML/RAR fusion gene in AML patients

Fig. 9
figure 9

Real-time quantitative PCR detection results a: PML/RAR fusion gene expression group; b PML/RAR fusion gene non-expression group

3.6 Relationship between WT1 gene expression and RAS mutation activation

According to the results of gene sequencing, the RAS gene in 7 (5.8%) of 121 AML patients was activated, and not activated in 114 (94.2%). Based on this, all AML patients were divided into RAS-activated group and RAS-inactivated group. As shown in Fig. 10, there was no significant difference in the relative expression of WT1 mRNA between the RAS-activated group and the RAS-inactivated group. Figure 11 showed the gene sequencing results, with abnormal sequences shown in red boxes.

Fig. 10
figure 10

WT1 gene expression in RAS patients with RAS activation

Fig. 11
figure 11

Ras gene sequencing results a Ras gene without mutations; b Ras gene with mutations

4 Discussion

Studies have shown that WT1 is overexpressed in various hematological malignancies, such as aplastic anemia [10] and leukemia [11], but its expression in AML and its role in the classification of AML are still controversial. In this study, real-time quantitative PCR was used to analyze the relative expression of WT1 mRNA in bone marrow samples of 121 AML patients and nine non-leukemia patients. The results showed that the relative expression of WT1 mRNA in bone marrow samples of AML patients was significantly higher than that of non-leukemia patients, which was consistent with what’s mentioned above [12] and further indicated that WT1 was an AML oncogene. In addition, with 2.5 as the cut-off point, 76 patients (62.8%) of AML patients with WT1 expression ≥ 2.5 were defined as the WT1 high-expression group, and 45 patients with WT1 expression < 2.5 were defined as the WT1 low-expression group (37.2%), based on which the accuracy of the DBN model in AML classification was evaluated. In this study, DBN performed feature extraction and classified WT1 gene sequencing data, and its accuracy in classifying WT1 expression levels was 94.06%, the precision was 93.82%, the recall rate was 93.59%, and the F1-score was 93.63%, indicating that the DBN model had high accuracy in WT1 expression level classification, consistent with previous research on breast cancer classification, which pointed out that the structured deep learning model can classify breast cancer based on pathological images, with an average accuracy of 93.2% [13]. It has been reported that the DBN algorithm can not only be used for disease classification, but also can classify complex disease gene expression data to facilitate clinical differentiation [14]. Experts have introduced DBN into the diagnosis of benign and malignant lung nodules. As a result, the classification accuracy, sensitivity, and specificity of the DBN algorithm were 95.3%, 92.5%, and 93.2%, respectively, and the area under the ROC curve was 0.921, demonstrating better classification effects versus the traditional algorithm and assisting the doctor in auxiliary diagnosis [15]. Some experts proposed a cytochrome P450 2C9 (Cytochrome P450 2C9) inhibitory classification model based on the DBN. The semi-supervised learning method of DBN was used to learn essential features, avoiding manual extraction. The experimental results showed that under the same conditions, DBN had obvious advantages over SVM and ANN, with an average classification accuracy of 80.6% and a sensitivity (SE) of 86.9%, and the specificity (SP) was 66.2%, close to the results of this research, which was of positive significance for drug screening and new drug development [16]. These data and literature show that deep learning has significant advantages in cancer classification and can provide rapid, accurate, and effective help for clinical classification diagnosis, treatment, and prognosis.

In this study, the relative expression of WT1 mRNA was the highest in M3 patients, and the lowest in M5 patients. Both M3 and M5 are acute non-lymphocytic leukemia. It is speculated that the abnormal expression of WT1 in these two types of AML may be related to the chromatin change induced by the fusion protein of retinoic acid receptor (RARα). Studies have shown that RARα gene may be associated with promyelocytic leukemia (PML) gene on number 15 chromosome to form a fusion protein (PML-RARα), which inhibits the differentiation and maturation of promyelocytic cells. In addition, after delocalization of PML, hundreds of fine particles can be formed in the nucleus and cytoplasm, resulting in reduced PML cell apoptosis and increased cell proliferation [17], which induces M3 type AML. However, the follow-up study of the relationship between PML-RARα and WT1 showed that there was no significant difference in the relative expression of WT1 mRNA between patients with PML/RAR fusion gene expression and patients without PML/RAR fusion gene expression. This result was inconsistent with the research conclusion of Zhang et al. (2017) [18]. They found that the highly expressed WT1 gene has internal tandem duplication mutations, namely fms-like tyrosine kinase-3 and the bcr-3 subtype of PML-RARα. This may be due to the small sample size in this study. The relationship between the PML-RARα fusion gene and the WT1 gene in AML needs further efforts.

The two-hit model of leukemia pathogenesis has made a significant contribution to the study on the pathogenesis of leukemia and related gene mutations, and it provides a conceptual framework for the classification of AML-related gene mutations [19]. This model reveals that type-I mutations lead to the activation of proliferative pathways, and type-II mutations impair normal hematopoietic differentiation. These two types of mutations must occur simultaneously to cause leukemia development. FLT3, the K-Ras, and N-Ras mutations all belong to mutations in type-I. The results of this study showed that there was no significant difference in the relative expression of WT1 mRNA between the RAS-activated group and the RAS-inactivated group. Huang et al. (2020) [20] proposed that all NRAS mutations and 65% of KRAS mutations were located at codons 12, 13, and 61 in the study, inconsistent with the result that WT1 mutations were enriched in patients with RAS mutations. What’s more, the results of this study showed that the relative expression of WT1 mRNA in patients with FLT3 mutation was significantly higher than that in patients without FLT3 mutation, indicating that there were many FLT3 mutations and WT1 overexpression in AML patients, which was of great significance for molecular combination therapy in AML patients. However, there are few studies on it, which should be enriched in the future.

5 Conclusion

In this study, DBN algorithm was used to classify features of the WT1 gene, and then, the role of WT1 in AML classification was explored. According to the research results, it was drawn: (1) the accuracy rate of DBN in classification and recognition of AML reaches 93.5%, indicating that DBN has good prospects; (2) the high expression of WT1 combined with FLT3 mutation has indications for the diagnosis and treatment of AML; (3) the relative expression of WT1 mRNA is highest in M3 and lowest in M5. However, there are few studies on the DBN algorithm used in WT1 classification, lacking a certain degree of contrast. Anyway, the results of this study can still provide guidance for the diagnosis and treatment of AML.