Skip to main content
Log in

ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Seismic signal detection is a crucial technology for enhancing the efficiency of earthquake early warning systems. However, existing deep learning-based seismic signal detection models often face limitations in resource-constrained seismic monitoring engineering environments due to the high computational resource demands of the models. To address this issue, this study employs spatial-depth convolution techniques in the downsampling process of seismic signal sequences, effectively minimizing the loss of fine-grained feature information. Concurrently, we leverage the coordinate attention module to enhance the model’s ability to recognize spatial features in seismic signal sequences. To reduce computational costs, we map the keys and values in the transformer architecture to a lower-dimensional subspace, significantly decreasing the demand for computational resources. By employing concatenation operations between the encoder and decoder, the model retains rich contextual information and progressively restores the spatial resolution of the signal during the decoding process. Based on these models, we propose the integration of coordinate attention and transformer network (ICAT-net), an efficient multi-task network designed to simultaneously handle various tasks, including seismic sequence recognition and phase picking. ICAT-net integrates local feature relationships with long-range dependency processing capabilities to meet the requirements of multi-task learning. Experimental results demonstrate that ICAT-net requires only 4.743168G of floating-point operations (FLOPs) and has a parameter count of 0.260755M, while performing excellently in tasks such as seismic waveform detection (DET), P-wave phase picking (Ppk), and S-wave phase picking (Spk). These advantages render ICAT-net particularly suitable for deployment in resource-constrained environments, providing valuable solutions for earthquake monitoring and disaster risk assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The STEAD is available at: https://github. com/smousavi05/STEAD. The ICAT model implemented in the code can be retrieved from: https://github.com/lee524/ICAT-net

References

  1. Shearer PM (2019) Introduction to seismology. Cambridge university press.

  2. Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer-an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11(1):3952

    Article  Google Scholar 

  3. Bormann P (Ed) (2012) New manual of seismological observatory practice (NMSOP-2). IASPEI, GFZ German Research Centre for Geosciences. https://doi.org/10.2312/GFZ.NMSOP-2

  4. Allen RV (1978) Automatic earthquake recognition and timing from single traces. Bull Seismol Soc Am 68(5):1521–1532

    Article  Google Scholar 

  5. Gibbons SJ, Ringdal F, Kværna T (2008) Detection and characterization of seismic phases using continuous spectral estimation on incoherent and partially coherent arrays. Geophys J Int 172(1):405–421

    Article  Google Scholar 

  6. Akhouayri E-S, Agliz D, Atmani A et al (2014) Automatic detection and picking of p-wave arrival in locally stationary noise using cross-correlation. Digit Signal Process 26:87–100

    Article  Google Scholar 

  7. Sleeman R, Van Eck T (1999) Robust automatic p-phase picking: an on-line implementation in the analysis of broadband seismogram recordings. Phys Earth Planet Inter 113(1–4):265–275

    Article  Google Scholar 

  8. Panagiotakis C, Kokinou E, Vallianatos F (2008) Automatic \(p\)-phase picking based on local-maxima distribution. IEEE Transact Geosci Remote Sens 46(8):2280–2287

    Article  Google Scholar 

  9. Saragiotis CD, Hadjileontiadis LJ, Panas SM (2002) Pai-s/k: A robust automatic seismic p phase arrival identification scheme. IEEE Transact Geosci Remote Sens 40(6):1395–1404

    Article  Google Scholar 

  10. Li Y, Wang Y, Lin H, Zhong T (2018) First arrival time picking for microseismic data based on dwsw algorithm. J Seismol 22:833–840

    Article  Google Scholar 

  11. Gao L, Liu D, Luo GF, Song GJ, Min F (2021) First-arrival picking through fuzzy c-means and robust locally weighted regression. Acta Geophysica 69:1623–1636

    Article  Google Scholar 

  12. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  13. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  14. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  15. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Info Process Syst 33:1877–1901

    Google Scholar 

  16. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  17. Zacarias-Morales N, Hernández-Nolasco JA, Pancardo P (2023) Full single-type deep learning models with multihead attention for speech enhancement. Appl Intell 53(17):20561–20576

    Article  Google Scholar 

  18. Mousavi SM, Sheng Y, Zhu W, Beroza GC (2019) Stanford earthquake dataset (stead): a global data set of seismic signals for ai. IEEE Access 7:179464–179476

    Article  Google Scholar 

  19. Ni Y, Hutko A, Skene F, Denolle M, Malone S, Bodin P, Hartog R, Wright A (2023) Curated Pacific Northwest AI-ready Seismic Dataset. Seismica 2(1). https://doi.org/10.26443/seismica.v2i1.368

  20. Zhao M, Xiao Z, Chen S, Fang L (2022) Diting: a large-scale chinese seismic benchmark dataset for artificial intelligence in seismology. Earthq Sci 35:1–11

    Google Scholar 

  21. Chen Y, Zhang G, Bai M, Zu S, Guan Z, Zhang M (2019) Automatic waveform classification and arrival picking based on convolutional neural network. Earth Space Sci 6(7):1244–1261

    Article  Google Scholar 

  22. Niu H, Gong Z, Ozanich E, Gerstoft P, Wang H, Li Z (2019) Deep-learning source localization using multi-frequency magnitude-only data. J Acoust Soc Am 146(1):211–222

    Article  Google Scholar 

  23. Kriegerowski M, Petersen GM, Vasyura-Bathke H, Ohrnberger M (2019) A deep convolutional neural network for localization of clustered earthquakes based on multistation full waveforms. Seismol Res Lett 90(2A):510–516

    Article  Google Scholar 

  24. Zhu W, Beroza GC (2019) Phasenet: a deep-neural-network-based seismic arrival-time picking method. Geophys J Int 216(1):261–273

    Google Scholar 

  25. Li S, Yang X, Cao A, Wang C, Liu Y, Liu Y, Niu Q (2023) Seismogram transformer: a generic deep learning backbone network for multiple earthquake monitoring tasks. arXiv preprint arXiv:2310.01037

  26. Perol T, Gharbi M, Denolle M (2018) Convolutional neural network for earthquake detection and location. Sci Adv 4(2):1700578

    Article  Google Scholar 

  27. Wang J, Xiao Z, Liu C, Zhao D, Yao Z (2019) Deep learning for picking seismic arrival times. J Geophys Res: Solid Earth 124(7):6612–6624

    Article  Google Scholar 

  28. Gentili S, Michelini A (2006) Automatic picking of p and s phases using a neural tree. J Seismol 10:39–63

    Article  Google Scholar 

  29. Zhao Y, Takano K (1999) An artificial neural network approach for broadband seismic phase picking. Bull Seismol Soc Am 89(3):670–680

    Article  Google Scholar 

  30. Mousavi SM, Zhu W, Sheng Y, Beroza GC (2019) Cred: a deep residual network of convolutional and recurrent units for earthquake signal detection. Sci Rep 9(1):10267

    Article  Google Scholar 

  31. Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4003–4012

  32. Khan W, Raj K, Kumar T, Roy AM, Luo B (2022) Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry 14(10):1976

    Article  Google Scholar 

  33. Si X, Wu X, Sheng H, Zhu J, Li Z (2024) SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction. IEEE Trans Geosci Remote Sens

  34. Münchmeyer J, Bindi D, Leser U, Tilmann F (2021) Earthquake magnitude and location estimation from real time seismic waveforms with a transformer network. Geophys J Int 226(2):1086–1104

    Article  Google Scholar 

  35. Stepnov A, Chernykh V, Konovalov A (2021) The seismo-performer: a novel machine learning approach for general and efficient seismic phase recognition from local earthquakes in real time. Sensors 21(18):6290

    Article  Google Scholar 

  36. Sunkara R, Luo T (2022) No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 443–459

  37. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722

  38. Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768

  39. Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22–31

  40. Caruana R (1997) Machine learning. Multitask Learn 28:41–75

    Google Scholar 

  41. Xiao T, Singh M, Mintun E, Darrell T, Dollár P, Girshick R (2021) Early convolutions help transformers see better. Adv Neural Info Process Syst 34:30392–30400

    Google Scholar 

  42. Kyurkchiev N, Markov S (2015) Sigmoid functions: some approximation and modelling aspects. LAP LAMBERT Academic Publishing, Saarbrucken

    Google Scholar 

  43. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  44. Zhang H, Ma C, Pazzi V, Li T, Casagli N (2020) Deep convolutional neural network for microseismic signal detection and classification. Pure Appl Geophys 177:5781–5797

    Article  Google Scholar 

  45. Choi S, Lee B, Kim J, Jung H (2024) Deep-learning-based seismic-signal p-wave first-arrival picking detection using spectrogram images. Electronics 13(1):229

    Article  Google Scholar 

  46. Sang Y, Peng Y, Lu M, Zhao C, Li L, Ma T (2023) Seisdenet: an intelligent seismic data denoising network for the internet of things. J Cloud Comput 12(1):34

    Article  Google Scholar 

  47. Cha J, Cho BR, Sharp JL (2013) Rethinking the truncated normal distribution. Int J Exp Des Process Optim 3(4):327–363

    Article  Google Scholar 

  48. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  49. Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 464–472. IEEE

  50. Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp 233–240

  51. Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp 79–91

  52. Error MA (2016) Mean absolute error. Retrieved September 19, 2016

  53. Lee DK, In J, Lee S (2015) Standard deviation and standard error of the mean. Korean J Anesth 68(3):220–223

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the National Natural Science Foundation of China under Grant No. 62271208, associated with South China University of Technology. Additionally, this study received funding from the Major Key Project of Peng Cheng Laboratory under Grant No. PCL2023A09.

Author information

Authors and Affiliations

Authors

Contributions

Xue-Ning Li: Conceptualization, methodology, software, validation, writing—original draft, data curation, validation. Fang-Jiong Chen: Conceptualization, methodology. Ye-Ping Lai: Resources, project administration. Peng Tang: Supervision, writing—review & editing. Xiao-Jun Liang: Funding acquisition, writing—review & editing.

Corresponding author

Correspondence to Fang-Jiong Chen.

Ethics declarations

Conflict of interest

The authors certify that there is no conflict of interest with any individual or organization for this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Up sampling block

Figure 8 illustrates the detailed workflow of the upsampling module. Following processing by the ICAT module, the feature matrix initially flows to a one-dimensional transposed convolution layer (Conv Transpose1d). The purpose of this layer is to perform upsampling operations to extend the sequence length of features via transposed convolution, which is utilized for expanding the feature dimensions of one-dimensional signals or time series data. Upon completion of this process, the feature matrix is fed into a one-dimensional convolution layer (Conv 1d), which is equipped with a convolution kernel of variable size K and set to a stride of 1. This layer adapts to the needs of feature extraction at various scales; by selecting an appropriate convolution kernel size K, it effectively integrates contextual information within the sequence, thereby enhancing the model’s ability to represent features at different temporal scales in time series data. In the subsequent process, the data pass through a one-dimensional batch normalization layer (BatchNorm1d), which stabilize the training process by normalizing batch data, improve model performance, and aids in preventing overfitting. Ultimately, the data undergo a nonlinear transformation via the Gaussian error linear unit (GELU) activation function, which captures and enhances the model’s comprehension and representation of complex data patterns, thereby providing a rich feature representation for downstream tasks.

Fig. 8
figure 8

Upsampling architecture

Appendix B Training details

In this experiment, two NVIDIA GeForce RTX 3090 GPUs and an Intel Core i9-10900X CPU were used for computation, and the experiment ran for a total of 8 hours. We implemented the training of the ICAT-net using PyTorch and applied a series of meticulously designed initialization and optimization techniques to ensure that the model effectively learns and captures the complex features of seismic wave signals. During training, we configured the parameters based on the settings used in the SeisT [25] method. Early stages of model training are critical for ensuring stable and effective convergence. Proper initialization and careful monitoring during these stages help prevent issues such as vanishing or exploding gradients, facilitating a smooth learning process [47]. This initialization method can prevent issues of gradient vanishing or exploding, helping to stabilize weight updates in the early stages of training. Additionally, the weights of the BatchNorm layers were set to 1, while biases were initialized to 0. This configuration helps the model to have nonzero gradients at the start of training, facilitating the onset of the learning process. To effectively train the ICAT-net, we used binary cross-entropy (BCE) as the loss function.

In terms of optimization strategies, we selected the Adam optimizer [48], an adaptive learning rate optimization algorithm that adjusts the learning rate for each parameter based on the history of parameter updates. This allows the model to learn at different step sizes in different parameter space regions, which is crucial for deep networks. Combined with the cyclic learning rate scheduler [49], our learning rate progressively decreases from \(1 \times 10^{-3}\) to \(8 \times 10^{-5}\) over a cycle, then repeats. The periodic adjustment of the learning rate aims to balance the model’s exploration (large step updates to parameters to escape local minima or saddle points) and exploitation (small step updates to refine the current solution). To prevent overfitting and ensure the generalizability of the model, we implemented an early stopping strategy. Specifically, if no decrease in loss on the validation set is observed over 20 consecutive training epochs, the training process will be prematurely terminated. This strategy ensures that the model does not waste resources on ineffective learning while preserving the state of the model at its current best performance.

Based on the EQTransformer [36] configuration, this study sets the detection thresholds for P-waves (primary longitudinal waves) and S-waves (secondary transverse waves) at 0.3, while the threshold for detecting seismic events is set at 0.5. These thresholds are meticulously chosen to balance detection rates and false alarm rates, ensuring that the model maintains high sensitivity while accurately excluding non-seismic signals. In the probability distribution of the model’s output, the arrival time of a phase is determined by locating the peak of the distribution. Specifically, if the output probability exceeds the corresponding threshold, the model marks the respective time point as a potential seismic phase arrival.

Appendix C Evaluation

Different evaluation metrics are crucial for assessing a model’s performance in specific tasks. This study utilizes multiple statistical metrics to comprehensively evaluate the model’s performance, including precision (\({\text {Pr}}\)) [50], recall (\({\text {Re}}\)) [50], F1 score (F1) [51], mean absolute error (\({\text {MAE}}\)) [52], standard deviation (\({\text {Std}}\)) [53], and mean error (\({\text {Mean}}\)) [53], defined as follows:

$$\begin{aligned} {\text {Pr}} = \frac{{T_{{\text {p}}} }}{{F_{{\text {p}}} + T_{{\text {p}}} }} \end{aligned}$$
(C1)
$$\begin{aligned} {\text {Re}} = \frac{{T_{{\text {p}}} }}{{F_{{\text {n}}} + T_{{\text {p}}} }} \end{aligned}$$
(C2)
$$\begin{aligned} F1 = \frac{{2 \times {\text {Pr}} \times {\text {Re}}}}{{{\text {Pr}} + {\text {Re}}}} \end{aligned}$$
(C3)
$$\begin{aligned} \text {Mean}= & \frac{1}{N} \sum _{i=1}^{N} (y_i - {\hat{y}}_i) \end{aligned}$$
(C4)
$$\begin{aligned} \text {Std}= & \sqrt{\frac{1}{N} \sum _{i=1}^{N} (y_i - {\hat{y}}_i)^2} \end{aligned}$$
(C5)
$$\begin{aligned} \text {MAE}= & \frac{1}{N} \sum _{i=1}^{N} |y_i - {\hat{y}}_i| \end{aligned}$$
(C6)

Specifically, the definitions of these evaluation metrics are as follows: \(T_{{\text {p}}}\) represents the number of true positives, \(F_{{\text {p}}}\) represents the number of false positives, \(F_{{\text {n}}}\) represents the number of false negatives, N is the total number of samples, \(y_i\) represents the true label of sample i, and \({\hat{y}}_i\) represents the predicted value for sample i. In phase picking tasks, this study considers samples with residuals within an error range of \(\delta < 0.1\,s\) as true positives, in order to accurately assess model performance. This criterion helps identify samples with smaller residuals and reduces the number of false positives.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, XN., Chen, FJ., Lai, YP. et al. ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking. J Supercomput 81, 191 (2025). https://doi.org/10.1007/s11227-024-06664-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06664-y

Keywords