ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking

Li, Xue-Ning; Chen, Fang-Jiong; Lai, Ye-Ping; Tang, Peng; Liang, Xiao-Jun

doi:10.1007/s11227-024-06664-y

ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking

Published: 18 November 2024

Volume 81, article number 191, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xue-Ning Li^1,2,
Fang-Jiong Chen^1,2,
Ye-Ping Lai²,
Peng Tang² &
…
Xiao-Jun Liang²

198 Accesses
Explore all metrics

Abstract

Seismic signal detection is a crucial technology for enhancing the efficiency of earthquake early warning systems. However, existing deep learning-based seismic signal detection models often face limitations in resource-constrained seismic monitoring engineering environments due to the high computational resource demands of the models. To address this issue, this study employs spatial-depth convolution techniques in the downsampling process of seismic signal sequences, effectively minimizing the loss of fine-grained feature information. Concurrently, we leverage the coordinate attention module to enhance the model’s ability to recognize spatial features in seismic signal sequences. To reduce computational costs, we map the keys and values in the transformer architecture to a lower-dimensional subspace, significantly decreasing the demand for computational resources. By employing concatenation operations between the encoder and decoder, the model retains rich contextual information and progressively restores the spatial resolution of the signal during the decoding process. Based on these models, we propose the integration of coordinate attention and transformer network (ICAT-net), an efficient multi-task network designed to simultaneously handle various tasks, including seismic sequence recognition and phase picking. ICAT-net integrates local feature relationships with long-range dependency processing capabilities to meet the requirements of multi-task learning. Experimental results demonstrate that ICAT-net requires only 4.743168G of floating-point operations (FLOPs) and has a parameter count of 0.260755M, while performing excellently in tasks such as seismic waveform detection (DET), P-wave phase picking (Ppk), and S-wave phase picking (Spk). These advantages render ICAT-net particularly suitable for deployment in resource-constrained environments, providing valuable solutions for earthquake monitoring and disaster risk assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking

Article Open access 07 August 2020

Research on Earthquake Detection Based on Machine Learning

1D Convolutional Seismic Event Classification Method Based on Attention Mechanism and Light Inception Block

Article 25 June 2024

Data availability

The STEAD is available at: https://github. com/smousavi05/STEAD. The ICAT model implemented in the code can be retrieved from: https://github.com/lee524/ICAT-net

References

Shearer PM (2019) Introduction to seismology. Cambridge university press.
Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer-an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11(1):3952
Article Google Scholar
Bormann P (Ed) (2012) New manual of seismological observatory practice (NMSOP-2). IASPEI, GFZ German Research Centre for Geosciences. https://doi.org/10.2312/GFZ.NMSOP-2
Allen RV (1978) Automatic earthquake recognition and timing from single traces. Bull Seismol Soc Am 68(5):1521–1532
Article Google Scholar
Gibbons SJ, Ringdal F, Kværna T (2008) Detection and characterization of seismic phases using continuous spectral estimation on incoherent and partially coherent arrays. Geophys J Int 172(1):405–421
Article Google Scholar
Akhouayri E-S, Agliz D, Atmani A et al (2014) Automatic detection and picking of p-wave arrival in locally stationary noise using cross-correlation. Digit Signal Process 26:87–100
Article Google Scholar
Sleeman R, Van Eck T (1999) Robust automatic p-phase picking: an on-line implementation in the analysis of broadband seismogram recordings. Phys Earth Planet Inter 113(1–4):265–275
Article Google Scholar
Panagiotakis C, Kokinou E, Vallianatos F (2008) Automatic $p$-phase picking based on local-maxima distribution. IEEE Transact Geosci Remote Sens 46(8):2280–2287
Article Google Scholar
Saragiotis CD, Hadjileontiadis LJ, Panas SM (2002) Pai-s/k: A robust automatic seismic p phase arrival identification scheme. IEEE Transact Geosci Remote Sens 40(6):1395–1404
Article Google Scholar
Li Y, Wang Y, Lin H, Zhong T (2018) First arrival time picking for microseismic data based on dwsw algorithm. J Seismol 22:833–840
Article Google Scholar
Gao L, Liu D, Luo GF, Song GJ, Min F (2021) First-arrival picking through fuzzy c-means and robust locally weighted regression. Acta Geophysica 69:1623–1636
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Info Process Syst 33:1877–1901
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Zacarias-Morales N, Hernández-Nolasco JA, Pancardo P (2023) Full single-type deep learning models with multihead attention for speech enhancement. Appl Intell 53(17):20561–20576
Article Google Scholar
Mousavi SM, Sheng Y, Zhu W, Beroza GC (2019) Stanford earthquake dataset (stead): a global data set of seismic signals for ai. IEEE Access 7:179464–179476
Article Google Scholar
Ni Y, Hutko A, Skene F, Denolle M, Malone S, Bodin P, Hartog R, Wright A (2023) Curated Pacific Northwest AI-ready Seismic Dataset. Seismica 2(1). https://doi.org/10.26443/seismica.v2i1.368
Zhao M, Xiao Z, Chen S, Fang L (2022) Diting: a large-scale chinese seismic benchmark dataset for artificial intelligence in seismology. Earthq Sci 35:1–11
Google Scholar
Chen Y, Zhang G, Bai M, Zu S, Guan Z, Zhang M (2019) Automatic waveform classification and arrival picking based on convolutional neural network. Earth Space Sci 6(7):1244–1261
Article Google Scholar
Niu H, Gong Z, Ozanich E, Gerstoft P, Wang H, Li Z (2019) Deep-learning source localization using multi-frequency magnitude-only data. J Acoust Soc Am 146(1):211–222
Article Google Scholar
Kriegerowski M, Petersen GM, Vasyura-Bathke H, Ohrnberger M (2019) A deep convolutional neural network for localization of clustered earthquakes based on multistation full waveforms. Seismol Res Lett 90(2A):510–516
Article Google Scholar
Zhu W, Beroza GC (2019) Phasenet: a deep-neural-network-based seismic arrival-time picking method. Geophys J Int 216(1):261–273
Google Scholar
Li S, Yang X, Cao A, Wang C, Liu Y, Liu Y, Niu Q (2023) Seismogram transformer: a generic deep learning backbone network for multiple earthquake monitoring tasks. arXiv preprint arXiv:2310.01037
Perol T, Gharbi M, Denolle M (2018) Convolutional neural network for earthquake detection and location. Sci Adv 4(2):1700578
Article Google Scholar
Wang J, Xiao Z, Liu C, Zhao D, Yao Z (2019) Deep learning for picking seismic arrival times. J Geophys Res: Solid Earth 124(7):6612–6624
Article Google Scholar
Gentili S, Michelini A (2006) Automatic picking of p and s phases using a neural tree. J Seismol 10:39–63
Article Google Scholar
Zhao Y, Takano K (1999) An artificial neural network approach for broadband seismic phase picking. Bull Seismol Soc Am 89(3):670–680
Article Google Scholar
Mousavi SM, Zhu W, Sheng Y, Beroza GC (2019) Cred: a deep residual network of convolutional and recurrent units for earthquake signal detection. Sci Rep 9(1):10267
Article Google Scholar
Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4003–4012
Khan W, Raj K, Kumar T, Roy AM, Luo B (2022) Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry 14(10):1976
Article Google Scholar
Si X, Wu X, Sheng H, Zhu J, Li Z (2024) SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction. IEEE Trans Geosci Remote Sens
Münchmeyer J, Bindi D, Leser U, Tilmann F (2021) Earthquake magnitude and location estimation from real time seismic waveforms with a transformer network. Geophys J Int 226(2):1086–1104
Article Google Scholar
Stepnov A, Chernykh V, Konovalov A (2021) The seismo-performer: a novel machine learning approach for general and efficient seismic phase recognition from local earthquakes in real time. Sensors 21(18):6290
Article Google Scholar
Sunkara R, Luo T (2022) No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 443–459
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22–31
Caruana R (1997) Machine learning. Multitask Learn 28:41–75
Google Scholar
Xiao T, Singh M, Mintun E, Darrell T, Dollár P, Girshick R (2021) Early convolutions help transformers see better. Adv Neural Info Process Syst 34:30392–30400
Google Scholar
Kyurkchiev N, Markov S (2015) Sigmoid functions: some approximation and modelling aspects. LAP LAMBERT Academic Publishing, Saarbrucken
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Zhang H, Ma C, Pazzi V, Li T, Casagli N (2020) Deep convolutional neural network for microseismic signal detection and classification. Pure Appl Geophys 177:5781–5797
Article Google Scholar
Choi S, Lee B, Kim J, Jung H (2024) Deep-learning-based seismic-signal p-wave first-arrival picking detection using spectrogram images. Electronics 13(1):229
Article Google Scholar
Sang Y, Peng Y, Lu M, Zhao C, Li L, Ma T (2023) Seisdenet: an intelligent seismic data denoising network for the internet of things. J Cloud Comput 12(1):34
Article Google Scholar
Cha J, Cho BR, Sharp JL (2013) Rethinking the truncated normal distribution. Int J Exp Des Process Optim 3(4):327–363
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 464–472. IEEE
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp 233–240
Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp 79–91
Error MA (2016) Mean absolute error. Retrieved September 19, 2016
Lee DK, In J, Lee S (2015) Standard deviation and standard error of the mean. Korean J Anesth 68(3):220–223
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported in part by the National Natural Science Foundation of China under Grant No. 62271208, associated with South China University of Technology. Additionally, this study received funding from the Major Key Project of Peng Cheng Laboratory under Grant No. PCL2023A09.

Author information

Authors and Affiliations

School of Future Technology, South China University of Technology, Guangzhou, 510641, China
Xue-Ning Li & Fang-Jiong Chen
Peng Cheng Laboratory, Shenzhen, 518000, China
Xue-Ning Li, Fang-Jiong Chen, Ye-Ping Lai, Peng Tang & Xiao-Jun Liang

Authors

Xue-Ning Li
View author publications
You can also search for this author in PubMed Google Scholar
Fang-Jiong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ye-Ping Lai
View author publications
You can also search for this author in PubMed Google Scholar
Peng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jun Liang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xue-Ning Li: Conceptualization, methodology, software, validation, writing—original draft, data curation, validation. Fang-Jiong Chen: Conceptualization, methodology. Ye-Ping Lai: Resources, project administration. Peng Tang: Supervision, writing—review & editing. Xiao-Jun Liang: Funding acquisition, writing—review & editing.

Corresponding author

Correspondence to Fang-Jiong Chen.

Ethics declarations

Conflict of interest

The authors certify that there is no conflict of interest with any individual or organization for this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Up sampling block

Figure 8 illustrates the detailed workflow of the upsampling module. Following processing by the ICAT module, the feature matrix initially flows to a one-dimensional transposed convolution layer (Conv Transpose1d). The purpose of this layer is to perform upsampling operations to extend the sequence length of features via transposed convolution, which is utilized for expanding the feature dimensions of one-dimensional signals or time series data. Upon completion of this process, the feature matrix is fed into a one-dimensional convolution layer (Conv 1d), which is equipped with a convolution kernel of variable size K and set to a stride of 1. This layer adapts to the needs of feature extraction at various scales; by selecting an appropriate convolution kernel size K, it effectively integrates contextual information within the sequence, thereby enhancing the model’s ability to represent features at different temporal scales in time series data. In the subsequent process, the data pass through a one-dimensional batch normalization layer (BatchNorm1d), which stabilize the training process by normalizing batch data, improve model performance, and aids in preventing overfitting. Ultimately, the data undergo a nonlinear transformation via the Gaussian error linear unit (GELU) activation function, which captures and enhances the model’s comprehension and representation of complex data patterns, thereby providing a rich feature representation for downstream tasks.

Appendix B Training details

In this experiment, two NVIDIA GeForce RTX 3090 GPUs and an Intel Core i9-10900X CPU were used for computation, and the experiment ran for a total of 8 hours. We implemented the training of the ICAT-net using PyTorch and applied a series of meticulously designed initialization and optimization techniques to ensure that the model effectively learns and captures the complex features of seismic wave signals. During training, we configured the parameters based on the settings used in the SeisT [25] method. Early stages of model training are critical for ensuring stable and effective convergence. Proper initialization and careful monitoring during these stages help prevent issues such as vanishing or exploding gradients, facilitating a smooth learning process [47]. This initialization method can prevent issues of gradient vanishing or exploding, helping to stabilize weight updates in the early stages of training. Additionally, the weights of the BatchNorm layers were set to 1, while biases were initialized to 0. This configuration helps the model to have nonzero gradients at the start of training, facilitating the onset of the learning process. To effectively train the ICAT-net, we used binary cross-entropy (BCE) as the loss function.

In terms of optimization strategies, we selected the Adam optimizer [48], an adaptive learning rate optimization algorithm that adjusts the learning rate for each parameter based on the history of parameter updates. This allows the model to learn at different step sizes in different parameter space regions, which is crucial for deep networks. Combined with the cyclic learning rate scheduler [49], our learning rate progressively decreases from $1 \times 10^{-3}$ to $8 \times 10^{-5}$ over a cycle, then repeats. The periodic adjustment of the learning rate aims to balance the model’s exploration (large step updates to parameters to escape local minima or saddle points) and exploitation (small step updates to refine the current solution). To prevent overfitting and ensure the generalizability of the model, we implemented an early stopping strategy. Specifically, if no decrease in loss on the validation set is observed over 20 consecutive training epochs, the training process will be prematurely terminated. This strategy ensures that the model does not waste resources on ineffective learning while preserving the state of the model at its current best performance.

Based on the EQTransformer [36] configuration, this study sets the detection thresholds for P-waves (primary longitudinal waves) and S-waves (secondary transverse waves) at 0.3, while the threshold for detecting seismic events is set at 0.5. These thresholds are meticulously chosen to balance detection rates and false alarm rates, ensuring that the model maintains high sensitivity while accurately excluding non-seismic signals. In the probability distribution of the model’s output, the arrival time of a phase is determined by locating the peak of the distribution. Specifically, if the output probability exceeds the corresponding threshold, the model marks the respective time point as a potential seismic phase arrival.

Appendix C Evaluation

Different evaluation metrics are crucial for assessing a model’s performance in specific tasks. This study utilizes multiple statistical metrics to comprehensively evaluate the model’s performance, including precision (${\text {Pr}}$) [50], recall (${\text {Re}}$) [50], F1 score (F1) [51], mean absolute error (${\text {MAE}}$) [52], standard deviation (${\text {Std}}$) [53], and mean error (${\text {Mean}}$) [53], defined as follows:

$$\begin{aligned} {\text {Pr}} = \frac{{T_{{\text {p}}} }}{{F_{{\text {p}}} + T_{{\text {p}}} }} \end{aligned}$$

(C1)

$$\begin{aligned} {\text {Re}} = \frac{{T_{{\text {p}}} }}{{F_{{\text {n}}} + T_{{\text {p}}} }} \end{aligned}$$

(C2)

$$\begin{aligned} F1 = \frac{{2 \times {\text {Pr}} \times {\text {Re}}}}{{{\text {Pr}} + {\text {Re}}}} \end{aligned}$$

(C3)

$$\begin{aligned} \text {Mean}= & \frac{1}{N} \sum _{i=1}^{N} (y_i - {\hat{y}}_i) \end{aligned}$$

(C4)

$$\begin{aligned} \text {Std}= & \sqrt{\frac{1}{N} \sum _{i=1}^{N} (y_i - {\hat{y}}_i)^2} \end{aligned}$$

(C5)

$$\begin{aligned} \text {MAE}= & \frac{1}{N} \sum _{i=1}^{N} |y_i - {\hat{y}}_i| \end{aligned}$$

(C6)

Specifically, the definitions of these evaluation metrics are as follows: $T_{{\text {p}}}$ represents the number of true positives, $F_{{\text {p}}}$ represents the number of false positives, $F_{{\text {n}}}$ represents the number of false negatives, N is the total number of samples, $y_i$ represents the true label of sample i, and ${\hat{y}}_i$ represents the predicted value for sample i. In phase picking tasks, this study considers samples with residuals within an error range of $\delta < 0.1\,s$ as true positives, in order to accurately assess model performance. This criterion helps identify samples with smaller residuals and reduces the number of false positives.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, XN., Chen, FJ., Lai, YP. et al. ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking. J Supercomput 81, 191 (2025). https://doi.org/10.1007/s11227-024-06664-y

Download citation

Accepted: 29 October 2024
Published: 18 November 2024
DOI: https://doi.org/10.1007/s11227-024-06664-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking

Research on Earthquake Detection Based on Machine Learning

1D Convolutional Seismic Event Classification Method Based on Attention Mechanism and Light Inception Block

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A Up sampling block

Appendix B Training details

Appendix C Evaluation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now