Abstract
Seismic signal detection is a crucial technology for enhancing the efficiency of earthquake early warning systems. However, existing deep learning-based seismic signal detection models often face limitations in resource-constrained seismic monitoring engineering environments due to the high computational resource demands of the models. To address this issue, this study employs spatial-depth convolution techniques in the downsampling process of seismic signal sequences, effectively minimizing the loss of fine-grained feature information. Concurrently, we leverage the coordinate attention module to enhance the model’s ability to recognize spatial features in seismic signal sequences. To reduce computational costs, we map the keys and values in the transformer architecture to a lower-dimensional subspace, significantly decreasing the demand for computational resources. By employing concatenation operations between the encoder and decoder, the model retains rich contextual information and progressively restores the spatial resolution of the signal during the decoding process. Based on these models, we propose the integration of coordinate attention and transformer network (ICAT-net), an efficient multi-task network designed to simultaneously handle various tasks, including seismic sequence recognition and phase picking. ICAT-net integrates local feature relationships with long-range dependency processing capabilities to meet the requirements of multi-task learning. Experimental results demonstrate that ICAT-net requires only 4.743168G of floating-point operations (FLOPs) and has a parameter count of 0.260755M, while performing excellently in tasks such as seismic waveform detection (DET), P-wave phase picking (Ppk), and S-wave phase picking (Spk). These advantages render ICAT-net particularly suitable for deployment in resource-constrained environments, providing valuable solutions for earthquake monitoring and disaster risk assessment.







Similar content being viewed by others
Data availability
The STEAD is available at: https://github. com/smousavi05/STEAD. The ICAT model implemented in the code can be retrieved from: https://github.com/lee524/ICAT-net
References
Shearer PM (2019) Introduction to seismology. Cambridge university press.
Mousavi SM, Ellsworth WL, Zhu W, Chuang LY, Beroza GC (2020) Earthquake transformer-an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11(1):3952
Bormann P (Ed) (2012) New manual of seismological observatory practice (NMSOP-2). IASPEI, GFZ German Research Centre for Geosciences. https://doi.org/10.2312/GFZ.NMSOP-2
Allen RV (1978) Automatic earthquake recognition and timing from single traces. Bull Seismol Soc Am 68(5):1521–1532
Gibbons SJ, Ringdal F, Kværna T (2008) Detection and characterization of seismic phases using continuous spectral estimation on incoherent and partially coherent arrays. Geophys J Int 172(1):405–421
Akhouayri E-S, Agliz D, Atmani A et al (2014) Automatic detection and picking of p-wave arrival in locally stationary noise using cross-correlation. Digit Signal Process 26:87–100
Sleeman R, Van Eck T (1999) Robust automatic p-phase picking: an on-line implementation in the analysis of broadband seismogram recordings. Phys Earth Planet Inter 113(1–4):265–275
Panagiotakis C, Kokinou E, Vallianatos F (2008) Automatic \(p\)-phase picking based on local-maxima distribution. IEEE Transact Geosci Remote Sens 46(8):2280–2287
Saragiotis CD, Hadjileontiadis LJ, Panas SM (2002) Pai-s/k: A robust automatic seismic p phase arrival identification scheme. IEEE Transact Geosci Remote Sens 40(6):1395–1404
Li Y, Wang Y, Lin H, Zhong T (2018) First arrival time picking for microseismic data based on dwsw algorithm. J Seismol 22:833–840
Gao L, Liu D, Luo GF, Song GJ, Min F (2021) First-arrival picking through fuzzy c-means and robust locally weighted regression. Acta Geophysica 69:1623–1636
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Info Process Syst 33:1877–1901
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Zacarias-Morales N, Hernández-Nolasco JA, Pancardo P (2023) Full single-type deep learning models with multihead attention for speech enhancement. Appl Intell 53(17):20561–20576
Mousavi SM, Sheng Y, Zhu W, Beroza GC (2019) Stanford earthquake dataset (stead): a global data set of seismic signals for ai. IEEE Access 7:179464–179476
Ni Y, Hutko A, Skene F, Denolle M, Malone S, Bodin P, Hartog R, Wright A (2023) Curated Pacific Northwest AI-ready Seismic Dataset. Seismica 2(1). https://doi.org/10.26443/seismica.v2i1.368
Zhao M, Xiao Z, Chen S, Fang L (2022) Diting: a large-scale chinese seismic benchmark dataset for artificial intelligence in seismology. Earthq Sci 35:1–11
Chen Y, Zhang G, Bai M, Zu S, Guan Z, Zhang M (2019) Automatic waveform classification and arrival picking based on convolutional neural network. Earth Space Sci 6(7):1244–1261
Niu H, Gong Z, Ozanich E, Gerstoft P, Wang H, Li Z (2019) Deep-learning source localization using multi-frequency magnitude-only data. J Acoust Soc Am 146(1):211–222
Kriegerowski M, Petersen GM, Vasyura-Bathke H, Ohrnberger M (2019) A deep convolutional neural network for localization of clustered earthquakes based on multistation full waveforms. Seismol Res Lett 90(2A):510–516
Zhu W, Beroza GC (2019) Phasenet: a deep-neural-network-based seismic arrival-time picking method. Geophys J Int 216(1):261–273
Li S, Yang X, Cao A, Wang C, Liu Y, Liu Y, Niu Q (2023) Seismogram transformer: a generic deep learning backbone network for multiple earthquake monitoring tasks. arXiv preprint arXiv:2310.01037
Perol T, Gharbi M, Denolle M (2018) Convolutional neural network for earthquake detection and location. Sci Adv 4(2):1700578
Wang J, Xiao Z, Liu C, Zhao D, Yao Z (2019) Deep learning for picking seismic arrival times. J Geophys Res: Solid Earth 124(7):6612–6624
Gentili S, Michelini A (2006) Automatic picking of p and s phases using a neural tree. J Seismol 10:39–63
Zhao Y, Takano K (1999) An artificial neural network approach for broadband seismic phase picking. Bull Seismol Soc Am 89(3):670–680
Mousavi SM, Zhu W, Sheng Y, Beroza GC (2019) Cred: a deep residual network of convolutional and recurrent units for earthquake signal detection. Sci Rep 9(1):10267
Hou Q, Zhang L, Cheng M-M, Feng J (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4003–4012
Khan W, Raj K, Kumar T, Roy AM, Luo B (2022) Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry 14(10):1976
Si X, Wu X, Sheng H, Zhu J, Li Z (2024) SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction. IEEE Trans Geosci Remote Sens
Münchmeyer J, Bindi D, Leser U, Tilmann F (2021) Earthquake magnitude and location estimation from real time seismic waveforms with a transformer network. Geophys J Int 226(2):1086–1104
Stepnov A, Chernykh V, Konovalov A (2021) The seismo-performer: a novel machine learning approach for general and efficient seismic phase recognition from local earthquakes in real time. Sensors 21(18):6290
Sunkara R, Luo T (2022) No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 443–459
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Wang S, Li BZ, Khabsa M, Fang H, Ma H (2020) Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 22–31
Caruana R (1997) Machine learning. Multitask Learn 28:41–75
Xiao T, Singh M, Mintun E, Darrell T, Dollár P, Girshick R (2021) Early convolutions help transformers see better. Adv Neural Info Process Syst 34:30392–30400
Kyurkchiev N, Markov S (2015) Sigmoid functions: some approximation and modelling aspects. LAP LAMBERT Academic Publishing, Saarbrucken
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Zhang H, Ma C, Pazzi V, Li T, Casagli N (2020) Deep convolutional neural network for microseismic signal detection and classification. Pure Appl Geophys 177:5781–5797
Choi S, Lee B, Kim J, Jung H (2024) Deep-learning-based seismic-signal p-wave first-arrival picking detection using spectrogram images. Electronics 13(1):229
Sang Y, Peng Y, Lu M, Zhao C, Li L, Ma T (2023) Seisdenet: an intelligent seismic data denoising network for the internet of things. J Cloud Comput 12(1):34
Cha J, Cho BR, Sharp JL (2013) Rethinking the truncated normal distribution. Int J Exp Des Process Optim 3(4):327–363
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 464–472. IEEE
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp 233–240
Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In: Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, pp 79–91
Error MA (2016) Mean absolute error. Retrieved September 19, 2016
Lee DK, In J, Lee S (2015) Standard deviation and standard error of the mean. Korean J Anesth 68(3):220–223
Acknowledgements
This research was supported in part by the National Natural Science Foundation of China under Grant No. 62271208, associated with South China University of Technology. Additionally, this study received funding from the Major Key Project of Peng Cheng Laboratory under Grant No. PCL2023A09.
Author information
Authors and Affiliations
Contributions
Xue-Ning Li: Conceptualization, methodology, software, validation, writing—original draft, data curation, validation. Fang-Jiong Chen: Conceptualization, methodology. Ye-Ping Lai: Resources, project administration. Peng Tang: Supervision, writing—review & editing. Xiao-Jun Liang: Funding acquisition, writing—review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors certify that there is no conflict of interest with any individual or organization for this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Up sampling block
Figure 8 illustrates the detailed workflow of the upsampling module. Following processing by the ICAT module, the feature matrix initially flows to a one-dimensional transposed convolution layer (Conv Transpose1d). The purpose of this layer is to perform upsampling operations to extend the sequence length of features via transposed convolution, which is utilized for expanding the feature dimensions of one-dimensional signals or time series data. Upon completion of this process, the feature matrix is fed into a one-dimensional convolution layer (Conv 1d), which is equipped with a convolution kernel of variable size K and set to a stride of 1. This layer adapts to the needs of feature extraction at various scales; by selecting an appropriate convolution kernel size K, it effectively integrates contextual information within the sequence, thereby enhancing the model’s ability to represent features at different temporal scales in time series data. In the subsequent process, the data pass through a one-dimensional batch normalization layer (BatchNorm1d), which stabilize the training process by normalizing batch data, improve model performance, and aids in preventing overfitting. Ultimately, the data undergo a nonlinear transformation via the Gaussian error linear unit (GELU) activation function, which captures and enhances the model’s comprehension and representation of complex data patterns, thereby providing a rich feature representation for downstream tasks.
Appendix B Training details
In this experiment, two NVIDIA GeForce RTX 3090 GPUs and an Intel Core i9-10900X CPU were used for computation, and the experiment ran for a total of 8 hours. We implemented the training of the ICAT-net using PyTorch and applied a series of meticulously designed initialization and optimization techniques to ensure that the model effectively learns and captures the complex features of seismic wave signals. During training, we configured the parameters based on the settings used in the SeisT [25] method. Early stages of model training are critical for ensuring stable and effective convergence. Proper initialization and careful monitoring during these stages help prevent issues such as vanishing or exploding gradients, facilitating a smooth learning process [47]. This initialization method can prevent issues of gradient vanishing or exploding, helping to stabilize weight updates in the early stages of training. Additionally, the weights of the BatchNorm layers were set to 1, while biases were initialized to 0. This configuration helps the model to have nonzero gradients at the start of training, facilitating the onset of the learning process. To effectively train the ICAT-net, we used binary cross-entropy (BCE) as the loss function.
In terms of optimization strategies, we selected the Adam optimizer [48], an adaptive learning rate optimization algorithm that adjusts the learning rate for each parameter based on the history of parameter updates. This allows the model to learn at different step sizes in different parameter space regions, which is crucial for deep networks. Combined with the cyclic learning rate scheduler [49], our learning rate progressively decreases from \(1 \times 10^{-3}\) to \(8 \times 10^{-5}\) over a cycle, then repeats. The periodic adjustment of the learning rate aims to balance the model’s exploration (large step updates to parameters to escape local minima or saddle points) and exploitation (small step updates to refine the current solution). To prevent overfitting and ensure the generalizability of the model, we implemented an early stopping strategy. Specifically, if no decrease in loss on the validation set is observed over 20 consecutive training epochs, the training process will be prematurely terminated. This strategy ensures that the model does not waste resources on ineffective learning while preserving the state of the model at its current best performance.
Based on the EQTransformer [36] configuration, this study sets the detection thresholds for P-waves (primary longitudinal waves) and S-waves (secondary transverse waves) at 0.3, while the threshold for detecting seismic events is set at 0.5. These thresholds are meticulously chosen to balance detection rates and false alarm rates, ensuring that the model maintains high sensitivity while accurately excluding non-seismic signals. In the probability distribution of the model’s output, the arrival time of a phase is determined by locating the peak of the distribution. Specifically, if the output probability exceeds the corresponding threshold, the model marks the respective time point as a potential seismic phase arrival.
Appendix C Evaluation
Different evaluation metrics are crucial for assessing a model’s performance in specific tasks. This study utilizes multiple statistical metrics to comprehensively evaluate the model’s performance, including precision (\({\text {Pr}}\)) [50], recall (\({\text {Re}}\)) [50], F1 score (F1) [51], mean absolute error (\({\text {MAE}}\)) [52], standard deviation (\({\text {Std}}\)) [53], and mean error (\({\text {Mean}}\)) [53], defined as follows:
Specifically, the definitions of these evaluation metrics are as follows: \(T_{{\text {p}}}\) represents the number of true positives, \(F_{{\text {p}}}\) represents the number of false positives, \(F_{{\text {n}}}\) represents the number of false negatives, N is the total number of samples, \(y_i\) represents the true label of sample i, and \({\hat{y}}_i\) represents the predicted value for sample i. In phase picking tasks, this study considers samples with residuals within an error range of \(\delta < 0.1\,s\) as true positives, in order to accurately assess model performance. This criterion helps identify samples with smaller residuals and reduces the number of false positives.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, XN., Chen, FJ., Lai, YP. et al. ICAT-net: a lightweight neural network with optimized coordinate attention and transformer mechanisms for earthquake detection and phase picking. J Supercomput 81, 191 (2025). https://doi.org/10.1007/s11227-024-06664-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06664-y