Skip to main content
Log in

A hidden semi-Markov model for chart pattern matching in financial time series

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Many pattern matching approaches have been applied in financial time series to detect chart patterns and predict price trends. In this paper, we propose an extended hidden semi-Markov model for chart pattern matching (HSMM-CP). In our approach, a hidden semi-Markov model is trained and a Viterbi algorithm is used to detect chart patterns. The proposed approach not only simplifies the traditional way of training an HSMM, but also reduces potential biases in parameter initialisation. We compare the proposed model with current approaches on a set of templates selected from 53 chart patterns. Experiments on a synthetic dataset show that the proposed approach has the highest average accuracy and recall among other pattern matching approaches. Specifically, the HSMM-CP approach achieves highest accuracy for “Triangles, Ascending”, “Head-and-Shoulders Tops”, “Triple Tops” and “Cup with Handle” patterns. Moreover, experiments results show that the HSMM-CP performs significantly better than other approaches in distinguishing patterns with similar shapes such as “Head-and-Shoulders Tops” and “Triple Tops”. Experiments are also conducted on a real dataset comprising the historical prices of several stocks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: KDD workshop, vol. 10, Seattle, WA, pp. 359–370

  • Bulkowski TN (2011) Encyclopedia of chart patterns, 2nd edn. Wiley, Hoboken, New Jersey

    Google Scholar 

  • Cao H, Jin H, Wu S, Ibrahim S (2013) Petri net based grid workflow verification and optimization. J Supercomput 66(3):1215–1230

    Article  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2(3):27

    Article  Google Scholar 

  • Chen CH, Tseng VS, Yu HH, Hong TP (2013) Time series pattern discovery by a PIP-based evolutionary approach. Soft Comput 17(9):1699–1710

    Article  Google Scholar 

  • Chung FL, Fu TC, Luk R, Ng V (2001) Flexible time series pattern matching based on perceptually important points. In: International joint conference on artificial intelligence workshop on learning from temporal and spatial data, pp. 1–7

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Series B (methodol) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Fu Tc, Chung Fl, Luk R, Ng Cm (2007) Stock time series pattern matching: template-based vs. rule-based approaches. Eng Appl Artif Intell 20(3):347–364

    Article  Google Scholar 

  • Ge X, Smyth P (2000) Deformable Markov model templates for time-series pattern matching. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 81–90

  • Gu B, Sheng VS (2016) A robust regularization path algorithm for \(\nu \)-support vector classification. IEEE Trans Neural Netw Learn Syst 28(5):1241–1248

    Article  Google Scholar 

  • Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416

    Article  MathSciNet  Google Scholar 

  • Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst 28(7):1646–1656

  • Holmes WJ, Russell MJ (1999) Probabilistic-trajectory segmental HMMs. Comput Speech Lang 13(1):3–37

    Article  Google Scholar 

  • Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: Data mining, 2001. ICDM 2001, Proceedings IEEE international conference on, IEEE, pp. 289–296

  • Keogh EJ, Pazzani MJ (2000) A simple dimensionality reduction technique for fast similarity search in large time series databases. In: Knowledge discovery and data mining. Current issues and new applications, Springer, pp. 122–133

  • Kim S, Smyth P (2006) Segmental hidden Markov models with random effects for waveform modeling. J Mach Learn Res 7:945–969

    MathSciNet  MATH  Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proce IEEE 77(2):257–286

    Article  Google Scholar 

  • Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245

    Article  Google Scholar 

  • Si YW, Yin J (2013) OBST-based segmentation approach to financial time series. Eng Appl Artif Intell 26(10):2581–2596

    Article  Google Scholar 

  • Wan Y, Gong X, Si YW (2016) Effect of segmentation on financial time series pattern matching. Appl Soft Comput 38:346–359

    Article  Google Scholar 

  • Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406

    Article  Google Scholar 

  • Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of lsb matching using differences between nonadjacent pixels. Multimed Tools Appl 75(4):1947–1962

    Article  Google Scholar 

  • Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM Sigkdd Explor Newsl 12(1):40–48

    Article  Google Scholar 

  • Yu SZ (2010) Hidden semi-Markov models. Artif Intell 174(2):215–243

    Article  MathSciNet  MATH  Google Scholar 

  • Zapranis A, Samolada E (2007) Can neural networks learn the “Head and Shoulders” technical analysis price pattern? Towards a methodology for testing the efficient market hypothesis. In: Artificial neural networks–ICANN 2007, Springer, pp. 516–526

  • Zhang Z, Jiang J, Liu X, Lau R, Wang H, Zhang R (2010) A real time hybrid pattern matching scheme for stock time series. In: Proceedings of the twenty-first Australasian conference on database technologies-vol 104, Australian Computer Society, Inc, pp. 161–170

  • Zheng Y, Jeon B, Xu D, Wu Q, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy c-means algorithm. J Intell Fuzzy Syst 28(2):961–973

    Google Scholar 

Download references

Acknowledgements

This research was funded by the Research Committee of University of Macau, Grant MYRG2017-00029-FST and MYRG2016-00148-FST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yain-Whar Si.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Appendices

Appendix A: How to set thresholds for the TB, ED and DTW approaches

Fig. 14
figure 14

Similarity calculation results using the TB approach for the four datasets of varying sub-sequence length. The y-axis denotes the similarity, and the x-axis denotes the case identification. The red crosses are the similarities calculated for the 50 positive cases, and the green crosses are the similarities calculated for the 50 negative cases. The blue lines are the thresholds, which were found to be constant

Fig. 15
figure 15

Similarity calculation results using the ED approach on the four datasets of varying sub-sequence length. The y-axis denotes the similarity, and the x-axis denotes the case identification. The red crosses are the similarities calculated for the 50 positive cases, and the green crosses are the similarities calculated for the 50 negative cases. The blue lines are the thresholds, which increased when the length of the sub-sequence increased

Fig. 16
figure 16

Similarity calculation results using the DTW approach on the four datasets of varying sub-sequence length. The y-axis denotes the similarity, and the x-axis denotes the case identification. The red crosses are the similarities calculated for the 50 positive cases, and the green crosses are the similarities calculated for the 50 negative cases. The blue lines are the thresholds, which increased when the length of the sub-sequence increased

We use the H&S-T pattern as an example to illustrate how we set the thresholds in the experiment. We began by generating four datasets containing one hundred time series (the top fifty were H&S-T positive time series and the bottom fifty were randomly generated negative time series) with different lengths of 19, 43, 85 and 127. We used the TB, ED and DTW approaches, respectively, to calculate the similarities between each time series in the four datasets. The top 50 were positive cases, and the distances had to be smaller than those in the bottom 50. As shown in Fig. 14a–d, the TB approach had a fixed threshold as the length of the time series increased. The threshold of the H&S-T pattern for TB \(\theta =0.1\). As shown in Fig. 15a–d, the threshold of the ED approach increased with the length of the time series. As shown in Fig 16a–d, the threshold of the DTW approach increased with the length of the time series. We modelled the threshold of the ED and DTW approach by a linear function of length. In Fig. 15a–d, the thresholds for the ED approach are 20, 40, 85 and 143 for lengths of 19, 43, 85 and 127, respectively. We regressed the threshold as a linear function of length, where the slope \(\alpha =1.1417\) and the intercept \(\beta =-6.2079\). For the DTW approach, in Fig 16a–d, the thresholds 6, 10, 22 and 34 and correspond to lengths of 19, 43, 85 and 127. In the regression linear function of length, the slope \(\gamma =0.2649\) and the intercept \(\varepsilon =-0.1457\).

Appendix B: Experimental settings for a synthetic dataset

The experiment settings are shown in Table 16

Table 16 In the experiment conducted to distinguish H&S and Trip-T, the setting was (Distinguish, 100, 115, 7, 0.1 1.1417, -6.2079, 0.2649, -0.1457). Distinguish was a dataset containing 50 H&S-T and 50 Trip-T time series. As we designed the H&S-T patterns as positive cases, the threshold settings for the TB, ED and DTW approaches matched those in the H&S-T pattern recognition experiment

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, Y., Si, YW. A hidden semi-Markov model for chart pattern matching in financial time series. Soft Comput 22, 6525–6544 (2018). https://doi.org/10.1007/s00500-017-2703-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2703-7

Keywords

Navigation