Generalized regression model for sequence matching and clustering

Lei, Hansheng; Govindaraju, Venu

doi:10.1007/s10115-006-0008-8

Generalized regression model for sequence matching and clustering

Regular Paper
Published: 09 May 2006

Volume 12, pages 77–94, (2007)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Hansheng Lei¹ &
Venu Govindaraju²

98 Accesses
4 Citations
Explore all metrics

Abstract

Linear relation has been found to be valuable in rule discovery of stocks, such as if stock X goes up a, stock Y will go down b. The traditional linear regression models the linear relation of two sequences faithfully. However, if a user requires clustering of stocks into groups where sequences have high linearity or similarity with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we present generalized regression model (GRM) to match the linearity of multiple sequences at a time. GRM also gives strong heuristic support for graceful and efficient clustering. The experiments on the stocks in the NASDAQ market mined interesting clusters of stock trends efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying Market Behaviours Using European Stock Index Time Series by a Hybrid Segmentation Algorithm

Article 25 January 2017

Correlation Analysis of Stock Index Data Features Using Sequential Rule Mining Algorithms

Overview on Sequential Mining Algorithms and Their Extensions

References

Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: Proceedings of the 4th international conference on foundations of data organizations and algorithms, pp 69–84
Agrawal R, Lin KI, Sawhne HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the 21st international conference on very large data bases, pp 490–501
Berndt D, Clifford J (1994) Using dynamic time warping to find patterns in sequences. Working notes of the knowledge discovery in databases workshop, pp 359–370
Bollobas B, Das G, Gunopulos D, Mannila H (1997) Time-series similarity problems and well-separated geometric sets. In: Proceedings of the 13th annual acm symposium on computational geometry, pp 454–456
Bozkaya T, Yazdani N, Ozsoyoglu ZM (1997) Matching and indexing sequences of different lengths. In: Proceedings of the 6th international conference on information and knowledge management, pp 128–135
Chan K, Fu W (1999) Efficient sequences matching by wavelets. In: Proceedings of the 15th international conference on data engineering
Chu K, Wong M (1999) Fast time-series searching with scaling and shifting. In: Proceedings of the 18th ACM symposium on principles of database systems, pp 237–248
Chung C, Lee S, Chun S, Kim D, Lee J (2000) Similarity search for multidimensional data sequences. In: Proceedings of the 16th international conference on data engineering, pp 599–608
Das G, Gunopulos D (2000) Sequences similarity measures. Sequences tutorial in knowledge discovery and data mining
Das G, Gunopulos D, Mannila H (1997) Finding similar sequences. In: Proceedings of the 1st European symposium on principles of data mining and knowledge discovery, pp 88–100
Das G, Lin K, Mannila H, Renganathan G, Smyt P (1998) Rule discovery from sequences. Knowl Discov Data Min 16–22
Day W, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1:1–24
Article Google Scholar
Dhillon I, Parlett B (2004) Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebr Appl 387:1–28
Google Scholar
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York
Google Scholar
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the ACM SIGMOD conference on management of data, pp 419–429
Goldin D, Kanellakis P (1995) On similarity queries for time-series data: Constraint specification and implementation. In: Proceedings of the 1st international conference on the principles and practice of constraint programming, pp 137–153
Jagadish H, Mendelzon A, Milo T (1995) Similarity-based queries. In: Proceedings of the symposium on principles of database systems, pp 36–45
Keogh E (2002) Exact indexing of dynamic time warping. In: Proceedings of the 28th international conference on very large data bases, pp 406–417
Keogh E, Folias T (2002) The UCR time series data mining archive. Computer Science & Engineering Department, University of California, Riverside, CA. http://www.cs.ucr.edu/~eamonn/TSDMA/index.html
Keogh E, Smyth P (1997) A probabilistic approach to fast pattern matching in sequences databases. In: Proceedings of the 3rd international conference on knowledge discovery and data mining, pp 24–30
Li C, Yu P, Castelli V (1996) Similarity search algorithm for databases of long sequences. In: Proceedings of the 12th international conference on data engineering, pp 546–553
Mosteller F, Tukey J (1977) Data analysis and regression: A second course in statistics. Addison-Wesley, Reading, MA
Google Scholar
Park S, Chu W, Yoon J, Hsu C (2000) Efficient similarity searches for time-warped subsequences in sequence databases. In: Proceedings of the 16th international conference on data engineering
Perng C, Wang H, Zhang S, Parker D (2000) Landmarks: A new model for similarity-based pattern querying in sequences databases. In: Proceedings of the 16th international conference on data engineering
Rafiei D, Mendelzon A (1997) Similarity-based queries for sequences data. In: Proceedings of the ACM SIGMOD conference on management of data, pp 13–25
Rafiei D, Mendelzon A (1998) Efficient retrieval of similar time sequences using DFT. In: Proceedings of the 5th international conference on foundations of data organizations and algorithms, pp 249–257
Struzik Z, Siebes A (1999) The Haar wavelet transform in the sequences similarity paradigm. In: Proceedings of the fourth european conference on principles and practice of knowledge discovery in databases
Swarztrauber P (1993) A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix. Math Comp 20:651–668
Article MathSciNet Google Scholar
Wooldridge J (1999) Introductory econometrics: A modern approach. South-Western College Publishing, Cincinnati
Yi B, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th international conference on very large databases, pp 385–394
Yi B, Jagadish H, Faloutsos C (1998) Efficient retrieval of similar time sequences under time warping. In: Proceedings of the 14th international conference on data engineering, pp 23–27

Download references

Author information

Authors and Affiliations

Department of Computer Science and Computer Information Systems, University of Texas at Brownsville, Brownsville, TX, 78520, USA
Hansheng Lei
Govindaraju Computer Science and Engineering Department, The State University of New York at Buffalo, Amherst, NY, USA
Venu Govindaraju

Authors

Hansheng Lei
View author publications
You can also search for this author in PubMed Google Scholar
Venu Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hansheng Lei.

Additional information

Hansheng Lei received his BE from Ocean University of China in 1998, MS from the University of Science and Technology of China in 2001, and Ph.D. from the University at Buffalo, the State University of New York in February 2006, all in computer science. He is currently an assistant professor in CS/CIS Department, University of Texas at Brownsville. His research interests include biometrics, pattern recognition, machine learning, and data mining.

Venu Govindaraju is a professor of Computer Science and Engineering at the University at Buffalo (UB), State University of New York. He received his B.-Tech. (Honors) from the Indian Institute of Technology (IIT), Kharagpur, India in 1986, and his Ph.D. degree in Computer Science from UB in 1992. His research is focused on pattern recognition applications in the areas of biometrics and digital libraries.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lei, H., Govindaraju, V. Generalized regression model for sequence matching and clustering. Knowl Inf Syst 12, 77–94 (2007). https://doi.org/10.1007/s10115-006-0008-8

Download citation

Received: 23 January 2005
Revised: 12 August 2005
Accepted: 13 December 2005
Published: 09 May 2006
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10115-006-0008-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized regression model for sequence matching and clustering

Abstract

Access this article

Similar content being viewed by others

Identifying Market Behaviours Using European Stock Index Time Series by a Hybrid Segmentation Algorithm

Correlation Analysis of Stock Index Data Features Using Sequential Rule Mining Algorithms

Overview on Sequential Mining Algorithms and Their Extensions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized regression model for sequence matching and clustering

Abstract

Access this article

Similar content being viewed by others

Identifying Market Behaviours Using European Stock Index Time Series by a Hybrid Segmentation Algorithm

Correlation Analysis of Stock Index Data Features Using Sequential Rule Mining Algorithms

Overview on Sequential Mining Algorithms and Their Extensions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation