Efficient Algorithms for Segmentation of Item-Set Time Series

Chundi, Parvathi; Rosenkrantz, Daniel J.

doi:10.1007/978-1-4020-9688-4_10

Parvathi Chundi³ &
Daniel J. Rosenkrantz⁴

1110 Accesses

Abstract

We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time series segmentation techniques to analyze the temporal content of three different data sets—Enron email, stock market data, and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining the item set data at the time points, and can be used to analyze different types of temporal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

ClaSP: parameter-free time series segmentation

Article Open access 15 February 2023

Extended Set Covering for Time Series Segmentation

Optimal online time-series segmentation

Article 26 December 2023

References

R. Bellman. On the approximation of curves by line segments using dynamic programming. Commun. ACM, 4(6):284, 1961.
Article Google Scholar
P. Chundi and D. J. Rosenkrantz. Constructing time decompositions for analyzing time-stamped documents. In Proceedings of the 4th SIAM International Conference on Data Mining, pages 57–68, Orlando, FL, Apr. 2004.
Google Scholar
P. Chundi and D. J. Rosenkrantz. On lossy time decompositions of time-stamped documents. In Proc. 13th ACM Conference on Information and Knowledge Management (CIKM), pages 437–445, Washington, DC, Nov. 2004.
Google Scholar
P. Chundi and D. J. Rosenkrantz. Information preserving time decompositions of time stamped documents. Data Min. Knowl. Discov., 13(1):41–65, 2006.
Article MathSciNet Google Scholar
P. Chundi and D. J. Rosenkrantz. Segmentation of time series data. In J. Wang, editor, Encyclopedia of Data Warehousing and Mining. Information Science Reference, Hershey, 2nd edition, pages 1753–1758, 2008.
Google Scholar
P. Chundi, R. Zhang, and D. J. Rosenkrantz. Efficient algorithms for constructing time decompositions of time stamped documents. In K. V. Andersen, J. K. Debenham, and R. Wagner, editors, Proc. 16th International Conference on Database and Expert Systems Applications (DEXA). Lecture Notes in Computer Science, volume 3588, pages 514–523. Springer, Berlin, 2005.
Google Scholar
K. K. S. Chung, L. Hossain, and J. Davis. Exploring sociocentric and egocentric approaches for social network analysis. In Proc. 2nd International Conference on Knowledge Management in Asia Pacific, 2005.
Google Scholar
P. Cohen and N. Adams. An algorithm for segmenting categorical time series into meaningful episodes. In Proc. 4th International Symposium on Intelligent Data Analysis. Lecture Notes in Computer Science, volume 2189, pages 198–207. Springer, Berlin, 2001.
Google Scholar
G. Das, K. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule discovery from time series. In Proc. 4th International Conference on Knowledge Discovery and Data Mining (KDD), pages 16–22. AAAI Press, Menlo Park, 1998.
Google Scholar
J. Diesner and K. Carley. Exploration of communication networks from the Enron Email Corpus. In Proc. 2005 Workshop on Link Analysis, Counterterrorism, and Security (held in conjunction with SDM 2005), 2005.
Google Scholar
Enron, 2005, Enron Email Corpus. http://www.cs.cmu.edu/~enron/.
J. A. Flanagan, J. Mantyjarvi, and J. Himberg. Unsupervised clustering of symbol strings and context recognition. In Proc. 2nd IEEE International Conference on Data Mining, page 171, 2002.
Google Scholar
M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: A review. ACM SIGMOD Record, 34(2):18–26, 2005.
Article Google Scholar
X. Ge, W. Pratt, and P. Smyth. Discovering Chinese words from unsegmented text. In Proc. 22nd International Conference on Research and Development on Information Retrieval (SIGIR), pages 271–272, Berkeley, CA, 1999.
Google Scholar
A. Gionis and H. Mannila. Finding recurrent sources in sequences. In Proc. 7th International Conference on Research in Computational Molecular Biology (RECOMB), pages 123–130, 2003.
Google Scholar
A. Gionis and H. Mannila. Segmentation algorithms for time series and sequence data. In Tutorial at 5th SIAM International Conference on Data Mining, 2005.
Google Scholar
R. Gwadera, A. Gionis, and H. Mannila. Optimal segmentation using tree models. In Proc. 6th International Conference on Data Mining (ICDM), pages 244–253, 2006.
Google Scholar
J. Himberg, J. Toivonen, K. Korpiaho, and H. Mannila. Time series segmentation for context recognition in mobile devices. In Proc. 1st International Conference on Data Mining (ICDM), pages 203–210, 2001.
Google Scholar
A. Kehagias and V. Petridis. Time-series segmentation using predictive modular neural networks. Neural Computation, 9(8):1691–1709, 1997.
Article Google Scholar
E. J. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Min. Knowl. Discov., 7(4):349–371, 2003.
Article MathSciNet Google Scholar
E. J. Keogh and M. J. Pazzani. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proc. 4th International Conference on Knowledge Discovery and Data Mining (KDD), pages 239–243. AAAI Press, Menlo Park, 1998.
Google Scholar
E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In Proc. 1st IEEE International Conference on Data Mining (ICDM), pages 289–296, 2001.
Google Scholar
B. Klimt and Y. Yang. Introducing the Enron Corpus. In First Conference on Email and Anti-Spam (CEAS), 2004.
Google Scholar
J. Lin, E. J. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), pages 2–11, 2003.
Google Scholar
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259–289, 1997.
Article Google Scholar
N. Pathak, S. Mane, and J. Srivastava. Who thinks who knows who? Socio-cognitive analysis of Email networks. In Proc. 6th IEEE International Conference on Data Mining (ICDM), pages 466–477, 2006.
Google Scholar
E. Perlman and A. Java. Predictive mining of time series data in astronomy. Proc. Astronomical Data Analysis Software and Systems XII, ASP Conference Series, 295:431–434, 2003.
Google Scholar
J. Shetty and J. Adibi. Discovering important nodes through graph entropy – the case of Enron Email Database. In Workshop on Link Discovery: Issues, Approaches and Applications (held in conjunction with ACM SIGKDD 2005), pages 74–81, 2005.
Google Scholar
H. Siy, P. Chundi, D. J. Rosenkrantz, and M. Subramaniam. Discovering dynamic developer relationships from software version histories by time series segmentation. In Proc. 23rd IEEE International Conference on Software Maintenance (ICSM), pages 415–424, Paris, Oct. 2007.
Google Scholar
H. Siy, P. Chundi, D. J. Rosenkrantz, and M. Subramaniam. A segmentation-based approach for temporal analysis of software version repositories. J. Software Maintenance and Evolution: Research and Practice, 20(3):199–222, 2008.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Nebraska at Omaha, Omaha, NE, 68106, USA
Parvathi Chundi
Computer Science Department, SUNY at Albany, Albany, NY, 12222, USA
Daniel J. Rosenkrantz

Authors

Parvathi Chundi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Rosenkrantz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Parvathi Chundi .

Editor information

Editors and Affiliations

Department of Computer Science, University at Albany—SUNY, 1400 Washington Avenue, Albany, NY, 12222, USA
S. S. Ravi
Bradley Dept. Electrical and Computer Engineering, Virginia Tech, 302 Whittemore (0111), Blacksburg, VA, 24061, USA
Sandeep K. Shukla

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chundi, P., Rosenkrantz, D.J. (2009). Efficient Algorithms for Segmentation of Item-Set Time Series. In: Ravi, S.S., Shukla, S.K. (eds) Fundamental Problems in Computing. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-9688-4_10

Download citation

DOI: https://doi.org/10.1007/978-1-4020-9688-4_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-9687-7
Online ISBN: 978-1-4020-9688-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics