Abstract
Identifying temporal information of topics from a document set typically involves constructing a time decomposition of the time period associated with the document set. In an earlier work, we formulated several metrics on a time decomposition, such as size, information loss, and variability, and gave dynamic programming based algorithms to construct time decompositions that are optimal with respect to these metrics. Computing information loss values for all subintervals of the time period is central to the computation of optimal time decompositions. This paper proposes several algorithms to assist in more efficiently constructing an optimal time decomposition. More efficient, parallelizable algorithms for computing loss values are described. An efficient top-down greedy heuristic to construct an optimal time decomposition is also presented. Experiments to study the performance of this greedy heuristic were conducted. Although lossy time decompositions constructed by the greedy heuristic are suboptimal, they seem to be better than the widely used uniform length decompositions.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search In Sequence Databases. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing Multi-Dimensional Time-Series with Support for Multiple Distance Measures. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 216–225 (2003)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An Online Algorithm for Segmenting Time Series. In: Proc. of the IEEE International Conference on Data Mining, pp. 289–296 (2001)
Das, G., Gunopulos, D., Mannila, H.: Finding Similar Time Series. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 88–100. Springer, Heidelberg (1997)
Lent, B., Agrawal, R., Srikant, R.: Discovering Trends in Text Databases. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD), pp. 227–230 (1997)
Roy, S., Gevry, D., Pottenger, W.M.: Methodologies for Trend Detection in Textual Data Mining. In: Proc. of the Textmine 2002 Workshop, SIAM Intl. Conf. on Data Mining (2002)
Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, Bounds, and Timelines: UMass and TDT-3. In: Proc. of the 3rd Topic Detection and Tracking Workshop (2000)
Chundi, P., Rosenkrantz, D.J.: Constructing Time Decompositions for Analyzing Time Stamped Documents. In: Proc. of the 4th SIAM International Conference on Data Mining, pp. 57–68 (2004)
Chundi, P., Rosenkrantz, D.J.: On Lossy Time Decompositions of Time Stamped Documents. In: Proc. of the ACM 13th Conference on Information and Knowledge Management (2004)
Chundi, P., Rosenkrantz, D.J.: Information Preserving Decompositions of Time Stamped Documents. Submitted to the Journal of Data Mining and Knowledge Discovery
Swan, R., Allan, J.: Automatic Generation of Overview Timelines. In: Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56 (2000)
Swan, R., Allan, J.: Extracting Significant Time Varying Features from Text. In: Finin, T.W., Yesha, Y., Nicholas, C. (eds.) CIKM 1992. LNCS, vol. 752, pp. 38–45. Springer, Heidelberg (1993)
Swan, R., Jensen, D.: TimeMines: Constructing Timelines with Statistical Models of Word Usage. In: Proc. KDD 2000 Workshop on Text Mining (2000)
Himberg, J., Korpiaho, K., Mannila, H., Tikanmäki, J., Toivonen, H.T.T.: Time series segmentation for context recognition in mobile devices. In: Proc. of the IEEE International Conference on Data Mining, pp. 203–210 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chundi, P., Zhang, R., Rosenkrantz, D.J. (2005). Efficient Algorithms for Constructing Time Decompositions of Time Stamped Documents. In: Andersen, K.V., Debenham, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2005. Lecture Notes in Computer Science, vol 3588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546924_50
Download citation
DOI: https://doi.org/10.1007/11546924_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28566-3
Online ISBN: 978-3-540-31729-6
eBook Packages: Computer ScienceComputer Science (R0)