Abstract
According to the accumulation of the electrically stored documents, acquisition of valuable knowledge with remarkable trends of technical terms has drawn the attentions as the topic in text mining. In order to support for discovering key topics appeared as key terms in such temporal textual datasets, we propose a method based on temporal patterns in several data-driven indices for text mining. The method consists of an automatic term extraction method in given documents, three importance indices, temporal pattern extraction by using temporal clustering, and trend detection based on linear trends of their centroids. Empirical studies show that the three importance indices are applied to the titles of two academic conferences about artificial intelligence field as the sets of documents. After extracting the temporal patterns of automatically extracted terms, we discuss the trends of the terms including the recent burst words among the titles of the conferences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hurst, M.: Temporal text mining. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, SS06–03–015 (2006)
Lent, B., Agrawal, R., Srikant, R.: Discovering trends in text databases, pp. 227–230. AAAI Press, Menlo Park (1997)
Kontostathis, A., Galitsky, L., Pottenger, W.M., Roy, S., Phelps, D.J.: A survey of emerging trend detection in textual data mining. A Comprehensive Survey of Text Mining, 185–222 (2003)
Abe, H., Tsumoto, S.: Detecting temporal trends of technical phrases by using importance indices and linear regression. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 251–259. Springer, Heidelberg (2009)
Lewis, D.D., Croft, W.B.: Term clustering of syntactic phrases. In: Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 1990, pp. 385–404. ACM, New York (1990)
Srinivasan, P., Ruiz, M.E., Kraft, D.H., Chen, J.: Vocabulary mining for information retrieval: rough sets and fuzzy sets. Inf. Process. Manage. 37, 15–38 (2001)
Swan, R., Allan, J.: Automatic generation of overview timelines. In: SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56. ACM, New York (2000)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Document Retrieval Systems, 132–142 (1988)
Kleinberg, J.M.: Bursty and hierarchical structure in streams. Data Min. Knowl. Discov. 7(4), 373–397 (2003)
Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: KDD 2005: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 198–207. ACM, New York (2005)
Wallach, H.M.: Topic modeling: beyond bag-of-words. In: ICML 2006: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM, New York (2006)
Abe, H., Tsumoto, S.: Detecting temporal patterns of importance indices about technical phrases. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS, vol. 5712, pp. 252–258. Springer, Heidelberg (2009)
Abe, H., Tsumoto, S.: Comparing temporal behavior of phrases on multiple indexes with a burst word detection method. In: Sakai, H., Chakraborty, M.K., Hassanien, A.E., Slezak, D., Zhu, W. (eds.) RSFDGrC 2009. LNCS, vol. 5908, pp. 502–509. Springer, Heidelberg (2009)
Anderberg, M.R.: Cluster Analysis for Applications. Monographs and Textbooks on Probability and Mathematical Statistics. Academic Press, Inc., New York (1973)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: A survey and novel approach. Data mining in Time Series Databases, pp. 1–22. World Scientific, Singapore (2003) (In an Edited Volume)
Liao, T.W.: Clustering of time series data: a survey. Pattern Recognition 38, 1857–1874 (2005)
The dblp computer science bibliography, http://www.informatik.uni-trier.de/~ley/db/
Nakagawa, H.: Automatic term recognition based on statistics of compound nouns. Terminology 6(2), 195–210 (2000)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abe, H., Tsumoto, S. (2011). Evaluating a Temporal Pattern Detection Method for Finding Research Keys in Bibliographical Data. In: Peters, J.F., et al. Transactions on Rough Sets XIV. Lecture Notes in Computer Science, vol 6600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21563-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-21563-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21562-9
Online ISBN: 978-3-642-21563-6
eBook Packages: Computer ScienceComputer Science (R0)