skip to main content
10.1145/2339530.2339758acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

On nested palindromes in clickstream data

Published: 12 August 2012 Publication History

Abstract

In this paper we discuss an interesting and useful property of clickstream data. Often a visit includes repeated views of the same page. We show that in three real datasets, sampled from the websites of technology and consulting groups and a news broadcaster, page repetitions occur for the majority as a very specific structure, namely in the form of nested palindromes. This can be explained by the widespread use of features which are available in any web browser: the "refresh" and "back" buttons. Among the types of patterns which can be mined from sequence data, many either stumble if symbol repetitions are involved, or else fail to capture interesting aspects related to symbol repetitions. In an attempt to remedy this, we characterize the palindromic structures, and discuss possible ways of making use of them. One way is to pre-process the sequence data by explicitly inserting these structures, in order to obtain a richer output from conventional mining algorithms. Another application we discuss is to use the information directly, in order to analyze certain aspects of the website under study. We also provide the simple linear-time algorithm which we developed to identify and extract the structures from our data.

Supplementary Material

JPG File (307_w_talk_7.jpg)
MP4 File (307_w_talk_7.mp4)

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, pages 487--499, 1994.
[2]
R. Agrawal and R. Srikant. Mining sequential patterns. Data Engineering, International Conference on, 0:3, 1995.
[3]
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD'02, pages 74--85, London, UK, UK, 2002. Springer-Verlag.
[4]
G. Casas-Garriga. Summarizing sequential data with closed partial orders. In Proceedings of the Fifth SIAM International Conference on Data Mining, pages 380--390, 2005.
[5]
S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, 2002.
[6]
N. Chomsky. Three models for the description of language. Information Theory, IRE Transactions on, 2(3):113--124, september 1956.
[7]
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Reviews, June 2007.
[8]
D. J. Cook and L. B. Holder. Mining Graph Data. John Wiley & Sons Inc., 2007.
[9]
A. Davison. Statistical Models. Cambridge University Press, The Pitt Building, Trumpington Street, Cambridge, UK, 2003.
[10]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[11]
M. Giel-Pietraszuk, M. Hoffmann, S. Dolecka, J. Rychlewski, and J. Barciszewski. Palindromes in proteins. Journal of Protein Chemistry, 22:109--113, 2003. 10.1023/A:1023454111924.
[12]
D. Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York, NY, USA, 1997.
[13]
J. Han, H. Cheng, D. Xin, and X. Yan. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery, 14(1), 2007.
[14]
R. Kolpakov and G. Kucherov. Searching for gapped palindromes. In P. Ferragina and G. Landau, editors, Combinatorial Pattern Matching, volume 5029 of Lecture Notes in Computer Science, pages 18--30. Springer Berlin / Heidelberg, 2008.
[15]
B. Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[16]
M. Mampaey, N. Tatti, and J. Vreeken. Tell me what i need to know: succinctly summarizing data with itemsets. In KDD, pages 573--581, 2011.
[17]
H. Mannila and C. Meek. Global partial orders from sequential data. In KDD'00: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 161--168. ACM, 2000.
[18]
H. Mannila, H. Toivonen, and I. Verkamo. Efficient algorithms for discovering association rules. pages 181--192. AAAI Press, 1994.
[19]
S. Morgenthaler. A survey of robust statistics. Statistical Methods & Applications, 16:171--172, 2007.
[20]
J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):2004, 2004.
[21]
J. Pei, H. Wang, J. Liu, K. Wang, J. Wang, and P. S. Yu. Discovering frequent closed partial orders from strings. IEEE Transactions on Knowledge and Data Engineering, 18:1467--1481, November 2006.
[22]
M. Speiser, G. Antonini, and A. Labbi. Ranking web-based partial orders by significance using a markov reference model. In IEEE 11th International Conference on Data Mining (ICDM), 2011.
[23]
X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets. In In SDM, pages 166--177, 2003.
[24]
M. Zaki, N. Parimi, N. De, F. Gao, B. Phoophakdee, J. Urban, V. Chaoji, M. Hasan, and S. Salem. Towards generic pattern mining. In B. Ganter and R. Godin, editors, Formal Concept Analysis, volume 3403 of Lecture Notes in Computer Science, pages 1--20. Springer Berlin / Heidelberg, 2005.

Cited By

View all
  • (2017)Moodle-based data mining potentials of MOOC systems at the University of Szeged2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)10.23919/MIPRO.2017.7973523(755-760)Online publication date: May-2017
  • (2016)Mining MOOC Clickstreams: Video-Watching Behavior vs. In-Video Quiz PerformanceIEEE Transactions on Signal Processing10.1109/TSP.2016.254622864:14(3677-3692)Online publication date: 15-Jul-2016
  • (2012)Detection of HTTP-GET attack with clustering and information theoretic measurementsProceedings of the 5th international conference on Foundations and Practice of Security10.1007/978-3-642-37119-6_4(45-61)Online publication date: 25-Oct-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. backtrack
  2. palindrome
  3. refresh
  4. web usage mining

Qualifiers

  • Research-article

Conference

KDD '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Moodle-based data mining potentials of MOOC systems at the University of Szeged2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)10.23919/MIPRO.2017.7973523(755-760)Online publication date: May-2017
  • (2016)Mining MOOC Clickstreams: Video-Watching Behavior vs. In-Video Quiz PerformanceIEEE Transactions on Signal Processing10.1109/TSP.2016.254622864:14(3677-3692)Online publication date: 15-Jul-2016
  • (2012)Detection of HTTP-GET attack with clustering and information theoretic measurementsProceedings of the 5th international conference on Foundations and Practice of Security10.1007/978-3-642-37119-6_4(45-61)Online publication date: 25-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media