Fast Construction of Generalized Suffix Trees over a Very Large Alphabet

Chen, Zhixiang; Fowler, Richard; Fu, Ada Wai-Chee; Wang, Chunyue

doi:10.1007/3-540-45071-8_30

Zhixiang Chen⁶,
Richard Fowler⁶,
Ada Wai-Chee Fu⁷ &
…
Chunyue Wang⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2697))

Included in the following conference series:

International Computing and Combinatorics Conference

Abstract

The work in this paper is motivated by the real-world problems such as mining frequent traversal path patterns from very large Web logs. Generalized suffix trees over a very large alphabet can be used to solve such problems. However, traditional algorithms such as the Weiner, Ukkonen and McCreight algorithms are not sufficient assurance of practicality because of large magnitudes of the alphabet and the set of strings in those real-world problems. Two new algorithms are designed for fast construction of generalized suffix trees over a very large alphabet, and their performance is analyzed in comparison with the well-known Ukkonen algorithm. It is shown that these two algorithms have better performance, and can deal with large alphabets and large string sets well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Borges and M. Levene. Data mining of user navigation patterns. MS99, 1999.
Google Scholar
A.G. Buchner and M.D. Mulvenna. Discovering internet marketing intelligence through online analytical web usage mining. ACM SIGMOD RECORD, pages 54–61, Dec. 1998.
Google Scholar
L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27, 1995.
Google Scholar
Z. Chen, R. Fowler, and A. Fu, Linear time algorithms for finding maximal forward references, Proc. of the IEEE Intl. Conf. on Info. Tech.: Coding & computing (ITCC 2003), 2003.
Google Scholar
Z. Chen, R. Fowler, A. Fu, and C. Wang, Linear and sublinear time algorithms for mining frequent traversal path patterns from very large Web logs, Proceeding of the Seventh International Database Engineering and Applications Symposium, 2003.
Google Scholar
Z. Chen, A. Fu, and F. Tong, Optimal algorithms for finding user access sessions from very large Web logs, Advances in Knowledge Discovery and Data Mining/PAKDD’02, Lecture Notes in Computer Science 2336, pages 290–296, 2002. (Full version will appear in Journal of World Wide Web: Internet and Information Systems, 2003.)
Google Scholar
M.S. Chen, J.S. Park, and P.S. Yu. Efficient data mining for path traversal patterns. IEEE Transactions on Knowledge and Data Engineering, 10:2:209–221, 1998.
Article Google Scholar
D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, 1997.
Google Scholar
E. Hunt, M.P. Atkinson and R.W. Irving, A database index to large biological sequences, Proceedings of the 27th International Conference on Very Large Data Bases, pages 139–148, 2001.
Google Scholar
R. Kosala and H. Blockeel, Web mining research: A survey, SIGKDD Explorations, 2(1), pages 1–15, 2000.
Article Google Scholar
F. Masseglia, P. Poncelet, and R. Cicchetti, An efficient algorithm for Web usage mining, Networking and Information Systems Journal, 2(5–6), pages 571–603, 1999.
Google Scholar
J. Pitkow and P. Pirolli, Mining longest repeating subsequences to predict World Wide Web Surfing, Proc. of the Second USENIX Symposium on Internet Technologies & Systems, pages 11–14, 1999.
Google Scholar
E.M. McCreight, A space-economical suffix tree construction algorithm, Journal of Algorithms, 23(2), pages 262–272, 1976.
MATH MathSciNet Google Scholar
C. Shababi, A.M. Zarkesh, J. Abidi, and V. Shah. Knowledge discovery from user’s web page navigation. Proceedings of the Seventh IEEE Intl. Workshop on Research Issues in Data Engineering (RIDE), pages 20–29, 1997.
Google Scholar
Z. Su, Q. Yang, Y. Lu, and H. Zhang, WhatNext: A prediction system for Web requests using N-gram sequence models, Proc. of the First International Conference on Web Information Systems Engineering, pages 200–207, 2000.
Google Scholar
E. Ukkonen, On-line construction of suffix trees, Algorithmica, 14(3), pages 249–260, 1995.
Article MATH MathSciNet Google Scholar
P. Weiner, Linear pattern matching algorithms, Proc. of the 14th IEEE Annual Symp. on Switching and Automata Theory, pages 1–11, 1973.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas-Pan American, Edinburg, TX, 78539, USA
Zhixiang Chen, Richard Fowler & Chunyue Wang
Department of Computer Science, Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Ada Wai-Chee Fu

Authors

Zhixiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Richard Fowler
View author publications
You can also search for this author in PubMed Google Scholar
Ada Wai-Chee Fu
View author publications
You can also search for this author in PubMed Google Scholar
Chunyue Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Texas at Austin, One University Station, C0500, Austin, TX, 78712, USA
Tandy Warnow
Department of Computer Science, Montana State University, EPS 357, Bozeman, MT, 59717, USA
Binhai Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Z., Fowler, R., Fu, A.WC., Wang, C. (2003). Fast Construction of Generalized Suffix Trees over a Very Large Alphabet. In: Warnow, T., Zhu, B. (eds) Computing and Combinatorics. COCOON 2003. Lecture Notes in Computer Science, vol 2697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45071-8_30

Download citation

DOI: https://doi.org/10.1007/3-540-45071-8_30
Published: 24 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40534-4
Online ISBN: 978-3-540-45071-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics