Skip to main content

Analysis of the Statistical Characteristics in Mining of Frequent Sequences

  • Conference paper
  • 848 Accesses

Part of the book series: Advances in Soft Computing ((AINSC,volume 31))

Abstract

The paper deals with the search and analysis of the subsequences in large volume sequences (texts, DNA sequences, etc.). A new algorithm ProMFS for mining frequent sequences is proposed and investigated. It is based on the estimated probabilistic-statistical characteristics of the appearance of elements of the sequence and their order. The algorithm builds a new much shorter sequence and makes decisions on the main sequence in accordance with the results of analysis of the shorter one.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R.C., Agrawal, C.C., Prasad, V.V. (2000) Depth first generation of long patterns. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, Massachusetts 108–118

    Google Scholar 

  2. http://en.wikipedia.org/wiki/DNA_sequence

    Google Scholar 

  3. Zaki, M.J. (2001) SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal. (Fisher, D. (ed.): Special issue on Unsupervised Learning). 42(1/2) 31–60

    Google Scholar 

  4. Zaki, M.J. (2000) Parallel sequence mining on shared-memory machines. In: Zaki, M.J., Ching-Tien Ho (eds): Large-scale Parallel Data Mining. Lecture Notes in Artificial Intelligence, Vol. 1759. Springer-Verlag, Berlin Heidelberg, New York 161–189

    Google Scholar 

  5. Pei, P.J., Han, J., Wang, W. (2002) Mining Sequential Patterns with Constraints in Large Databases. In Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM’02). McLean, VA 18–25

    Google Scholar 

  6. Pinto, P., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U. (2001) Multi-Dimensional Sequential Pattern Mining. In Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01). Atlanta, Georgia, 81–88

    Google Scholar 

  7. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C. (2001) PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. 17th International Conference on Data Engineering ICDE2001. Heidelberg, 215–224

    Google Scholar 

  8. Han, J., Pei, J. (2000) FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. Knowledge Discovery and Data Mining. 355–359

    Google Scholar 

  9. Ayres, J., Flannick, J., Gehrke, J., Yiu, T. (2002) Sequential pattern mining using a bitmap representation. In Proc. Knowledge Discovery and Data Mining. 429–435

    Google Scholar 

  10. Kum, H.C., Pei, J., Wang, W. (2003) ApproxMAP: Approximate Mining of Consensus Sequential Patterns. In Proceedings of the 2003 SIAM International Conference on Data Mining (SIAM DM’ 03). San Francisco, CA, 311–315

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tumasonis, R., Dzemyda, G. (2005). Analysis of the Statistical Characteristics in Mining of Frequent Sequences. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32392-9_39

Download citation

  • DOI: https://doi.org/10.1007/3-540-32392-9_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25056-2

  • Online ISBN: 978-3-540-32392-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics