Skip to main content

Intelligent Sequential Mining Via Alignment: Optimization Techniques for Very Large DB

  • Conference paper
Book cover Advances in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

Abstract

The shear volume of the results in traditional support based frequent sequential pattern mining methods has led to increasing interest in new intelligent mining methods to find more meaningful and compact results. One such approach is the consensus sequential pattern mining method based on sequence alignment, which has been successfully applied to various areas. However, the current approach to consensus sequential pattern mining has quadratic run time with respect to the database size limiting its application to very large databases. In this paper, we introduce two optimization techniques to reduce the running time significantly. First, we determine the theoretical bound for precision of the proximity matrix and reduce the time spent on calculating the full matrix. Second, we use a sample based iterative clustering method which allows us to use a much faster k-means clustering method with only a minor increase in memory consumption with negligible loss in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)

    Google Scholar 

  2. Goswami, A., Jin, R., Agrawal, G.: Fast and Exact Out-of-Core K-Means Clustering. In: Proc. of the Int’l Conference on Data Mining (ICDM), pp. 83–90 (2004)

    Google Scholar 

  3. Kum, H.C., Chang, J.H., Wang, W.: Sequential pattern mining in multi-databases via multiple alignment. DMKD 12(2-3), 151–180 (2006)

    Article  MathSciNet  Google Scholar 

  4. Kum, H.C., Chang, J.H., Wang, W.: Benchmarking the effectiveness of sequential pattern mining methods. Data and Knowledge Engineering 60, 30–50 (2007)

    Article  Google Scholar 

  5. Kum, H.C., et al.: ApproxMAP: Approximate mining of consensus sequential patterns. In: Proc. of SDM, pp. 311–315 (2003)

    Google Scholar 

  6. Marascu, A., Masseglia, F.: Mining data streams for frequent sequences extraction. In: Proc. of the IEEE Workshop on Mining Complex Data (MCD), IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  7. Tzvetkov, P., Yan, X., Han, J.: TSP: Mining top-k closed sequential patterns. In: Proc. of the Int’l Conference on Data Mining (ICDM), pp. 418–425 (2003)

    Google Scholar 

  8. Yan, X., Han, J., Afshar, R.: CloSpan: Mining closed sequential patterns in large datasets. In: Proc. of the SIAM Int’l Conf. on Data Mining, pp. 166–177. SIAM, Philadelphia (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Kum, HC., Chang, J.H., Wang, W. (2007). Intelligent Sequential Mining Via Alignment: Optimization Techniques for Very Large DB. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71701-0_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71700-3

  • Online ISBN: 978-3-540-71701-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics