Skip to main content

Extrapolation Prefix Tree for Data Stream Mining Using a Landmark Model

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7448))

Abstract

Since the introduction of FP-growth there has been extensive research into extending its usage to data streams or incremental mining. This task is particularly challenging in the data stream environment because of the unbounded nature of a data stream and the need for avoiding multiple scans of the data. In this paper, we propose an algorithm, Extrapolation Prefix Tree that extracts frequent itemsets using a landmark windowing scheme. The algorithm uses a prefix tree structure to store arriving transactions, but unlike previous approaches estimates the structure of the tree in the next block of data based on the arrival pattern of items appearing in transactions that arrive in the current block. Our experimentation shows that Extrapolation-Tree significantly outperforms the CP-Tree, both in terms of the number of updates and the execution time required to keep the tree current while maintaining a compact tree.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cheung, W., Zaiane, O.: Incremental mining of frequent patterns without candidate generation or support constraint. In: Proceedings of Seventh International Database Engineering and Applications Symposium, pp. 111–116 (2003)

    Google Scholar 

  2. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  3. Koh, J.-L., Shieh, S.-F.: An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures. In: Lee, Y., Li, J., Whang, K.-Y., Lee, D. (eds.) DASFAA 2004. LNCS, vol. 2973, pp. 417–424. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Leung, C.K.S., Khan, Q.I., Li, Z., Hoque, T.: Cantree: a canonical-order tree for incremental frequent-pattern mining. Knowl. Inf. Syst. 11, 287–311 (2007)

    Article  Google Scholar 

  5. Li, H.F., Lee, S.Y., Shan, M.K.: Online mining (recently) maximal frequent itemsets over data streams. In: Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, RIDE 2005, pp. 11–18. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  6. Li, H.F., Shan, M.K., Lee, S.Y.: DSM-FI: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems 17, 79–97 (2008)

    Article  Google Scholar 

  7. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 346–357. VLDB Endowment (2002)

    Google Scholar 

  8. Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K.: CP-Tree: A Tree Structure for Single-Pass Frequent Pattern Mining. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 1022–1027. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Yu, J.X., Chong, Z., Lu, H., Zhang, Z., Zhou, A.: A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences 176(14), 1986–2015 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koh, Y.S., Pears, R., Dobbie, G. (2012). Extrapolation Prefix Tree for Data Stream Mining Using a Landmark Model. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32584-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32583-0

  • Online ISBN: 978-3-642-32584-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics