Indexing Evolving Databases for Itemset Mining

Baralis, Elena; Cerquitelli, Tania; Chiusano, Silvia

doi:10.1007/978-3-540-77623-9_18

Elena Baralis⁵,
Tania Cerquitelli⁵ &
Silvia Chiusano⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 109))

508 Accesses

Summary

Research activity in data mining has been initially focused on defining efficient algorithms to perform the computationally intensive knowledge extraction task (i.e., itemset mining). The data to be analyzed was (possibly) extracted from the DBMS and stored into binary files. Proposed approaches for mining flat file data require a lot of memory and do not scale efficiently on large databases. An improved memory management could be achieved through the integration of the data mining algorithm into the kernel of the database management system. Furthermore, most data mining algorithms deal with “static” datasets (i.e., datasets which do not change over time). This chapter presents a novel index, called I-Forest, to support data mining activities on evolving databases, whose content is periodically updated through insertion (or deletion) of data blocks. I-Forest is a covering index that represents transactional blocks in a succinct form and allows different kinds of analysis. Time and support constraints (e.g., “analyze frequent quarterly data”) may be enforced during the extraction phase. The I-Forest index has been implemented into the PostgreSQL open source DBMS and it exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed approach which is always comparable with, and for low support threshold faster than, the Prefix-Tree algorithm accessing static data on flat file.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Frequent itemset mining using FP-tree: a CLA-based approach and its extended application in biodiversity data

Article 17 November 2022

SSDMiner: A Scalable and Fast Disk-Based Frequent Pattern Miner

A Partitioning Scheme for Big Dynamic Trees

References

R. Agrawal and R. Srikant. Fast algorithm for mining association rules. In VLDB, 1994.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Trans. Knowl. Data Eng., 5(6), 1993.
Google Scholar
Y. Aumann, R. Feldman, and O. Lipshtat. Borders: An efficient algorithm for association generation in dynamic databases. In JIIS, vol. 12, 1999.
Google Scholar
E. Baralis, T. Cerquitelli, and S. Chiusano. Index Support for Frequent Itemset Mining a Relational DBMS. In ICDE, 2005.
Google Scholar
M. Botta, J.-F. Boulicaut, C. Masson, and R. Meo. A comparison between query languages for the extraction of association rules. In DaWak, 2002.
Google Scholar
S. Chaudhuri, V. Narasayya, and S. Sarawagi. Efficient evaluation of queries with mining predicates. In IEEE ICDE, 2002.
Google Scholar
W. Cheung and O. R. Zaiane. Incremental mining of frequent patterns without candidate generation or support constraint. In IDEAS, pp. 111–116, July 2003.
Google Scholar
D. W.-L. Cheung, J. Han, V. Ng, and C. Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In ICDE, pp. 106–114. IEEE Computer Society, 1996.
Google Scholar
L. Dumitriu. Interactive mining and knowledge reuse for the closed-itemset incremental-mining problem. SIGKDD Explorations, 3(2):28–36, 2002.
Article Google Scholar
M. El-Hajj and O. R. Zaiane. Inverted matrix: Efficient discovery of frequent items in large datasets in the context of interactive mining. In ACM SIGKDD, 2003.
Google Scholar
FIMI. http://fimi.cs.helsinki.fi.
V. Ganti, J. Gehrke, and R. Ramakrishnan. DEMON: Mining and monitoring evolving data. IEEE Trans. Knowl. Data Eng., 13(1):50–63, 2001.
Article Google Scholar
V. Ganti, J. E. Gehrke, and R. Ramakrishnan. Mining data streams under block evolution. SIGKDD Explorations, 3(2), 2002.
Google Scholar
B. Goethals and M. J. Zaki. Fimi’03: Workshop on frequent itemset mining implementations, November 2003.
Google Scholar
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In FIMI, November 2003.
Google Scholar
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD, 2000.
Google Scholar
R. Meo, G. Psaila, and S. Ceri. A tightly-coupled architecture for data mining. In IEEE ICDE, 1998.
Google Scholar
A. Pietracaprina and D. Zandolin. Mining frequent itemsets using patricia tries. In FIMI, 2003.
Google Scholar
Postgres. http://www.postgresql.org.
G. Ramesh, W. Maniatty, and M. Zaki. Indexing and data access methods for database mining. In DMKD, 2002.
Google Scholar
A. Veloso, W. M. Jr., M. De Carvalho, B. Possas, S. Parthasarathy, and M. J. Zaki. Mining frequent itemsets in evolving databases. In SDM, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Torino, Italy
Elena Baralis, Tania Cerquitelli & Silvia Chiusano

Authors

Elena Baralis
View author publications
You can also search for this author in PubMed Google Scholar
Tania Cerquitelli
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Chiusano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Harrow School of Computer Science, The University of Westminster, Watford Road Northwick Park, London, HA1 3TP, UK
Panagiotis Chountas
School of Informatics, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
Ilias Petrounias
Systems Research Institute, Polish Academy of Sciences, Ul. Newelska 6, 01-447, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baralis, E., Cerquitelli, T., Chiusano, S. (2008). Indexing Evolving Databases for Itemset Mining. In: Chountas, P., Petrounias, I., Kacprzyk, J. (eds) Intelligent Techniques and Tools for Novel System Architectures. Studies in Computational Intelligence, vol 109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77623-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-77623-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77621-5
Online ISBN: 978-3-540-77623-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics