Summary
Research activity in data mining has been initially focused on defining efficient algorithms to perform the computationally intensive knowledge extraction task (i.e., itemset mining). The data to be analyzed was (possibly) extracted from the DBMS and stored into binary files. Proposed approaches for mining flat file data require a lot of memory and do not scale efficiently on large databases. An improved memory management could be achieved through the integration of the data mining algorithm into the kernel of the database management system. Furthermore, most data mining algorithms deal with “static” datasets (i.e., datasets which do not change over time). This chapter presents a novel index, called I-Forest, to support data mining activities on evolving databases, whose content is periodically updated through insertion (or deletion) of data blocks. I-Forest is a covering index that represents transactional blocks in a succinct form and allows different kinds of analysis. Time and support constraints (e.g., “analyze frequent quarterly data”) may be enforced during the extraction phase. The I-Forest index has been implemented into the PostgreSQL open source DBMS and it exploits its physical level access methods. Experiments, run for both sparse and dense data distributions, show the efficiency of the proposed approach which is always comparable with, and for low support threshold faster than, the Prefix-Tree algorithm accessing static data on flat file.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal and R. Srikant. Fast algorithm for mining association rules. In VLDB, 1994.
R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Trans. Knowl. Data Eng., 5(6), 1993.
Y. Aumann, R. Feldman, and O. Lipshtat. Borders: An efficient algorithm for association generation in dynamic databases. In JIIS, vol. 12, 1999.
E. Baralis, T. Cerquitelli, and S. Chiusano. Index Support for Frequent Itemset Mining a Relational DBMS. In ICDE, 2005.
M. Botta, J.-F. Boulicaut, C. Masson, and R. Meo. A comparison between query languages for the extraction of association rules. In DaWak, 2002.
S. Chaudhuri, V. Narasayya, and S. Sarawagi. Efficient evaluation of queries with mining predicates. In IEEE ICDE, 2002.
W. Cheung and O. R. Zaiane. Incremental mining of frequent patterns without candidate generation or support constraint. In IDEAS, pp. 111–116, July 2003.
D. W.-L. Cheung, J. Han, V. Ng, and C. Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In ICDE, pp. 106–114. IEEE Computer Society, 1996.
L. Dumitriu. Interactive mining and knowledge reuse for the closed-itemset incremental-mining problem. SIGKDD Explorations, 3(2):28–36, 2002.
M. El-Hajj and O. R. Zaiane. Inverted matrix: Efficient discovery of frequent items in large datasets in the context of interactive mining. In ACM SIGKDD, 2003.
FIMI. http://fimi.cs.helsinki.fi.
V. Ganti, J. Gehrke, and R. Ramakrishnan. DEMON: Mining and monitoring evolving data. IEEE Trans. Knowl. Data Eng., 13(1):50–63, 2001.
V. Ganti, J. E. Gehrke, and R. Ramakrishnan. Mining data streams under block evolution. SIGKDD Explorations, 3(2), 2002.
B. Goethals and M. J. Zaki. Fimi’03: Workshop on frequent itemset mining implementations, November 2003.
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In FIMI, November 2003.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD, 2000.
R. Meo, G. Psaila, and S. Ceri. A tightly-coupled architecture for data mining. In IEEE ICDE, 1998.
A. Pietracaprina and D. Zandolin. Mining frequent itemsets using patricia tries. In FIMI, 2003.
Postgres. http://www.postgresql.org.
G. Ramesh, W. Maniatty, and M. Zaki. Indexing and data access methods for database mining. In DMKD, 2002.
A. Veloso, W. M. Jr., M. De Carvalho, B. Possas, S. Parthasarathy, and M. J. Zaki. Mining frequent itemsets in evolving databases. In SDM, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Baralis, E., Cerquitelli, T., Chiusano, S. (2008). Indexing Evolving Databases for Itemset Mining. In: Chountas, P., Petrounias, I., Kacprzyk, J. (eds) Intelligent Techniques and Tools for Novel System Architectures. Studies in Computational Intelligence, vol 109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77623-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-77623-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77621-5
Online ISBN: 978-3-540-77623-9
eBook Packages: EngineeringEngineering (R0)