High Performance Data Mining and Knowledge Discovery

Skillicorn, David; Talia, Domenico

doi:10.1007/3-540-48311-X_203

David Skillicorn &
Domenico Talia

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1685))

Included in the following conference series:

European Conference on Parallel Processing

85 Accesses

Abstract

Many, perhaps most, organizations use computers when they interact with their customers. As a result, and almost by accident, many organizations have accumulated huge amounts of data about such interactions. Over the past five to ten years, they have increasingly tried to use this data for commercial advantage. This process began by accumulating transaction data into data warehouses, where it could be made available for decision support and retrospective analysis. The effeectiveness of such analysis largely depends on the ability of individuals to induce queries that will reveal key facts about the organization and its customers.

Increasingly, both the volume of data and its complexity have taken the problem beyond the ability of any individual to analyze. Data mining is the automated analysis of large volumes of data, looking for the relationships and knowledge that are implicit in large volumes of data and are ‘interesting’ in the sense of impacting an organization’s practice. Research and development work in the area of knowledge discovery and data mining concerns the study and definition of techniques, methods, and tools for the extraction of novel, useful, and implicit patterns from data. It builds on machine learning, database technology, and statistics, but is distinguished by problems of scale: the data involved is so large that most applications tend to use conceptually straightforward, but carefully optimized, algorithms.

There is a natural confluence between parallel computation and data mining. For researchers in parallel computation, data mining is an application area that is growing in importance, and that introduces interesting new problems (irregularity, data representation and storage, multiple parallelization strategies, symbolic computation) that have not been so critical in scientific and numerical computing. For organizations who want to use data mining in their day to day work, parallel computation offers increased performance, which in turn may translate into commercial advantage. When data mining tools are implemented on high-performance parallel computers, they can analyze massive databases in a reasonable time. Faster processing also means that users can experiment with more models to understand complex data. High performance makes it practical for users to analyze greater quantities of data. Larger databases, in turn, yield improved predictions.

Data mining, even sequentially, is not yet mature, and many of the existing applications are relatively unsophisticated. Nevertheless, it seemed useful to explore the fledgling projects that are looking at the connections between parallel computing and data mining. This track has assembled a small number of papers describe such research experiences.

The first paper “Mining of Association Rules in Very Large Databases: A Structured Parallel Approach” by Becuzzi, Coppola, and Vanneschi, presents a case study implementing the Apriori parallel association rule algorithm using the skeleton-based language SkIE.

Download to read the full chapter text

Chapter PDF

Authors

David Skillicorn
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Talia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ENSEEIHT, 2, Rue Camichel, F-31071, Toulouse Cedex 7, France
Patrick Amestoy , Philippe Berger , Michel Daydé & Daniel Ruiz , , &
CERFACS, 42, Av. Gaspard Coriolis, F-31057, Toulouse Cedex 1, France
Iain Duff , Valérie Frayssé & Luc Giraud , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Skillicorn, D., Talia, D. (1999). High Performance Data Mining and Knowledge Discovery. In: Amestoy, P., et al. Euro-Par’99 Parallel Processing. Euro-Par 1999. Lecture Notes in Computer Science, vol 1685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48311-X_203

Download citation

DOI: https://doi.org/10.1007/3-540-48311-X_203
Published: 06 August 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66443-7
Online ISBN: 978-3-540-48311-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics