A Toolbox Approach to Flexible and Efficient Data Mining

Nielsen⋆, Ole M.; Christen, Peter; Hegland, Markus; Semenova, Tatiana; Hancock, Timothy

doi:10.1007/3-540-45357-1_16

A Toolbox Approach to Flexible and Efficient Data Mining

Ole M. Nielsen⋆⁴,
Peter Christen⁴,
Markus Hegland⁴,
Tatiana Semenova⁴ &
…
Timothy Hancock⁵

Conference paper
First Online: 01 January 2001

1336 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Abstract

This paper describes a flexible and efficient toolbox based on the scripting language Python, capable of handling common tasks in data mining. Using either a relational database or flat files the toolbox gives the user a uniform view of a data collection. Two core features of the toolbox are caching of database queries and parallelism within a collection of independent queries. Our toolbox provides a number of routines for basic data mining tasks on top of which the user can add more functions - mainly domain and data collection dependent - for complex and time consuming data mining tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. Beazly, Python Essential Reference, New Riders, October 1999.
Google Scholar
G. Bell and J.N. Gray, The revolution yet to happen, Beyond Calculation (P.J. Denning and R.M. Metcalfe, eds.), Springer Verlag, 1997.
Google Scholar
M.J.A. Berry and G. Linoff, Data Mining, Techniques for Marketing, Sales and Customer Support. John Wiley & Sons, 1997.
Google Scholar
E. Bezzaro, M. Mattoso and G. Xexeo, An Analysis of the Integration between Data Mining Applications and Database Systems, Proceedings of the Data Mining 2000 Conference, Cambridge, 2000.
Google Scholar
P. Chapman, R. Kerber, J. Clinton, T. Khabaza, T. Reinartz and R. Wirth, The CRISP-DM Process Model, Discussion Paper, March 1999. http://www.//crisp.org
M.-S. Chen, J. Han and P.S. Yu, Data Mining: An Overview from a Database Perspective, IEEE Transactions on Knowledge Discovery and Data Engineering, Vol. 8,No. 6, December 1996.
Google Scholar
D. Düllmann, Petabyte databases. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-99), ACM Press, July 1999.
Google Scholar
G. Graefe, U. Fayyad and S. Chaudhuri, On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998.
Google Scholar
J.M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman, T. Roth and P. Haas, Interactive Data Analysis: The Control Project, IEEE Computer, Vol. 32, August 1999.
Google Scholar
W. Gropp, E. Lusk and A. Skjellum,Using MPI-Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, Massachusetts, 1994.
Google Scholar
D. Pyle, Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., 1999.
Google Scholar
P.G. Selfridge, D. Srivastava and L.O. Wilson, IDEA: Interactive Data Exploration and Analysis, Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996.
Google Scholar
S. Thomas and S. Sarawagi, Mining Generalized Association Rules and Sequential Patterns Using SQL Queries, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998.
Google Scholar
G. Williams, I. Altas, S. Barkin, P. Christen, M. Hegland, A. Marquez, P. Milne, R. Nagappan and S. Roberts, The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project, In Large-Scale Parallel Data Mining, M.J. Zaki and C.-T. Ho (Eds.), Springer Lecture Note in Artificial Intelligence 1759, 1999.
Google Scholar
R.J. Yarger, G. Reese and T. King, MySQL & mSQL, O’Reilly, July 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Canberra, ACT 0200, Australia
Ole M. Nielsen⋆, Peter Christen, Markus Hegland & Tatiana Semenova
James Cook University, Townsville, QLD 4811, Australia
Timothy Hancock

Authors

Ole M. Nielsen⋆
View author publications
You can also search for this author in PubMed Google Scholar
Peter Christen
View author publications
You can also search for this author in PubMed Google Scholar
Markus Hegland
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Semenova
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Hancock
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Information Systems, The University of Hong Kong, Pokfulam, Hong Kong China
David Cheung
CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, ACT 2601, Australia
Graham J. Williams
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Kowloon, Hong Kong China
Qing Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nielsen⋆, O.M., Christen, P., Hegland, M., Semenova, T., Hancock, T. (2001). A Toolbox Approach to Flexible and Efficient Data Mining. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_16

Download citation

DOI: https://doi.org/10.1007/3-540-45357-1_16
Published: 11 April 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics