Abstract
This paper describes a flexible and efficient toolbox based on the scripting language Python, capable of handling common tasks in data mining. Using either a relational database or flat files the toolbox gives the user a uniform view of a data collection. Two core features of the toolbox are caching of database queries and parallelism within a collection of independent queries. Our toolbox provides a number of routines for basic data mining tasks on top of which the user can add more functions - mainly domain and data collection dependent - for complex and time consuming data mining tasks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
D. Beazly, Python Essential Reference, New Riders, October 1999.
G. Bell and J.N. Gray, The revolution yet to happen, Beyond Calculation (P.J. Denning and R.M. Metcalfe, eds.), Springer Verlag, 1997.
M.J.A. Berry and G. Linoff, Data Mining, Techniques for Marketing, Sales and Customer Support. John Wiley & Sons, 1997.
E. Bezzaro, M. Mattoso and G. Xexeo, An Analysis of the Integration between Data Mining Applications and Database Systems, Proceedings of the Data Mining 2000 Conference, Cambridge, 2000.
P. Chapman, R. Kerber, J. Clinton, T. Khabaza, T. Reinartz and R. Wirth, The CRISP-DM Process Model, Discussion Paper, March 1999. http://www.//crisp.org
M.-S. Chen, J. Han and P.S. Yu, Data Mining: An Overview from a Database Perspective, IEEE Transactions on Knowledge Discovery and Data Engineering, Vol. 8,No. 6, December 1996.
D. Düllmann, Petabyte databases. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-99), ACM Press, July 1999.
G. Graefe, U. Fayyad and S. Chaudhuri, On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998.
J.M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman, T. Roth and P. Haas, Interactive Data Analysis: The Control Project, IEEE Computer, Vol. 32, August 1999.
W. Gropp, E. Lusk and A. Skjellum,Using MPI-Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, Massachusetts, 1994.
D. Pyle, Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., 1999.
P.G. Selfridge, D. Srivastava and L.O. Wilson, IDEA: Interactive Data Exploration and Analysis, Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996.
S. Thomas and S. Sarawagi, Mining Generalized Association Rules and Sequential Patterns Using SQL Queries, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998.
G. Williams, I. Altas, S. Barkin, P. Christen, M. Hegland, A. Marquez, P. Milne, R. Nagappan and S. Roberts, The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project, In Large-Scale Parallel Data Mining, M.J. Zaki and C.-T. Ho (Eds.), Springer Lecture Note in Artificial Intelligence 1759, 1999.
R.J. Yarger, G. Reese and T. King, MySQL & mSQL, O’Reilly, July 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nielsen⋆, O.M., Christen, P., Hegland, M., Semenova, T., Hancock, T. (2001). A Toolbox Approach to Flexible and Efficient Data Mining. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_16
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive