Skip to main content

A Toolbox Approach to Flexible and Efficient Data Mining

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Abstract

This paper describes a flexible and efficient toolbox based on the scripting language Python, capable of handling common tasks in data mining. Using either a relational database or flat files the toolbox gives the user a uniform view of a data collection. Two core features of the toolbox are caching of database queries and parallelism within a collection of independent queries. Our toolbox provides a number of routines for basic data mining tasks on top of which the user can add more functions - mainly domain and data collection dependent - for complex and time consuming data mining tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. Beazly, Python Essential Reference, New Riders, October 1999.

    Google Scholar 

  2. G. Bell and J.N. Gray, The revolution yet to happen, Beyond Calculation (P.J. Denning and R.M. Metcalfe, eds.), Springer Verlag, 1997.

    Google Scholar 

  3. M.J.A. Berry and G. Linoff, Data Mining, Techniques for Marketing, Sales and Customer Support. John Wiley & Sons, 1997.

    Google Scholar 

  4. E. Bezzaro, M. Mattoso and G. Xexeo, An Analysis of the Integration between Data Mining Applications and Database Systems, Proceedings of the Data Mining 2000 Conference, Cambridge, 2000.

    Google Scholar 

  5. P. Chapman, R. Kerber, J. Clinton, T. Khabaza, T. Reinartz and R. Wirth, The CRISP-DM Process Model, Discussion Paper, March 1999. http://www.//crisp.org

  6. M.-S. Chen, J. Han and P.S. Yu, Data Mining: An Overview from a Database Perspective, IEEE Transactions on Knowledge Discovery and Data Engineering, Vol. 8,No. 6, December 1996.

    Google Scholar 

  7. D. Düllmann, Petabyte databases. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-99), ACM Press, July 1999.

    Google Scholar 

  8. G. Graefe, U. Fayyad and S. Chaudhuri, On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998.

    Google Scholar 

  9. J.M. Hellerstein, R. Avnur, A. Chou, C. Hidber, C. Olston, V. Raman, T. Roth and P. Haas, Interactive Data Analysis: The Control Project, IEEE Computer, Vol. 32, August 1999.

    Google Scholar 

  10. W. Gropp, E. Lusk and A. Skjellum,Using MPI-Portable Parallel Programming with the Message-Passing Interface. MIT Press, Cambridge, Massachusetts, 1994.

    Google Scholar 

  11. D. Pyle, Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc., 1999.

    Google Scholar 

  12. P.G. Selfridge, D. Srivastava and L.O. Wilson, IDEA: Interactive Data Exploration and Analysis, Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996.

    Google Scholar 

  13. S. Thomas and S. Sarawagi, Mining Generalized Association Rules and Sequential Patterns Using SQL Queries, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1998.

    Google Scholar 

  14. G. Williams, I. Altas, S. Barkin, P. Christen, M. Hegland, A. Marquez, P. Milne, R. Nagappan and S. Roberts, The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project, In Large-Scale Parallel Data Mining, M.J. Zaki and C.-T. Ho (Eds.), Springer Lecture Note in Artificial Intelligence 1759, 1999.

    Google Scholar 

  15. R.J. Yarger, G. Reese and T. King, MySQL & mSQL, O’Reilly, July 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nielsen⋆, O.M., Christen, P., Hegland, M., Semenova, T., Hancock, T. (2001). A Toolbox Approach to Flexible and Efficient Data Mining. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics