Skip to main content

Towards a Methodology for Data Mining Project Development: The Importance of Abstraction

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

Summary

Standards such as CRISP-DM, SEMMA, PMML, are making data mining processes easier. Nevertheless, up to date, projects are being developed more as an art than as a science making it difficult to understand, evaluate and compare results as there is no standard methodology. In this chapter, we make a proposal for such a methodology based on RUP and CRISP-DM and concentrate on the project conception phase for determining a feasible project plan.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Chidanand and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13(2–3):197–210, 1997

    Google Scholar 

  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In U. Fayyad et al. Advances in Knowledge Discovery and Data Mining. MIT, Cambridge, MA, 1996

    Google Scholar 

  3. American Society for Quality. Six Sigma Forum. http://www.asq.org/info/glossary/p.html, last accessed 2005

  4. M. Berry and G. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support. Wiley, New York, 1998

    Google Scholar 

  5. S. Brin, R. Motwani, J.D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, 255–264, 1997

    Chapter  Google Scholar 

  6. Commerce-Database.com. Business Intelligence Definition. http://www.commerce-database.com/business-intelligence.htm, last accessed 2005

  7. CRISP-DM Consortium. CRISP-DM 1.0. Step-by-step data mining guide, 1.0 edition, August 2000

    Google Scholar 

  8. Data Mining Group. The Predictive Model Markup Language PMML. http://www.dmg.org, last accessed 2005

  9. U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. Fayyad et al. Advances in Knowledge Discovery and Data Mining 1–34. MIT, Cambridge, MA, Chapter 1, 1996

    Google Scholar 

  10. I. Geist. A framework for data mining and kdd. In SAC ’02: Proceedings of the 2002 ACM Symposium on Applied Computing, ACM, New York, NY, USA, 508–513, 2002

    Chapter  Google Scholar 

  11. K. Mc Graw and K. Harbison-Briggs. Knowledge Acquisition: Principles and Guidelines. McGraw-Hill, New York, 1986

    Google Scholar 

  12. R. Grossman, M. Hornick, and G. Meyer. Data Mining Standards Initiatives. Communication of ACM, August 2002, Vol. 45 No. 8 pp. 59–61, 2002

    Article  Google Scholar 

  13. S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases. In ACM SIGMOD International Conference on Management of Data, 73–84, June 1998

    Google Scholar 

  14. J. Han and M. Kamber. Data Mining: Concepts and Techniques, 550. Morgan Kaufmann, Los Altos, CA, August 2000

    Google Scholar 

  15. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In W. Chen, J. Naughton, and P.A. Bernstein, editors, 2000 ACM SIGMOD International Conference on Management of Data, 1–12. ACM, 05 2000

    Google Scholar 

  16. M. Kamber, L. Winstone, W. Gong, S. Cheng, and J. Han. Generalization and decision tree induction: Efficient classification in data mining. In Proceedings of the 1997 International Workshop Research Issues on Data Engineering (RIDE’97), Birmingham, England, 111–120, April 1997

    Google Scholar 

  17. P. Kruchten. The Rational Unified Process: An Introduction. Addison-Wesley, Reading, MA, 2004

    Google Scholar 

  18. S. Kudyba. Data Mining Efforts Increase Business Productivity and Efficiency. Interview with Stephan Kudyba – President of Null Sigma Inc., 2001

    Google Scholar 

  19. T.Y. Lin and E. Louie. Data mining using granular computing: fast algorithms for finding association rules. In Data Mining, Rough Sets and Granular Computing, 23–45. Physica, Heidelberg, 2002

    Google Scholar 

  20. M. Mehta, R. Agrawal, and J. Rissanen. Sliq: A fast scalable classifier for data mining. In Proceedings of International Conference on Extending Database Technology, 18–32, 1996

    Google Scholar 

  21. L.T. Moss and S. Atre. Business Intelligence Roadmap. The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley Information Technology Series, 2004

    Google Scholar 

  22. Object Management Group. Common Warehouse Metamodel – Data Mining. http://www.omg.org/cgi-bin/doc?ad/00-01-01, March last accessed 2005

  23. University of Washington. Project Definition in Project Management. http://www.washington.edu/computing/pm/define/definition.html, last accessed 2005

  24. Z. Pawlak. Information systems: theoretical foundations. Information Systems, 6(3):205–218, 1981

    Article  MATH  MathSciNet  Google Scholar 

  25. G. Piatesky-Shapiro. Data Mining, Web Mining, and Knowledge Discovery Guide. http://www.kdnuggets.com, 2005

  26. R.S. Pressman. Software Engineering: A Practioner’s Approach. McGraw-Hill, New York, 1997

    Google Scholar 

  27. SAS. SEMMA – Sample, Explore, Modify, Model, Assess. http://www.sas.com/technologies/analytics/datamining/miner/semma.html, last accessed 2005

  28. D. Slezak, J. Wroblewski, and M.S. Szczuka. Constructing extensions of bayesian classifiers with use of normalizing neural networks. In Foundations of Intelligent Systems, 14th International Symposium, ISMIS 2003, Maebashi City, Japan, October 28–31, 2003, Proceedings, volume 2871 of Lecture Notes in Computer Science, 408–416. Springer, Berlin Heidelberg New York, 2003

    Google Scholar 

  29. SPSS Corporation. CAT (Clementine Application Templates). http://www.spss.com/clementine/cats.htm, last accessed 2005

  30. W. Ziarko. Variable precision Rough Set Model. Journal of Computer Systems and Science, 46(1):39–59, 1993

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

González-Aranda, P., Menasalvas, E., Millán, S., Ruiz, C., Segovia, J. (2008). Towards a Methodology for Data Mining Project Development: The Importance of Abstraction. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78488-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78487-6

  • Online ISBN: 978-3-540-78488-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics