Summary
Standards such as CRISP-DM, SEMMA, PMML, are making data mining processes easier. Nevertheless, up to date, projects are being developed more as an art than as a science making it difficult to understand, evaluate and compare results as there is no standard methodology. In this chapter, we make a proposal for such a methodology based on RUP and CRISP-DM and concentrate on the project conception phase for determining a feasible project plan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Chidanand and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13(2–3):197–210, 1997
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In U. Fayyad et al. Advances in Knowledge Discovery and Data Mining. MIT, Cambridge, MA, 1996
American Society for Quality. Six Sigma Forum. http://www.asq.org/info/glossary/p.html, last accessed 2005
M. Berry and G. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support. Wiley, New York, 1998
S. Brin, R. Motwani, J.D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, 255–264, 1997
Commerce-Database.com. Business Intelligence Definition. http://www.commerce-database.com/business-intelligence.htm, last accessed 2005
CRISP-DM Consortium. CRISP-DM 1.0. Step-by-step data mining guide, 1.0 edition, August 2000
Data Mining Group. The Predictive Model Markup Language PMML. http://www.dmg.org, last accessed 2005
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. Fayyad et al. Advances in Knowledge Discovery and Data Mining 1–34. MIT, Cambridge, MA, Chapter 1, 1996
I. Geist. A framework for data mining and kdd. In SAC ’02: Proceedings of the 2002 ACM Symposium on Applied Computing, ACM, New York, NY, USA, 508–513, 2002
K. Mc Graw and K. Harbison-Briggs. Knowledge Acquisition: Principles and Guidelines. McGraw-Hill, New York, 1986
R. Grossman, M. Hornick, and G. Meyer. Data Mining Standards Initiatives. Communication of ACM, August 2002, Vol. 45 No. 8 pp. 59–61, 2002
S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases. In ACM SIGMOD International Conference on Management of Data, 73–84, June 1998
J. Han and M. Kamber. Data Mining: Concepts and Techniques, 550. Morgan Kaufmann, Los Altos, CA, August 2000
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In W. Chen, J. Naughton, and P.A. Bernstein, editors, 2000 ACM SIGMOD International Conference on Management of Data, 1–12. ACM, 05 2000
M. Kamber, L. Winstone, W. Gong, S. Cheng, and J. Han. Generalization and decision tree induction: Efficient classification in data mining. In Proceedings of the 1997 International Workshop Research Issues on Data Engineering (RIDE’97), Birmingham, England, 111–120, April 1997
P. Kruchten. The Rational Unified Process: An Introduction. Addison-Wesley, Reading, MA, 2004
S. Kudyba. Data Mining Efforts Increase Business Productivity and Efficiency. Interview with Stephan Kudyba – President of Null Sigma Inc., 2001
T.Y. Lin and E. Louie. Data mining using granular computing: fast algorithms for finding association rules. In Data Mining, Rough Sets and Granular Computing, 23–45. Physica, Heidelberg, 2002
M. Mehta, R. Agrawal, and J. Rissanen. Sliq: A fast scalable classifier for data mining. In Proceedings of International Conference on Extending Database Technology, 18–32, 1996
L.T. Moss and S. Atre. Business Intelligence Roadmap. The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley Information Technology Series, 2004
Object Management Group. Common Warehouse Metamodel – Data Mining. http://www.omg.org/cgi-bin/doc?ad/00-01-01, March last accessed 2005
University of Washington. Project Definition in Project Management. http://www.washington.edu/computing/pm/define/definition.html, last accessed 2005
Z. Pawlak. Information systems: theoretical foundations. Information Systems, 6(3):205–218, 1981
G. Piatesky-Shapiro. Data Mining, Web Mining, and Knowledge Discovery Guide. http://www.kdnuggets.com, 2005
R.S. Pressman. Software Engineering: A Practioner’s Approach. McGraw-Hill, New York, 1997
SAS. SEMMA – Sample, Explore, Modify, Model, Assess. http://www.sas.com/technologies/analytics/datamining/miner/semma.html, last accessed 2005
D. Slezak, J. Wroblewski, and M.S. Szczuka. Constructing extensions of bayesian classifiers with use of normalizing neural networks. In Foundations of Intelligent Systems, 14th International Symposium, ISMIS 2003, Maebashi City, Japan, October 28–31, 2003, Proceedings, volume 2871 of Lecture Notes in Computer Science, 408–416. Springer, Berlin Heidelberg New York, 2003
SPSS Corporation. CAT (Clementine Application Templates). http://www.spss.com/clementine/cats.htm, last accessed 2005
W. Ziarko. Variable precision Rough Set Model. Journal of Computer Systems and Science, 46(1):39–59, 1993
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
González-Aranda, P., Menasalvas, E., Millán, S., Ruiz, C., Segovia, J. (2008). Towards a Methodology for Data Mining Project Development: The Importance of Abstraction. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-78488-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)