Towards a Methodology for Data Mining Project Development: The Importance of Abstraction

González-Aranda, P.; Menasalvas, E.; Millán, S.; Ruiz, Carlos; Segovia, J.

doi:10.1007/978-3-540-78488-3_10

P. González-Aranda⁶,
E. Menasalvas⁶,
S. Millán⁷,
Carlos Ruiz⁶ &
…
J. Segovia⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

1302 Accesses
5 Citations

Summary

Standards such as CRISP-DM, SEMMA, PMML, are making data mining processes easier. Nevertheless, up to date, projects are being developed more as an art than as a science making it difficult to understand, evaluate and compare results as there is no standard methodology. In this chapter, we make a proposal for such a methodology based on RUP and CRISP-DM and concentrate on the project conception phase for determining a feasible project plan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Requirements for Machine Learning Methodology Software Tooling

CRISP-DM/SMEs: A Data Analytics Methodology for Non-profit SMEs

Data Mining-Based Metrics for the Systematic Evaluation of Software Project Management Methodologies

References

A. Chidanand and S. Weiss. Data mining with decision trees and decision rules. Future Generation Computer Systems, 13(2–3):197–210, 1997
Google Scholar
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. In U. Fayyad et al. Advances in Knowledge Discovery and Data Mining. MIT, Cambridge, MA, 1996
Google Scholar
American Society for Quality. Six Sigma Forum. http://www.asq.org/info/glossary/p.html, last accessed 2005
M. Berry and G. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support. Wiley, New York, 1998
Google Scholar
S. Brin, R. Motwani, J.D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, ACM, New York, NY, 255–264, 1997
Chapter Google Scholar
Commerce-Database.com. Business Intelligence Definition. http://www.commerce-database.com/business-intelligence.htm, last accessed 2005
CRISP-DM Consortium. CRISP-DM 1.0. Step-by-step data mining guide, 1.0 edition, August 2000
Google Scholar
Data Mining Group. The Predictive Model Markup Language PMML. http://www.dmg.org, last accessed 2005
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An overview. In U. Fayyad et al. Advances in Knowledge Discovery and Data Mining 1–34. MIT, Cambridge, MA, Chapter 1, 1996
Google Scholar
I. Geist. A framework for data mining and kdd. In SAC ’02: Proceedings of the 2002 ACM Symposium on Applied Computing, ACM, New York, NY, USA, 508–513, 2002
Chapter Google Scholar
K. Mc Graw and K. Harbison-Briggs. Knowledge Acquisition: Principles and Guidelines. McGraw-Hill, New York, 1986
Google Scholar
R. Grossman, M. Hornick, and G. Meyer. Data Mining Standards Initiatives. Communication of ACM, August 2002, Vol. 45 No. 8 pp. 59–61, 2002
Article Google Scholar
S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases. In ACM SIGMOD International Conference on Management of Data, 73–84, June 1998
Google Scholar
J. Han and M. Kamber. Data Mining: Concepts and Techniques, 550. Morgan Kaufmann, Los Altos, CA, August 2000
Google Scholar
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In W. Chen, J. Naughton, and P.A. Bernstein, editors, 2000 ACM SIGMOD International Conference on Management of Data, 1–12. ACM, 05 2000
Google Scholar
M. Kamber, L. Winstone, W. Gong, S. Cheng, and J. Han. Generalization and decision tree induction: Efficient classification in data mining. In Proceedings of the 1997 International Workshop Research Issues on Data Engineering (RIDE’97), Birmingham, England, 111–120, April 1997
Google Scholar
P. Kruchten. The Rational Unified Process: An Introduction. Addison-Wesley, Reading, MA, 2004
Google Scholar
S. Kudyba. Data Mining Efforts Increase Business Productivity and Efficiency. Interview with Stephan Kudyba – President of Null Sigma Inc., 2001
Google Scholar
T.Y. Lin and E. Louie. Data mining using granular computing: fast algorithms for finding association rules. In Data Mining, Rough Sets and Granular Computing, 23–45. Physica, Heidelberg, 2002
Google Scholar
M. Mehta, R. Agrawal, and J. Rissanen. Sliq: A fast scalable classifier for data mining. In Proceedings of International Conference on Extending Database Technology, 18–32, 1996
Google Scholar
L.T. Moss and S. Atre. Business Intelligence Roadmap. The Complete Project Lifecycle for Decision-Support Applications. Addison-Wesley Information Technology Series, 2004
Google Scholar
Object Management Group. Common Warehouse Metamodel – Data Mining. http://www.omg.org/cgi-bin/doc?ad/00-01-01, March last accessed 2005
University of Washington. Project Definition in Project Management. http://www.washington.edu/computing/pm/define/definition.html, last accessed 2005
Z. Pawlak. Information systems: theoretical foundations. Information Systems, 6(3):205–218, 1981
Article MATH MathSciNet Google Scholar
G. Piatesky-Shapiro. Data Mining, Web Mining, and Knowledge Discovery Guide. http://www.kdnuggets.com, 2005
R.S. Pressman. Software Engineering: A Practioner’s Approach. McGraw-Hill, New York, 1997
Google Scholar
SAS. SEMMA – Sample, Explore, Modify, Model, Assess. http://www.sas.com/technologies/analytics/datamining/miner/semma.html, last accessed 2005
D. Slezak, J. Wroblewski, and M.S. Szczuka. Constructing extensions of bayesian classifiers with use of normalizing neural networks. In Foundations of Intelligent Systems, 14th International Symposium, ISMIS 2003, Maebashi City, Japan, October 28–31, 2003, Proceedings, volume 2871 of Lecture Notes in Computer Science, 408–416. Springer, Berlin Heidelberg New York, 2003
Google Scholar
SPSS Corporation. CAT (Clementine Application Templates). http://www.spss.com/clementine/cats.htm, last accessed 2005
W. Ziarko. Variable precision Rough Set Model. Journal of Computer Systems and Science, 46(1):39–59, 1993
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Politecnica de Madrid, Madrid, Spain
P. González-Aranda, E. Menasalvas, Carlos Ruiz & J. Segovia
Universidad del Valle., Cali, Colombia
S. Millán

Authors

P. González-Aranda
View author publications
You can also search for this author in PubMed Google Scholar
E. Menasalvas
View author publications
You can also search for this author in PubMed Google Scholar
S. Millán
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
J. Segovia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, San Jose State University, San Jose, CA, 95192, USA
Tsau Young Lin
Department of Computer Science and Information Systems, Kennesaw State University, Building 11, Room 3060 1000 Chastain Road, Kennesaw, GA, 30144, USA
Ying Xie
Department of Computer Science, The University at Stony Brook, Stony Brook, New York, 11794-4400, USA
Anita Wasilewska
Institute of Information Science, Academia Sinica, No 128, Academia Road, Section 2 Nankang, Taipei, 11529, Taiwan
Churn-Jung Liau

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

González-Aranda, P., Menasalvas, E., Millán, S., Ruiz, C., Segovia, J. (2008). Towards a Methodology for Data Mining Project Development: The Importance of Abstraction. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-78488-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics