Abstract
As knowledge discovery (KD) matures and enters the mainstream, there is an onus on the technology developers to provide the technology in a deployable, embeddable form. This transition from a stand-alone technology, in the control of the knowledgeable few, to a widely accessible and usable technology will require the development of standards. These standards need to be designed to address various aspects of KD ranging from the actual process of applying the technology in a business environment, so as to make the process more transparent and repeatable, through to the representation of knowledge generated and the support for application developers. The large variety of data and model formats that researchers and practitioners have to deal with and the lack of procedural support in KD have prompted a number of standardization efforts in recent years, led by industry and supported by the KD community at large. This paper provides an overview of the most prominent of these standards and highlights how they relate to each other using some example applications of these standards.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anand S, Büchner A (1998) Data mining for decision support. Financial Times Management
Bhandari I, Colet E, Parker J, Pines Z, Pratap R, Ramanujam K (1997) Advanced scout: data mining and knowledge discovery in NBA data. Data Mining Knowl Discov 1: 121–125
Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Brachman R, Anand T (1994) The process of knowledge discovery in databases: a first sketch. In: Fayyad U, Uthurusamy R (eds) Knowledge discovery in databases: papers from the 1994 AAAI Workshop. Seattle, Washington, AAAI Press, pp 1–12
Chapman P, Clinton J, Kerber R, Khabaza T, Reinartz T, Shearer C, Wirth R (2000) Crisp-dm 1.0: Step-by-step data mining guide. http://www.crisp-dm.org
CWM (2004) Common warehouse metamodel (CWM). http://www.omg.org/technology/documents/formal/cwm.htm
DB2-IM (2004) DB2 intelligent miner for data. http://www-306.ibm.com/software/data/iminer/fordata/
Eisenberg A, Melton J (2001) SQL multimedia and application packages (SQL/MM). SIGMOD RECORD 30(4)
Farrand J, Flach P (2003) ROCOn: a tool for visualising ROC graphs. http://www.cs.bris.ac.uk/
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996a) From data mining to knowledge discovery: an overview. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R(eds) Advances in knowledge discovery and data mining. AAAI Press/The MIT Press, Menlo Park, CA, pp 1–34
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996b) The KDD process for extracting useful knowledge from volumes of data. Commun ACM 39(11): 27–34
Grossman R, Kamath C, Kegelmeyer P, V Kumar RN (2001) Data mining for scientific and engineering applications. Kluwer Academic Publishers
Grossman R, Hornick M, Meyer G (2002) Data mining standards initiatives. Commun ACM 45(8), http://www.dmg.org
ISO/IEC 9075:2003 (2003) ISO/IEC 9075:2003 Database Language SQL
ISO/IEC CD 13249-6 (2004) Information technology—database languages—SQL multimedia and application packages—Part 6: Data mining
JCA (2003) Java connection architecture. http://www.xmla.org/faq.asp
JCP (1995) The Java community process. http://jcp.org/en/home/index
JDM (2004) Java specification request 73. http://www.jcp.org/en/jsr/detail?id=73
Jorge A, Poças J, Azevedo P (2002) Post-processing operators for browsing large sets of association rules. In: Lange S, Satoh K, Smith C (eds) Proceedings of discovery science 02, Springer-Verlag, Lübeck, Germany, vol LNCS 2534
Klösgen W (1996) EXPLORA: a multipattern and multistrategy discovery assistant. In: Fayyad U, Piatetsky- Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, CA, pp 249–271
Klösgen W, Zytkow J (eds) (2002) Knowledge discovery in databases: the purpose, necessity, and challenges, Handbook of data mining and knowledge discovery. Oxford University Press, Inc.
Li J (2003) PMML output and visualization for WEKA. PhD thesis, Department of Computer Science, University of Bristol. http://www.cs.bris.ac.uk/home/jl2092/project/thesis.pdf
OLE-DB (2000) OLE-DB for data mining specification 1.0. Microsoft. http://msdn2.microsoft.com/en-us/library/ms146608.aspx
Oracle (2004) Oracle data mining. http://www.oracle.com/technology/products/bi/odm/index.html
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3): 203–231
Pyle D (2004) Nine simple rules you won’t want to follow. DB2 magazine 9(1)
Raedt LD (2002) A perspective on inductive databases. ACM SIGKDD Explor Newslett 4(2)
SOAP (2004) Simple object access protocol (SOAP). http://www.w3.org/TR/SOAP/
Tang Z, Kim P (2004) Building data mining solutions with SQL Server 2000. http://www.dmreview.com/whitepaper/wid292.pdf
Wettschereck D, Jorge A, Moyle S (2003) Data mining and decision support integration through the predictive model markup language standard and visualization. In: Data mining and decision support: integration and collaboration. Kluwer
Witten I, Frank E (1999) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco
XMLA (2004) XML for analysis (XMLA). http://www.xmla.org/
XMLA-spec (2004) XML for analysis (XMLA) specification version 1.1. http://www.xmla.org/docs_pub.asp
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Anand, S.S., Grobelnik, M., Herrmann, F. et al. Knowledge discovery standards. Artif Intell Rev 27, 21–56 (2007). https://doi.org/10.1007/s10462-008-9067-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-008-9067-4