Abstract
With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: Towards a unifying framework. InProc. KDD-96: Second International Conference on Knowledge Discovery & Data Mining Menlo Park, CA: AAAI Press, 1996, pp.82–88.
Matheus Christopher J, Chan Philip K, Piatetsky-Shapiro G. Systems for knowledge discovery in databases.IEEE Trans. Knowl. Data Eng., 1993, 5(6).
Knowledge Discovery Nuggets on the Internet: http://www.nuggets.com/.
Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases.AI Magazine, Fall, 1996, pp.37–54.
Agrawal R, Srikant R. Fast algorithms for mining association rules. InProc. Int'l Conf. Very Large Databases, 1994, pp.487–499.
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. InProc. ACM SIGMOD, May 1993, pp.207–216.
Srikant R, Agrawal R. Mining generalized association rules. InProc. 21st Int'l Conf. Very Large Databases, September 1995, pp.407–419.
Han Jiawei, Fu Yong. Discovery of multi-level association rules from large databases. InProc. Int'l Conf. Very Large Databases, 1995, pp.420–431.
Shen K, Ong, Mitbander B, Zaniolo C. Metaqueries for data mining. In:Advances in Knowledge Discovery and Data Mining, Fayyad R U (eds.), AAAI/MIT Press, 1996, pp.375–398.
Park J S, Chen M S, Yu P S. An effective hash based algorithm for mining association rules. InProc. ACM SIGMOD, May 1995, pp.175–186.
Fu Y, Han J. Meta-rule-guided mining of association rules in relational databases. InProc. 1st Int'l Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD'95), Singapore, Dec. 1995, pp.39–46.
Piatetsky-Shapiro G. Discovery, analysis, and presentation of strong rules. InKnowledge Discovery in Databases Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp. 229–238.
Silberschatz A, Tuzhilin A. On subjective measure of interestingness in knowledge discovery. InProc. 1st Int'l Conf. Knowledge Discovery and Data Mining (KDD95) Montreal, Canada, Aug. 1995, pp.275–281.
Harinarayan V, Ullman J D, Rajaraman A. Implementing data cubes efficiently. InProc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, Montreal, Canada, June 1996.
Gupta A, Harinarayan V, Quass D. Aggregate-query processing in data warehousing environment. InProc. 21st Int'l Conf. Very Large Data Bases, Zurich, Switzerland, September 1995, pp.358–369/
Widom J. Research problems in data warehousing. InProc. 4th Int'l Conf. Information and Knowledge Management, Baltimore, Maryland, Nov. 1995, pp.25–30.
Han J, Cai Y, Cercone N. Data-driven discovery of quantitative rules in relational databases.IEEE Trans. Knowledge and Data Engineering, 1993, 3: 29–40.
Cai Y, Cercone N, Han J. Attribute-Oriented Induction in Relational Databases. InKnowledge Discovery in Database, 1991, pp.213–228.
Han J, Fu Y. Exploration of the Power of Attribute-Oriented Induction in Data Mining. InAdvances in Knowledge Discovery and Data Mining Fayyad U M Piatetsky-Shapiro Get al. (eds.), AAAI/MIT Press, 1996, pp.399–421.
Han J, Cai Y, Cercone N. Knowledge discovery in databases: An attribute-oriented approach. InProc. 18th International Conference on Very Large Databases, Aug. 1992, pp.547–559.
Li Deyi, Shi Xuemei, Meng Haijun. Membership clouds and clouds generators.The Research and Development of Computers 1995, 42(8): 32–41.
Han Ke. The discovery state space theory and its applications. Ph.D. dissertation Communication Engineering Institute, Nanjing, China 1996.
Han J, Fu Y. Dynamic generation and refinement of concept hierarchies for knowledge discovery in databases. InProc. AAAI'94 Workshop Knowledge Discovery in Databases, Seatle, July 1994, pp.157–168.
Quinlan J R. Induction of decision trees.Machine Learning, 1986, 1: 81–106.
Quinlan J R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Shan Ning, Ziarko W, Hamilton H J, Cercone N. Discovering classification knowledge in databases using rough sets. InProc. KDD-96: Second International Conference on Knowledge Discovery & Data Mimng, Menlo Park, CA: AAAI Press, 1996, pp.271–274.
Agrawal A, Ghosh S, Imielinkski T, Iyer B, Swami A. An interval classifier for database mining applications. InProc. 18th Int'l Conf. Very Large Data Bases, August 1992, pp.560–573.
Ng R, Han J. Efficient and effective clustering method for spatial data mining. InProc. International Conference on Very Large Databases, Santiago, Chile, September 1994, pp.144–155.
Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
Ester M, Kriegel H P, Xu X. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. InProc. 4th Int'l Symposium on Large Spatial Databases (SSD95), Portland, Maine, August 1995, pp.67–82.
Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. InProc. ACM-SIGMOD, Montreal, Canada, June 1996.
Agrawal R, Faloutsos C, Swami A. Efficient similarity search in sequence databases. InProc. 4th Int'l Conf. Foundations of Data Organization and Algorithms, October 1993.
Faloutsos C, Ranganathan M, Manolopoulos Y. Fast subsequence matching in time-series databases. InProc. ACM SIGMOD, Minneapolis, MN, May 1994, pp.419–429.
Agrawal R, Lin K I, Sawhney H S, Shim K. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. InProc. 21st Int'l Conf. Very Large Databases, September 1995, pp.490–501.
Li C S, Yu P S, Castelli V. HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. InProc. 12th Int'l Conf. Data Engineering, February 1996.
Flockhart I W, Radcliffe N J. A genetic algorithm-based approach to data mining. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.299–302.
Matheus C, Chan P, Piatetsky-Shapiro G. System for knowledge discovery in databases.IEEE Trans. Knowledge and Data Engineering, 1993, 5(6): 903–913.
Schmitz J, Armstrong G, Little J D C. CoverStory automated news finding in marketing. InDSS Transactions, Institute of Management Sciences, Providence, RI, 1990.
Hoschka P, Klosgen W. A support system for interpreting statistical data. InKnowledge Discovery in Databases, Piatetsky-Shapiro G, Frawley W (eds.), Cambridge, MA: AAAI/MIT, 1991, pp.325–345.
Piatetsky-Shapiro G, Matheus C J. Knowledge discovery workbench: An exploratory environment for discovery in business databases. InWorkshop Notes from the 9th National Conference on Artificial Intelligence: Knowledge Discovery in Databases, Anaheim, CA, July 1991, pp.11–24.
Piatetsky-Shapiro G. Discovery, Analysis, and Presentation of Strong Rules. InKnowledge Discovery in Databases, Cambridge, MA: AAAI/MIT, 1991, pp.229–248.
Piatetsky-Shapiro G (ed.). Workshop Notes from the 9th Nar. Conf. Art. Intell.: Knowledge Discovery in Databases, Anaheim, CA, July, 1991.
Piatetsky-Shapiro G. Probabilistic data dependencies. InProc. Mach. Discovery Work, (9th Mach. Learn. Conf.), Aberdeen, Scotland, 1992, pp.11–17.
Han Jiawei, Fu Yongjianet al. DB Miner: A system for mining knowledge in large relational database. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.250–255.
Roddick John F, Craske Noel G, Richards Thomas J. Handling discovered structure in database systems.IEEE Trans. Knowledge and Data Engineering, April 1996, 8(2): 227–240.
Han Jiawei, Huang Yue, Cercone Nick, Fu Yongjian. Intelligent query answering by knowledge discovery techniques.IEEE Trans. Knowledge and Data Engineering, June 1996, 8(3): 373–390.
Rakesh Agrawal, Manish Mehta, John Shafer, Ramakrishnan Srikant. The quest data mining system. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.244–249.
Srikant R, Agrawal R. Mining quantitative association rules in large relational tables. InProc. ACM SIGMOD Conf. Management of Data, 1996.
Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. InProc. Fifth Int'l Conf. Extending Database Technology (EDBT), 1996.
Piatetsky-Shapiro G, Brachman R, Khabaza Tet al. An overview of issues in developing industrial data mining and knowledge discovery applications. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.89–95.
Selinger P G. Predictions and challenges for database systems in the year 2000. InProc. 19th Int'l Conf. Very Large Databases, Agrawal R, Baker S, Bell D (eds.), Dublin, Ireland, 1993, pp.667–675.
Fayyad U, Haussler D, Stolorz P. KDD for science data analysis: Issues and examples. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.50–56.
Frawley W J, Piatetsky-Shapiro G, Matheus C J. Knowledge discovery in databases: An overview. InKnowledge Discovery in Databases, Cambridge, MA: AAAI/MIT, 1991, pp.1–27. Reprinted inAI Magazine, 1992, 13(3): 1–27.
Jain A K, Dubes R C. Algorithms for Clustering Data. Prentice-Hall, 1988.
Fisher-D. Optimization and simplification of hierarchical clustering. InProc. 1st Int'l Conf. Knowledge Discovery and Data Mining (KDD95), Montreal, Canada, August 1995, pp.118–123.
Cheeseman P, Stutz J. Bayesian classification (AutoClass): Theory and results. InAdvances in Knowledge Discovery and Data Mining, Fayyad U M, Piatetsky-Shapiro Get al. (eds.), AAAI/MIT Press, 1996, pp.153–180.
Li D, Shi X, Ward P, Gupta M M. Soft inference mechanism based on cloud models. InProc. First Int'l Workshop on Logic Programming and Soft Computing, Francesca Arcelli Fontana, Ferrante Formato and Trevor P. Martin (eds.), Bonn, Germany, Sept. 6, 1996, pp.38–62.
Li Deyi, Shi Xuemei, Vincent N G. On representing uncertainty in commonsense knowledge. InProc. Joint 1997 Pacific Asia Conf. Expert Systems/Singapore Int'l Conf. Intelligent Systems (PACES/SPICIS 97), Orchard Hotel, Singapore, Feb. 1997, pp.291–298.
Kaufman K A, Michalski R S, Kerschberg L. Mining for Knowledge in Databases: Goals and General Description of the INLEN System. InKnowledge Discovery in Databases, Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp.449–462.
Michalski R S. A Theory and Methodology of inductive Learning. Machine Learning: An Artificial Intelligence Approach. Vol. 1, Michalski R Set al. (eds.), Morgan Kaufmann, 1983, pp.83–134.
Michalski R S, Kerschberg L, Kaufman K A, Ribeiro J S. Mining for knowledge in databases: The INLEN architecture, initial implementation and first results.J. Int'l Information Systems, 1992, 1: 85–114.
Mehta M, Agrawal R, Rissanen J. A fast scaleable classifier for data mining. InProc. Fifth Int'l Conf. Extending Database Technology, 1996.
Arning A, Agrawal R. A linear method for deviation detection in large databases. InProc. 2nd Int'l Conf. Knowledge Discovery in Databases and Data Mining, 1996.
O'Leary D E. Knowledge discovery as a threat to database, security. InKnowledge Discovery in Databases, Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp.229–238.
Piatetsky-Shapiro G, Matheus C J. Knowledge discovery workbench of exploring business databases.Int'l J. Intelligent Systems, 1992, 7: 675–686.
Author information
Authors and Affiliations
Additional information
Fan Jianhua is a Ph.D. candidate in Department of Computer Science, Nanjing Communications Engineering Institute. His current research interests include data mining, C3I systems.
Li Deyi graduated from Department of Electronic Engineering, Southeast University in 1967, and received his Ph.D. degree in computer science from Heriot-Watt University, Edinburgh in 1983. He is presently a Professor and the Chief-Engineer in the Institute of Beijing Electroinc System Engineering. His research interests include data mimng, fuzzy control, system simulation and C3I systems.
Rights and permissions
About this article
Cite this article
Fan, J., Li, D. An overview of data mining and knowledge discovery. J. of Comput. Sci. & Technol. 13, 348–368 (1998). https://doi.org/10.1007/BF02946624
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02946624