Skip to main content
Log in

An overview of data mining and knowledge discovery

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: Towards a unifying framework. InProc. KDD-96: Second International Conference on Knowledge Discovery & Data Mining Menlo Park, CA: AAAI Press, 1996, pp.82–88.

    Google Scholar 

  2. Matheus Christopher J, Chan Philip K, Piatetsky-Shapiro G. Systems for knowledge discovery in databases.IEEE Trans. Knowl. Data Eng., 1993, 5(6).

  3. Knowledge Discovery Nuggets on the Internet: http://www.nuggets.com/.

  4. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases.AI Magazine, Fall, 1996, pp.37–54.

  5. Agrawal R, Srikant R. Fast algorithms for mining association rules. InProc. Int'l Conf. Very Large Databases, 1994, pp.487–499.

  6. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. InProc. ACM SIGMOD, May 1993, pp.207–216.

  7. Srikant R, Agrawal R. Mining generalized association rules. InProc. 21st Int'l Conf. Very Large Databases, September 1995, pp.407–419.

  8. Han Jiawei, Fu Yong. Discovery of multi-level association rules from large databases. InProc. Int'l Conf. Very Large Databases, 1995, pp.420–431.

  9. Shen K, Ong, Mitbander B, Zaniolo C. Metaqueries for data mining. In:Advances in Knowledge Discovery and Data Mining, Fayyad R U (eds.), AAAI/MIT Press, 1996, pp.375–398.

  10. Park J S, Chen M S, Yu P S. An effective hash based algorithm for mining association rules. InProc. ACM SIGMOD, May 1995, pp.175–186.

  11. Fu Y, Han J. Meta-rule-guided mining of association rules in relational databases. InProc. 1st Int'l Workshop on Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD'95), Singapore, Dec. 1995, pp.39–46.

  12. Piatetsky-Shapiro G. Discovery, analysis, and presentation of strong rules. InKnowledge Discovery in Databases Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp. 229–238.

  13. Silberschatz A, Tuzhilin A. On subjective measure of interestingness in knowledge discovery. InProc. 1st Int'l Conf. Knowledge Discovery and Data Mining (KDD95) Montreal, Canada, Aug. 1995, pp.275–281.

  14. Harinarayan V, Ullman J D, Rajaraman A. Implementing data cubes efficiently. InProc. 1996 ACM-SIGMOD Int'l Conf. Management of Data, Montreal, Canada, June 1996.

  15. Gupta A, Harinarayan V, Quass D. Aggregate-query processing in data warehousing environment. InProc. 21st Int'l Conf. Very Large Data Bases, Zurich, Switzerland, September 1995, pp.358–369/

  16. Widom J. Research problems in data warehousing. InProc. 4th Int'l Conf. Information and Knowledge Management, Baltimore, Maryland, Nov. 1995, pp.25–30.

  17. Han J, Cai Y, Cercone N. Data-driven discovery of quantitative rules in relational databases.IEEE Trans. Knowledge and Data Engineering, 1993, 3: 29–40.

    Article  Google Scholar 

  18. Cai Y, Cercone N, Han J. Attribute-Oriented Induction in Relational Databases. InKnowledge Discovery in Database, 1991, pp.213–228.

  19. Han J, Fu Y. Exploration of the Power of Attribute-Oriented Induction in Data Mining. InAdvances in Knowledge Discovery and Data Mining Fayyad U M Piatetsky-Shapiro Get al. (eds.), AAAI/MIT Press, 1996, pp.399–421.

  20. Han J, Cai Y, Cercone N. Knowledge discovery in databases: An attribute-oriented approach. InProc. 18th International Conference on Very Large Databases, Aug. 1992, pp.547–559.

  21. Li Deyi, Shi Xuemei, Meng Haijun. Membership clouds and clouds generators.The Research and Development of Computers 1995, 42(8): 32–41.

    Google Scholar 

  22. Han Ke. The discovery state space theory and its applications. Ph.D. dissertation Communication Engineering Institute, Nanjing, China 1996.

    Google Scholar 

  23. Han J, Fu Y. Dynamic generation and refinement of concept hierarchies for knowledge discovery in databases. InProc. AAAI'94 Workshop Knowledge Discovery in Databases, Seatle, July 1994, pp.157–168.

  24. Quinlan J R. Induction of decision trees.Machine Learning, 1986, 1: 81–106.

    Google Scholar 

  25. Quinlan J R. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

  26. Shan Ning, Ziarko W, Hamilton H J, Cercone N. Discovering classification knowledge in databases using rough sets. InProc. KDD-96: Second International Conference on Knowledge Discovery & Data Mimng, Menlo Park, CA: AAAI Press, 1996, pp.271–274.

    Google Scholar 

  27. Agrawal A, Ghosh S, Imielinkski T, Iyer B, Swami A. An interval classifier for database mining applications. InProc. 18th Int'l Conf. Very Large Data Bases, August 1992, pp.560–573.

  28. Ng R, Han J. Efficient and effective clustering method for spatial data mining. InProc. International Conference on Very Large Databases, Santiago, Chile, September 1994, pp.144–155.

  29. Kaufman L, Rousseeuw P J. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.

  30. Ester M, Kriegel H P, Xu X. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. InProc. 4th Int'l Symposium on Large Spatial Databases (SSD95), Portland, Maine, August 1995, pp.67–82.

  31. Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases. InProc. ACM-SIGMOD, Montreal, Canada, June 1996.

  32. Agrawal R, Faloutsos C, Swami A. Efficient similarity search in sequence databases. InProc. 4th Int'l Conf. Foundations of Data Organization and Algorithms, October 1993.

  33. Faloutsos C, Ranganathan M, Manolopoulos Y. Fast subsequence matching in time-series databases. InProc. ACM SIGMOD, Minneapolis, MN, May 1994, pp.419–429.

  34. Agrawal R, Lin K I, Sawhney H S, Shim K. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. InProc. 21st Int'l Conf. Very Large Databases, September 1995, pp.490–501.

  35. Li C S, Yu P S, Castelli V. HierarchyScan: A hierarchical similarity search algorithm for databases of long sequences. InProc. 12th Int'l Conf. Data Engineering, February 1996.

  36. Flockhart I W, Radcliffe N J. A genetic algorithm-based approach to data mining. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.299–302.

    Google Scholar 

  37. Matheus C, Chan P, Piatetsky-Shapiro G. System for knowledge discovery in databases.IEEE Trans. Knowledge and Data Engineering, 1993, 5(6): 903–913.

    Article  Google Scholar 

  38. Schmitz J, Armstrong G, Little J D C. CoverStory automated news finding in marketing. InDSS Transactions, Institute of Management Sciences, Providence, RI, 1990.

  39. Hoschka P, Klosgen W. A support system for interpreting statistical data. InKnowledge Discovery in Databases, Piatetsky-Shapiro G, Frawley W (eds.), Cambridge, MA: AAAI/MIT, 1991, pp.325–345.

    Google Scholar 

  40. Piatetsky-Shapiro G, Matheus C J. Knowledge discovery workbench: An exploratory environment for discovery in business databases. InWorkshop Notes from the 9th National Conference on Artificial Intelligence: Knowledge Discovery in Databases, Anaheim, CA, July 1991, pp.11–24.

  41. Piatetsky-Shapiro G. Discovery, Analysis, and Presentation of Strong Rules. InKnowledge Discovery in Databases, Cambridge, MA: AAAI/MIT, 1991, pp.229–248.

    Google Scholar 

  42. Piatetsky-Shapiro G (ed.). Workshop Notes from the 9th Nar. Conf. Art. Intell.: Knowledge Discovery in Databases, Anaheim, CA, July, 1991.

  43. Piatetsky-Shapiro G. Probabilistic data dependencies. InProc. Mach. Discovery Work, (9th Mach. Learn. Conf.), Aberdeen, Scotland, 1992, pp.11–17.

  44. Han Jiawei, Fu Yongjianet al. DB Miner: A system for mining knowledge in large relational database. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.250–255.

    Google Scholar 

  45. Roddick John F, Craske Noel G, Richards Thomas J. Handling discovered structure in database systems.IEEE Trans. Knowledge and Data Engineering, April 1996, 8(2): 227–240.

    Article  Google Scholar 

  46. Han Jiawei, Huang Yue, Cercone Nick, Fu Yongjian. Intelligent query answering by knowledge discovery techniques.IEEE Trans. Knowledge and Data Engineering, June 1996, 8(3): 373–390.

    Article  Google Scholar 

  47. Rakesh Agrawal, Manish Mehta, John Shafer, Ramakrishnan Srikant. The quest data mining system. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.244–249.

    Google Scholar 

  48. Srikant R, Agrawal R. Mining quantitative association rules in large relational tables. InProc. ACM SIGMOD Conf. Management of Data, 1996.

  49. Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. InProc. Fifth Int'l Conf. Extending Database Technology (EDBT), 1996.

  50. Piatetsky-Shapiro G, Brachman R, Khabaza Tet al. An overview of issues in developing industrial data mining and knowledge discovery applications. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.89–95.

    Google Scholar 

  51. Selinger P G. Predictions and challenges for database systems in the year 2000. InProc. 19th Int'l Conf. Very Large Databases, Agrawal R, Baker S, Bell D (eds.), Dublin, Ireland, 1993, pp.667–675.

  52. Fayyad U, Haussler D, Stolorz P. KDD for science data analysis: Issues and examples. InProc. KDD-96: Second Int'l Conf. Knowledge Discovery & Data Mining, Menlo Park, CA: AAAI Press, 1996, pp.50–56.

    Google Scholar 

  53. Frawley W J, Piatetsky-Shapiro G, Matheus C J. Knowledge discovery in databases: An overview. InKnowledge Discovery in Databases, Cambridge, MA: AAAI/MIT, 1991, pp.1–27. Reprinted inAI Magazine, 1992, 13(3): 1–27.

    Google Scholar 

  54. Jain A K, Dubes R C. Algorithms for Clustering Data. Prentice-Hall, 1988.

  55. Fisher-D. Optimization and simplification of hierarchical clustering. InProc. 1st Int'l Conf. Knowledge Discovery and Data Mining (KDD95), Montreal, Canada, August 1995, pp.118–123.

  56. Cheeseman P, Stutz J. Bayesian classification (AutoClass): Theory and results. InAdvances in Knowledge Discovery and Data Mining, Fayyad U M, Piatetsky-Shapiro Get al. (eds.), AAAI/MIT Press, 1996, pp.153–180.

  57. Li D, Shi X, Ward P, Gupta M M. Soft inference mechanism based on cloud models. InProc. First Int'l Workshop on Logic Programming and Soft Computing, Francesca Arcelli Fontana, Ferrante Formato and Trevor P. Martin (eds.), Bonn, Germany, Sept. 6, 1996, pp.38–62.

  58. Li Deyi, Shi Xuemei, Vincent N G. On representing uncertainty in commonsense knowledge. InProc. Joint 1997 Pacific Asia Conf. Expert Systems/Singapore Int'l Conf. Intelligent Systems (PACES/SPICIS 97), Orchard Hotel, Singapore, Feb. 1997, pp.291–298.

  59. Kaufman K A, Michalski R S, Kerschberg L. Mining for Knowledge in Databases: Goals and General Description of the INLEN System. InKnowledge Discovery in Databases, Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp.449–462.

  60. Michalski R S. A Theory and Methodology of inductive Learning. Machine Learning: An Artificial Intelligence Approach. Vol. 1, Michalski R Set al. (eds.), Morgan Kaufmann, 1983, pp.83–134.

  61. Michalski R S, Kerschberg L, Kaufman K A, Ribeiro J S. Mining for knowledge in databases: The INLEN architecture, initial implementation and first results.J. Int'l Information Systems, 1992, 1: 85–114.

    Article  Google Scholar 

  62. Mehta M, Agrawal R, Rissanen J. A fast scaleable classifier for data mining. InProc. Fifth Int'l Conf. Extending Database Technology, 1996.

  63. Arning A, Agrawal R. A linear method for deviation detection in large databases. InProc. 2nd Int'l Conf. Knowledge Discovery in Databases and Data Mining, 1996.

  64. O'Leary D E. Knowledge discovery as a threat to database, security. InKnowledge Discovery in Databases, Piatetsky-Shapiro G, Frawley W J (eds.), AAAI/MIT Press, 1991, pp.229–238.

  65. Piatetsky-Shapiro G, Matheus C J. Knowledge discovery workbench of exploring business databases.Int'l J. Intelligent Systems, 1992, 7: 675–686.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Fan Jianhua is a Ph.D. candidate in Department of Computer Science, Nanjing Communications Engineering Institute. His current research interests include data mining, C3I systems.

Li Deyi graduated from Department of Electronic Engineering, Southeast University in 1967, and received his Ph.D. degree in computer science from Heriot-Watt University, Edinburgh in 1983. He is presently a Professor and the Chief-Engineer in the Institute of Beijing Electroinc System Engineering. His research interests include data mimng, fuzzy control, system simulation and C3I systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, J., Li, D. An overview of data mining and knowledge discovery. J. of Comput. Sci. & Technol. 13, 348–368 (1998). https://doi.org/10.1007/BF02946624

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02946624

Keywords

Navigation