Skip to main content

On Text Mining Algorithms for Automated Maintenance of Hierarchical Knowledge Directory

  • Conference paper
Knowledge Science, Engineering and Management (KSEM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4092))

  • 1117 Accesses

Abstract

This paper presents a series of text-mining algorithms for managing knowledge directory, which is one of the most crucial problems in constructing knowledge management systems today. In future systems, the constructed directory, in which knowledge objects are automatically classified, should evolve so as to provide a good indexing service, as the knowledge collection grows or its usage changes. One challenging issue is how to combine manual and automatic organization facilities that enable a user to flexibly organize obtained knowledge by the hierarchical structure over time. To this end, I propose three algorithms that utilize text mining technologies: semi-supervised classification, semi-supervised clustering, and automatic directory building. Through experiments using controlled document collections, the proposed approach is shown to significantly support hierarchical organization of large electronic knowledge base with minimal human effort.

This research was supported by the University of Seoul, Korea, in the year of 2005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggrawal, R., Bayardo, R.J., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Bonifacio, M., Bouquet, P., Traverso, P.: Enabling distributed knowledge management managerial and technological impliations. Informatik/Informatique 3(1) (2002)

    Google Scholar 

  3. Dempster, A.P., Laird, N., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B39, 1–38 (1977)

    MathSciNet  Google Scholar 

  4. Demiriz, A., Bennett, K.: Optimization Approaches to Semi-Supervised Learning. In: Ferris, M., Mangasarian, O., Pang, J. (eds.) Applications and Algorithms of Complementarity. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  5. Han, E., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65 (1991)

    Google Scholar 

  6. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Technical Report LS8-Report, Univ. of Dortmund (1997)

    Google Scholar 

  7. Kim, H.J., Lee, S.G.: A Semi-Supervised Document Clustering Technique for Information Organization. In: Proc. of the 9th Int’l Conf. on Information and Knowledge Management, pp. 30–37 (2000)

    Google Scholar 

  8. Labzour, T., Bensaid, A., Bezdek, J.: Improved Semi-Supervised Point-Prototype Clustering Algorithms. In: Proc. of the 7th International Conference on Fuzzy Systems, pp. 1383–1387 (1998)

    Google Scholar 

  9. Mitchell, T.M.: Bayesian Learning. In: Machine Learning, pp. 154–200. McGraw-Hill, New York (1997)

    Google Scholar 

  10. Mitchell, T.M.: Artificial Neural Networks. In: Machine Learning, pp. 81–126. McGraw-Hill, New York (1997)

    Google Scholar 

  11. Muslea, I., Minton, S., Knoblock, C.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442 (2002)

    Google Scholar 

  12. Nigam, K.: Using Unlabeled Data to Improve Text Classification, Ph.D. thesis, Carnegie Mellon University (2001)

    Google Scholar 

  13. Ogawa, Y., Moria, T., Kobayashi, K.: A Fuzzy Document Retrieval System Using the Key Word Connection Matrix and a Learning Method. Fuzzy Sets and Systems 39, 163–179 (1991)

    Article  MathSciNet  Google Scholar 

  14. Sahami, M., Yusufali, S., Baldonado, M.Q.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proc. of the 3rd ACM International Conference on Digital Libraries, pp. 200–209 (1998)

    Google Scholar 

  15. Schneider, K.-M.: Techniques for Improving the Performance of Naive Bayes for Text Classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Talavera, L., Béjar, J.: Integrating declarative knowledge in hierarchical clustering tasks. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 211–222. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  17. Content Management, Metadata & Semantic Web: Keynote Address. In: Net.ObjectDAYS 2001 (2001)

    Google Scholar 

  18. Innovaive Approaches for Improving Information Supply, Gartner Group Report, M-14-3517 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, Hj. (2006). On Text Mining Algorithms for Automated Maintenance of Hierarchical Knowledge Directory. In: Lang, J., Lin, F., Wang, J. (eds) Knowledge Science, Engineering and Management. KSEM 2006. Lecture Notes in Computer Science(), vol 4092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811220_18

Download citation

  • DOI: https://doi.org/10.1007/11811220_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37033-8

  • Online ISBN: 978-3-540-37035-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics