On Text Mining Algorithms for Automated Maintenance of Hierarchical Knowledge Directory

Kim, Han-joon

doi:10.1007/11811220_18

Han-joon Kim²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4092))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1144 Accesses

Abstract

This paper presents a series of text-mining algorithms for managing knowledge directory, which is one of the most crucial problems in constructing knowledge management systems today. In future systems, the constructed directory, in which knowledge objects are automatically classified, should evolve so as to provide a good indexing service, as the knowledge collection grows or its usage changes. One challenging issue is how to combine manual and automatic organization facilities that enable a user to flexibly organize obtained knowledge by the hierarchical structure over time. To this end, I propose three algorithms that utilize text mining technologies: semi-supervised classification, semi-supervised clustering, and automatic directory building. Through experiments using controlled document collections, the proposed approach is shown to significantly support hierarchical organization of large electronic knowledge base with minimal human effort.

This research was supported by the University of Seoul, Korea, in the year of 2005.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Document Classification with Hierarchically Structured Dictionaries

Building a Knowledge Based Summarization System for Text Data Mining

The Self-Generating Model: An Adaptation of the Self-organizing Map for Intelligent Agents and Data Mining

References

Aggrawal, R., Bayardo, R.J., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)
Chapter Google Scholar
Bonifacio, M., Bouquet, P., Traverso, P.: Enabling distributed knowledge management managerial and technological impliations. Informatik/Informatique 3(1) (2002)
Google Scholar
Dempster, A.P., Laird, N., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B39, 1–38 (1977)
MathSciNet Google Scholar
Demiriz, A., Bennett, K.: Optimization Approaches to Semi-Supervised Learning. In: Ferris, M., Mangasarian, O., Pang, J. (eds.) Applications and Algorithms of Complementarity. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Han, E., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Proc. of the 5^th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65 (1991)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Technical Report LS8-Report, Univ. of Dortmund (1997)
Google Scholar
Kim, H.J., Lee, S.G.: A Semi-Supervised Document Clustering Technique for Information Organization. In: Proc. of the 9th Int’l Conf. on Information and Knowledge Management, pp. 30–37 (2000)
Google Scholar
Labzour, T., Bensaid, A., Bezdek, J.: Improved Semi-Supervised Point-Prototype Clustering Algorithms. In: Proc. of the 7^th International Conference on Fuzzy Systems, pp. 1383–1387 (1998)
Google Scholar
Mitchell, T.M.: Bayesian Learning. In: Machine Learning, pp. 154–200. McGraw-Hill, New York (1997)
Google Scholar
Mitchell, T.M.: Artificial Neural Networks. In: Machine Learning, pp. 81–126. McGraw-Hill, New York (1997)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the 19^th International Conference on Machine Learning, pp. 435–442 (2002)
Google Scholar
Nigam, K.: Using Unlabeled Data to Improve Text Classification, Ph.D. thesis, Carnegie Mellon University (2001)
Google Scholar
Ogawa, Y., Moria, T., Kobayashi, K.: A Fuzzy Document Retrieval System Using the Key Word Connection Matrix and a Learning Method. Fuzzy Sets and Systems 39, 163–179 (1991)
Article MathSciNet Google Scholar
Sahami, M., Yusufali, S., Baldonado, M.Q.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proc. of the 3^rd ACM International Conference on Digital Libraries, pp. 200–209 (1998)
Google Scholar
Schneider, K.-M.: Techniques for Improving the Performance of Naive Bayes for Text Classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005)
Chapter Google Scholar
Talavera, L., Béjar, J.: Integrating declarative knowledge in hierarchical clustering tasks. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 211–222. Springer, Heidelberg (1999)
Chapter Google Scholar
Content Management, Metadata & Semantic Web: Keynote Address. In: Net.ObjectDAYS 2001 (2001)
Google Scholar
Innovaive Approaches for Improving Information Supply, Gartner Group Report, M-14-3517 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Seoul, Korea
Han-joon Kim

Authors

Han-joon Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT, UPS,, F-31062, Toulouse Cédex 9, France
Jérôme Lang
Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
Fangzhen Lin
Guangxi Normal University, Guilin, China
Ju Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, Hj. (2006). On Text Mining Algorithms for Automated Maintenance of Hierarchical Knowledge Directory. In: Lang, J., Lin, F., Wang, J. (eds) Knowledge Science, Engineering and Management. KSEM 2006. Lecture Notes in Computer Science(), vol 4092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811220_18

Download citation

DOI: https://doi.org/10.1007/11811220_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37033-8
Online ISBN: 978-3-540-37035-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics