Abstract
This paper presents a series of text-mining algorithms for managing knowledge directory, which is one of the most crucial problems in constructing knowledge management systems today. In future systems, the constructed directory, in which knowledge objects are automatically classified, should evolve so as to provide a good indexing service, as the knowledge collection grows or its usage changes. One challenging issue is how to combine manual and automatic organization facilities that enable a user to flexibly organize obtained knowledge by the hierarchical structure over time. To this end, I propose three algorithms that utilize text mining technologies: semi-supervised classification, semi-supervised clustering, and automatic directory building. Through experiments using controlled document collections, the proposed approach is shown to significantly support hierarchical organization of large electronic knowledge base with minimal human effort.
This research was supported by the University of Seoul, Korea, in the year of 2005.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggrawal, R., Bayardo, R.J., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)
Bonifacio, M., Bouquet, P., Traverso, P.: Enabling distributed knowledge management managerial and technological impliations. Informatik/Informatique 3(1) (2002)
Dempster, A.P., Laird, N., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B39, 1–38 (1977)
Demiriz, A., Bennett, K.: Optimization Approaches to Semi-Supervised Learning. In: Ferris, M., Mangasarian, O., Pang, J. (eds.) Applications and Algorithms of Complementarity. Kluwer Academic Publishers, Dordrecht (2000)
Han, E., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65 (1991)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Technical Report LS8-Report, Univ. of Dortmund (1997)
Kim, H.J., Lee, S.G.: A Semi-Supervised Document Clustering Technique for Information Organization. In: Proc. of the 9th Int’l Conf. on Information and Knowledge Management, pp. 30–37 (2000)
Labzour, T., Bensaid, A., Bezdek, J.: Improved Semi-Supervised Point-Prototype Clustering Algorithms. In: Proc. of the 7th International Conference on Fuzzy Systems, pp. 1383–1387 (1998)
Mitchell, T.M.: Bayesian Learning. In: Machine Learning, pp. 154–200. McGraw-Hill, New York (1997)
Mitchell, T.M.: Artificial Neural Networks. In: Machine Learning, pp. 81–126. McGraw-Hill, New York (1997)
Muslea, I., Minton, S., Knoblock, C.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442 (2002)
Nigam, K.: Using Unlabeled Data to Improve Text Classification, Ph.D. thesis, Carnegie Mellon University (2001)
Ogawa, Y., Moria, T., Kobayashi, K.: A Fuzzy Document Retrieval System Using the Key Word Connection Matrix and a Learning Method. Fuzzy Sets and Systems 39, 163–179 (1991)
Sahami, M., Yusufali, S., Baldonado, M.Q.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proc. of the 3rd ACM International Conference on Digital Libraries, pp. 200–209 (1998)
Schneider, K.-M.: Techniques for Improving the Performance of Naive Bayes for Text Classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005)
Talavera, L., Béjar, J.: Integrating declarative knowledge in hierarchical clustering tasks. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 211–222. Springer, Heidelberg (1999)
Content Management, Metadata & Semantic Web: Keynote Address. In: Net.ObjectDAYS 2001 (2001)
Innovaive Approaches for Improving Information Supply, Gartner Group Report, M-14-3517 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, Hj. (2006). On Text Mining Algorithms for Automated Maintenance of Hierarchical Knowledge Directory. In: Lang, J., Lin, F., Wang, J. (eds) Knowledge Science, Engineering and Management. KSEM 2006. Lecture Notes in Computer Science(), vol 4092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811220_18
Download citation
DOI: https://doi.org/10.1007/11811220_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37033-8
Online ISBN: 978-3-540-37035-2
eBook Packages: Computer ScienceComputer Science (R0)