Abstract
In this work we developed a new automatic hypertext construction method based on a proposed text mining approach. Our method applies the self-organizing map algorithm to cluster some flat text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of flat text documents collecting from several newswire sites. Although we only used Chinese text documents, our approach can be applied to any document that can be transformed to a set of indexed terms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Agosti, F. Crestani, and M. Melucci. On the use of information retrieval techniques for the automatic construction of hypertext. Information Processing & Management, 33:133–144, 1997.
R. Feldman and I. Dagan. Knowledge discovery in textual databases (kdt). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112–117, Montreal, Canada, 1995.
T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1997.
C.H. Lee and H.C. Yang. A web text mining approach based on self-organizing map. In Proc. ACM CIKM’99 2nd Workshop on Web Information and Data Management, pages 59–62, Kansas City, MI, 1999.
R. Rizzo, M. Allegra, and G. Fulantelli. Developing hypertext through a self-organizing map. In Proc. WebNet 98, pages 768–772, Orlando, USA, 1998.
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
A.H. Tan. Text mining: The state of the art and the challenges. In Proceedings of PAKDD’99 Workshop on Knowledge discovery from Advanced Databases (KDAD’99), pages 65–70, Beijing, China, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, HC., Lee, CH. (2001). Automatic Hypertext Construction through a Text Mining Approach by Self-Organizing Maps. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_14
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive