Skip to main content

Automatic Hypertext Construction through a Text Mining Approach by Self-Organizing Maps

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Abstract

In this work we developed a new automatic hypertext construction method based on a proposed text mining approach. Our method applies the self-organizing map algorithm to cluster some flat text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of flat text documents collecting from several newswire sites. Although we only used Chinese text documents, our approach can be applied to any document that can be transformed to a set of indexed terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Agosti, F. Crestani, and M. Melucci. On the use of information retrieval techniques for the automatic construction of hypertext. Information Processing & Management, 33:133–144, 1997.

    Article  Google Scholar 

  2. R. Feldman and I. Dagan. Knowledge discovery in textual databases (kdt). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 112–117, Montreal, Canada, 1995.

    Google Scholar 

  3. T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1997.

    MATH  Google Scholar 

  4. C.H. Lee and H.C. Yang. A web text mining approach based on self-organizing map. In Proc. ACM CIKM’99 2nd Workshop on Web Information and Data Management, pages 59–62, Kansas City, MI, 1999.

    Google Scholar 

  5. R. Rizzo, M. Allegra, and G. Fulantelli. Developing hypertext through a self-organizing map. In Proc. WebNet 98, pages 768–772, Orlando, USA, 1998.

    Google Scholar 

  6. G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

    MATH  Google Scholar 

  7. A.H. Tan. Text mining: The state of the art and the challenges. In Proceedings of PAKDD’99 Workshop on Knowledge discovery from Advanced Databases (KDAD’99), pages 65–70, Beijing, China, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, HC., Lee, CH. (2001). Automatic Hypertext Construction through a Text Mining Approach by Self-Organizing Maps. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics