Using SOFM to Improve Web Site Text Content

Ríos, Sebastían A.; Velásquez, Juan D.; Vera, Eduardo S.; Yasuda, Hiroshi; Aoki, Terumasa

doi:10.1007/11539117_88

Sebastían A. Ríos¹⁹,
Juan D. Velásquez²⁰,
Eduardo S. Vera^21,22,
Hiroshi Yasuda¹⁹ &
…
Terumasa Aoki¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3611))

Included in the following conference series:

International Conference on Natural Computation

1576 Accesses

Abstract

We introduce a new method to improve web site text content by identifying the most relevant free text in the web pages. In order to understand the variations in web page text, we collect pages during a period. The page text content is then transformed into a feature vector and is used as input of a clustering algorithm (SOFM), which groups the vectors by common text content. In each cluster, a centroid and its neighbor vectors are extracted. Then using a reverse clustering analysis, the pages represented by each vector are reviewed in order to find the similar. Furthermore, the proposed method was tested in a real web site, proving the effectiveness of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Clustering Text: A Comparison Between Available Text Vectorization Techniques

A comprehensive and analytical review of text clustering techniques

Article 08 April 2024

A News Text Clustering Method Based on Similarity of Text Labels

References

Berendt, B., Spiliopoulou, M.: Analysis of navigation behavior in web sites integrating multiple information systems. The VLDB journal 9, 27–75 (2001)
Google Scholar
Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Procs. 10th Int. Conf. on World Wide Web, Hong Kong, pp. 652–662 (2001)
Google Scholar
Chakrabarti, S.: Data mining for hypertext: A tutorial survey. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining (2000)
Google Scholar
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–15 (2000)
Article Google Scholar
Loh, S., Wives, L., de Oliveira, J.P.M.: Concept-based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations 2(1), 29–39 (2000)
Article Google Scholar
Nielsen, J.: User Interface directions for the web. Communications of ACM 42(1), 65–72 (1999)
Article Google Scholar
Pal, S.K., Talwar, V., Mitra, P.: Web Mining in Soft Computing Framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5), 1163–1177 (2002)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM archive 18(11), 613–620 (1975)
Article MATH Google Scholar
Velásquez, J.D., Yasuda, H., Aoki, T., Weber, R., Vera, E.: Using self-organizing feature maps to acquire knowledge about visitor behavior in a web site. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2773(1), pp. 951–958. Springer, Heidelberg (2003)
Chapter Google Scholar
Velásquez, J.D., Weber, R., Yasuda, H., Aoki, T.: A Methodology to Find Web Site Keywords. In: IEEE Int. Conf. on e-Technology, e-Commerce and e-Service, Taipei, Taiwan, pp. 285–292 (2004)
Google Scholar
Velásquez, J.D., Ríos, S., Bassi, A., Yasuda, H., Aoki, T.: Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems 1(1), 11–15 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research Center for Advanced Science and Technology, University of Tokyo,
Sebastían A. Ríos, Hiroshi Yasuda & Terumasa Aoki
Department of Industrial Engineering, University of Chile,
Juan D. Velásquez
Center for Collaborative Research, University of Tokyo,
Eduardo S. Vera
On leave from Department of Computer Science, University of Chile,
Eduardo S. Vera

Authors

Sebastían A. Ríos
View author publications
You can also search for this author in PubMed Google Scholar
Juan D. Velásquez
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo S. Vera
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Yasuda
View author publications
You can also search for this author in PubMed Google Scholar
Terumasa Aoki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
School of Software, Sun Yat-Sen University, 510275, Guangzhou, China
Ke Chen
School of Computer Engineering, Nanyang Technological University, BLK N4, 2b-39, Nanyang Avenue, 639798, Singapore
Yew Soon Ong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T. (2005). Using SOFM to Improve Web Site Text Content. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539117_88

Download citation

DOI: https://doi.org/10.1007/11539117_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28325-6
Online ISBN: 978-3-540-31858-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics