Skip to main content

Using SOFM to Improve Web Site Text Content

  • Conference paper
Advances in Natural Computation (ICNC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3611))

Included in the following conference series:

  • 1576 Accesses

Abstract

We introduce a new method to improve web site text content by identifying the most relevant free text in the web pages. In order to understand the variations in web page text, we collect pages during a period. The page text content is then transformed into a feature vector and is used as input of a clustering algorithm (SOFM), which groups the vectors by common text content. In each cluster, a centroid and its neighbor vectors are extracted. Then using a reverse clustering analysis, the pages represented by each vector are reviewed in order to find the similar. Furthermore, the proposed method was tested in a real web site, proving the effectiveness of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Berendt, B., Spiliopoulou, M.: Analysis of navigation behavior in web sites integrating multiple information systems. The VLDB journal 9, 27–75 (2001)

    Google Scholar 

  2. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Procs. 10th Int. Conf. on World Wide Web, Hong Kong, pp. 652–662 (2001)

    Google Scholar 

  3. Chakrabarti, S.: Data mining for hypertext: A tutorial survey. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining (2000)

    Google Scholar 

  4. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–15 (2000)

    Article  Google Scholar 

  5. Loh, S., Wives, L., de Oliveira, J.P.M.: Concept-based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations 2(1), 29–39 (2000)

    Article  Google Scholar 

  6. Nielsen, J.: User Interface directions for the web. Communications of ACM 42(1), 65–72 (1999)

    Article  Google Scholar 

  7. Pal, S.K., Talwar, V., Mitra, P.: Web Mining in Soft Computing Framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5), 1163–1177 (2002)

    Article  Google Scholar 

  8. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM archive 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  9. Velásquez, J.D., Yasuda, H., Aoki, T., Weber, R., Vera, E.: Using self-organizing feature maps to acquire knowledge about visitor behavior in a web site. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2773(1), pp. 951–958. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Velásquez, J.D., Weber, R., Yasuda, H., Aoki, T.: A Methodology to Find Web Site Keywords. In: IEEE Int. Conf. on e-Technology, e-Commerce and e-Service, Taipei, Taiwan, pp. 285–292 (2004)

    Google Scholar 

  11. Velásquez, J.D., Ríos, S., Bassi, A., Yasuda, H., Aoki, T.: Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems 1(1), 11–15 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T. (2005). Using SOFM to Improve Web Site Text Content. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539117_88

Download citation

  • DOI: https://doi.org/10.1007/11539117_88

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28325-6

  • Online ISBN: 978-3-540-31858-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics