Abstract
We introduce a new method to improve web site text content by identifying the most relevant free text in the web pages. In order to understand the variations in web page text, we collect pages during a period. The page text content is then transformed into a feature vector and is used as input of a clustering algorithm (SOFM), which groups the vectors by common text content. In each cluster, a centroid and its neighbor vectors are extracted. Then using a reverse clustering analysis, the pages represented by each vector are reviewed in order to find the similar. Furthermore, the proposed method was tested in a real web site, proving the effectiveness of this approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berendt, B., Spiliopoulou, M.: Analysis of navigation behavior in web sites integrating multiple information systems. The VLDB journal 9, 27–75 (2001)
Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: text summarization for web browsing on handheld devices. In: Procs. 10th Int. Conf. on World Wide Web, Hong Kong, pp. 652–662 (2001)
Chakrabarti, S.: Data mining for hypertext: A tutorial survey. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining (2000)
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–15 (2000)
Loh, S., Wives, L., de Oliveira, J.P.M.: Concept-based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations 2(1), 29–39 (2000)
Nielsen, J.: User Interface directions for the web. Communications of ACM 42(1), 65–72 (1999)
Pal, S.K., Talwar, V., Mitra, P.: Web Mining in Soft Computing Framework: Relevance, state of the art and future directions. IEEE Transactions on Neural Networks 13(5), 1163–1177 (2002)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM archive 18(11), 613–620 (1975)
Velásquez, J.D., Yasuda, H., Aoki, T., Weber, R., Vera, E.: Using self-organizing feature maps to acquire knowledge about visitor behavior in a web site. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2773(1), pp. 951–958. Springer, Heidelberg (2003)
Velásquez, J.D., Weber, R., Yasuda, H., Aoki, T.: A Methodology to Find Web Site Keywords. In: IEEE Int. Conf. on e-Technology, e-Commerce and e-Service, Taipei, Taiwan, pp. 285–292 (2004)
Velásquez, J.D., Ríos, S., Bassi, A., Yasuda, H., Aoki, T.: Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems 1(1), 11–15 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T. (2005). Using SOFM to Improve Web Site Text Content. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539117_88
Download citation
DOI: https://doi.org/10.1007/11539117_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28325-6
Online ISBN: 978-3-540-31858-3
eBook Packages: Computer ScienceComputer Science (R0)