Abstract
Measuring web page similarity is one of the core issues in web content detection and Classification. In this paper, we first give the definition of webpage visual blocks. And then we propose a method using visual blocks for measuring web page similarity. The experiments show our method can effectively measure level of similarity between different type of webpages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wenyin, L., Huang, G., Xiaoyue, L., et al.: Detection of phishing webpages based on visual similarity. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1060–1061. ACM (2005)
Baczkiewicz, M., Łuczak, D., Zakrzewicz, M.: Similarity-based web clip matching. Control and Cybernetics 40, 715–730 (2011)
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1998)
Zhang, W., Lu, H., Xu, B., et al.: Web phishing detection based on page spatial layout similarity. Informatica 37(3), 231–244 (2013)
Takama, Y., Mitsuhashi, N.: Visual similarity comparison for Web page retrieval. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 301–304. IEEE (2005)
Law, M.T., Gutierrez, C.S., Thome, N., et al.: Structural and visual similarity learning for Web page archiving. In: 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE (2012)
Marinai, S.: Page Similarity and Classification. In: Handbook of Document Image Processing and Recognition, pp. 223–253 (2014)
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Block-based Web Search. In: The 27th Annual International ACM SIGIR Conference on Information Retrieval, pp. 440–447. ACM, Sheffield (2004)
Bartík, V.: Measuring web page similarity based on textual and visual properties. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 13–21. Springer, Heidelberg (2012)
Thada, M.V., Joshi, M.S.: A Genetic Algorithm Approach for Improving the average Relevancy of Retrieved Documents Using Jaccard Similarity Coefficient. International Journal of Research in IT & Management 1(4) (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wei, Y., Wang, B., Liu, Y., Lv, F. (2014). Research on Webpage Similarity Computing Technology Based on Visual Blocks. In: Huang, H., Liu, T., Zhang, HP., Tang, J. (eds) Social Media Processing. SMP 2014. Communications in Computer and Information Science, vol 489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45558-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-662-45558-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45557-9
Online ISBN: 978-3-662-45558-6
eBook Packages: Computer ScienceComputer Science (R0)