Abstract
With the advancement of internet technology, web page segmentation, which aims to divide web pages into semantically coherent units, has become increasingly crucial for web-related applications. Conventional purely visual web page segmentation approaches, which depend on traditional edge detection, face challenges in generalizing across complex web pages. Recently, the Segment Anything Model (SAM) represents remarkable visual understanding and segmentation abilities. This inspires us that SAM can also demonstrate great potential in Web Page Segmentation. However, due to the lack of web-specific training data, its direct adaptation to web page segmentation domain has been hindered. To address this challenge, we propose WebSAM-Adapter, an effective adaptation of SAM, featuring a three-module architecture specifically tailored for web page segmentation with minimal additional trainable parameters. First, we propose a patch embedding tune module for adjusting the frozen patch embedding features, which is crucial for modifying the distribution of the original model. Second, an edge components tune module is designed to learn significant structural features within each web page. Finally, the outputs of these specialized modules are sent into our key Adapter module, which employs a lightweight multi-layer perceptron (MLP) to amalgamate these enriched features and generate webpage-specific knowledge. To the best of our knowledge, our method is the first successful adaptation of a large visual model like SAM to web page segmentation. Empirical evaluations on the comprehensive Webis-WebSeg-20 dataset demonstrate our model’s state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kiesel, J., Kneist, F., Meyer, L., Komlossy, K., Stein, B., Potthast, M.: Web page segmentation revisited: evaluation framework and dataset. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3047–3054 (2020)
Cai, D., He, X., Wen, J.-R., Ma, W.-Y.: Block-level link analysis. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 440–447 (2004)
Bing, L., Guo, R., Lam, W., Niu, Z.-Y., Wang, H.: Web page segmentation with structured prediction and its application in web page classification. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 767–776 (2014)
Akpinar, M.E., Yesilada, Y.: Vision based page segmentation algorithm: extended and perceived success. In: Sheng, Q.Z., Kjeldskov, J. (eds.) ICWE 2013. LNCS, vol. 8295, pp. 238–252. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-04244-2_22
Saar, T., Dumas, M., Kaljuve, M., Semenenko, N.: Browserbite: cross-browser testing via image processing. Softw. Pract. Exp. 46(11), 1459–1477 (2016)
Mahajan, S., Abolhassani, N., McMinn, P., Halfond, W.G.: Automated repair of mobile friendly problems in web pages. In: Proceedings of the 40th International Conference on Software Engineering, pp. 140–150 (2018)
Geng, G.-G., Lee, X.-D., Zhang, Y.-M.: Combating phishing attacks via brand identity and authorization features. Secur. Commun. Netw. 8(6), 888–898 (2015)
Cormier, M., Cohen, R., Mann, R., Rahim, K., Wang, D.: A robust vision-based framework for screen readers. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 555–569. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16199-0_39
Cormier, M., Moffatt, K., Cohen, R., Mann, R.: Purely vision-based segmentation of web pages for assistive technology. Comput. Vis. Image Underst. 148, 46–66 (2016)
Sanoja, A., Gançarski, S.: Block-o-matic: a web page segmentation framework. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS), pp. 595–600. IEEE (2014)
Vineel, G.: Web page dom node characterization and its application to page segmentation. In: 2009 IEEE International Conference on Internet Multimedia Services Architecture and Applications (IMSAA), pp. 1–6. IEEE (2009)
Chen, Y., Ma, W.-Y., Zhang, H.-J.: Detecting web page structure for adaptive viewing on small form factor devices. In: Proceedings of the 12th International Conference on World Wide Web, pp. 225–233 (2003)
Rajkumar, K., Kalaivani, V.: Dynamic web page segmentation based on detecting reappearance and layout of tag patterns for small screen devices. In: 2012 International Conference on Recent Trends in Information Technology, pp. 508–513. IEEE (2012)
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Vips: a vision-based page segmentation algorithm (2003)
Zeleny, J., Burget, R., Zendulka, J.: Box clustering segmentation: a new method for vision-based web page preprocessing. Inf. Process. Manag. 53(3), 735–750 (2017)
Bajammal, M., Mesbah, A.: Page segmentation using visual adjacency analysis. arXiv preprint arXiv:2112.11975 (2021)
Andrew, J., Ferrari, S., Maurel, F., Dias, G., Giguet, E.: Web page segmentation for non visual skimming. In: The 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33) (2019)
Manabe, T., Tajima, K.: Extracting logical hierarchical structure of html documents based on headings. In: Proceedings of the VLDB Endowment, pp. 1606–1617 (2015). http://dx.doi.org/10.14778/2824032.2824058
Cao, J., Mao, B., Luo, J.: A segmentation method for web page analysis using shrinking and dividing. Int. J. Parallel Emergent Distrib. Syst. 25(2), 93–104 (2010)
Cormer, M., Mann, R., Moffatt, K., Cohen, R.: Towards an improved vision-based web page segmentation algorithm. In: 2017 14th Conference on Computer and Robot Vision (CRV), pp. 345–352. IEEE (2017)
Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Ma, J., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023)
Wu, J., et al.: Medical sam adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)
Shaharabany, T., Dahan, A., Giryes, R., Wolf, L.: Autosam: adapting sam to medical images by overloading the prompt encoder. arXiv preprint arXiv:2306.06370 (2023)
Chen, K., et al.: Rsprompter: learning to prompt for remote sensing instance segmentation based on visual foundation model. arXiv preprint arXiv:2306.16269 (2023)
Chen, T., et al.: Sam fails to segment anything?-sam-adapter: adapting sam in underperformed scenes: Camouflage, shadow, and more. arXiv preprint arXiv:2304.09148 (2023)
Tang, L., Xiao, H., Li, B.: Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709 (2023)
Zaken, E.B., Ravfogel, S., Goldberg, Y.: Bitfit: simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 (2021)
Liu, W., Shen, X., Pun, C.-M., Cun, X.: Explicit visual prompting for low-level structure segmentations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19 434–19 445 (2023)
He, X., Li, C., Zhang, P., Yang, J., Wang, X.E.: Parameter-efficient model adaptation for vision transformers. arXiv preprint arXiv:2203.16329 (2022)
Chen, S., et al.: Adaptformer: adapting vision transformers for scalable visual recognition. Adv. Neural Inf. Process. Syst. 35, 16 664–16 678 (2022)
Chen, Z., et al.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). Cornell University - arXiv (2016)
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 461–486 (2009). https://doi.org/10.1007/s10791-008-9066-8
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Chen, K., et al.: Mmdetection: open mmlab detection toolbox and benchmark. arXiv Computer Vision and Pattern Recognition (2019)
Kiesel, J., Meyer, L., Kneist, F., Stein, B., Potthast, M.: An empirical comparison of web page segmentation algorithms. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 62–74. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_5
Acknowledgements
This work was financially supported by the Joint Research Fund of China Pacific Insurance (Group) Co. and SJTU-Artificial Intelligence Institute.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ren, B., Qian, Z., Sun, Y., Gao, C., Zhang, C. (2024). WebSAM-Adapter: Adapting Segment Anything Model for Web Page Segmentation. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-56027-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)