Authors:
Ciheng Zhang
1
;
Decky Aspandi
2
and
Steffen Staab
2
;
3
Affiliations:
1
Institute of Industrial Automation and Software Engineering, University of Stuttgart, Stuttgart, Germany
;
2
Institute for Parallel and Distributed Systems, University of Stuttgart, Stuttgart, Germany
;
3
Web and Internet Science, University of Southampton, Southampton, U.K.
Keyword(s):
Eye-Gaze Saliency, Image Translation, Visual Attention.
Abstract:
World-Wide-Web, with website and webpage as a main interface, facilitates dissemination of important information. Hence it is crucial to optimize webpage design for better user interaction, which is primarily done by analyzing users’ behavior, especially users’ eye-gaze locations on the webpage. However, gathering these data is still considered to be labor and time intensive. In this work, we enable the development of automatic eye-gaze estimations given webpage screenshots as input by curating of a unified dataset that consists of webpage screenshots, eye-gaze heatmap and website’s layout information in the form of image and text masks. Our curated dataset allows us to propose a deep learning-based model that leverages on both webpage screenshot and content information (image and text spatial location), which are then combined through attention mechanism for effective eye-gaze prediction. In our experiment, we show benefits of careful fine-tuning using our unified dataset to improve
accuracy of eye-gaze predictions. We further observe the capability of our model to focus on targeted areas (images and text) to achieve accurate eye-gaze area predictions. Finally, comparison with other alternatives shows state-of-the-art result of our approach, establishing a benchmark for webpage based eye-gaze prediction task.
(More)