Abstract
Posters are widely used a powerful tool for communication. They are very informative but are normally viewed for only 3 s, which calls for efficient and effective information delivery. It is thus important to know where people would look for posters. Saliency models could be of great help where expensive and time-consuming eye-tracking experiment isn’t an option. However, current datasets for saliency model training mainly deal with natural scenes, which makes research on saliency models for posters difficult. To address this problem, we collected 1700 high-quality posters as well as their eye-tracking data where each image is viewed by 15 participants. This could be the groundwork for future research in the field of saliency prediction for posters. It is noticeable that posters are rich in texts (e.g. title, slogan, description paragraph). The various types of texts serve respective functions, making some relatively more important than others. Nevertheless, the difference is largely neglected in current studies where researchers put same emphasis on all text regions, and the problem is especially crucial when it comes to saliency model for posters. Our further analysis of the eye-tracking results with focus on text offers some insights into the issue.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Visual saliency prediction
- Eye-tracking dataset of posters
- Relative importance of different text regions
1 Introduction
Poster is a popular and rather unique means of information delivery. Unlike photos or artworks which are more likely to be carefully looked at, they are normally viewed for around 3 s [1]. Plus, they are often rich in context, which requires higher level of semantic understanding. All these call for efficient and high-quality information delivery, and it is thus important to know where people would pay more attention to when they look at posters.
Along with the progress of machine learning, studies on visual saliency models using neural networks have gain strong momentums these years. Visual saliency models can make predictions for fixations of viewers, offering help where expensive, tedious and time-consuming eye-tracking experiment isn’t an option [2]. Most saliency models deal with prediction of fixation points of natural scenes, and few other studies focus on other image types such as webpages and infographics [3]. This is partly due to the fact that current datasets used for saliency model training basically consist of natural scenes. MIT300 which contains 300 natural images is one of the most frequently used datasets, and the same researchers later established MIT1003 which consists of 1003 landscapes and portraits [4, 5]. Other datasets include UNCEF (758 images, many of them emotion-evoking) [6], AVA (over 250,000 photography work) [7], CAT2000 (4000 images from 20 categories) [8] etc. Shen and Zhao [9] provided one of the very few datasets for webpages which contains 149 webpages and eye-tracking from 11 participants. To our knowledge, there isn’t yet a dataset of posters. It is therefore important to introduce a dataset of posters as the groundwork for saliency models of posters.
Compared to natural scenes, posters are rich in semantics which requires higher level of cognition. Bottom-up and top-down mechanism are the two cognitive mechanisms in human’s visual system: bottom-up mechanism refers to the instinctive and automatic deployment of attention, while top-down mechanism is driven by subjective factors such as pre-knowledge and personal interest [10]. Many studies have looked into how to incorporate top-down semantic features into saliency models. High-level semantic features such as face, person, object, text etc. have been employed by researchers. Among them, face is found the most important semantic feature, followed by text and other pop-out elements [11].
It’s important to note that posters are especially rich in texts (e.g. title, slogan, description paragraph), making it necessary to put special emphasis on this sematic feature. The various types of texts serve respective functions, making some relatively more important than others. However, current saliency models incorporate texts as feature for model training by using a text detector, which puts same weight on every piece of text and thus results in significant inaccuracy for saliency prediction of text regions [12]. In terms of why different texts attract differently, there are two levels of reasons: the lower-level reason is about typography, which leads to bottom-up saliency difference and is largely decided the type of text; the higher-level one concerns top-down understanding of words. In this study we focused on the former and tried to tried to alleviate the latter. Analysis of eye-tracking results of our dataset with text as the focus aims to offer some insights into how text influences saliency.
2 Large-Scale Eye-Tracking
2.1 Dataset Gathering
Our dataset consists of 1700 posters from a wide range of fields such food, cosmetics, electronics, jewelry, auto etc. This was done to cover more design patterns and reduce the impact of personal preference or pre-knowledge of our participants. The posters were then divided into three categories based on the ratio of graph and text: The first contains almost no readable text (group A), the second only a piece of title/slogan (group B), and the rest are the third (group C). The number of images for each category is almost equal. In order to ease top-down semantic influence, we only included posters in Chinese for group B and group C, and posters which featured copywriting weren’t chosen at all. After clarity examination, watermark editing and resizing, the three categories of pictures were mixed and evenly distributed into four groups.
2.2 Eye-Tracking Experiment
60 participants (29 males, 31 females) were students between the age of 18 and 25. They were divided into four groups, making each image viewed by 15 participants. All viewers sat at a distance of about 50 cm from a screen with resolution of 1280 * 1024, and they were asked to stabilize their heads. Tobii T60 eye tracker was used to record eye movement and calibration was examined before each run. During the free-viewing task, each image was shown for three seconds.
3 Data Analysis
In order to statistically study the saliency pattern of our dataset, we chose one group of images from our dataset and used tools provided by Tobii Studio such as heatmap and area of interest to analyze the eye-tracking data. Further investigation into the differences in statistic results of the three categories of images reveals how various text elements interact with each other and how they impact other elements.
3.1 Heatmap
Center-Bias
The term center-bias refers to the phenomenon that most fixation points tend to be located in the center of an image during a free-viewing task. Center-bias has been widely proved in eye-tracking studies using datasets of natural scenes [5, 13]. It is often attributed to the preference of photographers, who are likely to put objects of interest in the center of the image. In turn, viewers would form the habit of searching for important content near the central area of an image [14]. Also, due to experiment settings the fact that viewers directly face the center of the screen could contribute [15].
It is interesting to note that center-bias still applies for our dataset of posters. Figure 1 shows the mean heatmap of all posters within the group. The average heatmap has spindle-shaped clustering of fixation points in the central region, proving the center-bias hypothesis. However, the tendency towards the center isn’t very strong compared with previous work on dataset of natural scenes [5]. This may be due to the fact that posters are naturally more informative and efficient compared to natural scenes. This means that more scattering elements, which serve specific functions, are intentionally designed to attract viewers’ attention.
We then looked at the saliency difference between our three categories of stimuli. We found that fixation region is more concentrated for group A, and group C is the least concentrated. Figure 2(A) shows the average saliency map for the three different categories. From our observation, Group A enjoys more clustered fixation areas which form a clear spindle-shaped fixation region near the center of the image. However, for group C the fixation area is rather vague. Statistically, for group A 41.1% of salient areas fall in the 25% central region, and for group C the number is 34.2%. According to the gray level plot, for group A there is a larger number of black pixels (more than 500 K) and smaller number of gray pixels, and the decrease slope is rather smooth (see Fig. 2(B)). In comparison, group C owns fewer black pixels (around 350 K) as well as more gray pixels, and its decrease slope fluctuates. The reason for the difference in concentration of salient area may lie in the fact that designers tend to place the most important and eye-catching element in the center when designing posters with only graph and logo (group A). Without other distracting elements, viewers are naturally drawn to the center. With the increase in dispersed text elements, viewers’ fixation is more scattered.
In order to analyze whether human fixation consists for different people, we measured the entropy of saliency maps averaged across all viewers. Our Entropy histograms present gaussian-like distribution (see Fig. 2(C)). Group A exhibits lower level of entropy compared with group B, and group C has the highest entropy. For images with various text elements (group C), viewers’ choice of where to look tends to differ based on subjective top-down factors such as personal interest and pre-knowledge. However, for images with fewer places to look at, viewers’ decisions tend to be driven by bottom-up factors which basically act the same on everyone.
Specific Salient Element
From observation of the saliency maps, we noticed that viewers tend to focus on specific elements such as face, object and text.
-
A.
Face
Our saliency maps show a strong bias for people to focus on face, and the same applies for faces of animals, personified objects and even sculpture (see Fig. 3). Within the face, fixations are more likely to fall on eyes, nose and lips. When several faces appear, people tend to focus on faces in the middle and probably the more good-looking face.
-
B.
Object
Objects can also catch a lot of attention (see Fig. 4). Objects placed in the center are more likely to be looked at than those which scatter around. It’s interesting to note that there seems tendency for fixations to fall on objects presented by a spokesperson.
-
C.
Text
Texts are very common and very carefully designed in posters, as they could convey information both effectively and efficiently. Through observation of our saliency maps, it is clear that text areas always rank high on the fixation list (see Fig. 5). This could be attributed to the fact that we are almost instinctively driven to look at text and to try to understand it. However, not every piece of text enjoys the same degree of saliency because of the differences in text type, and this would be further investigated in the next part. Additionally, there are two interesting points to notice: text-on-object can often draw lot of attention, and both text saliency and object saliency may have contributed; name of the spokesperson can always attract attention. This could be explained by the location of the text which is normally near face of the person, making movement of attention easy. Also, the drive to know the name of the spokesperson is a possible reason.
3.2 Area of Interest
In order to quantitively study how people fixate on specific elements we hand-labeled our dataset. The elements chosen are face, object, logo and text. Text is further categorized into title, subtitle, description line, description paragraph, slogan and text-on-object so that we could study at length how different types of text influence overall visual saliency.
Time to First Fixation
Time to first fixation demonstrates the order in which viewers fixate on an area of interest. As in shown in Fig. 6(A), all the areas in concern could be grouped into three tiers: object and face first catch attention at around 0.6 s, then around 0.9 s title and text-on-object take the focus, after that comes all the text elements such as paragraph, logo, subtitle and slogan during 1.2 s and 1.4 s. This shows that object and face could catch viewers’ attention straight away. Title and text-on-object come the next. Fixation of title could be explained by typography of the title: as the usually most important piece of text in a poster, a title is normally big in size, placed in the center of the image and with carefully chosen font. As for text-on-object, perhaps object saliency itself and the instinct drive to know what the object is by reading text-on-object could explain. Within the remaining text elements, description paragraph and description line seem the least attractive for viewers, as they often are not very interesting. On the contrary, a logo, though not big in size and very often placed in the corner of the poster, is always emphasized through color, font etc.
Further analysis of average time to first fixation of three different groups reveals the differences between the groups (see Fig. 6(B)). With the increase in text elements, there is tendency of delayed time to first fixation for object, face, text-on-object, logo and slogan. The reason may lie in the fact that as the amount of text elements increases people spend longer time deciding where to look at. For title and subtitle, however, time to first fixation seems to be brought forward, as title and subtitle might be more intentionally designed to catch attention among all the elements in the poster.
Observation Length
Observation length measures the degree to which an area attracts. It stands for the total time a person looks within an area of interest, starting from a fixation within the area and ending with a fixation out of the area. Figure 7(A) shows the average observation length of posters. Object, face and title are among the top three areas which could hold attention for a fairly long period of time. Observation length of the remaining text elements are mostly the same, with description paragraph a bit longer than others. It is noticeable that the order of fixation and the length of fixation go hand in hand, which means that the area which attracts attention first is more likely to hold attention longer. Object, text-on-object and description paragraph, however, are the exceptions. For object, the rather long observation length could be because that viewers are actually spending time on text-on-object, which is hard to separate; For text-on-object, although it catches attention very fast it might not hold it very long because the words may be easily understood; For description paragraph, the larger amount of message contained requires more time of understanding.
Figure 7(B) demonstrates the comparison between observation length of posters of different groups. It is clear that observation length of face, object, text-on-object and logo is much longer for group A than that for group B and group C. This is because that with more distracting text elements in Group B and Group C, viewers tend to search for more information instead of dwelling on the first few interest areas.
Participant%
Participant% refers to the ratio of participants who fixated on a specific area of interest. It has strong impacts on the intensity of an area on a saliency map.
As is shown in Fig. 8(A), among all the text-related areas, title ranks first (89.4%), next comes text-on-object (67.3%), followed by description paragraph (55.8%), subtitle (50.8%) and description line (47.4%), and finally logo (40.2%) and slogan (39.2%). This reveals the relative importance of different text elements, and could help determine weights of those elements in studies of saliency prediction neural networks. Comparison of participant% between three groups (see Fig. 8(B)) shows that less people fixate on text-on-object and logo when the number of text elements increases.
4 Discussion
Our eye-tracking dataset of 1700 posters with category annotations based on the ratio of graph and text allowed us to quantitatively inspect the overall saliency pattern of posters with focus on text. Subsequent between-group studies helped us gain knowledge of how texts interact with each other and other elements in posters.
We learned that center-bias still holds true for posters, though not so strong and rather spindle-shaped. With the increase in text elements, fixations are more dispersed and people’s choices of where to look at tend to differ. Observation of heatmap shows that face, object and text are the specific elements people focus on. Also, different types of text are found to attract differently. All the elements mentioned above were labeled as area of interest, with text further categorized into title, subtitle, description line, description paragraph, slogan and text-on-object. Analysis of time to fixation shows that object and face are the first elements that draw attention, followed by title and text-on-object. However, as the amount of text elements increases people tend to take longer time to fixate on object and face, as they need more time to decide where to look at. Observation length reveals the degree to which an area attracts, and the result mostly seems to go hand in hand with the order of fixation, with face, object and title remaining the top 3. There is a strong tendency for people to spend less time on face and object when more text elements exist. The ratio of participants who fixated on a specific region (participant%) distributes almost evenly from object (89.4%) to slogan (39.2%). Less people focus on text-on-object and logo when there are more text elements. The relative importance of different types of text could offer some advice on the weights for different text features in saliency model studies.
5 Conclusion
In this paper we made the following contributions: We provided a dataset containing 1700 high-quality posters falling into a wide range of areas, with eye-tracking data collected from 15 people for each image. To our knowledge there aren’t yet such datasets for posters. Our dataset could thus be the foundation for research on visual saliency model for posters. Statistical analysis of heatmap and area of interest reveals the overall saliency pattern of posters. Specifically, the relative importance of different text regions and how they influence saliency of other elements. This could provide some advice on saliency prediction of posters.
For future work it is advisable to work on saliency prediction model for posters, as it is a relatively novel field and has strong practical value. To take a step further, research could be done on automatic evaluation system for posters.
References
Hutton, S.B., Nolte, S.: The effect of gaze cues on attention to print advertisements. Appl. Cogn. Psychol. 25(6), 887–892 (2011)
Xu, P., Ehinger, K.A., Zhang, Y.: TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking. Computer Science (2015)
Kim, N.W., et al.: BubbleView: an interface for crowdsourcing image importance maps and tracking visual attention. ACM Trans. Comput. Hum. Interact. 24(5), 1–40 (2017)
Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012)
Judd, T., Ehinger, K., Durand, F.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)
Ramanathan, S., Katti, H., Sebe, N., Kankanhalli, M., Chua, T.-S.: An eye fixation database for saliency detection in images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 30–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_3
Murray, N., Marchesotti, L., Perronnin, F.: AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408–2415. IEEE (2012)
Borji, A., Itti, L.: Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015)
Shen, C., Zhao, Q.: Webpage saliency. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 33–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_3
Frintrop, S., Rome, E., Christensen, H.I.: Computational visual attention systems and their cognitive foundations: a survey. ACM Trans. Appl. Percept. (TAP) 7(1), 1–39 (2010)
Kümmerer, M., Theis, L., Bethge, M.: Deep gaze I: boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045 (2014)
Bylinskii, Z., Recasens, A., Borji, A., Oliva, A., Torralba, A., Durand, F.: Where Should Saliency Models Look Next? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part V. LNCS, vol. 9909, pp. 809–824. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_49
Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)
Tseng, P.H., Carmi, R., Cameron, I.G., Munoz, D.P., Itti, L.: Quantifying center bias of observers in free viewing of dynamic natural scenes. J. Vis. 9(7), 4–4 (2009)
Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.: WSUN: a Bayesian framework for saliency using natural statistics. J. Vis. 8(7), 32–32 (2008)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fang, Y., Zhu, L., Cao, X., Zhang, L., Li, X. (2020). Visual Saliency: How Text Influences. In: Meiselwitz, G. (eds) Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network Analysis. HCII 2020. Lecture Notes in Computer Science(), vol 12194. Springer, Cham. https://doi.org/10.1007/978-3-030-49570-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-49570-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49569-5
Online ISBN: 978-3-030-49570-1
eBook Packages: Computer ScienceComputer Science (R0)