Screen recording segmentation to scenes for eye-tracking analysis

Simko, Jakub; Vrba, Jakub

doi:10.1007/s11042-018-6369-7

Screen recording segmentation to scenes for eye-tracking analysis

Published: 09 July 2018

Volume 78, pages 2401–2425, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

532 Accesses
3 Citations
Explore all metrics

Abstract

In usability studies involving eye-tracking, quantitative analysis of gaze data requires the information about so called scene occurrences. Scene ocurrences are time segments during which the application user interface remains more-less static, so gaze events (e.g., fixations) can be mapped to the particular areas of interest (user interface elements). The scene occurrences typically start and end by user interface changes such as page-to-page transitions, menu expansions, overlay propmts, etc. Normally, one would record such changes programmatically through application logging, yet in many studies, this is not possible. For example, in an early-prototype mobile-app testing, only a camera recording of a smart device screen is often available as evidence. In such cases, analysts must manually annotate the recordings. To reduce the need for manual annotation of scene occurrences, we present an image processing method for segmenting user interface video recordings. The method exploits specific properties of user interface recordings, which greatly differ from real world video shots (for which many segmentation methods exist). The core of our method lies in the use of SSIM and SIFT similarity metrics used on video frames (with several pre-processing and filtering procedures). The main advantage of our method is, that it requires no training data apart from single screenshot example for each scene (to which the recording frames are compared). The method is also able to work with user finger overlays, which are always present in mobile device recordings. We evaluate the accuracy of our method over recordings from several real-life studies and compare it with other image similarity techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Eye Tracking in Virtual Reality: a Broad Review of Applications and Challenges

Article Open access 18 January 2023

RETRACTED ARTICLE: Eye tracking: empirical foundations for a minimal reporting guideline

Article Open access 06 April 2022

Notes

http://uxi.sk

References

Banovic N, Grossman T, Matejka J, Fitzmaurice G (2012) Waken: reverse engineering usage information and interface structure from software videos. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, UIST ’12. ACM, New York, pp 83–92, https://doi.org/10.1145/2380116.2380129, (to appear in print)
Bao L, Li J, Xing Z, Wang X, Zhou B (2015) scvripper: video scraping tool for modeling developers’ behavior using interaction data. In: Proceedings of the 37th international conference on software engineering - volume 2, ICSE ’15. http://dl.acm.org/citation.cfm?id=2819009.2819134. IEEE Press, Piscataway, pp 673–676
Chang TH, Yeh T, Miller RC (2010) Gui testing using computer vision. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’10. ACM, New York, pp 1535–1544, https://doi.org/10.1145/1753326.1753555, (to appear in print)
Ciresan DC, Meier U, Masci J, Maria Gambardella L, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: IJCAI Proceedings-international joint conference on artificial intelligence, vol 22, Barcelona, p 1237
Denoue L, Carter S, Cooper M (2016) Docugram: turning screen recordings into documents. In: Proceedings of the 2016 ACM symposium on document engineering, DocEng ’16. ACM, New York, pp 185–188, https://doi.org/10.1145/2960811.2967154, (to appear in print)
Dixon M, Fogarty J (2010) Prefab: implementing advanced behaviors using pixel-based reverse engineering of interface structure. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’10. ACM, New York, pp 1525–1534, https://doi.org/10.1145/1753326.1753554, (to appear in print)
Dixon M, Laput G, Fogarty J (2014) Pixel-based methods for widget state and style in a runtime implementation of sliding widgets. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14. ACM, New York, pp 2231–2240, https://doi.org/10.1145/2556288.2556979, (to appear in print)
Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer-Verlag New York, Inc., Secaucus
MATH Google Scholar
Givens P, Chakarov A, Sankaranarayanan S, Yeh T (2013) Exploring the internal state of user interfaces by combining computer vision techniques with grammatical inference. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13. IEEE Press, Piscataway, pp 1165–1168. http://dl.acm.org/citation.cfm?id=2486788.2486951
Haralick RM, Sternberg SR, Zhuang X (1987) Image analysis using mathematical morphology. IEEE Trans Pattern Anal Mach Intell 4:532–550
Article Google Scholar
Holmqvist K, Nyström M, Andersson R, Dewhurst R, Jarodzka H, van de Weijer J (2011) Eye tracking: a comprehensive guide to methods and measures. OUP Oxford. https://books.google.sk/books?id=5rIDPV1EoLUC
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML Deep learning workshop, vol 2
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article MathSciNet Google Scholar
Mendi E, Bayrak C (2010) Shot boundary detection and key frame extraction using salient region detection and structural similarity. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, pp 66:1–66:4, https://doi.org/10.1145/1900008.1900096, (to appear in print)
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Article Google Scholar
Pongnumkul S, Dontcheva M, Li W, Wang J, Bourdev L, Avidan S, Cohen MF (2011) Pause-and-play: automatically linking screencast video tutorials with applications. In: Proceedings of the 24th annual ACM symposium on user interface software and technology, UIST ’11. ACM, New York, pp 135–144, https://doi.org/10.1145/2047196.2047213, (to appear in print)
Priya GGL, Domnic S (2010) Video cut detection using dominant color features. In: Proceedings of the first international conference on intelligent interactive technologies and multimedia, IITM ’10. ACM, New York, pp 130–134, https://doi.org/10.1145/1963564.1963586, (to appear in print)
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 512–519
Tahaghoghi SMM, Williams HE, Thom JA, Volkmer T (2005) Video cut detection using frame windows. In: Proceedings of the twenty-eighth Australasian conference on computer science - volume 38, ACSC ’05. Australian Computer Society, Inc., Darlinghurst, pp 193–199. http://dl.acm.org/citation.cfm?id=1082161.1082183
Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1122–1134
Article MathSciNet Google Scholar
Tonomura Y, Akutsu A, Otsuji K, Sadakata T (1993) Videomap and videospaceicon: tools for anatomizing video content. In: Proceedings of the INTERACT ’93 and CHI ’93 conference on human factors in computing systems, CHI ’93. ACM, New York, pp 131–136, https://doi.org/10.1145/169059.169117, (to appear in print)
Truong BT, Dorai C, Venkatesh S (2000) New enhancements to cut, fade, and dissolve detection processes in video segmentation. In: Proceedings of the eighth ACM international conference on multimedia, MULTIMEDIA ’00. ACM, New York, pp 219–227, https://doi.org/10.1145/354384.354481, (to appear in print)
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, pp 3630–3638
Wang R, Tao D (2016) Non-local auto-encoder with collaborative stabilization for image restoration. IEEE Trans Image Process 25(5):2117–2129. https://doi.org/10.1109/TIP.2016.2541318
Article MathSciNet Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Article Google Scholar
Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385(C):338–352. https://doi.org/10.1016/j.ins.2017.01.011
Article Google Scholar
Yeh T, Chang TH, Miller RC (2009) Sikuli: using gui screenshots for search and automation. In: Proceedings of the 22Nd Annual ACM symposium on user interface software and technology, UIST ’09. ACM, New York, pp 183–192, https://doi.org/10.1145/1622176.1622213, (to appear in print)

Download references

Acknowledgements

This work was partially supported by the Scientific Grant Agency of the Slovak Republic, grant No. VG 1/0646/15, the Slovak Research and Development Agency under the contract No. APVV-15-0508 and was created with the support of the Ministry of Education, Science, Research and Sport of the Slovak Republic within the Research and Development Operational Programme for the project ”University Science Park of STU Bratislava”, ITMS 26240220084, co-funded by the ERDF.

Author information

Authors and Affiliations

Slovak University of Technology in Bratislava, Ilkovicova 2, 84216, Bratislava, Slovakia
Jakub Simko & Jakub Vrba

Authors

Jakub Simko
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Vrba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Simko.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Simko, J., Vrba, J. Screen recording segmentation to scenes for eye-tracking analysis. Multimed Tools Appl 78, 2401–2425 (2019). https://doi.org/10.1007/s11042-018-6369-7

Download citation

Received: 24 August 2017
Revised: 14 June 2018
Accepted: 03 July 2018
Published: 09 July 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s11042-018-6369-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Screen recording segmentation to scenes for eye-tracking analysis

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Eye Tracking in Virtual Reality: a Broad Review of Applications and Challenges

RETRACTED ARTICLE: Eye tracking: empirical foundations for a minimal reporting guideline

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Screen recording segmentation to scenes for eye-tracking analysis

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Eye Tracking in Virtual Reality: a Broad Review of Applications and Challenges

RETRACTED ARTICLE: Eye tracking: empirical foundations for a minimal reporting guideline

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation