Skip to main content
Log in

Screen recording segmentation to scenes for eye-tracking analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In usability studies involving eye-tracking, quantitative analysis of gaze data requires the information about so called scene occurrences. Scene ocurrences are time segments during which the application user interface remains more-less static, so gaze events (e.g., fixations) can be mapped to the particular areas of interest (user interface elements). The scene occurrences typically start and end by user interface changes such as page-to-page transitions, menu expansions, overlay propmts, etc. Normally, one would record such changes programmatically through application logging, yet in many studies, this is not possible. For example, in an early-prototype mobile-app testing, only a camera recording of a smart device screen is often available as evidence. In such cases, analysts must manually annotate the recordings. To reduce the need for manual annotation of scene occurrences, we present an image processing method for segmenting user interface video recordings. The method exploits specific properties of user interface recordings, which greatly differ from real world video shots (for which many segmentation methods exist). The core of our method lies in the use of SSIM and SIFT similarity metrics used on video frames (with several pre-processing and filtering procedures). The main advantage of our method is, that it requires no training data apart from single screenshot example for each scene (to which the recording frames are compared). The method is also able to work with user finger overlays, which are always present in mobile device recordings. We evaluate the accuracy of our method over recordings from several real-life studies and compare it with other image similarity techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://uxi.sk

References

  1. Banovic N, Grossman T, Matejka J, Fitzmaurice G (2012) Waken: reverse engineering usage information and interface structure from software videos. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, UIST ’12. ACM, New York, pp 83–92, https://doi.org/10.1145/2380116.2380129, (to appear in print)

  2. Bao L, Li J, Xing Z, Wang X, Zhou B (2015) scvripper: video scraping tool for modeling developers’ behavior using interaction data. In: Proceedings of the 37th international conference on software engineering - volume 2, ICSE ’15. http://dl.acm.org/citation.cfm?id=2819009.2819134. IEEE Press, Piscataway, pp 673–676

  3. Chang TH, Yeh T, Miller RC (2010) Gui testing using computer vision. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’10. ACM, New York, pp 1535–1544, https://doi.org/10.1145/1753326.1753555, (to appear in print)

  4. Ciresan DC, Meier U, Masci J, Maria Gambardella L, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: IJCAI Proceedings-international joint conference on artificial intelligence, vol 22, Barcelona, p 1237

  5. Denoue L, Carter S, Cooper M (2016) Docugram: turning screen recordings into documents. In: Proceedings of the 2016 ACM symposium on document engineering, DocEng ’16. ACM, New York, pp 185–188, https://doi.org/10.1145/2960811.2967154, (to appear in print)

  6. Dixon M, Fogarty J (2010) Prefab: implementing advanced behaviors using pixel-based reverse engineering of interface structure. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’10. ACM, New York, pp 1525–1534, https://doi.org/10.1145/1753326.1753554, (to appear in print)

  7. Dixon M, Laput G, Fogarty J (2014) Pixel-based methods for widget state and style in a runtime implementation of sliding widgets. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14. ACM, New York, pp 2231–2240, https://doi.org/10.1145/2556288.2556979, (to appear in print)

  8. Duchowski AT (2007) Eye tracking methodology: theory and practice. Springer-Verlag New York, Inc., Secaucus

    MATH  Google Scholar 

  9. Givens P, Chakarov A, Sankaranarayanan S, Yeh T (2013) Exploring the internal state of user interfaces by combining computer vision techniques with grammatical inference. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13. IEEE Press, Piscataway, pp 1165–1168. http://dl.acm.org/citation.cfm?id=2486788.2486951

  10. Haralick RM, Sternberg SR, Zhuang X (1987) Image analysis using mathematical morphology. IEEE Trans Pattern Anal Mach Intell 4:532–550

    Article  Google Scholar 

  11. Holmqvist K, Nyström M, Andersson R, Dewhurst R, Jarodzka H, van de Weijer J (2011) Eye tracking: a comprehensive guide to methods and measures. OUP Oxford. https://books.google.sk/books?id=5rIDPV1EoLUC

  12. Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML Deep learning workshop, vol 2

  13. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  MathSciNet  Google Scholar 

  14. Mendi E, Bayrak C (2010) Shot boundary detection and key frame extraction using salient region detection and structural similarity. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, pp 66:1–66:4, https://doi.org/10.1145/1900008.1900096, (to appear in print)

  15. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Article  Google Scholar 

  16. Pongnumkul S, Dontcheva M, Li W, Wang J, Bourdev L, Avidan S, Cohen MF (2011) Pause-and-play: automatically linking screencast video tutorials with applications. In: Proceedings of the 24th annual ACM symposium on user interface software and technology, UIST ’11. ACM, New York, pp 135–144, https://doi.org/10.1145/2047196.2047213, (to appear in print)

  17. Priya GGL, Domnic S (2010) Video cut detection using dominant color features. In: Proceedings of the first international conference on intelligent interactive technologies and multimedia, IITM ’10. ACM, New York, pp 130–134, https://doi.org/10.1145/1963564.1963586, (to appear in print)

  18. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: 2014 IEEE Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 512–519

  19. Tahaghoghi SMM, Williams HE, Thom JA, Volkmer T (2005) Video cut detection using frame windows. In: Proceedings of the twenty-eighth Australasian conference on computer science - volume 38, ACSC ’05. Australian Computer Society, Inc., Darlinghurst, pp 193–199. http://dl.acm.org/citation.cfm?id=1082161.1082183

  20. Tao D, Cheng J, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1122–1134

    Article  MathSciNet  Google Scholar 

  21. Tonomura Y, Akutsu A, Otsuji K, Sadakata T (1993) Videomap and videospaceicon: tools for anatomizing video content. In: Proceedings of the INTERACT ’93 and CHI ’93 conference on human factors in computing systems, CHI ’93. ACM, New York, pp 131–136, https://doi.org/10.1145/169059.169117, (to appear in print)

  22. Truong BT, Dorai C, Venkatesh S (2000) New enhancements to cut, fade, and dissolve detection processes in video segmentation. In: Proceedings of the eighth ACM international conference on multimedia, MULTIMEDIA ’00. ACM, New York, pp 219–227, https://doi.org/10.1145/354384.354481, (to appear in print)

  23. Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, pp 3630–3638

  24. Wang R, Tao D (2016) Non-local auto-encoder with collaborative stabilization for image restoration. IEEE Trans Image Process 25(5):2117–2129. https://doi.org/10.1109/TIP.2016.2541318

    Article  MathSciNet  Google Scholar 

  25. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  26. Yang X, Liu W, Tao D, Cheng J (2017) Canonical correlation analysis networks for two-view image recognition. Inf Sci 385(C):338–352. https://doi.org/10.1016/j.ins.2017.01.011

    Article  Google Scholar 

  27. Yeh T, Chang TH, Miller RC (2009) Sikuli: using gui screenshots for search and automation. In: Proceedings of the 22Nd Annual ACM symposium on user interface software and technology, UIST ’09. ACM, New York, pp 183–192, https://doi.org/10.1145/1622176.1622213, (to appear in print)

Download references

Acknowledgements

This work was partially supported by the Scientific Grant Agency of the Slovak Republic, grant No. VG 1/0646/15, the Slovak Research and Development Agency under the contract No. APVV-15-0508 and was created with the support of the Ministry of Education, Science, Research and Sport of the Slovak Republic within the Research and Development Operational Programme for the project ”University Science Park of STU Bratislava”, ITMS 26240220084, co-funded by the ERDF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Simko.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Simko, J., Vrba, J. Screen recording segmentation to scenes for eye-tracking analysis. Multimed Tools Appl 78, 2401–2425 (2019). https://doi.org/10.1007/s11042-018-6369-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6369-7

Keywords

Navigation