skip to main content
10.1145/3167132.3167236acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Text extraction and retrieval from smartphone screenshots: building a repository for life in media

Published:09 April 2018Publication History

ABSTRACT

Daily engagement in life experiences is increasingly interwoven with mobile device use. Screen capture at the scale of seconds is being used in behavioral studies and to implement "just-in-time" health interventions. The increasing psychological breadth of digital information will continue to make the actual screens that people view a preferred if not required source of data about life experiences. Effective and efficient Information Extraction and Retrieval from digital screenshots is a crucial prerequisite to successful use of screen data. In this paper, we present the experimental workflow we exploited to: (i) pre-process a unique collection of screen captures, (ii) extract unstructured text embedded in the images, (iii) organize image text and metadata based on a structured schema, (iv) index the resulting document collection, and (v) allow for Image Retrieval through a dedicated vertical search engine application. The adopted procedure integrates different open source libraries for traditional image processing, Optical Character Recognition (OCR), and Image Retrieval. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing modules with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, led to a 74% character-level accuracy of the extracted text. Further, we used the processed repository as baseline for a dedicated Image Retrieval system, for the immediate use and application for behavioral and prevention scientists. We discuss issues of Text Information Extraction and Retrieval that are particular to the screenshot image case and suggest important future work.

References

  1. Thomas M Breuel. 2008. The OCRopus open source OCR system. In Electronic Imaging 2008. International Society for Optics and Photonics, 68150F--68150F.Google ScholarGoogle Scholar
  2. Rafael C Carrasco. 2014. An open-source OCR evaluation tool. In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. ACM, 179--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agnese Chiatti, Xiao Yang, Mimi Brinberg, M. J. Cho, Anupriya Gagneja, Nilam Ram, Byron Reeves, and C. Lee Giles. 2017. Text Extraction from Smartphone Screenshots to Archive in situ Media Behavior. In Proceedings of the 9th International Conference on Knowledge Capture (K-CAP 2017). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Henrique Batista da Silva, Raquel Pereira de Almeida, Gabriel Barbosa da Fonseca, Carlos Caetano, Dario Vieira, Zenilton K. Gonçalves do Patrocínio, Jr., Arnaldo de Albuquerque Araújo, and Silvio Jamil F. Guimarães. 2016. Video Similarity Search by Using Compact Representations. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC 16). ACM, New York, NY, USA, 80--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andreia V Faria, Kenichi Oishi, Shoko Yoshida, Argye Hillis, Michael I Miller, and Susumu Mori. 2015. Content-based image retrieval for brain MRI: An image-searching engine and population-based analysis to utilize past clinical data for future diagnosis. NeuroImage: Clinical 7 (2015), 367--376.Google ScholarGoogle ScholarCross RefCross Ref
  6. Wenyi Huang, Dafang He, Xiao Yang, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2016. Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model. In Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM, 551--555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Itseez. 2015. Open Source Computer Vision Library. https://github.com/itseez/opencv. (2015).Google ScholarGoogle Scholar
  8. Byung K. Jung, Sung Y. Shin, Wei Wang, Hyung D. Choi, and Jeong K. Pack. 2014. Similar MRI Object Retrieval Based on Modified Contour to Centroid Triangulation with Arc Difference Rate. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC '14). ACM, New York, NY, USA, 31--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 (Feb. 1966), 707.Google ScholarGoogle Scholar
  10. Yi Lu. 1995. Machine printed character segmentation: An overview. Pattern recognition 28, 1 (1995), 67--80.Google ScholarGoogle Scholar
  11. Million Meshesha and C. V. Jawahar. 2008. Matching word images for content-based retrieval from printed document images. International Journal of Document Analysis and Recognition (IJDAR) 11, 1 (01 Oct 2008), 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Ram, D. Conroy, A. L. Pincus, A. Lorek, A. H. Rebar, M. J. Roche, J. Morack, M. Coccia, J. Feldman, and D. Gerstorf. 2014. Examining the interplay of processes across multiple time-scales: Illustration with the Intraindividual Study of Affect, Health, and Interpersonal Behavior (iSAHIB). Research in Human Development 11 (2014), 142--160. Issue 2.Google ScholarGoogle ScholarCross RefCross Ref
  13. Byron Reeves, Nilam Ram, Thomas N. Robinson, James J. Cummings, Lee Giles, Jennifer Pan, Agnese Chiatti, MJ Cho, Katie Roehrick, Xiao Yang, Anupriya Gagneja, Miriam Brinberg, Daniel Muise, Yingdan Lu, Mufan Luo, Andrew Fitzgerald, and Leo Yeykelis. 2017. Screenomics: A Framework to Capture and Analyze Personal Life Experiences and the Ways that Technology Shapes Them. In review (2017).Google ScholarGoogle Scholar
  14. Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995), 109.Google ScholarGoogle Scholar
  15. Julius Schöning, Patrick Faion, and Gunther Heidemann. 2015. Semi-automatic Ground Truth Annotation in Videos: An Interactive Tool for Polygon-based Object Annotation and Segmentation. In Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). ACM, New York, NY, USA, Article 17, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ray Smith. 2007. An overview of the Tesseract OCR engine. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 2. IEEE, 629--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kamrul Hasan Talukder and Tania Mallick. 2014. Connected component based approach for text extraction from color image. In Computer and Information Technology (ICCIT), 2014 17th International Conference on. IEEE, 204--209.Google ScholarGoogle ScholarCross RefCross Ref
  18. Oeivind Due Trier and Anil K. Jain. 1995. Goal-directed evaluation of binarization methods. IEEE transactions on Pattern analysis and Machine Intelligence 17, 12 (1995), 1191--1201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dan Vanderkam. {n. d.}. localturk. https://github.com/danvk/localturk. ({n. d.}).Google ScholarGoogle Scholar
  20. Kai Wang and Serge Belongie. 2010. Word spotting in the wild. In European Conference on Computer Vision. Springer, 591--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21ist International Conference on Pattern Recognition (ICPR'12). IEEE, 3304--3308.Google ScholarGoogle Scholar
  22. Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence 37, 7 (2015), 1480--1500.Google ScholarGoogle Scholar
  23. T. Yeh, T. Chang, and R. C. Miller. 2009. Sikuli: using GUI screenshots for search and automation. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Yeykelis, J. J. Cummings, and B. Reeves. 2014. Multitasking on a single device: Arousal and the frequency, anticipation, and prediction of switching between media content on a computer. Journal of Communication 64 (2014), 167--192. Issue 1.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Text extraction and retrieval from smartphone screenshots: building a repository for life in media

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
              April 2018
              2327 pages
              ISBN:9781450351911
              DOI:10.1145/3167132

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 April 2018

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate1,650of6,669submissions,25%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader