research-article

Text extraction and retrieval from smartphone screenshots: building a repository for life in media

Authors:
Agnese Chiatti

Pennsylvania State University

Pennsylvania State University
View Profile

,
Mu Jung Cho

Stanford University

Stanford University
View Profile

,
Anupriya Gagneja

Stanford University

Stanford University
View Profile

,
Xiao Yang

Pennsylvania State University

Pennsylvania State University
View Profile

,
Miriam Brinberg

Pennsylvania State University

Pennsylvania State University
View Profile

,
Katie Roehrick

Stanford University

Stanford University
View Profile

,
Sagnik Ray Choudhury

Pennsylvania State University

Pennsylvania State University
View Profile

,
Nilam Ram

Pennsylvania State University

Pennsylvania State University
View Profile

,
Byron Reeves

Stanford University

Stanford University
View Profile

,
C. Lee Giles

Pennsylvania State University

Pennsylvania State University
View Profile

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingApril 2018Pages 948–955https://doi.org/10.1145/3167132.3167236

Published:09 April 2018Publication History

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Pages 948–955

ABSTRACT

Daily engagement in life experiences is increasingly interwoven with mobile device use. Screen capture at the scale of seconds is being used in behavioral studies and to implement "just-in-time" health interventions. The increasing psychological breadth of digital information will continue to make the actual screens that people view a preferred if not required source of data about life experiences. Effective and efficient Information Extraction and Retrieval from digital screenshots is a crucial prerequisite to successful use of screen data. In this paper, we present the experimental workflow we exploited to: (i) pre-process a unique collection of screen captures, (ii) extract unstructured text embedded in the images, (iii) organize image text and metadata based on a structured schema, (iv) index the resulting document collection, and (v) allow for Image Retrieval through a dedicated vertical search engine application. The adopted procedure integrates different open source libraries for traditional image processing, Optical Character Recognition (OCR), and Image Retrieval. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing modules with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, led to a 74% character-level accuracy of the extracted text. Further, we used the processed repository as baseline for a dedicated Image Retrieval system, for the immediate use and application for behavioral and prevention scientists. We discuss issues of Text Information Extraction and Retrieval that are particular to the screenshot image case and suggest important future work.

References

Thomas M Breuel. 2008. The OCRopus open source OCR system. In Electronic Imaging 2008. International Society for Optics and Photonics, 68150F--68150F.Google Scholar
Rafael C Carrasco. 2014. An open-source OCR evaluation tool. In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. ACM, 179--184. Google ScholarDigital Library
Agnese Chiatti, Xiao Yang, Mimi Brinberg, M. J. Cho, Anupriya Gagneja, Nilam Ram, Byron Reeves, and C. Lee Giles. 2017. Text Extraction from Smartphone Screenshots to Archive in situ Media Behavior. In Proceedings of the 9th International Conference on Knowledge Capture (K-CAP 2017). ACM. Google ScholarDigital Library
Henrique Batista da Silva, Raquel Pereira de Almeida, Gabriel Barbosa da Fonseca, Carlos Caetano, Dario Vieira, Zenilton K. Gonçalves do Patrocínio, Jr., Arnaldo de Albuquerque Araújo, and Silvio Jamil F. Guimarães. 2016. Video Similarity Search by Using Compact Representations. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC 16). ACM, New York, NY, USA, 80--83. Google ScholarDigital Library
Andreia V Faria, Kenichi Oishi, Shoko Yoshida, Argye Hillis, Michael I Miller, and Susumu Mori. 2015. Content-based image retrieval for brain MRI: An image-searching engine and population-based analysis to utilize past clinical data for future diagnosis. NeuroImage: Clinical 7 (2015), 367--376.Google ScholarCross Ref
Wenyi Huang, Dafang He, Xiao Yang, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2016. Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model. In Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM, 551--555. Google ScholarDigital Library
Itseez. 2015. Open Source Computer Vision Library. https://github.com/itseez/opencv. (2015).Google Scholar
Byung K. Jung, Sung Y. Shin, Wei Wang, Hyung D. Choi, and Jeong K. Pack. 2014. Similar MRI Object Retrieval Based on Modified Contour to Centroid Triangulation with Arc Difference Rate. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC '14). ACM, New York, NY, USA, 31--32. Google ScholarDigital Library
V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 (Feb. 1966), 707.Google Scholar
Yi Lu. 1995. Machine printed character segmentation: An overview. Pattern recognition 28, 1 (1995), 67--80.Google Scholar
Million Meshesha and C. V. Jawahar. 2008. Matching word images for content-based retrieval from printed document images. International Journal of Document Analysis and Recognition (IJDAR) 11, 1 (01 Oct 2008), 29--38. Google ScholarDigital Library
N. Ram, D. Conroy, A. L. Pincus, A. Lorek, A. H. Rebar, M. J. Roche, J. Morack, M. Coccia, J. Feldman, and D. Gerstorf. 2014. Examining the interplay of processes across multiple time-scales: Illustration with the Intraindividual Study of Affect, Health, and Interpersonal Behavior (iSAHIB). Research in Human Development 11 (2014), 142--160. Issue 2.Google ScholarCross Ref
Byron Reeves, Nilam Ram, Thomas N. Robinson, James J. Cummings, Lee Giles, Jennifer Pan, Agnese Chiatti, MJ Cho, Katie Roehrick, Xiao Yang, Anupriya Gagneja, Miriam Brinberg, Daniel Muise, Yingdan Lu, Mufan Luo, Andrew Fitzgerald, and Leo Yeykelis. 2017. Screenomics: A Framework to Capture and Analyze Personal Life Experiences and the Ways that Technology Shapes Them. In review (2017).Google Scholar
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995), 109.Google Scholar
Julius Schöning, Patrick Faion, and Gunther Heidemann. 2015. Semi-automatic Ground Truth Annotation in Videos: An Interactive Tool for Polygon-based Object Annotation and Segmentation. In Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). ACM, New York, NY, USA, Article 17, 4 pages. Google ScholarDigital Library
Ray Smith. 2007. An overview of the Tesseract OCR engine. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 2. IEEE, 629--633. Google ScholarDigital Library
Kamrul Hasan Talukder and Tania Mallick. 2014. Connected component based approach for text extraction from color image. In Computer and Information Technology (ICCIT), 2014 17th International Conference on. IEEE, 204--209.Google ScholarCross Ref
Oeivind Due Trier and Anil K. Jain. 1995. Goal-directed evaluation of binarization methods. IEEE transactions on Pattern analysis and Machine Intelligence 17, 12 (1995), 1191--1201. Google ScholarDigital Library
Dan Vanderkam. {n. d.}. localturk. https://github.com/danvk/localturk. ({n. d.}).Google Scholar
Kai Wang and Serge Belongie. 2010. Word spotting in the wild. In European Conference on Computer Vision. Springer, 591--604. Google ScholarDigital Library
Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21ist International Conference on Pattern Recognition (ICPR'12). IEEE, 3304--3308.Google Scholar
Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence 37, 7 (2015), 1480--1500.Google Scholar
T. Yeh, T. Chang, and R. C. Miller. 2009. Sikuli: using GUI screenshots for search and automation. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, 183--192. Google ScholarDigital Library
L. Yeykelis, J. J. Cummings, and B. Reeves. 2014. Multitasking on a single device: Arousal and the frequency, anticipation, and prediction of switching between media content on a computer. Journal of Communication 64 (2014), 167--192. Issue 1.Google ScholarCross Ref

Index Terms

Text extraction and retrieval from smartphone screenshots: building a repository for life in media

Recommendations

Leveraging non-relevant images to enhance image retrieval performance
MULTIMEDIA '02: Proceedings of the tenth ACM international conference on Multimedia

Inherent subjectivity in user's perception of an image has motivated the use of relevance feedback (RF) in the image desigined output's retrieval process. RF techniques interactively determine the user's query concept, given the user's relevance ...
Read More
Extraction and recognition of artificial text in multimedia documents

Abstract The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not ...
Read More
Localized content based image retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Classic Content-Based Image Retrieval (CBIR) takes a single non-annotated query image, and retrieves similar images from an image repository. Such a search must rely upon a holistic (or global) view of the image. Yet often the desired content of an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
April 2018
2327 pages
ISBN:9781450351911
DOI:10.1145/3167132
Conference Chairs:
Hisham M. Haddad
Kennesaw State University
,
Roger L. Wainwright
University of Tulsa
,
Richard Chbeir
University of Pau & Pays Adour, France
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
image processing
image retrieval
text extraction
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 279
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Text extraction and retrieval from smartphone screenshots: building a repository for life in media

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Leveraging non-relevant images to enhance image retrieval performance

Extraction and recognition of artificial text in multimedia documents

Localized content based image retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Text extraction and retrieval from smartphone screenshots: building a repository for life in media

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Leveraging non-relevant images to enhance image retrieval performance

Extraction and recognition of artificial text in multimedia documents

Localized content based image retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media