ABSTRACT
Daily engagement in life experiences is increasingly interwoven with mobile device use. Screen capture at the scale of seconds is being used in behavioral studies and to implement "just-in-time" health interventions. The increasing psychological breadth of digital information will continue to make the actual screens that people view a preferred if not required source of data about life experiences. Effective and efficient Information Extraction and Retrieval from digital screenshots is a crucial prerequisite to successful use of screen data. In this paper, we present the experimental workflow we exploited to: (i) pre-process a unique collection of screen captures, (ii) extract unstructured text embedded in the images, (iii) organize image text and metadata based on a structured schema, (iv) index the resulting document collection, and (v) allow for Image Retrieval through a dedicated vertical search engine application. The adopted procedure integrates different open source libraries for traditional image processing, Optical Character Recognition (OCR), and Image Retrieval. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing modules with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, led to a 74% character-level accuracy of the extracted text. Further, we used the processed repository as baseline for a dedicated Image Retrieval system, for the immediate use and application for behavioral and prevention scientists. We discuss issues of Text Information Extraction and Retrieval that are particular to the screenshot image case and suggest important future work.
- Thomas M Breuel. 2008. The OCRopus open source OCR system. In Electronic Imaging 2008. International Society for Optics and Photonics, 68150F--68150F.Google Scholar
- Rafael C Carrasco. 2014. An open-source OCR evaluation tool. In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage. ACM, 179--184. Google ScholarDigital Library
- Agnese Chiatti, Xiao Yang, Mimi Brinberg, M. J. Cho, Anupriya Gagneja, Nilam Ram, Byron Reeves, and C. Lee Giles. 2017. Text Extraction from Smartphone Screenshots to Archive in situ Media Behavior. In Proceedings of the 9th International Conference on Knowledge Capture (K-CAP 2017). ACM. Google ScholarDigital Library
- Henrique Batista da Silva, Raquel Pereira de Almeida, Gabriel Barbosa da Fonseca, Carlos Caetano, Dario Vieira, Zenilton K. Gonçalves do Patrocínio, Jr., Arnaldo de Albuquerque Araújo, and Silvio Jamil F. Guimarães. 2016. Video Similarity Search by Using Compact Representations. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (SAC 16). ACM, New York, NY, USA, 80--83. Google ScholarDigital Library
- Andreia V Faria, Kenichi Oishi, Shoko Yoshida, Argye Hillis, Michael I Miller, and Susumu Mori. 2015. Content-based image retrieval for brain MRI: An image-searching engine and population-based analysis to utilize past clinical data for future diagnosis. NeuroImage: Clinical 7 (2015), 367--376.Google ScholarCross Ref
- Wenyi Huang, Dafang He, Xiao Yang, Zihan Zhou, Daniel Kifer, and C Lee Giles. 2016. Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model. In Proceedings of the 2016 ACM on Multimedia Conference (MM '16). ACM, 551--555. Google ScholarDigital Library
- Itseez. 2015. Open Source Computer Vision Library. https://github.com/itseez/opencv. (2015).Google Scholar
- Byung K. Jung, Sung Y. Shin, Wei Wang, Hyung D. Choi, and Jeong K. Pack. 2014. Similar MRI Object Retrieval Based on Modified Contour to Centroid Triangulation with Arc Difference Rate. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC '14). ACM, New York, NY, USA, 31--32. Google ScholarDigital Library
- V. I. Levenshtein. 1966. Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10 (Feb. 1966), 707.Google Scholar
- Yi Lu. 1995. Machine printed character segmentation: An overview. Pattern recognition 28, 1 (1995), 67--80.Google Scholar
- Million Meshesha and C. V. Jawahar. 2008. Matching word images for content-based retrieval from printed document images. International Journal of Document Analysis and Recognition (IJDAR) 11, 1 (01 Oct 2008), 29--38. Google ScholarDigital Library
- N. Ram, D. Conroy, A. L. Pincus, A. Lorek, A. H. Rebar, M. J. Roche, J. Morack, M. Coccia, J. Feldman, and D. Gerstorf. 2014. Examining the interplay of processes across multiple time-scales: Illustration with the Intraindividual Study of Affect, Health, and Interpersonal Behavior (iSAHIB). Research in Human Development 11 (2014), 142--160. Issue 2.Google ScholarCross Ref
- Byron Reeves, Nilam Ram, Thomas N. Robinson, James J. Cummings, Lee Giles, Jennifer Pan, Agnese Chiatti, MJ Cho, Katie Roehrick, Xiao Yang, Anupriya Gagneja, Miriam Brinberg, Daniel Muise, Yingdan Lu, Mufan Luo, Andrew Fitzgerald, and Leo Yeykelis. 2017. Screenomics: A Framework to Capture and Analyze Personal Life Experiences and the Ways that Technology Shapes Them. In review (2017).Google Scholar
- Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at TREC-3. Nist Special Publication Sp 109 (1995), 109.Google Scholar
- Julius Schöning, Patrick Faion, and Gunther Heidemann. 2015. Semi-automatic Ground Truth Annotation in Videos: An Interactive Tool for Polygon-based Object Annotation and Segmentation. In Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). ACM, New York, NY, USA, Article 17, 4 pages. Google ScholarDigital Library
- Ray Smith. 2007. An overview of the Tesseract OCR engine. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, Vol. 2. IEEE, 629--633. Google ScholarDigital Library
- Kamrul Hasan Talukder and Tania Mallick. 2014. Connected component based approach for text extraction from color image. In Computer and Information Technology (ICCIT), 2014 17th International Conference on. IEEE, 204--209.Google ScholarCross Ref
- Oeivind Due Trier and Anil K. Jain. 1995. Goal-directed evaluation of binarization methods. IEEE transactions on Pattern analysis and Machine Intelligence 17, 12 (1995), 1191--1201. Google ScholarDigital Library
- Dan Vanderkam. {n. d.}. localturk. https://github.com/danvk/localturk. ({n. d.}).Google Scholar
- Kai Wang and Serge Belongie. 2010. Word spotting in the wild. In European Conference on Computer Vision. Springer, 591--604. Google ScholarDigital Library
- Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21ist International Conference on Pattern Recognition (ICPR'12). IEEE, 3304--3308.Google Scholar
- Qixiang Ye and David Doermann. 2015. Text detection and recognition in imagery: A survey. IEEE transactions on pattern analysis and machine intelligence 37, 7 (2015), 1480--1500.Google Scholar
- T. Yeh, T. Chang, and R. C. Miller. 2009. Sikuli: using GUI screenshots for search and automation. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. ACM, 183--192. Google ScholarDigital Library
- L. Yeykelis, J. J. Cummings, and B. Reeves. 2014. Multitasking on a single device: Arousal and the frequency, anticipation, and prediction of switching between media content on a computer. Journal of Communication 64 (2014), 167--192. Issue 1.Google ScholarCross Ref
Index Terms
- Text extraction and retrieval from smartphone screenshots: building a repository for life in media
Recommendations
Leveraging non-relevant images to enhance image retrieval performance
MULTIMEDIA '02: Proceedings of the tenth ACM international conference on MultimediaInherent subjectivity in user's perception of an image has motivated the use of relevance feedback (RF) in the image desigined output's retrieval process. RF techniques interactively determine the user's query concept, given the user's relevance ...
Extraction and recognition of artificial text in multimedia documents
Abstract The systems currently available for content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not ...
Localized content based image retrieval
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrievalClassic Content-Based Image Retrieval (CBIR) takes a single non-annotated query image, and retrieves similar images from an image repository. Such a search must rely upon a holistic (or global) view of the image. Yet often the desired content of an ...
Comments