Understanding mobile document capture and correcting orientation errors

https://doi.org/10.1016/j.ijhcs.2017.03.004Get rights and content

Highlights

  • We study user behavior of document capturing using mobile camera by online study (n=106) and in-lab experiment (n=16).

  • We analyze the erroneous orientation problem during document photo capturing.

  • We propose a technique devised for inferring the user's document capture intention.

  • We device two methods for correcting orientation errors while document capturing.

  • The proposed methods show that more than 80% of error are be corrected.

Abstract

Smartphone cameras are increasingly used for document capture in daily life. To understand user behaviors, we performed two studies: (1) an online survey (n=106) to understand general smartphone camera usage behaviors related to information capture, as well as participants' experiences of orientation errors, and (2) a controlled lab study (n=16) to understand detailed document capture behaviors and to identify patterns in orientation errors. According to our online survey, 79.30% of the respondents reported experiencing orientation errors during document capture. In addition, our lab study showed that more than 90% of landscape capture tasks result in incorrect orientation. To solve this problem, we systematically analyzed the user behavior during document capture (e.g., video sequences and photographs taken or hand grip used) and propose a novel solution called ScanShot, which detects document capture time to help users correct orientation errors. ScanShot tracks the direction of gravity during document capture and monitors the users rotational or tilting movements of to update changes in orientation automatically. Our results confirm that document capture with 93.44% accuracy; in addition, our orientation update mechanism can reduce orientation errors by 92.85% using a gyroscope (for rotation) and 81.60% using an accelerometer (for micro-tilts).

Introduction

People use the built-in camera on their smartphone for various reasons, ranging from personal reflection to social experiences and functional tasks (e.g., Okabe, 2006, Gye, 2007, Lux et al., 2010). Research has shown that camera phones are often used for capturing functional images such as printed images or writing for later reference (Kindberg et al., 2004, Kindberg et al., 2005). Our work considers this type of document capture which is increasingly occurring in our daily life (e.g., capturing magazine or newspaper articles) (Brown and Sellen, 2000). While fixed scanners are still widely used in office settings, smartphone-based document capture allows users to instantly capture documents at anytime and in any location, which has dramatically influenced our document capture behaviors and the management of personal information (Doermann et al., 2003).

Document capture is typically done by configuring the angle of the smartphone camera into a top-down (or bird's-eye) view. However, as some readers may have experienced, orientation errors are often found in the captured images. We discovered that this type of problem originates from the inferred orientation of the phone being different from the capturing orientation of the user (hand posture). Recent smartphones have four orientation modes in a 2D space (just as in a picture frame on a wall), namely, portrait, upside down, landscape left (rotating the device to the left), and landscape right (rotating the device to the right), as shown in Fig. 1. For a given capturing orientation, there are three incorrect modes resulting in orientation errors during document capture from a top-down angle.

We began our study by conducting preliminary studies including online survey (n=106) and an in-lab experiment (n=16) with the goal to understand extensively the document capture behavior as well as the orientation errors. The research questions for each studies are 1) to investigate of real-world user experiences of capturing information using a smartphone camera from the online survey, and 2) to understand how people captures information using smartphone camera with analysis of the recorded video and sensor data from the in-lab experiment. The online study provides overall picture of contexts in which smartphone users captures information including documents. The findings are differentiated from prior works in that it studies real-world cases where smartphones are ubiquitous and is specific to information capturing. The in-lab experiment systematically shows how people captures the information in terms of hand grips, behavioral sequences with the details of orientation errors which turned out to be more severe in landscape mode.

Our video analysis of document capture during the in-lab experiment helped us propose ScanShot, a novel method for detecting document capture. Our approach is composed of two steps. First, ScanShot detects a document capture (when the body of phone is placed parallel to the ground) by monitoring the gravity direction using an accelerometer. For the second step, if a document capture is detected, ScanShot then attempts to update the orientation changes automatically. To achieve this, we proposed two different approaches. One approach is to detect rotation events by analyzing the recorded gyroscope data. When a phone is turned on (or a camera app is launched), it continuously records the gyroscope data. As soon as a document capture is detected, it examines the recorded gyroscope data to check whether any significant rotation changes previously occurred. Another approach is to infer the current orientation by observing the users micro-tilting behavior of the device while capturing a document. As the user holds their phone parallel to the ground, it is likely that they will tend to tilt inwards slightly (because the user will try to see the screen), a phenomenon we call micro-tilting. We previously showed that micro-titling behavior can be captured by monitoring the accelerometer and applying a machine learning model. To validate the efficacy of ScanShot, we carefully designed our algorithms and configured their parameters. Our evaluation showed that document capture moments can be detected with accuracy of 92.5% and automatic correction achieves accuracy of automatic rotation achieves accuracy of 92.85% (gyroscope) and 81.60% (accelerometer).

The key contributions of this paper are summarized as follows:

  • We conducted an online survey (n=106) in order to investigate real-world user behaviors for capturing information using a smartphone camera and their awareness on the orientation issues.

  • We performed an in-lab experiment (n=16) so as to study the detailed user interactions such as hand grips, behavioral sequences including video and sensor recordings.

  • Based on the previous in-lab experiment, we describe a technique devised for inferring the user's document capture intention using an accelerometer. Our method can fairly precisely identify a document capture with an accuracy of 92.50%.

  • We propose two methods for correcting orientation errors that occur during document capture when using a smartphone camera. The first method fixes the orientation errors by tracking the user's rotational movement using a gyroscope. The second method monitors the user's tilting behavior at the time of document capture to infer the correct orientation mode.

  • We discuss several important issues including the generalizability of our algorithm and its integration into existing systems. In addition, we discuss practical design implications such as the design of context-aware services for document capture, investigating diverse hand grip positions for mobile interaction design, and increasing the awareness of viewfinder UI indicators.

The remainder of this paper is organized as follows. First, we start with a thorough review of related studies in Section 2. In Section 3, we then describe preliminary studies conducted to understand user behaviors regarding document capture using a mobile phone. Based on the insights acquired from the previous user study, in Section 4, we describe the design of ScanShot, which corrects erroneous orientations by monitoring the gyroscope and accelerometer sensors. In Section 5, we describe the performances of the two proposed methods. In Section 6, we discuss the generalizability and integration issues and suggest several practical design implications. Finally, we provide some concluding remark in Section 7.

Section snippets

Document capture using a camera

As the quality, accessibility, and functionality of digital devices improve, such tools are increasingly being used for information capture and storage. Brown and Sellen (2000) investigated the use of information capture in work settings to understand the motive and reasons behind the use of digital cameras for alternative purposes. They asked two groups of people in a workspace environment to capture any kinds of information and document-based information respectively, and interviewed them

Methodology

To understand user behaviors, we performed two user studies: (1) an online survey to understand smartphone cameras usage for information capture and participants' experiences of orientation errors, and (2) a controlled lab study to understand detailed document capture behaviors and to identify patterns of orientation errors.

Scanshot design

Our preliminary user study guided us to design a novel solution called ScanShot for detecting document capture and help the device correct any orientation errors. The key concept underlying ScanShot is that we can reliably detect whether a user is taking a top-down shot by tracking an accelerometer. Once a top-down shot is detected, we then analyze multiple sensor data to infer the current orientation mode. We propose two methods for correcting an orientation error, one using a gyroscope to

Evaluation

We evaluated ScanShot using the data collected from our previous experiments because the entire sensor data while capturing document was captured in previous in-lab study. We first analyzed the accuracy of the document capture detection, and then evaluated the efficacy of the two orientation correction methods, namely our rotation- and tilt-based solutions. Our evaluation focused on the overall performance, user variance, parameter sensitivity, and an analysis of misclassified instances. For

Discussion

In this paper, we proposed the use of ScanShot, which identifies the moment of document capture and corrects any orientation errors. For document capture, we propose the application of a simple accelerometer-based detector, and for rotation correction, we propose approaches, namely rotation event detection and micro-titling detection. Our experiment showed that these approaches can reduce rotation errors significantly.

Because our experiment only considered in-lab conditions, we will next

Conclusion

We analyzed the problem of orientation errors in document capture using a smartphone camera. We investigated the error rates of the orientation error, the hand grips used for capturing a document, and the skew angle of captured documents. Based on the user study, we proposed ScanShot, which automatically detects the document capture to update orientation change. ScanShot supports these features solely with the use of built-in motion sensors, namely an accelerometer and gyroscope. Our evaluation

Acknowledgements

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. B0101-15-1272, Development of Device Collaborative Giga-Level Smart Cloudlet Technology), and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korea government (MSIP) (No. NRF-2015R1D1A1A01059497).

References (32)

  • J. Johnson

    Designing With the Mind in Mind: Simple Guide to Understanding User Interface Design Rules

    (2010)
  • Ahmed, S., Kise, K., Iwamura, M., Liwicki, M., Dengel, A., 2013. Automatic ground truth generation of camera captured...
  • Bao, L., Intille, S. S., 2004. Activity recognition from user-annotated acceleration data. In: Proceedings of the...
  • H. Beyer et al.

    Contextual Design: Defining Customer-centered Systems

    (1997)
  • Brown, B.A.T., Sellen, A.J., O'Hara, K.P., 2000. A diary study of information capture in working life. In: Proceedings...
  • Bulling, A., Blanke, U., Schiele, B., 2014. A tutorial on human activity recognition using body-worn inertial sensors....
  • Cheng, L.-P., Hsiao, F.-I., Liu, Y.-T., Chen, M. Y., 2012. iRotate: automatic screen rotation based on face...
  • Cheng, L.P., Lee, M.H., Wu, C.Y., Hsiao, F.I., Liu, Y.T., Liang, H.S., Chiu, Y.C., Lee, M.S., Chen, M.Y., 2013....
  • Doermann, D., Liang, J., Li, H., 2003. Progress in camera-based document image analysis. In: Document Analysis and...
  • Goel, M., Wobbrock, J., Patel, S., 2012. Gripsense: using built-in sensors to detect hand posture and pressure on...
  • Gye, L., 2007. Picture this: the impact of mobile camera phones on personal photographic practices. Cont.: J....
  • Hinckley, K., Pierce, J., Sinclair, M., Horvitz, E., 2000. Sensing techniques for mobile interaction. In: Proceedings...
  • Hinds, S.C., Fisher, J.L., D'Amato, D.P., 1990. A document skew detection method using run-length encoding and the...
  • Hwang, S., Bianchi, A., Wohn, K.y., 2013. Vibpress: estimating pressure input using vibration absorption on mobile...
  • Kim, K.E., Chang, W., Cho, S.J., Shim, J., Lee, H., Park, J., Lee, Y., Kim, S., 2006. Hand grip pattern recognition for...
  • T. Kindberg et al.

    How and why people use camera phones

    HP Lab. Tech. Rep. HPL

    (2004)
  • View full text