Understanding mobile document capture and correcting orientation errors
Introduction
People use the built-in camera on their smartphone for various reasons, ranging from personal reflection to social experiences and functional tasks (e.g., Okabe, 2006, Gye, 2007, Lux et al., 2010). Research has shown that camera phones are often used for capturing functional images such as printed images or writing for later reference (Kindberg et al., 2004, Kindberg et al., 2005). Our work considers this type of document capture which is increasingly occurring in our daily life (e.g., capturing magazine or newspaper articles) (Brown and Sellen, 2000). While fixed scanners are still widely used in office settings, smartphone-based document capture allows users to instantly capture documents at anytime and in any location, which has dramatically influenced our document capture behaviors and the management of personal information (Doermann et al., 2003).
Document capture is typically done by configuring the angle of the smartphone camera into a top-down (or bird's-eye) view. However, as some readers may have experienced, orientation errors are often found in the captured images. We discovered that this type of problem originates from the inferred orientation of the phone being different from the capturing orientation of the user (hand posture). Recent smartphones have four orientation modes in a 2D space (just as in a picture frame on a wall), namely, portrait, upside down, landscape left (rotating the device to the left), and landscape right (rotating the device to the right), as shown in Fig. 1. For a given capturing orientation, there are three incorrect modes resulting in orientation errors during document capture from a top-down angle.
We began our study by conducting preliminary studies including online survey (n=106) and an in-lab experiment (n=16) with the goal to understand extensively the document capture behavior as well as the orientation errors. The research questions for each studies are 1) to investigate of real-world user experiences of capturing information using a smartphone camera from the online survey, and 2) to understand how people captures information using smartphone camera with analysis of the recorded video and sensor data from the in-lab experiment. The online study provides overall picture of contexts in which smartphone users captures information including documents. The findings are differentiated from prior works in that it studies real-world cases where smartphones are ubiquitous and is specific to information capturing. The in-lab experiment systematically shows how people captures the information in terms of hand grips, behavioral sequences with the details of orientation errors which turned out to be more severe in landscape mode.
Our video analysis of document capture during the in-lab experiment helped us propose ScanShot, a novel method for detecting document capture. Our approach is composed of two steps. First, ScanShot detects a document capture (when the body of phone is placed parallel to the ground) by monitoring the gravity direction using an accelerometer. For the second step, if a document capture is detected, ScanShot then attempts to update the orientation changes automatically. To achieve this, we proposed two different approaches. One approach is to detect rotation events by analyzing the recorded gyroscope data. When a phone is turned on (or a camera app is launched), it continuously records the gyroscope data. As soon as a document capture is detected, it examines the recorded gyroscope data to check whether any significant rotation changes previously occurred. Another approach is to infer the current orientation by observing the users micro-tilting behavior of the device while capturing a document. As the user holds their phone parallel to the ground, it is likely that they will tend to tilt inwards slightly (because the user will try to see the screen), a phenomenon we call micro-tilting. We previously showed that micro-titling behavior can be captured by monitoring the accelerometer and applying a machine learning model. To validate the efficacy of ScanShot, we carefully designed our algorithms and configured their parameters. Our evaluation showed that document capture moments can be detected with accuracy of 92.5% and automatic correction achieves accuracy of automatic rotation achieves accuracy of 92.85% (gyroscope) and 81.60% (accelerometer).
The key contributions of this paper are summarized as follows:
- •
We conducted an online survey (n=106) in order to investigate real-world user behaviors for capturing information using a smartphone camera and their awareness on the orientation issues.
- •
We performed an in-lab experiment (n=16) so as to study the detailed user interactions such as hand grips, behavioral sequences including video and sensor recordings.
- •
Based on the previous in-lab experiment, we describe a technique devised for inferring the user's document capture intention using an accelerometer. Our method can fairly precisely identify a document capture with an accuracy of 92.50%.
- •
We propose two methods for correcting orientation errors that occur during document capture when using a smartphone camera. The first method fixes the orientation errors by tracking the user's rotational movement using a gyroscope. The second method monitors the user's tilting behavior at the time of document capture to infer the correct orientation mode.
- •
We discuss several important issues including the generalizability of our algorithm and its integration into existing systems. In addition, we discuss practical design implications such as the design of context-aware services for document capture, investigating diverse hand grip positions for mobile interaction design, and increasing the awareness of viewfinder UI indicators.
The remainder of this paper is organized as follows. First, we start with a thorough review of related studies in Section 2. In Section 3, we then describe preliminary studies conducted to understand user behaviors regarding document capture using a mobile phone. Based on the insights acquired from the previous user study, in Section 4, we describe the design of ScanShot, which corrects erroneous orientations by monitoring the gyroscope and accelerometer sensors. In Section 5, we describe the performances of the two proposed methods. In Section 6, we discuss the generalizability and integration issues and suggest several practical design implications. Finally, we provide some concluding remark in Section 7.
Section snippets
Document capture using a camera
As the quality, accessibility, and functionality of digital devices improve, such tools are increasingly being used for information capture and storage. Brown and Sellen (2000) investigated the use of information capture in work settings to understand the motive and reasons behind the use of digital cameras for alternative purposes. They asked two groups of people in a workspace environment to capture any kinds of information and document-based information respectively, and interviewed them
Methodology
To understand user behaviors, we performed two user studies: (1) an online survey to understand smartphone cameras usage for information capture and participants' experiences of orientation errors, and (2) a controlled lab study to understand detailed document capture behaviors and to identify patterns of orientation errors.
Scanshot design
Our preliminary user study guided us to design a novel solution called ScanShot for detecting document capture and help the device correct any orientation errors. The key concept underlying ScanShot is that we can reliably detect whether a user is taking a top-down shot by tracking an accelerometer. Once a top-down shot is detected, we then analyze multiple sensor data to infer the current orientation mode. We propose two methods for correcting an orientation error, one using a gyroscope to
Evaluation
We evaluated ScanShot using the data collected from our previous experiments because the entire sensor data while capturing document was captured in previous in-lab study. We first analyzed the accuracy of the document capture detection, and then evaluated the efficacy of the two orientation correction methods, namely our rotation- and tilt-based solutions. Our evaluation focused on the overall performance, user variance, parameter sensitivity, and an analysis of misclassified instances. For
Discussion
In this paper, we proposed the use of ScanShot, which identifies the moment of document capture and corrects any orientation errors. For document capture, we propose the application of a simple accelerometer-based detector, and for rotation correction, we propose approaches, namely rotation event detection and micro-titling detection. Our experiment showed that these approaches can reduce rotation errors significantly.
Because our experiment only considered in-lab conditions, we will next
Conclusion
We analyzed the problem of orientation errors in document capture using a smartphone camera. We investigated the error rates of the orientation error, the hand grips used for capturing a document, and the skew angle of captured documents. Based on the user study, we proposed ScanShot, which automatically detects the document capture to update orientation change. ScanShot supports these features solely with the use of built-in motion sensors, namely an accelerometer and gyroscope. Our evaluation
Acknowledgements
This work was partly supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. B0101-15-1272, Development of Device Collaborative Giga-Level Smart Cloudlet Technology), and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korea government (MSIP) (No. NRF-2015R1D1A1A01059497).
References (32)
Designing With the Mind in Mind: Simple Guide to Understanding User Interface Design Rules
(2010)- Ahmed, S., Kise, K., Iwamura, M., Liwicki, M., Dengel, A., 2013. Automatic ground truth generation of camera captured...
- Bao, L., Intille, S. S., 2004. Activity recognition from user-annotated acceleration data. In: Proceedings of the...
- et al.
Contextual Design: Defining Customer-centered Systems
(1997) - Brown, B.A.T., Sellen, A.J., O'Hara, K.P., 2000. A diary study of information capture in working life. In: Proceedings...
- Bulling, A., Blanke, U., Schiele, B., 2014. A tutorial on human activity recognition using body-worn inertial sensors....
- Cheng, L.-P., Hsiao, F.-I., Liu, Y.-T., Chen, M. Y., 2012. iRotate: automatic screen rotation based on face...
- Cheng, L.P., Lee, M.H., Wu, C.Y., Hsiao, F.I., Liu, Y.T., Liang, H.S., Chiu, Y.C., Lee, M.S., Chen, M.Y., 2013....
- Doermann, D., Liang, J., Li, H., 2003. Progress in camera-based document image analysis. In: Document Analysis and...
- Goel, M., Wobbrock, J., Patel, S., 2012. Gripsense: using built-in sensors to detect hand posture and pressure on...
How and why people use camera phones
HP Lab. Tech. Rep. HPL
Cited by (1)
CleanPage: Fast and clean document and whiteboard capture
2020, Journal of Imaging