ABSTRACT
People frequently capture photos with their smartphones, and some are starting to capture images of documents. However, the quality of captured document images is often lower than expected, even when an application that performs post-processing to improve the image is used. To improve the quality of captured images before post-processing, we developed the Smart Document Capture (SmartDCap) application that provides real-time feedback to users about the likely quality of a captured image. The quality measures capture the sharpness and framing of a page or regions on a page, such as a set of one or more columns, a part of a column, a figure, or a table. Using our approach, while users adjust the camera position, the application automatically determines when to take a picture of a document to produce a good quality result. We performed a subjective evaluation comparing SmartDCap and the Android Ice Cream Sandwich (ICS) camera application; we also used raters to evaluate the quality of the captured images. Our results indicate that users find SmartDCap to be as easy to use as the standard ICS camera application. Also, images captured using SmartDCap are sharper and better framed on average than images using the ICS camera application.
Supplemental Material
- Brewster, S., Wright, P., and Edwards, A. An evaluation of earcons for use in auditory human-computer interfaces. In Proc. of the INTERACT '93 and CHI '93 Conf. on Human Factors in Computing Systems, ACM (1993), 222--227. Google ScholarDigital Library
- Carter, S., Adcock, J., Doherty, J., and Branham, S. Nudgecam: toward targeted, higher quality media capture. In Proc. of the ACM Intl. Conf. on Multimedia (2010), 615--618. Google ScholarDigital Library
- Ferzli, R., and Karam, L. A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB). IEEE Transactions on Image Processing 18, 4 (2009), 717--728. Google ScholarDigital Library
- Garzonis, S., Jones, S., Jay, T., and ONeill, E. Auditory icon and earcon mobile service notifications: intuitiveness, learnability, memorability and preference. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, ACM (2009), 1513--1522. Google ScholarDigital Library
- Ha, J., Haralick, R., and Phillips, I. Recursive x-y cut using bounding boxes of connected components. In Proc. of the Intl. Conf. on Document Analysis and Recognition, vol. 2, IEEE (1995), 952--955. Google ScholarDigital Library
- Hoggan, E., Crossan, A., Brewster, S., and Kaaresoja, T. Audio or tactile feedback: which modality when? In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, ACM (2009), 2253--2256. Google ScholarDigital Library
- Jayant, C., Ji, H., White, S., and Bigham, J. P. Supporting blind photography. In Proc. of the ACM SIGACCESS Conf. on Computers and Accessibility (2011), 203--210. Google ScholarDigital Library
- Larson, E., and Chandler, D. Most apparent distortion: Full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging 19, 1 (2012).Google Scholar
- Lee, S., and Ryu, D. Parameter-free geometric document layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (2001), 1240--1256. Google ScholarDigital Library
- Liang, J., Doermann, D., and Li, H. Camera-based analysis of text and documents: a survey. Intl. Journal on Document Analysis and Recognition 7, 2 (2005), 84--104.Google ScholarDigital Library
- Liu, C., Huot, S., Diehl, J., Mackay, W., and Beaudouin-Lafon, M. Evaluating the benefits of real-time feedback in mobile augmented reality with hand-held devices. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, ACM (2012), 2973--2976. Google ScholarDigital Library
- Narvekar, N., and Karam, L. A no-reference image blur metric based on the cumulative probability of blur detection (CPBD). IEEE Transactions on Image Processing 20, 9 (2011), 2678--2683. Google ScholarDigital Library
- Nees, M., and Walker, B. Data density and trend reversals in auditory graphs: Effects on point-estimation and trend-identification tasks. ACM Transactions on Applied Perception (TAP) 5, 3 (2008), 13. Google ScholarDigital Library
- Pollard, S., and Pilu, M. Building cameras for capturing documents. Intl. Journal on Document Analysis and Recognition 7, 2 (2005), 123--137.Google ScholarDigital Library
- Shafait, F., Keysers, D., and Breuel, T. Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 6 (2008), 941--954. Google ScholarDigital Library
- Sheikh, H., Sabir, M., and Bovik, A. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing 15, 11 (Nov 2006), 3440--3451. Google ScholarDigital Library
- Vázquez, M., and Steinfeld, A. Facilitating photographic documentation of accessibility in street scenes. In CHI Extended Abstracts on Human Factors in Computing Systems, ACM (2011), 1711--1716. Google ScholarDigital Library
- Walker, B., and Kramer, G. Mappings and metaphors in auditory displays: An experimental assessment. ACM Transactions on Applied Perception (TAP) 2, 4 (2005), 407--412. Google ScholarDigital Library
- White, S., Ji, H., and Bigham, J. Easysnap: Real-time audio feedback for blind photography. In Adj. Proc. of the ACM Symposium on User Interface Software and Technology (2010), 409--410. Google ScholarDigital Library
- Yu, Y., and Liu, Z. A user study of visual versus sonically-enhanced interfaces for use while walking. In Proc. of the ACM Intl. Conf. on Multimedia (2010), 687--680. Google ScholarDigital Library
Index Terms
- SmartDCap: semi-automatic capture of higher quality document images from a smartphone
Recommendations
Binarisation of photographed documents image quality and processing time assessment
DocEng '21: Proceedings of the 21st ACM Symposium on Document EngineeringSmartphones with cameras are omnipresent in today's world and are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This competition on binarizing photographed documents assessed the ...
A new algorithm for segmenting warped text-lines in document images
SAC '11: Proceedings of the 2011 ACM Symposium on Applied ComputingThe digitalization of bound documents either using flatbed scanners or digital cameras often yield images with non-straight text-lines due to a geometrical warp. This paper presents a new algorithm for text-line segmentation for documents captured by ...
DocEng'2020 Time-Quality Competition on Binarizing Photographed Documents
DocEng '20: Proceedings of the ACM Symposium on Document Engineering 2020Document image binarization is a key process in many document processing platforms. The DocEng'2020 Time-Quality Competition on Binarizing Photographed Documents assessed the performance of eight new algorithms and also 41 other "classical" algorithms. ...
Comments