Keywords

1 Introduction

Due to recent development in digital technology, there has been much research going on to improve usability by connecting digital technology with the real world. For instance, in the field of NUI and TUI, it is possible to capture real-world actions and perform feedback based on those actions. However, considering our daily activities, there have not been many cases of smoothly transitioning our real-world actions into digital processing. In such cases, we focus on using the general paper medium (newspaper, books and publications), and consider the following scenarios. (1) when discovering a content, figure or photo that you like, putting a sign around it and capturing by a camera or scanning it, (2) classifying paper media including various writing and related articles and storing them in files together, (3) when trying to check the meaning of a word, putting a sign around it and searching the word by using a computer or smartphone, etc. Based on these cases, in this paper we propose a system of applying digital processing by using the natural human action of “writing something on a paper medium” as a trigger to smoothly realize transitioning our real-world actions into digital processing. Basically, a pen is used on a paper medium to (1) draw “\(\lceil \rfloor \)” at diagonal ends of the desired region to save it, (2) draw the same characters on the upper-left side of the “\(\lceil \rfloor \)” to associate that regions, (3) enclose “\(\Box \)” the English word user does not understand the Japanese meaning. Using analog actions on the paper medium as input allows users to enjoy the benefits of the digital platform during casual use of the paper medium (Fig. 1).

Fig. 1.
figure 1

(Left) Favorite figures, pictures or paragraphs enclosed by “\(\lceil \rfloor \)”, (middle) English word enclosed by “\(\Box \)” to search for its translated meaning, (right) similar characters (“1” is written in this case) written on the upper-left side of the regions enclosed by “\(\lceil \rfloor \)” to associate the regions.

2 Related Work

2.1 Link Between Paper Medium and Digital Data

Among the field of seamless integration of digital and analog media, there have been a number of research focussing on the link between paper medium and digital data. Koike et al. [1] and Do-Lenh et al. [2] have developed a system that links the real world with digital data, where, upon placing something like a book with a marker pasted on it on a table, the system can project digital information related to the book in the vicinity. Although these systems are similar to our research in terms of connecting paper medium with digital data, the aforementioned systems require markers to be pasted and digital data prepared beforehand for the systems to work. These are not required for our system. Sangsubhan and Tanaka [8] have developed an idea generation support system by automatically digitizing data written using a digital pen on a paper medium. This research also focused on the uses of the digital data after digitizing analog data, and depending on written cues, digitized English words can be looked up for meanings and kept for later study, or multiple data can be grouped together for combined browsing.

2.2 Extracting Written Cues

Nakai et al. [4] have proposed a method of extracting written cues on a paper medium by comparing the image of the paper with an original digital version and detecting the position of the cues made. Iwata et al. [5], by attaching a miniature camera on the tip of a pen, have managed to detect the position of the written cues made without scanning, and by using subtraction technique were able to extract written cues. However, the aforementioned systems require a digitized version of the data present on the paper medium, which is not required for our system. Moreover, the recognition process is huge and time consuming. Stevens et al. [12]. have developed a high performance system for extraction of written cues by restricting the color used for the written cue. Guo and Ma [10] and Zheng et al. [11] have developed a system that can extract written cues only from paper medium where written cues have been made. However, the system is only capable of extracting handwritten letters but unable to do so for handwritten lines or figures. In our system, we have used color information of written cues for detection, in order to be able to use the system on smartphones which have significantly lower specs compared to PCs. Here, high precision extraction of written cues has been achieved. In addition, by allowing users to choose the color used for extraction, the system provides flexibility.

3 Data and Commands Arrangement Design

In this section, we will introduce the method of use and application processing. There are two modes in this system.

  • Recognition mode

  • View mode

In use of recognition mode, system applies digital processing by recognizing cues written by the user. In use of view mode, the user are able to browse digital datas stored by recognition mode.

3.1 Recognition Mode

To use this system, users hold a pen and use a smartphone. The user holds the phone up over the paper while writing or after he finished writing. There are three types of written cues (made by handwritten entries) recognized by this system.

(1). Enclosure by \(\varvec{\lceil \rfloor }\) . The user likes an image or paragraph on the paper, and would like to save it. By drawing “\(\lceil \rfloor \)” at the diagonal ends of the desired region with a pen, “\(\lceil \)” is enclosed by red square and “\(\rfloor \)” is enclosed by green square on smartphone, the user is able to confirm that this system recognizes “\(\lceil \rfloor \)” (Fig. 2(a)). By touching the region, the rectangular area enclosed by “\(\lceil \rfloor \)” will be saved on the smartphone as digital data (Fig. 2(b)).

Fig. 2.
figure 2

(a) By drawing “\(\lceil \rfloor \)” and touching the region the user would like to save, (b) the region will be saved as digital data.

(2). Drawing the same character. By writing a character with a pen on the upper-left side of the “\(\lceil \rfloor \)” used in (1)., associations can be made in between regions designated by the same character. The character is enclosed by a yellow square on the smartphone, and the user is able to confirm that this system recognizes it. In Fig. 3(a) the character “1” is written; the corresponding region and the other region where the same character (i.e. “1”) is written are automatically associated by touching the region (Fig. 3(b), (c)). This function can be useful in scenarios where the user wants to save an figure and the description associated with it as a set. The user can read the description while looking at the figure on the smartphone.

(3). Enclosure by “ \(\varvec{\Box }\) ”. The user does not understand the meaning of an English word on the paper. By enclosing the word with “\(\Box \)” with a pen, the word is enclosed by a blue rectangle on the smartphone; the user is able to confirm that this system recognizes “\(\Box \)” (Fig. 4(a)). Touching the region prompts the system to show the translated (Japanese) meaning on the upper left side of the display (Fig. 4(b)). In addition, the word enclosed by “\(\Box \)” and the corresponding translation will be saved on the smartphone.

Fig. 3.
figure 3

(a) By writing “1” on the upper-left side of the “\(\lceil \rfloor \)” and (b) touching one of the region (the lower figure of (a) in this case), (c) the other region associated with the region will be displayed.

Fig. 4.
figure 4

(a) By drawing “\(\Box \)” and touching the word, (b) display the translated (Japanese) meaning on the upper left side of the display

Selection of region by touching on screen. The regions saved using the above methods can be selected by touching them on the smartphone. With the selection of a region, if created with (1)., the digital image of that region will be displayed on the screen. If created with (2)., the digital images associated with the region will be displayed, then one of the digital images will be displayed larger. If created with (3)., the translated meaning of the word within the region will be displayed on the top-left side of the screen.

Registration of the pen. Due to different pens being used by users, the system allows for any pen to be used for making cues. The process for registering a pen for making cues is as follows. While the system is running, the user scribbles something on a scrap of paper with the desired marker pen (Fig. 5(a)). By touching the pen icon on the upper left corner of the screen, the pen icon turns yellow and the system goes to pen registration state. Then, by touching the scribbled region on the smartphone screen (Fig. 5(b)), the system extracts the color of the ink (Fig. 5(c)), and the color is registered as the default color for future recognition and extraction.

Fig. 5.
figure 5

(a) Scribbling something on a scrap of paper with the desired marker pen, (b) by touching the pen icon and the scribbled region, (c) the system extracts the color of the ink

3.2 View Mode

Since the data extracted using the written cue are saved on the smartphone, the data can be accessed and browsed anytime. The data may consist of figures, pictures, images of paragraphs, English words and their translations, and are divided into two categories: “IMAGE” and “WORD”. The “IMAGE” category allows users to collectively browse extracted figures, pictures or paragraphs (Fig. 6(a)). In addition, image data that have been grouped together using the same letter show a yellow triangle mark on the upper right corner and the letter used for grouping on the upper left corner, and can be browsed as a set of grouped images (Fig. 6(b)). By selecting the “WORD” category, English words previously extracted and looked up for meanings can be rechecked (Fig. 6(c)) and their meanings rechecked as well.

Fig. 6.
figure 6

(a) “IMAGE” category shows images list saved by recognition mode, (b) images list grouped by “1”, (c) “WORD” category shows words list saved by recognition mode

4 Implementation

This system has been implemented as an application running on iOS. In this chapter, we will explain the implementation method used for the “recognition mode” described in the previous chapter.

4.1 Recognition of Written Cues

In this mode, simple shapes such as “\(\lceil \rfloor \)” or “\(\Box \)”, as well as letters used for grouping can be extracted. In addition, by taking into account the processing power of smartphones, we have used extraction of specific colors to detect written cues; thus reducing the processing load and improving accuracy.

4.2 Color Extraction

The robust HSV color space has been used for color extraction. The written cues extracted with regard to the specific color are distinguished as a separate region from the main text, and then binarized for the shape recognition process explained in the following.

4.3 Shape Recognition

After detection of written cues, recognition is performed. To recognize the written shape in the image obtained from the smartphone camera, “template matching” has been used as the recognition algorithm. In this mode, the digital processing performed differs depending on the detected shape.

4.4 Processing for the Shape “\(\lceil \rfloor \)

Using template matching, the coordinates of the upper left positions of “\(\lceil \)” and “\(\rfloor \)” are obtained. A rectangular region is determined using these coordinates, and then cut out from the RGB image obtained from the camera.

4.5 Processing for Written Letters

In order to recognize letters, the region containing the written letter is cut out as image data. The method used for this is described as follows. First, a rectangular region has to be obtained using “\(\lceil \)” and “\(\rfloor \)”. Then, another rectangular region of 30 px containing the letter is cut out, with upper left corner of the rectangle starting at 15 px above and 20 px to the left of the upper left corner of “\(\lceil \)”. Using this image as a template and by performing template matching on images containing letters obtained later, the letters are compared, and upon finding similar letters, would group the regions enclosed by “\(\lceil \)” and “\(\rfloor \)” that are associated with the letters.

4.6 Processing for the Shape “\(\Box \)

Using template matching, the coordinates for the upper left corner of “\(\Box \)” can be obtained, and by applying these coordinates on the RGB image obtained from the camera, the English word enclosed within the “\(\Box \)” can be cut out. Then, from the image data obtained, the text can be extracted from the image file using OCR.

4.7 Text Extraction Using OCR

In this mode, OCR (optical character recognition) has been used for text extraction from image data (using “tesseract-ios” from OCR libraries). However, in case of noise in the image backdrop or blur in the image, the accuracy tends to drop significantly. Therefore, the images used in the system are magnified, converted to grayscale, sharpened and the contrast increased in order to improve the accuracy.

4.8 Translation of English Words

To look up the word extracted using OCR for its meaning, an English-Japanese dictionary web service called “Dejizo” has been used. By including the English word in the request URL while accessing Dejizo, the corresponding translation can be obtained included in an XML file.

4.9 Data Processing

The figures, pictures, paragraphs, English words and the corresponding meanings obtained from the recognition mode are then stored on the iPhone storage. Figures, pictures and paragraphs are stored as image files; the metadata of the image files and English words extracted for translation, and the corresponding translations are stored as XML files. While using the recognition mode, the data stored in the storage is imported into the application memory. This is to reduce the processing time for the application. When saving data, the data present on the application memory is compared to the one in the storage in order to determine if the data is the same or not. In addition, the data is also used when viewing the data in the “view mode”.

5 Evaluation

A preliminary experiment has been performed for performance evaluation of the recognition accuracy of the recognition mode and the discrimination accuracy for the generated data.

5.1 Experiment Outline

We evaluated the following: the recognition and discrimination accuracy for “\(\lceil \)” and “\(\rfloor \)”, the discrimination accuracy for the letters written to the upper left corner of “\(\lceil \)”, the recognition accuracy for “\(\Box \)” and the discrimination accuracy for the enclosed region (English word). In addition, three types of marker pens have been used to evaluate difference of recognition by color of pen. A thesis paper written in English has been used as the paper medium for the experiments. Three university students of age 21–22 have been chosen as participants for the experiments.

Fig. 7.
figure 7

(a) “IMAGE” category shows images list saved by recognition mode, (b) images list grouped by “1”, (c) “WORD” category shows words list saved by recognition mode

The color of the graph represents the color of the ink of the marker pen used. “\(\lceil \rfloor \) recognition accuracy” represents the recognition accuracy for “\(\lceil \)” and “\(\rfloor \)”, and “\(\lceil \rfloor \) discrimination accuracy” represents the discrimination accuracy for the data obtained (such as paragraphs, figures, etc.). Similarly, “\(\Box \) recognition accuracy” represents the recognition accuracy for “\(\Box \)”, and “\(\Box \) discrimination accuracy” represents the discrimination accuracy for the data obtained (English word).

Fig. 8.
figure 8

The discrimination accuracy for the letter written by (a) participant A, (b) participant B, (c) participant C

The color of the graph represents the color of the ink of the marker pen used. Each graph represents the discrimination accuracy for the letter written on the upper left corner of “\(\lceil \)”.

5.2 Considerations

From the graphs it can be seen that, while using the red and blue pen, the results for the recognition and discrimination accuracies were similar, while the recognition and discrimination accuracy for the yellow pen was lower. The reason for this can be assumed to be the low contrast created by yellow ink on a white paper. From this it can be inferred that, for the system to work properly, the color of the ink used for writing cues needs to have high contrast with regards to the color of the paper used as the medium. However, when using red and blue pens, the recognition for “\(\lceil \rfloor \)” and “\(\Box \)” tend to be accurate (Fig. 7(a), (c)). In addition, data such as paragraphs and figures obtained using “\(\lceil \rfloor \)” also have high discrimination accuracy (Fig. 7(b)). On the other hand, the English word obtained from “\(\Box \)” appears to have a discrimination accuracy of about 60 % (Fig. 7(d)). This result is seemingly affected by the accuracy of the text extraction of OCR. In this system, to increase the discrimination accuracy, several image processing have been applied on the image files of the data. For implementation of this system in the real world, the discrimination accuracy for “\(\Box \)” needs to be further increased by using improved processes. However, the ability to extract an English word enclosed within “\(\Box \)”, with an accuracy of 60 %, might become the foundation of a real-world implemented version of a system based on paper medium. On the matter of discrimination accuracy for letters, for participant A, 100 % accuracy was obtained. However, for participants B and C, the accuracy turned out to be lower (Fig. 8). The decrease in accuracy for participant C might have been caused due to the inability of the participant to recreate the subtle intricacies of a star symbol (Fig. 8(c)), and for participant B, the similarity between the two symbols (Fig. 8(b)) might have caused confusion in the recognition process thus reducing accuracy. Although using simple letters and symbols gave accurate results, the system needs to be further improved to accommodate complex symbols, and the ability to differentiate between similar-looking but different symbols.

6 Discussion and Conclusion

In this research, we have developed a system for performing digital processing on data obtained through written cues on the “general paper medium” such as books, newspaper, publications, etc. Users are able to use the system with a smartphone, and by writing cues on a paper medium, the system can be used to save desired figures, pictures or paragraphs from the physical paper, associate aforementioned figures, pictures and paragraphs with each other, or show meanings of English words chosen by the user. In addition, the saved data and associated data can be browsed anytime on the smartphone, and since the written cues persist on the paper medium, they can also be browsed on the physical paper. Moreover, since the system only requires a smartphone and pen in addition to the paper medium, and the digital processing requiring the natural process of writing something on paper, the system is very accessible and easy to use for the general people. From the preliminary experiments conducted, we have obtained high accuracy for recognition and discrimination depending on the pen used and written cues. However, on the matter of extraction of English words, we have deduced the need of further study to improve processing method to increase accuracy. In addition, to give accurate results, we have inferred the need of thick marker pens with the color of the ink having high contrast with regards to the color of the paper, which gives rise to some inflexibility for the system. Furthermore, while this system requires the source text material to be in English, we want to further improve the system to accommodate for other languages such as Japanese.