Book Page Identification Using Convolutional Neural Networks Trained by Task-Unrelated Dataset

Liu, Leyuan; Zhao, Yi; Zhou, Huabing; Chen, Jingying

doi:10.1007/978-3-319-71607-7_57

Leyuan Liu^16,17,
Yi Zhao¹⁶,
Huabing Zhou¹⁷ &
…
Jingying Chen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10666))

Included in the following conference series:

International Conference on Image and Graphics

2615 Accesses

Abstract

This paper presents a pipeline to make convolutional neural networks (CNNs) trained for another unrelated task available for book page identification. The pipeline has five building blocks: (1) An image segmentation module to separate book page from the background; (2) An image correction module to correct geometry and color distortions; (3) A feature extraction module to extract discriminative image features by a pre-trained CNN; (4) A feature compression module to reduce feature dimensions for speeding up; and (5) A feature matching module to calculate the similarity between a query image and a reference image, and then to find out the most similar reference image. The experimental results on a challenging testing dataset show that the proposed book page identification method achieves a top-5 hit rate of 98.93%.

You have full access to this open access chapter, Download conference paper PDF

Benchmarking Deep Learning Models for Classification of Book Covers

Article 24 April 2020

Book Cover and Content Similarity Retrieval Using Computer Vision and NLP Techniques

CNN-Based Book Cover and Back Cover Recognition and Classification

Keywords

1 Introduction

Nowadays, more and more printed books are accompanied by electronic resources including videos, audios, games, augmented reality and other mobile apps. However, it is not very convenient to access most of these electronic resources, as the association between printed books and electronic resources is not automatically available [1]. Take the accessing of an accompanied video for example: one should first find the right video file corresponding to the book, open it by a video player, and then repeatedly fast-forward or fast-backward to locate the exact position relevant to a certain book page. Beside the fact that this task may take an adult minutes to complete, it is often a challenge for very young children and aged people. There exists a pressing need for associating the printed books with the accompanied electronics resources to enable quick and convenient access to the electronic resources.

The major issue of associating the printed books with accompanied electronic resources is to automatically identify book pages. It is then possible to map the page to its corresponding video/audio position or to a certain scenario of an app/game via a table/database [2]. Existing page identification methods can be generally divided into three categories: Optical IDentification (OID), Page Identifier (PI), and Computer Vision (CV) methods.

The OID-based method usually relies on a device called “talking pen” [3], which reads and identifies invisible codes printed by infrared reflective ink. As the OID-based methods can discriminate about 600000 different codes, vast amounts of pages can be identified stablely. However, this technology is hard to be popularized because high-cost ink and especially made hardware are required.

PI-based method identifies book pages by recognizing an additional page identifier printed on each page. Jeong et al. [4] print a specially designed page identifier on each page of the book, and identify book pages by comparing the characteristics of the captured page identifier with its database. Baik [5] regards the two dimensional code as an ambient media gate to the digital world, and has developed a “scan-to-watch” application to access the TV programs by scanning two dimensional codes on printed materials. Although PI-based technology can be easily integrated into mobile apps and page identifiers are quite robust for recognizing, there are two disadvantages: (1) Page identifiers more or less strip the aesthetics out of books; (2) Books without printed page identifiers cannot use PI-based method for page identification.

CV-based method treats page identification as an image retrieval problem, i.e. taking an image of a printed book page, and then finding out the most similar reference image from a registration dataset in which each reference image has already been mapped to a book page. Iwata et al. [6] use four directional features field to identify book covers for a small-scale library system. Tsai et al. [7] employ Speeded Up Robust Features (SURF) to recognize CD covers. Chae et al. [2] use a mobile phone to take sequence images of printed materials, and then retrieve the reference image from database by a keypoint-based matching and tracking method. CV-based methods don’t require to print additional identifiers on books, thus this technology can be used to identify pages of any book. However, the identification accuracy of most existing CV-based methods cannot provide satisfied user experience. Recently, convolutional neural networks have made impressive progress in many fields of computer vision including image retrieval [8]. This progress make it possible to improve the performance of CV-based book page identification.

This paper presents a book page identification method based on convolutional neural networks (CNNs). As collecting and labelling millions of book page images to train a CNN is time-consuming, a pipeline is proposed to make CNNs trained by another task-unrelated dataset for book page identification. The experimental results on a challenging testing dataset show that the proposed book page identification method achieves a top-5 hit rate of 98.93%.

2 The Proposed Method

As shown in Fig. 1, the pipeline of the proposed book page identification method has five building blocks: (1) An image segmentation module to separate book page from the background; (2) An image correction module to correct geometry and color distortions; (3) A feature extraction module to extract discriminative image features by a pre-trained CNN; (4) A feature compression module to reduce feature dimensions for speeding up; and (5) A Feature matching module to calculate the similarity between a query image and a reference image, and then to find the most similar reference image out. In the offline phase, each reference image only needs to be processed by the feature extraction module and the feature compression module to obtain a compressed feature code. The feature codes of all reference images are stored in a matrix. In the online phase, a query image is processed by all the five modules.

2.1 Book Page Segmentation

Background affects the performance of book page identification seriously, as abundant visual information included in background may be encoded into the feature codes by CNNs. Therefore, book pages need to be separated from background.

Many interactive image segmentation algorithms [9, 10] have been proposed in the last decade. Drawing a bounding box of an interested object, these algorithms can then separate the object from background. However, the “bounding box drawing” interaction degrades the user experience. Although some image segmentation algorithms [11, 12] can initialize the bounding box of an interested object automatically, none of them can provide real-time processing speed on mainstream smart phone and other consumer electronics.

In this subsection, a coarse-to-fine strategy is proposed to segment book page from background full automatically with real-time processing speed. As illustrated in Fig. 2, the proposed image segmentation algorithm consists of three steps: (1) Coarse segmentation to segment book page at pixel level using a fixed bounding box initialization; (2) Bounding box re-initialization to provide a more accuracy bounding box for fine segmentation; (3) Fine segmentation to obtain the final results.

A color histogram based Bayes classifier is employed to conduct coarse segmentation. Let H _O(b) and H _B(b) denote the b-th bin of the non-normalized histogram computed over the initial bounding box (O) and its surrounding background region (B) respectively. Additionally, let bx denote the bin b assigned to the pixel I(x) at location x. Bayes rule [13] is applied to obtain the object likelihood as:

$$ p(x \in O|O,B,b_{x} ) \approx \frac{{p(b_{x} |x \in O)p(x \in O)}}{{\sum\limits_{{\Omega \in \{ O,B\} }} {p(b_{x} |x \in\Omega )p(x \in\Omega )} }} $$

(1)

In particular, the likelihood terms in (1) is estimated directly from color histograms, i.e. $ p(b_{x} |x \in O) \approx H_{o} (b_{x} )/|O| $ and $ p(b_{x} |x \in B) \approx H_{B} (b_{x} )/|B| $. Furthermore, the prior probability can be approximated as $ P(x \in O) \approx |O|/(|O| + |S|) $. Thus, the Bayes classifier simplifies to:

$$ p(x \in O|O,B,b_{x} ) \approx \frac{{H_{O} (b_{x} )}}{{H_{O} (b_{x} ) + H_{B} (b_{x} )}} $$

(2)

The pixels of book page can be coarsely separated from background by (2) with very low computational cost. Then, a new bounding box is fitted on the coarse segmentation result using least-squares approximation. Finally, the book page is segmented by the DenseCut algorithm [10], which is a high quality image segmentation technique with a processing speed of about 15 images per second on general consumer electronics.

2.2 Image Correction

There are mainly two kinds of distortions, i.e. geometry distortion and color distortion, in the original query images. If these distortions are not corrected, the performance of book page identification suffers from them significantly. In this subsection, geometry distortion and color distortion are corrected in a single pass. As illustrated in Fig. 3, the geometry distorted book page in the original query image is converted to a square one by perspective transformation, and meanwhile the distorted color of the book page is corrected to that appears under a canonical light source by chromatic adaptation.

As it is hard to make a handheld camera straight on at the plane of a book page, the rectangle book page in a query image usually distorts to quasi-quadrilateral, mainly due to perspective projection. Thus, perspective transformation is investigated to convert the quasi-quadrilateral book page to a square one. Let (x _s , y _s) denote a point in the corrected image. Perspective transformation is used to map (x _s , y _s) back to its corresponding point (x _q , y _q) in the original image:

$$ \left\{ {\begin{array}{*{20}c} {x_{q} = \frac{{a_{11} x_{s} + a_{21} y_{s} + a_{31} }}{{a_{13} x_{s} + a_{23} y_{s} + a_{33} }}} \\\ {y_{q} = \frac{{a_{12} x_{s} + a_{22} y_{s} + a_{32} }}{{a_{13} x_{s} + a_{23} y_{s} + a_{33} }}} \\ \end{array} } \right. $$

(3)

where $ \left\{ {a_{11} , \, a_{12} , \, a_{13} ; \, a_{21} , \, a_{22} , \, a_{23} ; \, a_{31} , \, a_{32} , \, a_{33} = 1} \right\} $ are elements of the 3 × 3 transformation matrix. This transformation matrix needs to be computed by cues from images.

To compute the transformation matrix, at least four point correspondences between the original image and the corrected image are needed to be established. To this end, a quadrilateral which encloses the contour of the segmented book page is fitted using least-squares approximation (see Fig. 3(b)). After that, four point correspondences, i.e. $ \{ (Q_{0} , \, S_{0} ), \, (Q_{1} , \, S_{1} ), \, (Q_{2} , \, S_{2} ), \, (Q_{3} , \, S_{3} )\} $ in Fig. 3, are established. Then, these four point correspondences are substituted in (3), and the transformation matrix can be determined.

Color distortion in the original query image is mainly caused by ambient illumination. Once the ambient illumination is estimated, the query image can be corrected to an image that appears to be recorded under a canonical illumination using chromatic adaptation [14]:

$$ \left[ {\begin{array}{*{20}c} {R_{\text{s}} } \\ {G_{s} } \\ {B_{s} } \\ \end{array} } \right] = \mathop {\left[ {\begin{array}{*{20}c} {R_{q} } \\ {G_{q} } \\ {B_{q} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\frac{1}{{\sqrt 3 {\text{R}}_{\text{e}} }}} & 0 & 0 \\ 0 & {\frac{1}{{\sqrt 3 G_{\text{e}} }}} & 0 \\ 0 & 0 & {\frac{1}{{\sqrt 3 B_{\text{e}} }}} \\ \end{array} } \right]}\nolimits^{{}} $$

(4)

where $ [R_{q} ,G_{q} ,B_{\text{q}} ]^{\text{T}} $ and $ [R_{s} ,G_{s} ,B_{s} ]^{\text{T}} $ are the pixel color in the original query image and the corrected image respectively, and $ [R_{e} ,G_{e} ,B_{e} ]^{\text{T}} $ is the ambient illumination which needs to be estimated.

Computational color constancy [14, 15] is a powerful tool for estimating ambient illumination from a single image. Exploring the tradeoff between illumination estimation accuracy and computational efficiency, the gray-edge based computational color constancy algorithm [14] is adopted. This algorithm assumes that the average edge difference in a scene is achromatic. Based on this hypothesis, the ambient illumination is estimated as:

$$ \left[ {\begin{array}{*{20}c} {R_{e} } \\ {G_{e} } \\ {B_{e} } \\ \end{array} } \right] = \frac{1}{C}\left[ {\begin{array}{*{20}c} {\left( {\sum\limits_{x \in [0,w),y \in [0,h)}^{{}} {\left( {\nabla R_{q} (x,y)} \right)^{p} } } \right)^{1/p} } \\ {\left( {\sum\limits_{x \in [0,w),y \in [0,h)}^{{}} {\left( {\nabla G_{q} (x,y)} \right)^{p} } } \right)^{1/p} } \\ {\left( {\sum\limits_{x \in [0,w),y \in [0,h)}^{{}} {\left( {\nabla B_{q} (x,y)} \right)^{p} } } \right)^{1/p} } \\ \end{array} } \right] $$

(5)

where w and h are respectively the width and height of the original query image, $ \nabla ( \cdot ) $ denotes the gradient map of the original query image, p is a parameter, and C is the normalization coefficient. In the implementation, p is set as 5.

Once the perspective transformation matrix is computed and the ambient illumination is estimated, the geometry distortion is corrected using (3) and the color distortion is corrected using (5) simultaneously.

2.3 Feature Extraction

Recently, the CNNs have achieved impressive progress in many fields of computer vision including image retrieval. The most direct way to book page identification is collecting a dataset which consists of book page images and then using it to train a CNN. So that, book pages can be identified by the trained CNN in an end-to-end way. However, to train such a CNN, a dataset which contains millions of labelled book page images is required. Collecting and labelling such a large-scale dataset is time-consuming.

Many studies [8, 16,17,18] have shown some qualitative evidence that the features emerging in the upper layers of the CNNs trained for object classification may serve as good descriptors for another unrelated tasks such as image retrieval. Inspired by these works, pre-trained CNNs for object classification are investigated for book page identification in this paper. Exploring the accuracy and speed trade-off of many pre-trained CNNs with different architectures [8, 16,17,18,19,20], the VGG Fast version (VGG-F) convolutional neural network [17] is finally adopted to identify book pages in this paper.

The architecture of VGG-F CNN is illustrated in Fig. 4. It consists of 5 convolutional layers (conv1-5) and 3 full-connected layers (full6-8). The conv1 layer employs 64 kernels of size 11 × 11 × 3 to filter the 224 × 224 × 3 color input images with a stride of 4 pixels. The conv2 layer takes as input the output of conv1 layer and filters it with 256 kernels of size 5 × 5 × 64. The conv3, conv4 and conv5 layers all have 256 convolution kernels of size 3 × 3 × 256. A max-pooling unit follows the convolution unit in layer conv1, conv2 and conv5, but does not in layer conv3 and conv4. Each of the 5 convolutional layers includes a Rectified Linear Unit (ReLU). The full-connect layers full6 and full7 are regularized using dropout, and have 4096 neurons each. The last layer full8 is the output layer and acts as a multi-way soft-max object classifier. The ILSVRC dataset [21] which contains 1.2 millions training images of 1000 object categories is used to train the VGG-F CNN.

The 4096 dimensional vector output from the full7 layer is extracted as the feature code for book page identification. For computation-saving, the full8 layer of the trained VGG-F CNN is cut off when extracting feature codes from book page images. In the offline phase, all reference images in the book page database are resized to 224 × 224 pixels and input into the trained CNN one by one to extract feature codes, and the extracted feature codes are stored in a matrix. In the online phase, a feature code is also extracted from the image output by the image correction module.

2.4 Feature Compression

To identify a book page, similarities between the feature code of the query image and the feature codes of all reference images are needed to be calculated. As each feature code extracted by the CNN is a 4096-dimensional vector, computing these similarities is inefficient. The most direct solution for improving the efficiency is reducing the dimensions of the feature codes. Babenko et al. [8] use Principle Component Analysis (PCA) to compress feature codes extracted by CNNs, and have obtained a good performance of content based image retrieval while reducing much computational cost. Encouraged by this work, PCA is employed to compress the 4096-dimensional feature codes for speeding up.

Denote the feature code extracted from an image as a vector $ {\mathbf{X}}_{i} $. Suppose there are m reference images in the book page database, all of their feature codes can form a 4096 × m matrix $ {\mathbf{M}} $, i.e. $ {\mathbf{M}} = [{\mathbf{X}}_{1} \, {\mathbf{X}}_{ 2} \cdots {\mathbf{X}}_{m} ] $. Then, the covariance matrix $ {\varvec{\Sigma}} $ of $ {\mathbf{M}} $ can be calculated. And then, the eigen-matrix $ {\mathbf{U}} $ can be obtained by the Singular Value Decomposition (SVD) of $ {\varvec{\Sigma}} $, i.e. $ {\mathbf{U}}{\text{ = SVD(}}{\varvec{\Sigma}} ) $. After that, the compression matrix $ {\mathbf{U}}_{d} $ can be formed by selecting the first d eigen-vectors of $ {\mathbf{U}} $. Finally, a 4096-dimensional feature code $ {\mathbf{X}} $ can be compressed to d dimensions by:

$$ {\tilde{\mathbf{X}}} = {\mathbf{U}}_{d}^{T} {\mathbf{X}} $$

(6)

where $ {\tilde{\mathbf{X}}} $ is the compressed feature code.

In the offline phase, the feature codes of all the reference images are compressed using (6) and stored in a matrix. In the online phase, the feature code of the query image is also compressed to the d dimensions.

2.5 Feature Matching

Two kinds of search methods, i.e. exhaustive search [8] and hashing based search [22, 23], are often employed in the field of image retrieval. In most existing hashing based methods, feature codes of images are first encoded to binary hash codes by a projection and a quantization steps, then Hamming distance is adopted to calculate the distances between the pairs of the query image and each reference image. However, hashing based methods are not appropriate for applications with relatively small amount of reference images, as the computational loss for generating hash codes may outweigh the gain from computing distances. Another risk of hashing based methods is that these methods sometimes produce sub-optimal binary hash codes which will degrade the retrieval performance [24].

Taking account of the trade-off between computational cost and retrieval accuracy, exhaustive search is adopted in this paper. The procedures of exhaustive search is straightforward: (1) Compute the similarities between the query image and each reference image. (2) Rank all the reference images according to their similarities to the query image. (3) Select k top-ranking reference images as the retrieval results.

Cosine distance is adapted to measure the similarity as it experimentally achieves the best performance. Assume that $ {\tilde{\mathbf{X}}}_{i} $ is the compressed feature code extracted from the query image, and $ {\tilde{\mathbf{X}}}_{j} $ is the compressed feature code extracted from a reference image, then the similarity between these two images is measured by:

$$ {\text{S}}_{i,j} = \frac{{{\tilde{\mathbf{X}}}_{i} {\tilde{\mathbf{X}}}_{\text{j}}^{T} }}{{\sqrt {{\tilde{\mathbf{X}}}_{i} {\tilde{\mathbf{X}}}_{i}^{T} } \sqrt {{\tilde{\mathbf{X}}}_{j} {\tilde{\mathbf{X}}}_{j}^{T} } }} $$

(7)

In (7), the term $ \sqrt {{\tilde{\mathbf{X}}}_{i} {\tilde{\mathbf{X}}}_{i}^{T} } $ can be ignored as ignoring it does not change the rank of reference image j, and the term $ \sqrt {{\tilde{\mathbf{X}}}_{j} {\tilde{\mathbf{X}}}_{j}^{T} } $ can be computed offline. Thus, the similarity can be redefined to reduce computational load while maintaining ranking results:

$$ {\tilde{\text{S}}}_{i,j} = p_{j} ({\tilde{\mathbf{X}}}_{i} {\tilde{\mathbf{X}}}_{\text{j}}^{T} ) $$

(8)

where $ p_{j} = 1/\sqrt {{\tilde{\mathbf{X}}}_{j} {\tilde{\mathbf{X}}}_{j}^{T} } $ is computed offline. So that, only (d + 1) multiplication and d addition operations are consumed for matching each reference image in the online phase.

3 Experiments

In this section, the proposed book page identification method is extensively evaluated. The experiments were conducted on a smart phone with an eight-core processor (4 × 2.3 GHz + 4 × 1.8 GHz) and 4 GB RAM, to validate that the proposed book page identification method and book-eResource association system can run on general consumer hardware. The core algorithms of the proposed book page identification method are implemented using optimized multithreaded C++ code.

To evaluate the proposed book page identification method and book-eResource association, a testing database involving 4568 book pages is collected. For each book page, a reference image is captured by a flatbed scanner, and 4 to 8 query images are taken arbitrarily by cameras on different smart phones. As a result, 4568 reference images and 25112 query images are collected in the testing dataset. When taking the query images, factors including geometry distortion, color distortion, highlight, image blur, and cluttered background have been taken into account to simulate severe usage situations.

The top-k hit rate is adopted as the metrics for quantitative evaluation:

$$ \gamma_{k} = \frac{{N_{k} }}{N} $$

(9)

where N is the total testing times, and N _k is the times that the right answer is among the first k reference images considered most probable by the book page identification method.

3.1 Overall Performance

Some exemplar results of the proposed book page identification method are illustrated in Fig. 5. The results in Fig. 5(a) and (b) show that the proposed book page identification method can discriminate similar book pages. In Fig. 5(c), the proposed method does not suffer from image blur and the large highlight area in the query image. The results in Fig. 5(d) show that the proposed book page identification method does not suffer from “bad” image segmentation result due to clutter background. The results in Fig. 5(e) demonstrate that the proposed book page identification method can tolerate imperfect corrected image. The results in Fig. 5(f) and (g) show that the proposed book page identification method is not sensitive to the orientation of corrected images. In short, the proposed book page identification method achieves satisfying performance under severe usage situations including clutter background, image blur, highlight, geometry distortion and color distortion.

The proposed book page identification method is compared with the state-of-the-art end-to-end CNN based image retrieval method [8]. The CNNs used in these two method are pre-trained by the same ILSVRC dataset [21]. Both of the two methods are using a 128 dimensions feature codes compression rate in this experiment. The quantitative comparison results are shown in Table 1. The proposed book page identification method achieves a top-5 hit rate of 98.93%, while the end-to-end CNN [8] only achieves a top-5 hit rate of 55.49%.

Table 1. The hit rates of the proposed method and the end-to-end method

Full size table

3.2 Effectiveness of the Proposed Pipeline

This experiment is designed to validate the effectiveness of the proposed pipeline. During this experiment, the image correction module is first removed, then the image segmentation module is also removed from the pipeline. To avoid interference, the feature codes are not compressed in this experiment. The hit rates after removing these two modules are shown in Table 2. From the experimental results, one can see that the book page identification method results in inferior performance when removing the image correction and image segmentation modules from the pipeline.

Table 2. The hit rates after removing modules from the pipeline

Full size table

3.3 Performance of Different Feature Code Compression

This experiment aims to evaluate the performance of different versions of feature codes after PCA compression to different dimensions. The top-1 to top-5 hit rates for different PCA compression rates are illustrated in Fig. 6. It demonstrates that the feature codes extracted by CNN can be compressed to 128 dimensions with slight loss of performance.

3.4 Computation Time

The computation time is relative to the size of the query image and the dimensions of feature codes. When testing computation time in this experiment, the input query image is resized to 400 × 400 pixels for segmentation, the corrected image size is set as 224 × 224 pixels, and the feature codes are compressed to 128 dimensions. The average computation time of the entire pipeline for a query image is 430 millisecond (ms). With this period, image segmentation takes 46 ms, image correction takes 23 ms, feature code extraction takes 342 ms, feature code compression takes 2 ms, and feature matching takes 17 ms (when searching among 4568 reference images).

4 Conclusions

This paper has presented a CNN based book page identification method for associating printed books with electronic resources. A pipeline has been proposed to make CNN trained for another unrelated task available for book page identification. The pipeline has five building blocks: the image segmentation module, image correction module, CNN based feature extraction module, feature compression module, and feature matching module. Under this pipeline, the CNN trained by another task-unrelated dataset can extract effective and robust features for book page identification. The proposed book page identification method has achieved a top-5 hit rate of 98.93% on a challenging testing dataset.

References

Yokota, J.: From print to digital? Considering the future of picture books for children. In: Bologna: Fifty Years of Children’s Books from Around the World, pp. 443–449 (2013)
Google Scholar
Chae, S., Yang, Y., Choi, H., Kim, I., Byun, J., Jo, J., Han, T.: Smart advisor: real-time information provider with mobile augmented reality. In: Proceedings of the IEEE International Conference on Consumer Electronics, Las Vegas, USA, pp. 97–98, January 2016
Google Scholar
Hsu, M., Chen, C.: Analysis of motivation triggers in interactive digital reading for children. Int. J. Infonomics 6(1), 669–675 (2013)
Article Google Scholar
Jeong, H.T., Lee, D.W., Heo, G.S., Lee, C.H.: Live book: a mixed reality book using a projection system. In: Proceedings of the IEEE International Conference on Consumer Electronics, Las Vegas, USA, pp. 680–681, January 2012
Google Scholar
Baik, S.: Rethinking QR code: analog portal to digital world. Multimedia Tools Appl. 58(2), 427–434 (2012)
Article Google Scholar
Iwata, K., Yamamoto, K., Yasuda, M., Murata, K.: Book cover identification by using four directional features filed for a small-scale library system. In: Proceedings of the International Conference on Document Analysis and Recognition, Seattle, USA, pp. 582–586, September 2001
Google Scholar
Tsai, S.S., Chen, D., Singh, J.P., Girod, B.: Rate-efficient, real-time CD cover recognition on a camera-phone. In: Proceedings of the International Conference on Multimedia, Vancouver, Canada, pp. 1023–1024, October 2008
Google Scholar
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584–599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar
Cheng, M.M., Prisacariu, V.A., Zheng, S., Torr, P.H.S., Rother, C.: DenseCut: densely connected CRFs for real-time GrabCut. Comput. Graph. Forum 34(7), 193–201 (2015)
Article Google Scholar
Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, pp. 321–328, December 2013
Google Scholar
Cheng, M.M., Mitra, N.J., Huang, X., Hu, S.M.: Salientshape: group saliency in image collections. Vis. Comput. 30(4), 443–453 (2014)
Article Google Scholar
Possegger, H., Mauthner, T., Bischof, H.: In defense of color-based model-free tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 2113–2120, June 2015
Google Scholar
Van De Weijer, J., Gevers, T., Gijsenij, A.: Edge-based color constancy. IEEE Trans. Image Process. 16(9), 2207–2214 (2007)
Article MathSciNet Google Scholar
Liu, L., Sang, N., Yang, S., Huang, R.: Real-time skin color detection under rapidly changing illumination conditions. IEEE Trans. Consum. Electron. 57(3), 1295–1302 (2011)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, USA, pp. 1097–1105, December 2012
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the British Machine Vision Conference, Nottingham, UK, pp. 1–12, September 2014
Google Scholar
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1717–1724, June 2014
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., Cun, Y.L.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of the International Conference on Learning Representations, Banff, Canada, pp. 1–16, April 2014
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248–255, June 2009
Google Scholar
Lin, K., Yang, H.F., Hsiao, J.H., Chen, C.S.: Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, pp. 27–35, June 2015
Google Scholar
Liu, H., Wang, R., Shan, S., Chen, X.L.: Deep supervised hashing for fast Image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 2064–2072, June 2016
Google Scholar
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 3270–3278, June 2015
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 41671377, 61502195), the Natural Science Foundation of Hubei Province (No. 2017CFB504), the Research Funds of CCNU from the Colleges Basic Research and Operation of MOE (Nos. CCNU17QN0003, CCNU17QN0002, CCNU2016A02020), and the Foundation of Hubei Key Laboratory of Intelligent Robot (No. HBIR201606).

Author information

Authors and Affiliations

National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China
Leyuan Liu, Yi Zhao & Jingying Chen
Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, China
Leyuan Liu & Huabing Zhou

Authors

Leyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Huabing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jingying Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingying Chen .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
Dalian University of Technology, Dalian, China
Xiangwei Kong
UNSW, Sydney, New South Wales, Australia
David Taubman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, L., Zhao, Y., Zhou, H., Chen, J. (2017). Book Page Identification Using Convolutional Neural Networks Trained by Task-Unrelated Dataset. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10666. Springer, Cham. https://doi.org/10.1007/978-3-319-71607-7_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-71607-7_57
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71606-0
Online ISBN: 978-3-319-71607-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)