skip to main content
10.1145/1815330.1815379acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

A kernel-based approach to document retrieval

Published: 09 June 2010 Publication History

Abstract

In this paper we tackle the problem of document image retrieval by combining a similarity measure between documents and the probability that a given document belongs to a certain class. The membership probability to a specific class is computed using Support Vector Machines in conjunction with similarity measure based kernel applied to structural document representations. In the presented experiments, we use different document representations, both visual and structural, and we apply them to a database of historical documents. We show how our method based on similarity kernels outperforms the usual distance-based retrieval.

References

[1]
C. Bahlmann, B. Haasdonk, and H. Burkhardt. Online handwriting recognition with support vector machines - a kernel approach. In Proceedings of the Eight International Workshop on Frontiers in Handwriting Recognition, pages 49--54, 2002.
[2]
H. Bunke and K. Riesen. Recent developments in graph classification and clustering using graph embedding kernels. In Proceedings of the Eighth International Workshop on Pattern Recognition in Information Systems, pages 3--13, 2008.
[3]
O. Chapelle, P. Haffner, and V. Vapnik. Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5):1055--1064, 1999.
[4]
D. Doermann. The indexing and retrieval of document images: A survey. Computer Vision and Image Understanding, 70(3):287--298, 1998.
[5]
T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861--874, 2006.
[6]
A. Gordo and E. Valveny. The diagonal split: A pre-segmentation step for page layout analysis and classification. In Pattern Recognition and Image Analysis, volume 5524 of Lecture Notes on Computer Science, pages 290--297. Springer-Verlag, 2009.
[7]
A. Gordo and E. Valveny. A rotation invariant page layout descriptor for document classification and retrieval. In Proceedings of the Tenth International Conference on Document Analysis and Recognition, pages 481--485, 2009.
[8]
B. Haasdonk. Feature space interpretation of SVMs with indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4):482--492, 2005.
[9]
P. Heroux, S. Diana, A. Ribert, and E. Trupin. Classification method study for automatic form class identification. In Proceedings of the Fourteenth International Conference on Pattern Recognition, pages 926--928, 1998.
[10]
D. Keysers, T. Deselaers, and H. Ney. Pixel-to-pixel matching for image recognition using hungarian graph matching. In Pattern Recognition, volume 3175 of Lecture Notes in Computer Science, pages 154--162. Springer-Verlag, 2004.
[11]
D. Keysers, F. Shafait, and T. Breuel. Document image zone classification - a simple high-performance approach. In Proceedings of the Second International Conference on Computer Vision Theory and Applications, pages 44--51, 2007.
[12]
S. Marinai. A survey of document image retrieval in digital libraries. In Proceedings of the Ninth Colloque International Francophone Sur l'Ecrit et le Document, pages 193--198, 2006.
[13]
M. Mitra and B. Chaudhuri. Information retrieval from documents: A survey. Information Retrieval, 2(2--3):141--163, 2000.
[14]
M. Neuhaus and H. Bunke. Bridging the gap between graph edit distance and kernel machines. World Scientific Publishing, 2007.
[15]
B. Scholkopf and A. Smola. Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, 2001.
[16]
H. Sun. Page segmentation for Manhattan and non-Manhattan layout documents via selective CRLA. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, pages 116--120, 2005.

Cited By

View all
  • (2016)Document image retrieval based on texture features and similarity fusion2016 International Conference on Image and Vision Computing New Zealand (IVCNZ)10.1109/IVCNZ.2016.7804437(1-6)Online publication date: Nov-2016
  • (2016)A brief review of document image retrieval methods: Recent advances2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727648(3500-3507)Online publication date: Jul-2016
  • (2016)Document Image Retrieval Based on Texture Features: A Recognition-Free Approach2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA.2016.7797033(1-7)Online publication date: Nov-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
June 2010
490 pages
ISBN:9781605587738
DOI:10.1145/1815330
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document retrieval
  2. query-by-example
  3. similarity measure based kernels
  4. support vector machines

Qualifiers

  • Research-article

Funding Sources

Conference

DAS '10

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Document image retrieval based on texture features and similarity fusion2016 International Conference on Image and Vision Computing New Zealand (IVCNZ)10.1109/IVCNZ.2016.7804437(1-6)Online publication date: Nov-2016
  • (2016)A brief review of document image retrieval methods: Recent advances2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727648(3500-3507)Online publication date: Jul-2016
  • (2016)Document Image Retrieval Based on Texture Features: A Recognition-Free Approach2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA.2016.7797033(1-7)Online publication date: Nov-2016
  • (2014)Multimodal page classification in administrative document image streamsInternational Journal on Document Analysis and Recognition10.1007/s10032-014-0225-817:4(331-341)Online publication date: 1-Dec-2014
  • (2014)Page Similarity and ClassificationHandbook of Document Image Processing and Recognition10.1007/978-0-85729-859-1_7(223-253)Online publication date: 30-Apr-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media