Abstract
We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
7 Bibliography
Aupnigbogu J.C.Reconnaissance de Textes Imprimés Multifontes à l'aide de Modèles Stochastiques et Métriques. Thèse Doct. Sei.: Université de Nancy 1, 1992
R.G. CASEY, G. NAGY Recursive Segmentation and Classification of Composite Character. 6th ICPR, Intenational Conference on Pattern Recognition, Paris, France, 1982, vol.2, p.1023–1025
DUFFY L., LEBOURGEOIS F. et EMPTOZ H. The Improve of Logical Structure Analysis by Typographic Characteristics Extraction. ICIAP97, International Conference on Image Analysis and Processing, Florence, Italie, 1997
FISCHER S., AMIN A. and DRIVAS D. Segmentation of the Yellow Page. Third ICDAR, International Conference on Document Analysis and Recognition, Montréal, Canada, 1995, p. 605–609
LE D.X., THOMA G.R. et WECHSLER. Automated Borders Detection and Adaptative Segmentation for Binary Document Images. 13th ICPR, Intenational Conference on Pattern Recognition, Vienne, Austria, 1996, p.737–741
LEBOURGEOIS F., HENRY H. et EMPTOZ H. An OCR System for Printed Document. MVA'92, IAPR Workshop on Machine Vision Applications, Tokyo, Japon, 1992, p.83–86
NIYOGI D. and SRIHARI S.N. Knowledge-Based Derivation of Document Logical Structure. Third ICDAR, International Conference on Document Analysis and Recognition, Montréal, Canada, 1995, p. 472–475
SATOH S., TAKASU A. and KATSURA E. An Automated Generation of Electronic Library based on Document Image Understanding. Third ICDAR, International Conference on Document Analysis and Recognition, Montréal, Canada, 1995, p. 163–166
ZRAMDINI A. et INGOLD R. Optical Font Recognition from Projection Profiles. Third RIDT International Conference on Raster Imaging and Digital Typography, Darmstadt, Allemagne, 1994
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duffy, L., Lebourgeois, F., Emptoz, H. (1997). Composite document analysis by means of typographic characteristics. In: Murshed, N.A., Bortolozzi, F. (eds) Advances in Document Image Analysis. BSDIA 1997. Lecture Notes in Computer Science, vol 1339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63791-5_14
Download citation
DOI: https://doi.org/10.1007/3-540-63791-5_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63791-2
Online ISBN: 978-3-540-69646-9
eBook Packages: Springer Book Archive