Abstract
In this paper, we consider the Pattern Recognition applied to paper documents based on the grammatical inference (GI) for classes of structured documents like summaries, dictionaries, bibliographic data basis, encyclopaedias and so on. In this task, the inference engine takes as input a set of individual examples of these documents and outputs a set of rules that recognise similar documents. We place GI in an algebraic framework in which rewrite rules will define the process of generalisation. The implementation algorithm discussed here is used in a current document handling project in which paper documents are typographically tagged and then recognised. One of the current applications in this project is to extract the physical and the logical structures of a given set of paper documents and then reorganise them in a machine readable form like HTML code.
Preview
Unable to display preview. Download preview PDF.
References
S. Tayeb-Bey, A. S. Saidi “Grammatical Formalism for Document Understanding System: From Document towards HTML Text”. BSDIA'97, November 1997, Brasilia.
E.M. Gold. “Language identification in the limit”. Inf. and Control, 10(5)-1967.
H.S. Fu and T. Booth: “Grammatical Inference: Introduction and Survey”, parts 1 & 2. IEEE Trans. Sys. man and Cyber. SMC-5: 95–11.
R. C. Gonzalez and M. G. Thomason. “Syntactic Pattern Recognition, an Introduction”. Addison Wesley. Reading Mass. 1978.
H.S. Fu. “Syntactic Pattern Recognition and Applications”. Prentice Hall, N.Y. 1982.
L. Miclet. “Grammatical Inference”. Syntactic and Structural Pattern Recognition. H. Bunk and SanFeliu eds. World Scientific.
J. Onica, P. Garcia. “Inferring regular Languages in Polynomial Update time”. Pattern Recognition and Image Analysis. 1992.
P. Dupont, L. Miclet & E. Vidal. “What is the search space of Regular Inference?”. ICGI'94, Grammatical Inference and Applications. Springer-Verlag-94.
L. Fribourg, M. V. Peixoto. “Automates concurrents à Contraintes”. TSI.13 (6). 1994.
J. A. Goguen, J.W. Tatcher, E.G. Wagner, J.B. Wright. “Initial Algebra Semantics and Continuous Algebra”. JACM 24(1). 1977.
A. S. Saidi: “Extensions Grammaticales de la Programmation Logique”. PhD. 1992.
A. S. Saidi. “On the unification of phrases”. IFIP-94.
H. Ehrig, B. Mahr. “ Fundamentals of Algebraic Specification”. Vol-1 & 2. Springer-Verlag1985.
E.M. Gold. “Complexity of automaton identification from given data””. Information and Control, 37-1978.
J. E. Hopcroft, J.D. Ullmann. “Formal Languages and their Relation to Automata”. Addison-Wesley 1969.
F. Bancilhon & all. “Magic Sets and Other Strange Ways to Implement Logic Programs”. Proc. ACM Symp. on principles of Databases Systems. Boston 1986.
F. Coste, J. Nicols: “Regular Inference as a graph coloring Problem”. ICML'97. 1997.
K.R. Apt, M.H. Van Emden: “Contribution to the Theory of Logic Programming”. JACM. 29(3ℴ. 1982.
R.S. Michalski & all. “Machine Learning: An Artificial Intelligence Approach”, vol. 1 & 2. Springer-Verlag 1984 and Morgan Kaufmann 1986.
H. Ahohen, H. Mannila. “Forming Grammars for structured documents”. Research report. University of Helsinki. 1994.
P. Frankhauser, Y. Xu. “MarkitUp! an incremental approach to document structure recognition”. Elect. Publishing-Organisation, Dissemination and Design, 6(4). 1994.
G. Lindén “Structured Document Transformation”. PhD Thesis. University of Helsinki.Finland June 1997.
Y. Yan Tang, C. De Yan, C. Y. Suen “Document processing for Automatic Knowledge Acquisition”. IEEE transactions on Knowledge and Data Engineering. 6(1). 1994.
B. Poirier, M. Dagenais. “Outils d'extraction et de reconnaissance de la structure de documents”. CNED'96. pp. 179–184. Nantes-France 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Saidi, A.S., Tayeb-bey, S. (1998). Grammatical inference in document recognition. In: Honavar, V., Slutzki, G. (eds) Grammatical Inference. ICGI 1998. Lecture Notes in Computer Science, vol 1433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0054074
Download citation
DOI: https://doi.org/10.1007/BFb0054074
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64776-8
Online ISBN: 978-3-540-68707-8
eBook Packages: Springer Book Archive