Abstract
The EMILE 4.1 toolbox is intended to help researchers to analyze the grammatical structure of free text. The basic theoretical concepts behind the EMILE algorithm are expressions and contexts. The idea is that expressions of the same syntactic type can be substituted for each other in the same context. By performing a large statistical cluster analysis on the sentences of the text EMILE tries to identify traces of expressions that have this substitutionability relation. If there exists enough statistical evidence for the existence of a grammatical type EMILE creates such a type. Fundamental notions in the EMILE 4.1 algorithm are the so-called characteristic expressions and contexts. An expression of type T is characteristic for T if it only appears in a context of type T. The notion of characteristic context and expression boosts the learning capacities of the EMILE 4.1 algorithm. The EMILE algorithm is relatively scalable. It can easily analyze text up to 100,000 sentences on a workstation. The EMILE tool has been used in various domains, amongst others biomedical research [Adriaans, 2001b], identification of ontologies and semantic learning [Adriaans et al., 1993].
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adriaans, P. (2001a). Learning shallow context-free languages under simple distributions. In Copestake, A. and (eds.), K.V., editors, Algebras, Diagrams and Decisions in Language, Logic and Computation. CSLI/CUP
Adriaans, P. (2001b). Semantic induction with emile, opportunities in bioinformatics. In Vet, P. v. d. e. a., editor, TWLT19, Information Extraction in Molecular Biology, Proceedings Twente Workshop on Language Technology 19, ES F Scientific Programme on Integrated Approaches for Functional Genomics, Enschede, pages 1–6. Universiteit Twente, Faculteit Informática.
Adriaans, P., Janssen, S., and Nomden, E. (1993). Effective identification of semantic categories in curriculum texts by means of cluster analysis. In Adriaans, P., editor, ECML-93, European Conference on Machine Learning, Workshop notes Machine Learning Techniques and Text Analysis, Vienna, Austria, pages 37–44. Department of Medical Cybernetics and Artificial Intelligence, University of Vienna in cooperation with the Austrian Rezsearch Institute for Artificial Intelligence.
Adriaans, W. P. (1992). Language Learning from a Categorial Perspective. PhD thesis, Universiteit van Amsterdam.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adriaans, P., Vervoort, M. (2002). The EMILE 4.1 Grammar Induction Toolbox. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_24
Download citation
DOI: https://doi.org/10.1007/3-540-45790-9_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44239-4
Online ISBN: 978-3-540-45790-9
eBook Packages: Springer Book Archive