Automatic Specialized vs. Non-specialized Sentence Differentiation

da Cunha, Iria; Cabré, M. Teresa; SanJuan, Eric; Sierra, Gerardo; Torres-Moreno, Juan Manuel; Vivaldi, Jorge

doi:10.1007/978-3-642-19437-5_22

Iria da Cunha^18,19,17,
M. Teresa Cabré¹⁷,
Eric SanJuan¹⁹,
Gerardo Sierra¹⁸,
Juan Manuel Torres-Moreno^18,19,20 &
…
Jorge Vivaldi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1286 Accesses
4 Citations

Abstract

Compilation of Languages for Specific Purposes (LSP) corpora is a task which is fraught with several difficulties (mainly time and human effort), because it is not easy to discern between specialized and non-specialized text. The aim of this work is to study automatic specialized vs. non-specialized sentence differentiation. The experiments are carried out on two corpora of sentences extracted from specialized and non-specialized texts. One in economics (academic publications and news from newspapers), another about sexuality (academic publications and texts from forums and blogs). First we show the feasibility of the task using a statistical n-gram classifier. Then we show that grammatical features can also be used to classify sentences from the first corpus. For such purpose we use association rule mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cabré, M.T.: Textos especializados y unidades de conocimiento: metodología y tipologización. In: Garía Palacios, J., Fuentes, M.T. (eds.) Texto, terminología y traducción, pp. 15–36. Ediciones Almar, Salamanca (2002)
Google Scholar
Pearson, J.: Terms in context. John Benjamin, Amsterdam (1998)
Book Google Scholar
Cabré, M.T.: La terminología. Representación y comunicación. IULA-UPF, Barcelona (1999)
Google Scholar
Kocourek, R.: La langue française de la technique et de la science. Vers une linguistique de la langue savante. Oscar Branstetter, Wiesbaden (1991)
Google Scholar
Hoffmann, L.: Kommunikationsmittel Fachsprache - Eine Einführung. Sammlung Akademie Verlag, Berlin (1976)
Google Scholar
Coulon, R.: French as it is written by French sociologists. Bulletin pédagogique des IUT (18), 11–25 (1972)
Google Scholar
Cajolet-Laganière, H., Maillet, N.: Caractérisation des textes techniques québécois. Présence francophone (47), 113–147 (1995)
Google Scholar
L’Homme, M.C.: Contribution á l’analyse grammaticale de la langue d’espécialité: le mode, le temps et la personne du verbe dans quelques textes, scientifiques é crits á vocation pédagogique. Université Laval, Québec (1993)
Google Scholar
L’Homme, M.C.: Formes verbales de temps et texte scientifique. Le langage et l’homme 2-3(31), 107–123 (1995)
Google Scholar
Cabré, M.T., Bach, C., da Cunha, I., Morales, A., Vivaldi, J.: Comparación de algunas características lingüísticas del discurso especializado frente al discurso general: el caso del discurso económico. In: XXVII Congreso Internacional de AESLA: Modos y formas de la comunicación humana (AESLA 2009), Universidad de Castilla-La Mancha, Ciudad Real (2010)
Google Scholar
Cabré, M.T.: Constituir un corpus de textos de especialidad: condiciones y posibilidades. In: Ballard, M., Pineira-Tresmontant, C. (eds.), pp. 89–106. Artois Presses Université, Arras (2005)
Google Scholar
Vivaldi, J.: Corpus and exploitation tool: IULACT and bwanaNet. In: Cantos Gómez, P., Sánchez Pérez, A. (eds.) I International Conference on Corpus Linguistics (CICL 2009), A survey on corpus-based research, Universidad de Murcia, pp. 224–239 (2009)
Google Scholar
Medina, A., Sierra, G.: Criteria for the Construction of a Corpus for a Mexican Spanish Dictionary of Sexuality. In: 11th Euralex International Congress, vol. 2. Université de Bretagne-Sud. Lorient, Francia (2004)
Google Scholar
Amir, A., Aumann, Y., Feldman, R., Fresko, M.: Maximal Association Rules: A Tool for Mining Associations in Text. Journal of Intelligent Information Systems 5(3), 333–345 (2005)
Article Google Scholar
Stanislas, O., Mickael, R., Nathalie, C., Kessler, R., Lefèvre, F., Torres-Moreno, J.-M.: Système du LIA pour la campagne DEFT 2010: datation et localisation d’articles de presse francophones. In: DEFT 2010, Montréal (2010)
Google Scholar
Kocourek, R.: La langue française de lá technique et de la science, 2nd edn. Oscar Branstetter, Wiesbaden (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Universitari de Linguistique Applicada - UPF, Roc Boronat, 138, E-08018, Barcelona, España
Iria da Cunha, M. Teresa Cabré & Jorge Vivaldi
Grupo de Ingenierá Lingüística - Instituto de Ingeniería UNAM Torre de IngenieríÂa Basamento, Ciudad Universitaria Mexico, D.F. 04510, Mexico
Iria da Cunha, Gerardo Sierra & Juan Manuel Torres-Moreno
Laboratoire Informatique d’Avignon, UAPV, 339 chemin des Meinajaries, Cedex 9, BP91228, 84911, Avignon, France
Iria da Cunha, Eric SanJuan & Juan Manuel Torres-Moreno
Département de génie informatique, École Polytechnique de Montréal, CP 6079, Succ. Centre Ville, H3C 3A7, Montréal, Québec, Canada
Juan Manuel Torres-Moreno

Authors

Iria da Cunha
View author publications
You can also search for this author in PubMed Google Scholar
M. Teresa Cabré
View author publications
You can also search for this author in PubMed Google Scholar
Eric SanJuan
View author publications
You can also search for this author in PubMed Google Scholar
Gerardo Sierra
View author publications
You can also search for this author in PubMed Google Scholar
Juan Manuel Torres-Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Vivaldi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

da Cunha, I., Cabré, M.T., SanJuan, E., Sierra, G., Torres-Moreno, J.M., Vivaldi, J. (2011). Automatic Specialized vs. Non-specialized Sentence Differentiation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics