Skip to main content

Automatic Pattern Generator of Natural Language Text Applied in Public Health

  • Conference paper
  • First Online:
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015)

Abstract

At the moment, a huge amount of scientific articles is available, referring to a wide variety of topics like medicine, technology, economics, finance, and so on. Scientific papers show results of scientific interest and also present the evaluation and interpretation of relevant arguments. Due to the fact that these papers are created with a high frequency it is feasible to analyze how people write in a given domain. Within the discipline of natural language processing there are different approaches to analyze large amounts of text corpus. Identification patterns with semantic elements in a text, let us classify and examine the corpus to facilitate interpretation and management of information through computers. At the moment, a semiautomatic or automatic way to generate natural language patterns is not available or quite complicated. In the paper, it is shown how a tool developed for this research is tested in a domain of public health. The results obtained – by means of a tool and aided by graphs – provide groups of words that are used (to determine if they come from a specific vocabulary), most common grammatical categories, most repeated words in a domain, patterns found, and frequency of patterns found. A domain of public health has been selected containing 800 papers concerning different topics referring to genetics. The topics include mutations, genetic deafness, DNA, trinucleotide, suppressor genes, among others. An ontology of public health has been used to provide the basis of the study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abney, S.: Part-of-speech tagging and partial parsing. In: Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech Processing. An ELSNET book. Bluwey Academic Publishers, Dordrecht (1997)

    Google Scholar 

  2. Alonso, L.: Herramientas Libres para Procesamiento del Lenguaje Natural. Facultad de Matemática, Astronomía y Física. UNC, Córdoba, Argentina. 5tas Jornadas Regionales de Software Libre. 20 de noviembre de 2005. http://www.cs.famaf.unc.edu.ar/~laura/freeNLP

  3. Amsler, R.A.: A taxonomy for English nouns and verbs. In: Proceedings of the 19th Annual Meeting of the Association for Computational Linguistic, Stanford, California, pp. 133–138 (1981)

    Google Scholar 

  4. Carreras, X., Márquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the 4th RANLP Conference, Borovets, Bulgaria, September 2003

    Google Scholar 

  5. Cowie, J., Wilks, Y.: Information Extraction. In: Dale, R. (ed.) Handbook of Natural Language Processing, pp. 241–260. Marcel Dekker, New York (2000)

    Google Scholar 

  6. Dale, R.: Symbolic approaches to natural language processing. In: Dale, R. (ed.) Handbook of Natural Language Processing. Marcel Dekker, New York (2000)

    Google Scholar 

  7. Gómez-Pérez, A., Fernando-López, M., Corcho, O.: Ontological Engineering. Springer, London (2004)

    Google Scholar 

  8. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computations. Addison-Wesley, Reading (1979)

    MATH  Google Scholar 

  9. Llorens, J., Morato, J., Genova, G.: RSHP: an information representation model based on relationships. In: Damiani, E., Jain, L.C., Madravio, M. (eds.) Soft Computing in Software Engineering. Studies in Fuzziness and Soft Computing Series, vol. 159, pp. 221–253. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Llorens, J.: Definición de una Metodología y una Estructura de Repositorio orientadas a la Reutilización: el Tesauro de Software. Universidad Carlos III (1996)

    Google Scholar 

  11. Christopher, Manning: Foundations of Statistic Natural Language Processing, p. 81. Cambridge University, Cambridge (1999)

    Google Scholar 

  12. Martí, M.A., Llisterri, J.: Tratamiento del lenguaje natural, p. 207. Universitat de Barcelona, Barcelona (2002)

    Google Scholar 

  13. Moreno, V.: Representación del conocimiento de proyectos de software mediante técnicas automatizadas. Anteproyecto de Tesis Doctoral. Universidad Carlos III de Madrid, Marzo (2009)

    Google Scholar 

  14. Poesio, M.: Semantic Analysis. In: Dale, R. (ed.) Handbook of Natural Language Processing. Marcel Dekker, New York (2000)

    Google Scholar 

  15. Rehberg, C.P.: Automatic pattern generation in natural language processing. United States Patent. US 8,180,629 B2, 15 May 2012, January 2010

    Google Scholar 

  16. Riley, M.D.: Some applications of tree-based modeling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop. Morgan Kaufmann, California, pp. 339–352 (1989)

    Google Scholar 

  17. Suarez, P., Moreno, V., Fraga, A., Llorens, J.: Automatic generation of semantic patterns using techniques of natural language processing. In: SKY, pp. 34–44 (2013)

    Google Scholar 

  18. Thomason, R.H.: What is Semantics? Version 2. 27 March 2012. http://web.eecs.umich.edu/~rthomaso/documents/general/what-is-semantics.html

  19. Triviño, J.L., Morales Bueno, R.: A Spanish POS tagger with variable memory. In: Proceedings of the Sixth International Workshop on Parsing Technologies (IWPT-2000). ACL/SIGPARSE, Trento, Italia, pp. 254–265 (2000)

    Google Scholar 

  20. Weischedel, R., Metter, M., Schwartz, R., Ramshaw, L., Palmucci, J.: Coping with ambiguity and unknown through probabilistic models. Comput. Linguist. 19 369–382

    Google Scholar 

Download references

Acknowledgements

The Authors Thank the AGO2 Project, Founded by the Ministry of Education of Spain for Aiding the Author in the Research and Production of This Paper.

The Research Leading to These Results Has Received Funding from the European Union’s Seventh Framework Program (FP7/2007-2013) for Crystal – Critical System Engineering Acceleration Joint Undertaking under Grant Agreement No 332830 and from Specific National Programs and/or Funding Authorities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anabel Fraga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Fraga, A., Llorens, J., Parra, E., Moreno, V. (2016). Automatic Pattern Generator of Natural Language Text Applied in Public Health. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2015. Communications in Computer and Information Science, vol 631. Springer, Cham. https://doi.org/10.1007/978-3-319-52758-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52758-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52757-4

  • Online ISBN: 978-3-319-52758-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics