Skip to main content

Extracting Location Names from Unstructured Italian Texts Using Grammar Rules and MapReduce

  • Conference paper
  • First Online:
Information and Software Technologies (ICIST 2016)

Abstract

Named entity recognition aims at locating elements in a given text and classifying them according to pre-defined categories, such as the names of persons, organisations, locations, quantities, etc. This paper proposes an approach to recognise the location names by extracting them from unstructured Italian language texts. We put forward the use of the framework MapReduce for this task, since it is more robust than a classical analysis when data are unknown and assists at parallelising processing, which is essential for a large amount of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Apache, Hadoop (2016). http://hadoop.apache.org

  • Calvagna, A., Tramontana, E.: Delivering dependable reusable components by expressing and enforcing design decisions. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), Kyoto, Japan, 22–26 July, pp. 493–498 (2013)

    Google Scholar 

  • Caruso, D., Giunta, R., Messina, D., Pappalardo, G., Tramontana, E.: Rule-based location extraction from Italian unstructured text. In: Proceedings of XVI Workshop From Object to Agents (WOA), 17–19 July, vol. 1382, pp. 46–52 2015

    Google Scholar 

  • Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)

    Article  Google Scholar 

  • Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  • Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 2733–2739. Morgan Kaufmann Publishers Inc. (2007)

    Google Scholar 

  • Fornaia, A., Napoli C., Pappalardo, G., Tramontana, E.: An AOP-RBPNN approach to infer user interests and mine contents on social media. Intelligenza Artificiale 9, 209–219 (2015). IOS Press

    Google Scholar 

  • Giunta, R., Pappalardo, G., Tramontana, E.: Superimposing roles for design patterns into application classes by means of aspects. In: Proceedings of the ACM Symposium on Applied Computing (SAC), 26–30 March 2012, Trento, Italy, pp. 1866–1868

    Google Scholar 

  • Marszalek, Z., Wozniak, M., Borowik, G., Wazirali, R., Napoli, C., Pappalardo, G., Tramontana, E.: Benchmark tests on improved merge for big data processing. In: Proceedings of IEEE Asia-Pacific Conference on Computer Aided System Engineering (APCASE), pp. 96–101, July 2015

    Google Scholar 

  • Napoli, C., Pappalardo, G., Tramontana, E.: An agent-driven semantical identifier using radial basis neural networks and reinforcement learning. In: Proceedings of XV Workshop From Object to Agents (WOA), vol. 1260, 25–26 September 2014

    Google Scholar 

  • Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intell. 194, 151–175 (2013). Elsevier

    Article  MathSciNet  MATH  Google Scholar 

  • Nowak, B.A., Nowicki, R.K., Woźniak, M., Napoli, C.: Multi-class nearest neighbour classifier for incomplete data handling. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing. LNCS, vol. 9119, pp. 469–480. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  • Nowicki, R.K., Nowak, B., Woźniak, M.: Application of rough sets in k nearest neighbours algorithm for classification of incomplete samples. In: Kunifuji, S., Papadopoulos, G.A., Skulimowski, A.M.J., Kacprzyk, J. (eds.) Knowledge, Information and Creativity Support Systems. AISC, vol. 416, pp. 243-257. Springer, Switzerland (2016). doi:10.1007/978-3-319-27478-2_17, ISSN: 2194-5357

    Google Scholar 

  • Peleato, R.A., Chappelier, J.-C., Rajman, M.: Automated information extraction out of classified advertisements. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, p. 203. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  • Połap, D., Woźniak, M., Napoli, C., Tramontana, E.: Real-time cloud-based game management system via cuckoo search algorithm. Int. J. Electron. Telecommun. 61(4), 333–338 (2015)

    Google Scholar 

  • Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html

  • Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2008)

    Article  MATH  Google Scholar 

  • Tramontana, E.: Automatically characterising components with concerns and reducing tangling. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), Kyoto, Japan, 22–26 July, pp. 499–504 (2013)

    Google Scholar 

  • Woźniak, M., Marszałek, Z., Gabryel, M., Nowicki, R.K.: Preprocessing large data sets by the use of quick sort algorithm. In: Skulimowski, A.M.J., Kacprzyk, J. (eds.) Knowledge, Information and Creativity Support Systems: Recent Trends, Advances and Solutions. AISC, vol. 364, pp. 111–121. Springer, Switzerland (2016). doi:10.1007/978-3-319-19090-7_9, ISSN: 2194-5357

    Google Scholar 

  • Wozniak, M., Polap, D., Borowik, G., Napoli, C.: A first attempt to cloud-based user verification in distributed system. In Proceedings of IEEE Asia-Pacific Conference on Computer Aided System Engineering (APCASE), pp. 226–231, July 2015

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by project PRIME “Piattaforma di Reasoning Integrata, Multimedia, Esperta” funded by Regione Sicilia within PO FESR Sicilia 2007/2013 framework, and FIR project “Organizzazione e trattamento di trascrizioni e testi in scenari di security”, code 375E90.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Napoli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Napoli, C., Tramontana, E., Verga, G. (2016). Extracting Location Names from Unstructured Italian Texts Using Grammar Rules and MapReduce. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46254-7_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46253-0

  • Online ISBN: 978-3-319-46254-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics