Abstract
Named entity recognition aims at locating elements in a given text and classifying them according to pre-defined categories, such as the names of persons, organisations, locations, quantities, etc. This paper proposes an approach to recognise the location names by extracting them from unstructured Italian language texts. We put forward the use of the framework MapReduce for this task, since it is more robust than a classical analysis when data are unknown and assists at parallelising processing, which is essential for a large amount of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache, Hadoop (2016). http://hadoop.apache.org
Calvagna, A., Tramontana, E.: Delivering dependable reusable components by expressing and enforcing design decisions. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), Kyoto, Japan, 22–26 July, pp. 493–498 (2013)
Caruso, D., Giunta, R., Messina, D., Pappalardo, G., Tramontana, E.: Rule-based location extraction from Italian unstructured text. In: Proceedings of XVI Workshop From Object to Agents (WOA), 17–19 July, vol. 1382, pp. 46–52 2015
Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 2733–2739. Morgan Kaufmann Publishers Inc. (2007)
Fornaia, A., Napoli C., Pappalardo, G., Tramontana, E.: An AOP-RBPNN approach to infer user interests and mine contents on social media. Intelligenza Artificiale 9, 209–219 (2015). IOS Press
Giunta, R., Pappalardo, G., Tramontana, E.: Superimposing roles for design patterns into application classes by means of aspects. In: Proceedings of the ACM Symposium on Applied Computing (SAC), 26–30 March 2012, Trento, Italy, pp. 1866–1868
Marszalek, Z., Wozniak, M., Borowik, G., Wazirali, R., Napoli, C., Pappalardo, G., Tramontana, E.: Benchmark tests on improved merge for big data processing. In: Proceedings of IEEE Asia-Pacific Conference on Computer Aided System Engineering (APCASE), pp. 96–101, July 2015
Napoli, C., Pappalardo, G., Tramontana, E.: An agent-driven semantical identifier using radial basis neural networks and reinforcement learning. In: Proceedings of XV Workshop From Object to Agents (WOA), vol. 1260, 25–26 September 2014
Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intell. 194, 151–175 (2013). Elsevier
Nowak, B.A., Nowicki, R.K., Woźniak, M., Napoli, C.: Multi-class nearest neighbour classifier for incomplete data handling. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing. LNCS, vol. 9119, pp. 469–480. Springer, Heidelberg (2015)
Nowicki, R.K., Nowak, B., Woźniak, M.: Application of rough sets in k nearest neighbours algorithm for classification of incomplete samples. In: Kunifuji, S., Papadopoulos, G.A., Skulimowski, A.M.J., Kacprzyk, J. (eds.) Knowledge, Information and Creativity Support Systems. AISC, vol. 416, pp. 243-257. Springer, Switzerland (2016). doi:10.1007/978-3-319-27478-2_17, ISSN: 2194-5357
Peleato, R.A., Chappelier, J.-C., Rajman, M.: Automated information extraction out of classified advertisements. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, p. 203. Springer, Heidelberg (2001)
Połap, D., Woźniak, M., Napoli, C., Tramontana, E.: Real-time cloud-based game management system via cuckoo search algorithm. Int. J. Electron. Telecommun. 61(4), 333–338 (2015)
Porter, M.F.: Snowball: a language for stemming algorithms (2001). http://snowball.tartarus.org/texts/introduction.html
Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2008)
Tramontana, E.: Automatically characterising components with concerns and reducing tangling. In: Proceedings of IEEE International Computer Software and Applications Conference (COMPSAC), Kyoto, Japan, 22–26 July, pp. 499–504 (2013)
Woźniak, M., Marszałek, Z., Gabryel, M., Nowicki, R.K.: Preprocessing large data sets by the use of quick sort algorithm. In: Skulimowski, A.M.J., Kacprzyk, J. (eds.) Knowledge, Information and Creativity Support Systems: Recent Trends, Advances and Solutions. AISC, vol. 364, pp. 111–121. Springer, Switzerland (2016). doi:10.1007/978-3-319-19090-7_9, ISSN: 2194-5357
Wozniak, M., Polap, D., Borowik, G., Napoli, C.: A first attempt to cloud-based user verification in distributed system. In Proceedings of IEEE Asia-Pacific Conference on Computer Aided System Engineering (APCASE), pp. 226–231, July 2015
Acknowledgements
This work has been partially supported by project PRIME “Piattaforma di Reasoning Integrata, Multimedia, Esperta” funded by Regione Sicilia within PO FESR Sicilia 2007/2013 framework, and FIR project “Organizzazione e trattamento di trascrizioni e testi in scenari di security”, code 375E90.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Napoli, C., Tramontana, E., Verga, G. (2016). Extracting Location Names from Unstructured Italian Texts Using Grammar Rules and MapReduce. In: Dregvaite, G., Damasevicius, R. (eds) Information and Software Technologies. ICIST 2016. Communications in Computer and Information Science, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-46254-7_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46253-0
Online ISBN: 978-3-319-46254-7
eBook Packages: Computer ScienceComputer Science (R0)