Abstract
Currently, tremendous numbers of scientific and technical articles are being published due to the rapid development of the scientific and technical fields. Also, systems are being proposed which can give useful information to users by extracting information from scientific and technical articles. For such systems, we need to be able to extract information from a massive number of documents very fast and reliably. However, legacy parsers, such as Stanford, Enju and so on, cannot consider a large number of documents because such parsers analyze wide context range of the sentence for their parsing, and so those parsers require a lot of time to run. Therefore, in this paper, we report on the development of a parser which is based on MapReduce, a distributed and parallel programming model. Our parser has achieved about nineteen times better performance than that of one of the-state-of-the-art legacy parsers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kim, J., Lee, S., Jeong, D.-H., Jung, H.: Semantic Data Model and Service for Supporting Intelligent Legislation Establishment. In: The 2nd Joint International Semantic Technology Conference (2012)
Klein, D., Manning, C.D.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430 (2003)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
HDFS (hadoop distributed file system) architecture (2009), http://hadoop.apache.org/common/docs/current/hdfs-design.html
Shin, S., Um, J., Song, S.-K., Choi, S.-P., Jung, H.: uLAMP: unified Linguistic Assets Management System. In: The 2nd Joint International Semantic Technology Conference (2012)
Seo, D., Hwang, M.-N., Shin, S., Choi, S.: Development of Crawler System Gathering Web Document on Science and Technology. In: The 2nd Joint International Semantic Technology Conference (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Um, JH., Jeong, CH., Choi, SP., Lee, S., Jung, H. (2014). Fast Big Textual Data Parsing in Distributed and Parallel Computing Environment. In: Park, J., Adeli, H., Park, N., Woungang, I. (eds) Mobile, Ubiquitous, and Intelligent Computing. Lecture Notes in Electrical Engineering, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40675-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-40675-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40674-4
Online ISBN: 978-3-642-40675-1
eBook Packages: EngineeringEngineering (R0)