Abstract
This paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human–robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellegarda J (2004) Statistical Language Model Adaptation Review and Perspectives. Journal of Speech Communication 42:93–108
Million tweets per day (2011). http://blog.twitter.com/2011/06/200-million-tweets-per-day.html
Masataki H, Sagisaka Y, Tawahara T (1997) Task adaptation using MAP estimation in N-gram language model. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 783–786
Pietra S, Pietra V, Mercer R, Roukos S (1992) Adaptive language modeling using minimum discriminant estimation. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 633–636
Chang F, Dean J, Ghemawat S, Hsieh W, Deborah A, Wallach B, Chandra T, Fikes A, Gruber R (2006) BigTable: A distributed storage system for structured data, operating systems design and implementation’06, Seattle
Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2/
Welcome To Apache HBase, http://Hbase.apache.org
Web 1T 5-gram Version 1, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId= LDC2006T13
Dean J, Ghemawa S (2004) MapReduce: Simplied data processing on large clusters, operating systems design and implementation’04, Google labs pp 137–150
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this paper
Cite this paper
Jung, DY. et al. (2013). Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction. In: Park, J., Barolli, L., Xhafa, F., Jeong, HY. (eds) Information Technology Convergence. Lecture Notes in Electrical Engineering, vol 253. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6996-0_101
Download citation
DOI: https://doi.org/10.1007/978-94-007-6996-0_101
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6995-3
Online ISBN: 978-94-007-6996-0
eBook Packages: EngineeringEngineering (R0)