Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction

Jung, Dae-Young; Lee, Hyuk-Jun; Park, Sung-Yong; Koo, Myoung-Wan; Kim, Ji-Hwan; Park, Jeong-sik; Jeon, Hyung-Bae; Lee, Yun-Keun

doi:10.1007/978-94-007-6996-0_101

Dae-Young Jung⁵,
Hyuk-Jun Lee⁵,
Sung-Yong Park⁵,
Myoung-Wan Koo⁵,
Ji-Hwan Kim⁵,
Jeong-sik Park⁶,
Hyung-Bae Jeon⁷ &
…
Yun-Keun Lee⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 253))

1122 Accesses

Abstract

This paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human–robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellegarda J (2004) Statistical Language Model Adaptation Review and Perspectives. Journal of Speech Communication 42:93–108
Article Google Scholar
Million tweets per day (2011). http://blog.twitter.com/2011/06/200-million-tweets-per-day.html
Masataki H, Sagisaka Y, Tawahara T (1997) Task adaptation using MAP estimation in N-gram language model. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 783–786
Google Scholar
Pietra S, Pietra V, Mercer R, Roukos S (1992) Adaptive language modeling using minimum discriminant estimation. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 633–636
Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh W, Deborah A, Wallach B, Chandra T, Fikes A, Gruber R (2006) BigTable: A distributed storage system for structured data, operating systems design and implementation’06, Seattle
Google Scholar
Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2/
Welcome To Apache HBase, http://Hbase.apache.org
Web 1T 5-gram Version 1, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId= LDC2006T13
Dean J, Ghemawa S (2004) MapReduce: Simplied data processing on large clusters, operating systems design and implementation’04, Google labs pp 137–150
Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sogang University, Seoul, South Korea
Dae-Young Jung, Hyuk-Jun Lee, Sung-Yong Park, Myoung-Wan Koo & Ji-Hwan Kim
Department of Intelligent Robot Engineering, Mokwon University, Daejon, South Korea
Jeong-sik Park
Electronics and Telecommunications Research Institute, Electronics and Telecommunications Research Institute, Daejon, South Korea
Hyung-Bae Jeon & Yun-Keun Lee

Authors

Dae-Young Jung
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk-Jun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Yong Park
View author publications
You can also search for this author in PubMed Google Scholar
Myoung-Wan Koo
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Hwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-sik Park
View author publications
You can also search for this author in PubMed Google Scholar
Hyung-Bae Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Keun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji-Hwan Kim .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Seoul University of Science & and Technology (SeoulTech), Seoul, Korea, Republic of (South Korea)
James J. (Jong Hyuk) Park
Dept. Information & Communication Engineering, Fukuoka Institute of Technology Fac. Information Engineering, Fukuoka, Japan
Leonard Barolli
Departament De Llenguatges I Sistemes Informàtics, Universitat Politècnica De Catalunya, Barcelona, Spain
Fatos Xhafa
Humanitas College, Kyung Hee University, Seoul, Korea, Republic of (South Korea)
Hwa-Young Jeong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, DY. et al. (2013). Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction. In: Park, J., Barolli, L., Xhafa, F., Jeong, HY. (eds) Information Technology Convergence. Lecture Notes in Electrical Engineering, vol 253. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6996-0_101

Download citation

DOI: https://doi.org/10.1007/978-94-007-6996-0_101
Published: 14 July 2013
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6995-3
Online ISBN: 978-94-007-6996-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics