Skip to main content

Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction

  • Conference paper
  • First Online:
Information Technology Convergence

Abstract

This paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human–robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellegarda J (2004) Statistical Language Model Adaptation Review and Perspectives. Journal of Speech Communication 42:93–108

    Article  Google Scholar 

  2. Million tweets per day (2011). http://blog.twitter.com/2011/06/200-million-tweets-per-day.html

  3. Masataki H, Sagisaka Y, Tawahara T (1997) Task adaptation using MAP estimation in N-gram language model. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 783–786

    Google Scholar 

  4. Pietra S, Pietra V, Mercer R, Roukos S (1992) Adaptive language modeling using minimum discriminant estimation. In: Proceedings of international conference on acoustics, speech, and signal processing, pp 633–636

    Google Scholar 

  5. Chang F, Dean J, Ghemawat S, Hsieh W, Deborah A, Wallach B, Chandra T, Fikes A, Gruber R (2006) BigTable: A distributed storage system for structured data, operating systems design and implementation’06, Seattle

    Google Scholar 

  6. Amazon Elastic Compute Cloud (Amazon EC2), http://aws.amazon.com/ec2/

  7. Welcome To Apache HBase, http://Hbase.apache.org

  8. Web 1T 5-gram Version 1, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId= LDC2006T13

  9. Dean J, Ghemawa S (2004) MapReduce: Simplied data processing on large clusters, operating systems design and implementation’04, Google labs pp 137–150

    Google Scholar 

  10. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system, mass storage systems and technologies

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji-Hwan Kim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Jung, DY. et al. (2013). Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction. In: Park, J., Barolli, L., Xhafa, F., Jeong, HY. (eds) Information Technology Convergence. Lecture Notes in Electrical Engineering, vol 253. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6996-0_101

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-6996-0_101

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-6995-3

  • Online ISBN: 978-94-007-6996-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics