Scene Understanding Based on Sound and Text Information for a Cooking Support Robot

Kojima, Ryosuke; Sugiyama, Osamu; Nakadai, Kazuhiro

doi:10.1007/978-3-319-19066-2_64

Ryosuke Kojima⁹,
Osamu Sugiyama⁹ &
Kazuhiro Nakadai⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9101))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2801 Accesses

Abstract

We address noise-robust “auditory scene understanding” for a robot defined by extracting 6W (What, When, Where, Who, Why, hoW) information on the surrounding environment. Although such a robot has been studied in the field of robot audition, only the first four Ws except for “why” and “how” were in scope. Thus, this paper mainly focuses on extracting “how” information, in particular, on cooking scenes to realize a cooking support robot. In this case, “how” information is regarded as a cooking procedure, we construct sound-based cooking procedure recognition based on two models. One is a conventional statistical model, Gaussian Mixture Model (GMM), which is used for an acoustic model to recognize a cooking sound event such as stirring, cutting and so on. The other is a Hierarchical Hidden Markov Model (HHMM), which is used for a recipe model to recognize a sequence of cooking events, i.e., a cooking procedure. We constructed a prototype system for cooking recipe and procedure recognition. Preliminary results showed that the proposed GMM-HHMM based system outperformed a conventional GMM-HMM based system in terms of noise-robustness in cooking recipe recognition and our system was able to correct misrecognition of cooking sound events using recipe model in cooking procedure recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recognition of Heat-Induced Food State Changes by Time-Series Use of Vision-Language Model for Cooking Robot

Spectrogram Analysis and Text Conversion of Sound Signal for Query Generation to Give Input to Audio Input Device

“KogniChef”: A Cognitive Cooking Assistant

Article 18 March 2017

References

Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. (JAIR) 24, 305–339 (2005)
MATH Google Scholar
Uslar, M., Specht, M., Rohjans, S., Trefke, J., Gonzalez, J.M.V.: Introduction. In: Uslar, M., Specht, M., Rohjans, S., Trefke, J., Vasquez Gonzalez, J.M. (eds.) The Common Information Model CIM. POWSYS, vol. 2, pp. 3–48. Springer, Heidelberg (2012)
Chapter Google Scholar
Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden markov model: Analysis and applications. Machine Learning 32(1), 41–62 (1998)
Article MATH Google Scholar
Hashimoto, A., Mori, N., et al.: Smart kitchen: A user centric cooking support system. Proc. of IPMU 8, 848–854 (2008)
Google Scholar
Inoue, Y., Minato, S.: An Efficient Method for Indexing All Topological Orders of a Directed Graph. In: Ahn, H.-K., Shin, C.-S. (eds.) ISAAC 2014. LNCS, vol. 8889, pp. 103–114. Springer, Heidelberg (2014)
Chapter Google Scholar
Khoo, C.S., Chan, S., Niu, Y.: Extracting causal knowledge from a medical database using graphical patterns. In: Proc. of the 38th Annual Meeting on Association for Computational Linguistics, pp. 336–343. ACL (2000)
Google Scholar
Kudo, T., Matsumoto, Y.: Japanese dependency analysis using cascaded chunking. In: Proc. of the 6th Conf. on Natural Language Learning, vol. 20, pp. 1–7 (2002)
Google Scholar
Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to japanese morphological analysis. EMNLP 4, 230–237 (2004)
Google Scholar
Liao, L., Fox, D., Kautz, H.: Hierarchical conditional random fields for gps-based activity recognition. In: Robotics Research, pp. 487–506. Springer (2007)
Google Scholar
Mori, S., Maeta, H., et al.: Flow graph corpus from recipe texts. In: Proc. of the 9th International Conf. on Language Resources and Evaluation (2014)
Google Scholar
Nakadai, K., Lourens, T., et al.: Active audition for humanoid. In: AAAI/IAAI, pp. 832–839 (2000)
Google Scholar
Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical modeling. J. of Artificial Intelligence Research 15(1), 391–454 (2001)
MATH MathSciNet Google Scholar
Spriggs, E.H., De La Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: CVPR Workshops 2009. IEEE Computer Society Conference, pp. 17–24 (2009)
Google Scholar
Truyen, T.T., Phung, D., et al.: Hierarchical semi-markov conditional random fields for recursive sequential data. In: Advances in Neural Information Processing Systems, pp. 1657–1664 (2009)
Google Scholar
Yamakata, Y., Imahori, S., Sugiyama, Y., Mori, S., Tanaka, K.: Feature Extraction and Summarization of Recipes Using Flow Graph. In: Jatowt, A., Lim, E.-P., Ding, Y., Miura, A., Tezuka, T., Dias, G., Tanaka, K., Flanagin, A., Dai, B.T. (eds.) SocInfo 2013. LNCS, vol. 8238, pp. 241–254. Springer, Heidelberg (2013)
Chapter Google Scholar
Yamakata, Y., Tsuchimoto, Y., et al.: Cooking ingredient recognition based on the load on a chopping board during cutting. In: 2011 IEEE International Symposium on Multimedia. pp. 381–386 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Engineering, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku Tokyo, Japan
Ryosuke Kojima, Osamu Sugiyama & Kazuhiro Nakadai

Authors

Ryosuke Kojima
View author publications
You can also search for this author in PubMed Google Scholar
Osamu Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Nakadai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryosuke Kojima .

Editor information

Editors and Affiliations

Texas State University, San Marcos, Texas, USA
Moonis Ali
Dongguk University, Seoul, Korea, Republic of (South Korea)
Young Sig Kwon
Dongguk University, Seoul, Korea, Republic of (South Korea)
Chang-Hwan Lee
Dongguk University, Seoul, Korea, Republic of (South Korea)
Juntae Kim
Seoul National University, Seoul, Korea, Republic of (South Korea)
Yongdai Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kojima, R., Sugiyama, O., Nakadai, K. (2015). Scene Understanding Based on Sound and Text Information for a Cooking Support Robot. In: Ali, M., Kwon, Y., Lee, CH., Kim, J., Kim, Y. (eds) Current Approaches in Applied Artificial Intelligence. IEA/AIE 2015. Lecture Notes in Computer Science(), vol 9101. Springer, Cham. https://doi.org/10.1007/978-3-319-19066-2_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-19066-2_64
Published: 01 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19065-5
Online ISBN: 978-3-319-19066-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics