Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People

Lu, Jun-Li; Osone, Hiroyuki; Shitara, Akihisa; Iijima, Ryo; Ryskeldiev, Bektur; Sarcar, Sayan; Ochiai, Yoichi

doi:10.1007/978-3-030-78095-1_30

Jun-Li Lu^10,11,
Hiroyuki Osone¹⁰,
Akihisa Shitara¹⁰,
Ryo Iijima¹⁰,
Bektur Ryskeldiev^10,11,
Sayan Sarcar^10,11 &
…
Yoichi Ochiai^10,11

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12769))

Included in the following conference series:

International Conference on Human-Computer Interaction

1235 Accesses
1 Citations

Abstract

Indoor navigation systems guide a user to his/her specified destination. However, current navigation systems face the challenges when a user provides ambiguous descriptions about the destinations. This can commonly happen to visually impaired people or those who are unfamiliar with new environments. For example, in an office, a low-vision person asks the navigator by saying “Take me to where I can take a rest?". The navigator may recognize each object (e.g., desk) in the office but may not recognize which location the user can take a rest. To overcome the gap of surrounding understanding between low-vision people and a navigator, we propose a personalized interactive navigation system that links user’s ambiguous descriptions to indoor objects. We build a navigation system that automatically detect and describe objects in the environment by neural-network models. Further, we personalize the navigation by re-training the recognition models based on previous interactive dialogues, which may contain the corresponding between user’s understanding and the visual images or shapes of objects. In addition, we utilize a GPU cloud for supporting computational cost and smooth the navigation by locating user’s position using Visual SLAM. We discussed further research on customizable navigation with multi-aspect perceptions of disabilities and the limitation of AI-assisted recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
YOLOv4, https://github.com/Tianxiaomo/pytorch-YOLOv4.
2.
Image Captioning, https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning.
3.
OpenVSLAM, https://github.com/xdspacelab/openvslam.
4.
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html.
5.
Google Glass Enterprise Edition 2, https://www.google.com/glass/tech-spec.
6.
Checkerboard, https://markhedleyjones.com/projects/calibration-checkerboard-collection.
7.
https://cloud.google.com/speech-to-text.

References

Ahmetovic, D., Guerreiro, J., Ohn-Bar, E., Kitani, K.M., Asakawa, C.: Impact of expertise on interaction preferences for navigation assistance of visually impaired individuals. In: Proceedings of the 16th Web For All 2019 Conference - Personalizing the Web, W4A 2019, San Francisco, May 13–15, pp. 31:1–31:9. ACM (2019)
Google Scholar
Ahmetovic, D., Mascetti, S., Bernareggi, C., Guerreiro, J., Oh, U., Asakawa, C.: Deep learning compensation of rotation errors during navigation assistance for people with visual impairments or blindness. ACM Trans. Access. Comput. 12(4), 19:1–19:19 (2020)
Google Scholar
Ahmetovic, D., Sato, D., Oh, U., Ishihara, T., Kitani, K., Asakawa, C.: Recog: supporting blind people in recognizing personal objects. In: Bernhaupt, R., et al. (eds.) CHI 2020: CHI Conference on Human Factors in Computing Systems, Honolulu, April 25–30, pp. 1–12. ACM (2020)
Google Scholar
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOV4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)
Google Scholar
Giudice, N.A., Guenther, B.A., Kaplan, T.M., Anderson, S.M., Knuesel, R.J., Cioffi, J.F.: Use of an indoor navigation system by sighted and blind travelers: performance similarities across visual status and age. ACM Trans. Access. Comput. 13(3), 11:1–11:27 (2020)
Google Scholar
Guerreiro, J., Ahmetovic, D., Sato, D., Kitani, K., Asakawa, C.: Airport accessibility and navigation assistance for people with visual impairments. In: Brewster, S.A., Fitzpatrick, G., Cox, A.L., Kostakos, V. (eds.) Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow, 04–09, May, p. 16. ACM (2019)
Google Scholar
Guerreiro, J., Sato, D., Asakawa, S., Dong, H., Kitani, K.M., Asakawa, C.: Cabot: designing and evaluating an autonomous navigation robot for blind people. In: Bigham, J.P., Azenkot, S., Kane, S.K. (eds.) The 21st International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2019, Pittsburgh, 28–30, October, pp. 68–82. ACM (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, 27–30 June, pp. 770–778. IEEE Computer Society (2016)
Google Scholar
Idrees, A., Iqbal, Z., Ishfaq, M.: An efficient indoor navigation technique to find optimal route for blinds using QR codes. CoRR abs/2005.14517 (2020)
Google Scholar
Jabnoun, H., Hashish, M.A., Benzarti, F.: Mobile assistive application for blind people in indoor navigation. In: Jmaiel, M., Mokhtari, M., Abdulrazak, B., Aloulou, H., Kallel, S. (eds.) ICOST 2020. LNCS, vol. 12157, pp. 395–403. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51517-1_36
Chapter Google Scholar
Kayukawa, S., Ishihara, T., Takagi, H., Morishima, S., Asakawa, C.: Blindpilot: a robotic local navigation system that leads blind people to a landmark object. In: Bernhaupt, R., et al. (eds.) Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI 2020, Honolulu, 25–30 April, pp. 1–9. ACM (2020)
Google Scholar
Kuriakose, B., Shrestha, R., Sandnes, F.E.: Smartphone navigation support for blind and visually impaired people - a comprehensive analysis of potentials and opportunities. In: Antona, M., Stephanidis, C. (eds.) HCII 2020. LNCS, vol. 12189, pp. 568–583. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49108-6_41
Chapter Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ohn-Bar, E., Guerreiro, J., Kitani, K., Asakawa, C.: Variability in reactions to instructional guidance during smartphone-based assisted navigation of blind users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2(3), 131:1–131:25 (2018)
Google Scholar
Plikynas, D., Zvironas, A., Gudauskis, M., Budrionis, A., Daniusis, P., Sliesoraityte, I.: Research advances of indoor navigation for blind people: a brief review of technological instrumentation. IEEE Instrum. Meas. Mag. 23(4), 22–32 (2020)
Article Google Scholar
Sato, D., et al.: NavCog3 in the wild: large-scale blind indoor navigation assistant with semantic features. ACM Trans. Access. Comput. 12(3), 14:1–14:30 (2019)
Google Scholar
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. CoRR abs/1808.03314 (2018)
Google Scholar
Sumikura, S., Shibuya, M., Sakurada, K.: OpenVSLAM: a versatile visual SLAM framework. In: Proceedings of the 27th ACM International Conference on Multimedia MM 2019, pp. 2292–2295 (2019)
Google Scholar
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July, JMLR Workshop and Conference Proceedings, vol. 37, pp. 2048–2057. JMLR.org (2015)
Google Scholar
Younis, A., Li, S., Jn, S., Hai, Z.: Real-time object detection using pre-trained deep learning models mobilenet-SSD. In: ICCDE 2020: The 6th International Conference on Computing and Data Engineering, Sanya, China, 4–6 January, pp. 44–48. ACM (2020)
Google Scholar

Download references

Acknowledgement

This work was supported by Japan Science and Technology Agency (JST CREST: JPMJCR19F2). Research Representative: Prof. Yoichi Ochiai, University of Tsukuba, Japan.

Author information

Authors and Affiliations

Research and Development Center for Digital Nature, University of Tsukuba, Tsukuba, Japan
Jun-Li Lu, Hiroyuki Osone, Akihisa Shitara, Ryo Iijima, Bektur Ryskeldiev, Sayan Sarcar & Yoichi Ochiai
Faculty of Library, Information and Media Science, University of Tsukuba, Tsukuba, Japan
Jun-Li Lu, Bektur Ryskeldiev, Sayan Sarcar & Yoichi Ochiai

Authors

Jun-Li Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Osone
View author publications
You can also search for this author in PubMed Google Scholar
Akihisa Shitara
View author publications
You can also search for this author in PubMed Google Scholar
Ryo Iijima
View author publications
You can also search for this author in PubMed Google Scholar
Bektur Ryskeldiev
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Sarcar
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Ochiai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun-Li Lu .

Editor information

Editors and Affiliations

Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis

A Neural Networks used in Recognition Models

We showed how to train our recognition models, as shown in Fig. 4, as follows. For detecting objects, we utilized the model of YOLOv4 [4], and there were eight object classes, which are “electric fan", “monitor", “chair", “locker", “door", “microwave", “blackboard", and “desk", trained in the demonstration. For describing objects in an environment, we utilized a typical model of image captioning [20]. In the demonstration, there were some sentences of user descriptions attached with the images of some objects. The spoken sentences from the user were translated by Google API^{Footnote 7}. Note that we ran transfer learning on the model of image captioning, since the basic recognition ability for textual descriptions on common visual images might be needed. We continued the training of image captioning on a model of weights, which were pre-trained on Microsoft COCO [13].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, JL. et al. (2021). Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments. HCII 2021. Lecture Notes in Computer Science(), vol 12769. Springer, Cham. https://doi.org/10.1007/978-3-030-78095-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-78095-1_30
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78094-4
Online ISBN: 978-3-030-78095-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Neural Networks used in Recognition Models

A Neural Networks used in Recognition Models

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation