How See the Colorful Scenery?: The Color-Centered Descriptive Text Generation for the Visually Impaired in Japan

Nishimura, Chieko; Kondo, Naruya; Murakami, Takahito; Torii, Maya; Niwa, Ryogo; Ochiai, Yoichi

doi:10.1007/978-3-031-06417-3_75

Chieko Nishimura⁸,
Naruya Kondo⁸,
Takahito Murakami⁸,
Maya Torii⁸,
Ryogo Niwa⁸ &
…
Yoichi Ochiai⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1580))

Included in the following conference series:

International Conference on Human-Computer Interaction

1842 Accesses

Abstract

It is difficult for the visually impaired to understand landscapes and visual arts. Since 2010s, technology development for converting visual information into tactile or auditory information has been investigated to improve the contents understanding of visually impaired. From this background, applications describing user’s surroundings from image recognition technology has been developed.

However, these applications currently generate little description of color, which is insufficient for visually impaired people to enjoy landscapes and visual arts. Therefore, in this research, we have added additional explanation and experimented the viewing experience for the visually impaired on the focus to avoid elements that confuse the user from excessive amount of features.

First, we have discovered that colors are described intensively comparing to shapes and sizes from analyzing Japanese audio descriptions, and generated additional description of them in the implemented system.

Next, we have evaluated the implemented scene describing system by conducting experiment on 17 visually impaired. As a result, there was an opinion that it became possible to imagine colors. Nevertheless, there were some opinions that explanation was difficult to understand. These were three types of opinions. 1. Unknown or abstract words 2. Unnatural combination of words 3. The big shift from the first imagination.

As the future prospect, we aim to well assist the visually impaired to understand the landscape in detail and instantly by generate a corpus based on explanatory texts for people with disabilities, such as audio guides, and to generate sentences that take into account word-for-word affiliation and the order of the generated sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dense Captioning of Natural Scenes in Spanish

HELPI VIZ: A Semantic image Annotation and Visualization Platform for Visually Impaired

TextCaps: A Dataset for Image Captioning with Reading Comprehension

Notes

1.
Python 3.7.12.
2.
Scikit-learn version 1.0.1.
3.
Chiba Institute of Technology http://captions.stair.center/.

References

ADLAB: Adlab audio description guidelines (2014). http://www.adlabproject.eu/Docs/adlab%20book/index.html
Antol, S., et al.: VQA: visual question answering. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Arma, S.: “Why can’t you wear black shoes like the other mothers?" Preliminary investigation on the Italian language of audio description. EUT Edizioni Università di Trieste (2012)
Google Scholar
Asakawa, S., et al.: An independent and interactive museum experience for blind people. In: Proceedings of the 16th International Web for All Conference, W4A 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3315002.3317557
Bigham, J.P., et al.: VizWiz: nearly real-time answers to visual questions. In: Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A), W4A 2010. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1805986.1806020
Brady, E., Morris, M.R., Zhong, Y., White, S., Bigham, J.P.: Visual challenges in the everyday lives of blind people. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2013, pp. 2117–2126. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2470654.2481291
iyamadesign: color scheme collection. BNN, Inc. (2020)
Google Scholar
Jiménez Hurtado, C., Soler Gallego, S.: Multimodality, translation and accessibility: a corpus-based study of audio description. Perspectives 21(4), 577–594 (2013)
Article Google Scholar
Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Kita, K.: Probabilistic Language Models. University of Tokyo Press (1999)
Google Scholar
Kobayashi, M., Yosiki, K.: Mathematical relation among PCCS tones, PCCS color attributes and Munsell color attributes. In: Journal of the Color Science Association of Japan, pp. 249–261. The Color Science Association of Japan (2001). https://ci.nii.ac.jp/naid/110001709729
Morris, M.R., Johnson, J., Bennett, C.L., Cutrell, E.: Rich representations of visual content for screen reader users, pp. 1–11. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3173574.3173633
Perego, E.: Da dove viene e dove va l’audiodescrizione filmica per i ciechi e gli ipovedenti (2014)
Google Scholar
Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 453–469 (2004)
Article Google Scholar
Salway, A.: A corpus-based analysis of audio description. In: Media for All, pp. 151–174. Brill (2007)
Google Scholar
Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representation. Ann. Rev. Neurosci. 24(1), 1193–1216 (2001)
Article Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, Lille, France, 07–09 July 2015, vol. 37, pp. 2048–2057. PMLR (2015). https://proceedings.mlr.press/v37/xuc15.html

Download references

Acknowledgment

We are grateful to the laboratory member for useful comments. This work is supported by CREST and AIP Challenge from the JST.

Author information

Authors and Affiliations

Research and Development Center for Digital Nature, University of Tsukuba, Tsukuba, Japan
Chieko Nishimura, Naruya Kondo, Takahito Murakami, Maya Torii, Ryogo Niwa & Yoichi Ochiai

Authors

Chieko Nishimura
View author publications
You can also search for this author in PubMed Google Scholar
Naruya Kondo
View author publications
You can also search for this author in PubMed Google Scholar
Takahito Murakami
View author publications
You can also search for this author in PubMed Google Scholar
Maya Torii
View author publications
You can also search for this author in PubMed Google Scholar
Ryogo Niwa
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Ochiai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chieko Nishimura .

Editor information

Editors and Affiliations

University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nishimura, C., Kondo, N., Murakami, T., Torii, M., Niwa, R., Ochiai, Y. (2022). How See the Colorful Scenery?: The Color-Centered Descriptive Text Generation for the Visually Impaired in Japan. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2022 Posters. HCII 2022. Communications in Computer and Information Science, vol 1580. Springer, Cham. https://doi.org/10.1007/978-3-031-06417-3_75

Download citation

DOI: https://doi.org/10.1007/978-3-031-06417-3_75
Published: 16 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06416-6
Online ISBN: 978-3-031-06417-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

How See the Colorful Scenery?: The Color-Centered Descriptive Text Generation for the Visually Impaired in Japan