Abstract
It is difficult for the visually impaired to understand landscapes and visual arts. Since 2010s, technology development for converting visual information into tactile or auditory information has been investigated to improve the contents understanding of visually impaired. From this background, applications describing user’s surroundings from image recognition technology has been developed.
However, these applications currently generate little description of color, which is insufficient for visually impaired people to enjoy landscapes and visual arts. Therefore, in this research, we have added additional explanation and experimented the viewing experience for the visually impaired on the focus to avoid elements that confuse the user from excessive amount of features.
First, we have discovered that colors are described intensively comparing to shapes and sizes from analyzing Japanese audio descriptions, and generated additional description of them in the implemented system.
Next, we have evaluated the implemented scene describing system by conducting experiment on 17 visually impaired. As a result, there was an opinion that it became possible to imagine colors. Nevertheless, there were some opinions that explanation was difficult to understand. These were three types of opinions. 1. Unknown or abstract words 2. Unnatural combination of words 3. The big shift from the first imagination.
As the future prospect, we aim to well assist the visually impaired to understand the landscape in detail and instantly by generate a corpus based on explanatory texts for people with disabilities, such as audio guides, and to generate sentences that take into account word-for-word affiliation and the order of the generated sentences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Python 3.7.12.
- 2.
Scikit-learn version 1.0.1.
- 3.
Chiba Institute of Technology http://captions.stair.center/.
References
ADLAB: Adlab audio description guidelines (2014). http://www.adlabproject.eu/Docs/adlab%20book/index.html
Antol, S., et al.: VQA: visual question answering. In: International Conference on Computer Vision (ICCV) (2015)
Arma, S.: “Why can’t you wear black shoes like the other mothers?" Preliminary investigation on the Italian language of audio description. EUT Edizioni Università di Trieste (2012)
Asakawa, S., et al.: An independent and interactive museum experience for blind people. In: Proceedings of the 16th International Web for All Conference, W4A 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3315002.3317557
Bigham, J.P., et al.: VizWiz: nearly real-time answers to visual questions. In: Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A), W4A 2010. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1805986.1806020
Brady, E., Morris, M.R., Zhong, Y., White, S., Bigham, J.P.: Visual challenges in the everyday lives of blind people. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2013, pp. 2117–2126. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2470654.2481291
iyamadesign: color scheme collection. BNN, Inc. (2020)
Jiménez Hurtado, C., Soler Gallego, S.: Multimodality, translation and accessibility: a corpus-based study of audio description. Perspectives 21(4), 577–594 (2013)
Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Kita, K.: Probabilistic Language Models. University of Tokyo Press (1999)
Kobayashi, M., Yosiki, K.: Mathematical relation among PCCS tones, PCCS color attributes and Munsell color attributes. In: Journal of the Color Science Association of Japan, pp. 249–261. The Color Science Association of Japan (2001). https://ci.nii.ac.jp/naid/110001709729
Morris, M.R., Johnson, J., Bennett, C.L., Cutrell, E.: Rich representations of visual content for screen reader users, pp. 1–11. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3173574.3173633
Perego, E.: Da dove viene e dove va l’audiodescrizione filmica per i ciechi e gli ipovedenti (2014)
Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 453–469 (2004)
Salway, A.: A corpus-based analysis of audio description. In: Media for All, pp. 151–174. Brill (2007)
Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representation. Ann. Rev. Neurosci. 24(1), 1193–1216 (2001)
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, Lille, France, 07–09 July 2015, vol. 37, pp. 2048–2057. PMLR (2015). https://proceedings.mlr.press/v37/xuc15.html
Acknowledgment
We are grateful to the laboratory member for useful comments. This work is supported by CREST and AIP Challenge from the JST.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nishimura, C., Kondo, N., Murakami, T., Torii, M., Niwa, R., Ochiai, Y. (2022). How See the Colorful Scenery?: The Color-Centered Descriptive Text Generation for the Visually Impaired in Japan. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2022 Posters. HCII 2022. Communications in Computer and Information Science, vol 1580. Springer, Cham. https://doi.org/10.1007/978-3-031-06417-3_75
Download citation
DOI: https://doi.org/10.1007/978-3-031-06417-3_75
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06416-6
Online ISBN: 978-3-031-06417-3
eBook Packages: Computer ScienceComputer Science (R0)