Skip to main content

How See the Colorful Scenery?: The Color-Centered Descriptive Text Generation for the Visually Impaired in Japan

  • Conference paper
  • First Online:
HCI International 2022 Posters (HCII 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1580))

Included in the following conference series:

  • 1455 Accesses

Abstract

It is difficult for the visually impaired to understand landscapes and visual arts. Since 2010s, technology development for converting visual information into tactile or auditory information has been investigated to improve the contents understanding of visually impaired. From this background, applications describing user’s surroundings from image recognition technology has been developed.

However, these applications currently generate little description of color, which is insufficient for visually impaired people to enjoy landscapes and visual arts. Therefore, in this research, we have added additional explanation and experimented the viewing experience for the visually impaired on the focus to avoid elements that confuse the user from excessive amount of features.

First, we have discovered that colors are described intensively comparing to shapes and sizes from analyzing Japanese audio descriptions, and generated additional description of them in the implemented system.

Next, we have evaluated the implemented scene describing system by conducting experiment on 17 visually impaired. As a result, there was an opinion that it became possible to imagine colors. Nevertheless, there were some opinions that explanation was difficult to understand. These were three types of opinions. 1. Unknown or abstract words 2. Unnatural combination of words 3. The big shift from the first imagination.

As the future prospect, we aim to well assist the visually impaired to understand the landscape in detail and instantly by generate a corpus based on explanatory texts for people with disabilities, such as audio guides, and to generate sentences that take into account word-for-word affiliation and the order of the generated sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Python 3.7.12.

  2. 2.

    Scikit-learn version 1.0.1.

  3. 3.

    Chiba Institute of Technology http://captions.stair.center/.

References

  1. ADLAB: Adlab audio description guidelines (2014). http://www.adlabproject.eu/Docs/adlab%20book/index.html

  2. Antol, S., et al.: VQA: visual question answering. In: International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  3. Arma, S.: “Why can’t you wear black shoes like the other mothers?" Preliminary investigation on the Italian language of audio description. EUT Edizioni Università di Trieste (2012)

    Google Scholar 

  4. Asakawa, S., et al.: An independent and interactive museum experience for blind people. In: Proceedings of the 16th International Web for All Conference, W4A 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3315002.3317557

  5. Bigham, J.P., et al.: VizWiz: nearly real-time answers to visual questions. In: Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A), W4A 2010. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1805986.1806020

  6. Brady, E., Morris, M.R., Zhong, Y., White, S., Bigham, J.P.: Visual challenges in the everyday lives of blind people. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2013, pp. 2117–2126. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2470654.2481291

  7. iyamadesign: color scheme collection. BNN, Inc. (2020)

    Google Scholar 

  8. Jiménez Hurtado, C., Soler Gallego, S.: Multimodality, translation and accessibility: a corpus-based study of audio description. Perspectives 21(4), 577–594 (2013)

    Article  Google Scholar 

  9. Johnson, J., Karpathy, A., Fei-Fei, L.: DenseCap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  10. Kita, K.: Probabilistic Language Models. University of Tokyo Press (1999)

    Google Scholar 

  11. Kobayashi, M., Yosiki, K.: Mathematical relation among PCCS tones, PCCS color attributes and Munsell color attributes. In: Journal of the Color Science Association of Japan, pp. 249–261. The Color Science Association of Japan (2001). https://ci.nii.ac.jp/naid/110001709729

  12. Morris, M.R., Johnson, J., Bennett, C.L., Cutrell, E.: Rich representations of visual content for screen reader users, pp. 1–11. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3173574.3173633

  13. Perego, E.: Da dove viene e dove va l’audiodescrizione filmica per i ciechi e gli ipovedenti (2014)

    Google Scholar 

  14. Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 453–469 (2004)

    Article  Google Scholar 

  15. Salway, A.: A corpus-based analysis of audio description. In: Media for All, pp. 151–174. Brill (2007)

    Google Scholar 

  16. Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representation. Ann. Rev. Neurosci. 24(1), 1193–1216 (2001)

    Article  Google Scholar 

  17. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, Lille, France, 07–09 July 2015, vol. 37, pp. 2048–2057. PMLR (2015). https://proceedings.mlr.press/v37/xuc15.html

Download references

Acknowledgment

We are grateful to the laboratory member for useful comments. This work is supported by CREST and AIP Challenge from the JST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chieko Nishimura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nishimura, C., Kondo, N., Murakami, T., Torii, M., Niwa, R., Ochiai, Y. (2022). How See the Colorful Scenery?: The Color-Centered Descriptive Text Generation for the Visually Impaired in Japan. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2022 Posters. HCII 2022. Communications in Computer and Information Science, vol 1580. Springer, Cham. https://doi.org/10.1007/978-3-031-06417-3_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06417-3_75

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06416-6

  • Online ISBN: 978-3-031-06417-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics