Systematic survey of anything-to-text recognition and constructing its framework in language learning

Hwang, Wu-Yuin; Nguyen, Van-Giap; Purba,  Siska Wati  Dewi

doi:10.1007/s10639-022-11112-6

Systematic survey of anything-to-text recognition and constructing its framework in language learning

Published: 27 May 2022

Volume 27, pages 12273–12299, (2022)
Cite this article

Education and Information Technologies Aims and scope Submit manuscript

439 Accesses
10 Citations
Explore all metrics

Abstract

Since recognition technology has been widely used to support learners' language learning, it is necessary to have a framework that can support the implementation of anything-to-text recognition technology, such as speech-to-text recognition, image-to-text recognition, body movement-to-text recognition, emotion-to-text recognition, and location-to-text recognition, into learning designs. Therefore, in this study, we aim to review published articles related to anything-to-text recognition in language learning from 2011 to 2020 and propose an anything-to-text recognition framework. A total of 48 articles passed the selection process of this study. The results showed that most of the published articles focused on English language learning and recruited university students to participate in their studies. In addition, most of the articles aimed to foster learners' listening skills, and very few of them paid attention to writing skills. Speech-to-text recognition was commonly used to help speaking and listening skills. Image-to-text recognition was usually used to help reading and listening skills. Body movement-to-text, emotion-to-text, and location-to-text recognition technologies were rarely used; however, these also had the potential to support language learning. Based on these findings, an anything-to-text recognition framework should consist of three important layers, namely learning representations, recognition accuracy, and learning effects with regard to learners' needs and imaginations in language learning supported by recognition technologies. Furthermore, this study also highlights the features of research trends and provides suggestions for researchers in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech-to-text recognition in University English as a Foreign Language Learning

Article 06 April 2022

Investigation of Multiple Recognitions Used for EFL Writing in Authentic Contexts

Investigating the effectiveness of speech-to-text recognition applications on learning performance, attention, and meditation

Article 15 February 2017

Data availability

The data that support the findings of this study are available on request from the corresponding author, Siska Wati Dewi Purba.

References

Ahn, T., & Lee, S. M. (2016). User experience of a mobile speaking application with automatic speech recognition for EFL learning. British Journal of Educational Technology, 47(4), 778–786. https://doi.org/10.1111/bjet.12354
Article Google Scholar
Arcon, N., Klein, P. D., & Dombroski, J. D. (2017). Effects of dictation, speech to text, and handwriting on the written composition of elementary school english language learners. Reading & Writing Quarterly, 33(6), 533–548. https://doi.org/10.1080/10573569.2016.1253513
Article Google Scholar
Asher, J. J. (1969). The total physical response approach to second language learning. The Modern Language Journal, 53(1), 3–17.
Google Scholar
Bain, K., Basson, S. H., & Wald, M. (2002). Speech recognition in university classrooms, 192. https://doi.org/10.1145/638281.638284
Bayat, N. (2014). The effect of the process writing approach on writing success and anxiety. Educational Sciences: Theory & Practice. https://doi.org/10.12738/estp.2014.3.1720
Brunner, A. (2013). Automatic recognition of speech, thought, and writing representation in German narrative texts. Literary and Linguistic Computing, 28(4), 563–575.
Article Google Scholar
Călin, A. D. (2016). Variation of pose and gesture recognition accuracy using two kinect versions. 2016 International Symposium on Innovations in Intelligent Systems and Applications (INISTA).
Castañeda, D. A. (2011). The effects of instruction enhanced by video/photo blogs and wikis on learning the distinctions of the Spanish preterite and imperfect. Foreign Language Annals, 44(4), 692–711.
Article Google Scholar
Caute, A., & Woolf, C. (2016). Using voice recognition software to improve communicative writing and social participation in an individual with severe acquired dysgraphia: An experimental single-case therapy study. Aphasiology, 30(2–3), 245–268.
Google Scholar
Chen, C. M., & Lee, T. H. (2011). Emotion recognition and communication for reducing second-language speaking anxiety in a web-based one-to-one synchronous learning environment. British Journal of Educational Technology, 42(3), 417–440. https://doi.org/10.1111/j.1467-8535.2009.01035.x
Article Google Scholar
Chen, H. H. J. (2011). Developing and evaluating an oral skills training website supported by automatic speech recognition technology. ReCALL, 23(1), 59–78. https://doi.org/10.1017/s0958344010000285
Article Google Scholar
Cucchiarini, C., Van Doremalen, J., & Strik, H. (2008). DISCO: Development and Integration of Speech technology into Courseware for language learning. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 2791–2794). https://doi.org/10.21437/interspeech.2008-472
Cummins, C., Pellicano, E., & Crane, L. (2020). Autistic adults’ views of their communication skills and needs. International Journal Of Language & Communication Disorders / Royal College Of Speech & Language Therapists, 55(5), 678–689. https://doi.org/10.1111/1460-6984.12552
Article Google Scholar
de Vries, B. P., Cucchiarini, C., Bodnar, S., Strik, H., & van Hout, R. (2014). Spoken grammar practice and feedback in an ASR-based CALL system. Computer Assisted Language Learning, 28(6), 550–576. https://doi.org/10.1080/09588221.2014.889713
Article Google Scholar
Delia Calin, A. (2016). Variation of pose and gesture recognition accuracy using two kinect versions. Proceedings of the 2016 International Symposium on Innovations in Intelligent SysTems and Applications, INISTA 2016. https://doi.org/10.1109/INISTA.2016.7571858
Estes, C., & Bloom, R. L. (2010). Using voice recognition software to treat dysgraphia in a patient with conduction aphasia. Aphasiology, 25(3), 366–385. https://doi.org/10.1080/02687038.2010.493294
Article Google Scholar
Evers, K., & Chen, S. (2020). Effects of an automatic speech recognition system with peer feedback on pronunciation instruction for adults. Computer Assisted Language Learning, 1-21. https://doi.org/10.1080/09588221.2020.1839504
Fedra, E., & Schmidt, M. F. H. (2018). Preschoolers understand the moral dimension of factual claims. Frontiers in Psychology, 9, 1841. https://doi.org/10.3389/fpsyg.2018.01841
Article Google Scholar
Gardenfors, P. (2017). Demonstration and pantomime in the evolution of teaching. Frontiers in Psychology, 8, 415. https://doi.org/10.3389/fpsyg.2017.00415
Article Google Scholar
Gärdenfors, P. (2021). Demonstration and pantomime in the evolution of teaching and communication. Language & Communication, 80, 71–79. https://doi.org/10.1016/j.langcom.2021.06.001
Article Google Scholar
Greenberg, C. S., Mason, L. P., Sadjadi, S. O., & Reynolds, D. A. (2020). Two decades of speaker recognition evaluation at the national institute of standards and technology. Computer Speech & Language, 60, 101032. https://doi.org/10.1016/j.csl.2019.101032
Article Google Scholar
Haug, K. N., & Klein, P. D. (2017). The effect of speech-to-text technology on learning a writing strategy. Reading & Writing Quarterly, 34(1), 47–62. https://doi.org/10.1080/10573569.2017.1326014
Article Google Scholar
Hwang, W. Y., Shadiev, R., Kuo, T. C. T., & Chen, N. S. (2012). Effects of speech-to-text recognition application on learning performance in synchronous cyber classrooms. Educational Technology and Society, 15(1), 367–380.
Hwang, W. Y., Chen, H. S. L., Shadiev, R., Huang, R. Y. M., & Chen, C. Y. (2014). Improving English as a foreign language writing in elementary schools using mobile devices in familiar situational contexts. Computer Assisted Language Learning, 27(5), 359–378. https://doi.org/10.1080/09588221.2012.733711
Hwang, W. Y., Shadiev, R., Hsu, J. L., Huang, Y. M., Hsu, G. L., & Lin, Y. C. (2016). Effects of storytelling to facilitate EFL speaking using Web-based multimedia system. Computer Assisted Language Learning, 29(2), 215–241. https://doi.org/10.1080/09588221.2014.927367
Hwang, W. Y., Manabe, K., Cai, D. J., & Ma, Z. H. (2020). Collaborative kinesthetic english learning with recognition technology. Journal of Educational Computing Research, 58(5), 946–977. https://doi.org/10.1177/0735633119893117
Kirschner, P. A., Sweller, J., Kirschner, F., & Zambrano, R. (2018). From cognitive load theory to collaborative cognitive load theory. International Journal of Computer-Supported Collaborative Learning, 13(2), 213–233.
Article Google Scholar
Kuo, T. C. T., Shadiev, R., Hwang, W. Y., & Chen, N. S. (2012). Effects of applying STR for group learning activities on learning performance in a synchronous cyber classroom. Computers & Education, 58(1), 600–608. https://doi.org/10.1016/j.compedu.2011.07.018
Article Google Scholar
Leopold, C., Mayer, R. E., & Dutke, S. (2019). The power of imagination and perspective in learning from science text. Journal of Educational Psychology, 111(5), 793–808. https://doi.org/10.1037/edu0000310
Article Google Scholar
Lin, H. C. K., Wang, C. H., Chao, C. J., & Chien, M. K. (2012). Employing textual and facial emotion recognition to design an affective tutoring system. Turkish Online Journal of Educational Technology-TOJET, 11(4), 418–426.
Google Scholar
Liu, H. C. (2020). Using eye-tracking technology to explore the impact of instructional multimedia on CFL Learners’ Chinese Character Recognition. The Asia-Pacific Education Researcher, 30(1), 33–46. https://doi.org/10.1007/s40299-020-00512-2
Article Google Scholar
Liu, Y., Jang, B. G., & Roy-Campbell, Z. (2018). Optimum input mode in the modality and redundancy principles for university ESL students’ multimedia learning. Computers & Education, 127, 190–200.
Article Google Scholar
Maine, F., & Shields, R. (2015). Developing reading comprehension with moving image narratives. Cambridge Journal of Education, 45(4), 519–535.
Article Google Scholar
Matthews, J., O’Toole, J. M., & Chen, S. (2016). The impact of word recognition from speech (WRS) proficiency level on interaction, task success and word learning: design implications for CALL to develop L2 WRS. Computer Assisted Language Learning, 30(1–2), 22–43. https://doi.org/10.1080/09588221.2015.1129348
Article Google Scholar
Mayer, R., & Mayer, R. E. (2005). The Cambridge handbook of multimedia learning. Cambridge University Press.
Mees, I. M., Dragsted, B., Hansen, I. G., & Jakobsen, A. L. (2013). Sound effects in translation. Target International Journal of Translation Studies, 25(1), 140–154.
Article Google Scholar
Mirzaei, M. S., Akita, Y., & Kawahara, T. (2014). Partial and synchronized caption generation to develop second language listening skill. Workshop Proceedings of the 22nd International Conference on Computers in Education (pp. 13–23). ICCE 2014.
Neumann, M. M., Acosta, C., & Neumann, D. L. (2014). Young children’s visual attention to environmental print as measured by eye tracker analysis. Reading Research Quarterly, 49(2), 157–167. https://doi.org/10.1002/rrq.66
Article Google Scholar
Neviarouskaya, A., Prendinger, H., & Ishizuka, M. (2010). Affect analysis model: novel rule-based approach to affect sensing from text. Natural Language Engineering, 17(1), 95–135. https://doi.org/10.1017/s1351324910000239
Article Google Scholar
Nguyen, T. H., Hwang, W. Y., Pham, X. L., & Ma, Z. H. (2018). User-oriented EFL speaking through application and exercise: Instant speech translation and shadowing in authentic context. Educational Technology and Society, 21(4), 129–142.
Nguyen, T. H., Hwang, W. Y., Pham, X. L., & Pham, T. (2020). Self-experienced storytelling in an authentic context to facilitate EFL writing. Computer Assisted Language Learning, 0(0), 1–30. https://doi.org/10.1080/09588221.2020.1744665
Piaget, J. (1976). Piaget's theory. Piaget and his school (pp. 11-23). Springer.
Ranchal, R., Taber-Doughty, T., Guo, Y., Bain, K., Martin, H., Robinson, J. P., & Duerstock, B. S. (2013). Using speech recognition for real-time captioning and lecture transcription in the classroom. IEEE Transactions on Learning Technologies, 6(4), 299–311.
Article Google Scholar
Rogerson-Revell, P. M. (2021). Computer-assisted pronunciation training (CAPT): Current issues and future directions. RELC Journal, 52(1), 189–205.
Article Google Scholar
Schmitterer, A. M. A., & Schroeder, S. (2018). The recognition of letters in emergent literacy in German: evidence from a longitudinal study. Journal of Research in Reading, 41(3), 423–437. https://doi.org/10.1111/1467-9817.12116
Article Google Scholar
Shadiev, R., & Sun, A. (2019). Using texts generated by STR and CAT to facilitate student comprehension of lecture content in a foreign language. Journal of Computing in Higher Education, 32(3), 561–581. https://doi.org/10.1007/s12528-019-09246-7
Article Google Scholar
Shadiev, R., Sun, A., & Huang, Y. M. (2018). A study of the facilitation of cross-cultural understanding and intercultural sensitivity using speech‐enabled language translation technology. British Journal of Educational Technology, 50(3), 1415–1433. https://doi.org/10.1111/bjet.12648
Article Google Scholar
Shadiev, R., Wu, T. T., & Huang, Y. M. (2020). Using image-to-text recognition technology to facilitate vocabulary acquisition in authentic contexts. ReCALL, 32(2), 195–212. https://doi.org/10.1017/s0958344020000038
Article Google Scholar
Shadiev, R., Wu, T. T., Sun, A., & Huang, Y. M. (2017). Applications of speech-to-text recognition and computer-aided translation for facilitating cross-cultural learning through a learning activity: issues and their solutions. Educational Technology Research and Development, 66(1), 191–214. https://doi.org/10.1007/s11423-017-9556-8
Article Google Scholar
Strauber, C. B., Sorcar, P., Howlett, C., & Goldman, S. (2020). Using a picture-embedded method to support acquisition of sight words. Learning and Instruction, 65, 101248. https://doi.org/10.1016/j.learninstruc.2019.101248
Article Google Scholar
Sun, J. C., Chang, K. Y., & Chen, Y. H. (2015). GPS sensor-based mobile learning for English: an exploratory study on self-efficacy, self-regulation and student achievement. Research and Practice in Technology Enhanced Learning, 10(1), 23. https://doi.org/10.1186/s41039-015-0024-y
Article Google Scholar
Tajtáková, M., & Arias-Aranda, D. (2008). Targeting university students in audience development strategies for opera and ballet. The Service Industries Journal, 28(2), 179–191. https://doi.org/10.1080/02642060701842191
Article Google Scholar
Wang, F., Hwang, W. Y., Li, Y. H., Chen, P. T., & Manabe, K. (2019). Collaborative kinesthetic EFL learning with collaborative total physical response. Computer Assisted Language Learning, 32(7), 745–783. https://doi.org/10.1080/09588221.2018.1540432
Wang, L., Huynh, D. Q., & Koniusz, P. (2020). A comparative review of recent kinect-based action recognition algorithms. IEEE Transactions on Image Processing, 29, 15–28. https://doi.org/10.1109/TIP.2019.2925285
Xu, C., & Xia, J. (2019). Scaffolding process knowledge in L2 writing development: insights from computer keystroke log and process graph. Computer Assisted Language Learning, 34(4), 583–608. https://doi.org/10.1080/09588221.2019.1632901
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate Institute of Network Learning Technology, National Central University, Taoyuan City, Taiwan
Wu-Yuin Hwang & Van-Giap Nguyen
Master of Educational Technology, Universitas Pelita Harapan, Tangerang, Indonesia
Siska Wati Dewi Purba

Authors

Wu-Yuin Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Van-Giap Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Siska Wati Dewi Purba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siska Wati Dewi Purba.

Ethics declarations

Competing interests

The authors report there are no competing interests to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Table 9

Table 9 Recognition technologies and their environments

Full size table

Appendix 2

Table 10

Table 10 Recognition accuracy survey

Full size table

Appendix 3

Table 11

Table 11 Level of matching learner needs

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, WY., Nguyen, VG. & Purba, .W.D. Systematic survey of anything-to-text recognition and constructing its framework in language learning. Educ Inf Technol 27, 12273–12299 (2022). https://doi.org/10.1007/s10639-022-11112-6

Download citation

Received: 16 February 2022
Accepted: 13 May 2022
Published: 27 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10639-022-11112-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Systematic survey of anything-to-text recognition and constructing its framework in language learning

Abstract

Access this article

Similar content being viewed by others

Speech-to-text recognition in University English as a Foreign Language Learning

Investigation of Multiple Recognitions Used for EFL Writing in Authentic Contexts

Investigating the effectiveness of speech-to-text recognition applications on learning performance, attention, and meditation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Systematic survey of anything-to-text recognition and constructing its framework in language learning

Abstract

Access this article

Similar content being viewed by others

Speech-to-text recognition in University English as a Foreign Language Learning

Investigation of Multiple Recognitions Used for EFL Writing in Authentic Contexts

Investigating the effectiveness of speech-to-text recognition applications on learning performance, attention, and meditation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendices

Appendix 1

Appendix 2

Appendix 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation