Web System Prototype based on speech recognition to construct medical reports in Brazilian Portuguese
Introduction
Due to important advances in the technological field, it has become possible to construct increasingly complex systems owing to the great increase in processing power and computational storage. This evolution contributes, for example, to the development of Automatic Speech Recognition (ASR) systems [1], which aim to recognize spoken words by converting them into a written format [2].
These systems have been applied in the medical field for different purposes, for example:
- •
Reading skills improvement in children with Down Syndrome [3];
- •
Parkinson's disease prediction [4];
- •
Interaction with post-traumatic stress disorders patients by speech [5];
- •
Communication support for people with speech dysfunction [6];
- •
Referring a patient to a medical expert, according to complaints verbally reported by the patient [7];
- •
Speech intelligibility evaluation for patients with oral diseases [8].
In hospitals, ASRs have been used in different ways. In particular, ASRs have been used in medical offices during consultations to improve data collection [9]. In addition, ASRs have been employed as a translation system to serve immigrants [10] and as a tool to prepare radiology reports faster than conventional approaches [11].
Despite the advances in the accuracy of these systems, there are still limitations, such as noise and the absence of some words in the recognition vocabulary used for ASR system training. Besides these limitations, these systems should also work properly under varied conditions [2], [12] and deal with variations regarding the speaker's voice, pronunciation and environment [13].
In this scenario, the Laboratory of Bioinformatics (LABI) at the Western Paraná State University (UNIOESTE), campus of Foz do Iguaçu/Paraná, in partnership with the Department of Coloproctology at the State University of Campinas (UNICAMP), has investigated the validation of the use of ASR system in the medical field. For this, a Web System Prototype (WSP) was developed to generate medical reports by means of ASRs. In this work, a medical report consists of a piece of text written by an expert with exam results [14].
As part of the WSP building, this study evaluated the performance of two ASRs in the medical field: Google Web Speech API [15] and Microsoft Bing Speech API [16]. Afterwards both ASRs were integrated into the WSP to generate medical reports from transcribed speech in different computational platforms. It is noteworthy that the evaluation of ASRs integrated into the WSP would lead to a similar accuracy, as the inputs and the speech recognition process would be the same.
However, it is important that an expert conducts a review of the transcription results obtained by the ASRs integrated into the WSP, because possible transcription errors can change the meaning of the sentence. Especially in the medical context, errors regarding sentence meaning can result in serious consequences for the patient.
This work differs from [11] and [9] in different points, such as the following ones: (1) no commercial license is required to use the selected ASR in the WSP; (2) our prototype is flexible, making it possible to integrate other ASRs; (3) the prototype manages the medical reports history, allowing the user to access all changes made in the reports.
This work is organized as follows. Section 2 presents related work. Section 3 describes materials and methods. Section 4 reports and discusses experimental results. Finally, Section 5 concludes this work.
Section snippets
Related work
This section highlights the related works that used ASR to generate medical reports in hospitals.
A commercial ASR (Precision Reporting version 10.7; GE Healthcare) was used in [11] to support the elaboration of radiology reports in English. The system was implemented in a community hospital with 150 beds, from May to July 2011. According to the authors, the implementation of ASR during this period resulted in a reduction of the report elaboration time from 24 hours to about one hour.
To improve
Materials and methods
In this section, the procedures performed to collect and process audio files are presented. Also, the method, technologies and tools considered to develop WSP are described, as well as, the evaluation of ASRs is reported.
Results and discussion
The results of the evaluation of the ASRs and the implementation of the WSP are described and discussed below.
Conclusion
This study compared the performance of two ASRs in the medical field. After conducting an experimental evaluation, it was found that Google's ASR performs better than Microsoft's ASR.
We conclude that the use of the evaluated ASR for preliminary tests in the medical field is feasible, verified from the construction of the WSP to generate medical reports. This recommendation is based on the satisfactory speech recognition performance achieved by the systems integrated with the WSP. In addition,
Conflict of interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Acknowledgements
We would like to acknowledge the EurekaSD project – Enhancing University Research and Education in Areas Useful for Sustainable Development, the Araucária Foundation - Brazil - for the Support of the Scientific and Technological Development of Paraná through a Research and Technological Productivity Scholarship for H.D. Lee (grant 534/2014), and the Coordination for the Improvement of Higher Education Personnel - Brazil (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES) -
References (24)
- et al.
Implementation of speech recognition in a community-based radiology practice: effect on report turnaround times
Journal of the American College of Radiology
(2014) - et al.
A pilot study of the use of emerging computer technologies to improve the effectiveness of reading and writing therapies in children with Down syndrome
British Journal of Educational Technology
(2017) - et al.
Word accuracy and dynamic time warping to assess intelligibility deficits in patients with parkinsons disease
- et al.
An adaptive dialogue system for assessing post traumatic stress disorder
- et al.
Speech disabilities in adults and the suitable speech recognition software tools – a review
- et al.
Mobile personal healthcare mediated by virtual humans
- et al.
Oral squamous cell carcinoma of the tongue: prospective and objective speech evaluation of patients undergoing surgical therapy
Journal of the Sciences and Specialities of the Head and Neck
(2016) Improving Speech Recognition Accuracy for Clinical Conversations, Master's Dissertation
(2012)
Performance of a new speech translation device in translating verbal recommendations of medication action plans for patients with diabetes
Journal of Diabetes Science and Technology
Cited by (9)
Machine and cognitive intelligence for human health: systematic review
2022, Brain InformaticsIntelligent Correction System of Students' English Pronunciation Errors Based on Speech Recognition Technology
2022, Journal of Information and Knowledge ManagementApplication of Multimodal NLP Instruction Combined with Speech Recognition in Oral English Practice
2022, Mobile Information SystemsBackground speech synchronous recognition method of e-commerce platform based on Hidden Markov model
2022, International Journal of Circuits, Systems and Signal ProcessingA video indexing and retrieval computational prototype based on transcribed speech
2021, Multimedia Tools and Applications