Web System Prototype based on speech recognition to construct medical reports in Brazilian Portuguese

https://doi.org/10.1016/j.ijmedinf.2018.10.010Get rights and content

Highlights

  • Integration of two ASR technologies into a WSP to generate medical reports.

  • Evaluation of Google's and Microsoft's ASR in a Brazilian Portuguese.

  • Google's ASR showed better accuracy compared to Microsoft's ASR.

Abstract

The overall purpose of automatic speech recognition systems is to make possible the interaction between humans and electronic devices through speech. For example, the content captured from user's speech using a microphone can be transcribed into text. In general, such systems should be able to overcome adversities such as noise, communication channel variability, speaker's age and accent, speech speed, concurrent speeches from other speakers and spontaneous speech. Despite this challenging scenario, this study aims to develop a Web System Prototype to generate medical reports through automatic speech recognition in the Brazilian Portuguese language. The prototype was developed by applying a Software Engineering technique named Delivery in Stage. During the conduction of this technique, we integrated the Google Web Speech API and Microsoft Bing Speech API into the prototype to increase the number of compatible platforms. These automatic speech recognition systems were individually evaluated in the task of transcribing the dictation of a medical area text by 30 volunteers. The recognition performance was evaluated according to the Word Error Rate measure. The Google system achieved an error rate of 12.30%, which was statistically significantly better (p-value <0.0001) than the Microsoft one: 17.68%. Conducting this work allowed us to conclude that these automatic speech recognition systems are compatible with the prototype and can be used in the medical field. The findings also suggest that, besides supporting medical reports construction, the Web System Prototype can be useful for purposes such as recording physicians’ notes during a clinical procedure.

Introduction

Due to important advances in the technological field, it has become possible to construct increasingly complex systems owing to the great increase in processing power and computational storage. This evolution contributes, for example, to the development of Automatic Speech Recognition (ASR) systems [1], which aim to recognize spoken words by converting them into a written format [2].

These systems have been applied in the medical field for different purposes, for example:

  • Reading skills improvement in children with Down Syndrome [3];

  • Parkinson's disease prediction [4];

  • Interaction with post-traumatic stress disorders patients by speech [5];

  • Communication support for people with speech dysfunction [6];

  • Referring a patient to a medical expert, according to complaints verbally reported by the patient [7];

  • Speech intelligibility evaluation for patients with oral diseases [8].

In hospitals, ASRs have been used in different ways. In particular, ASRs have been used in medical offices during consultations to improve data collection [9]. In addition, ASRs have been employed as a translation system to serve immigrants [10] and as a tool to prepare radiology reports faster than conventional approaches [11].

Despite the advances in the accuracy of these systems, there are still limitations, such as noise and the absence of some words in the recognition vocabulary used for ASR system training. Besides these limitations, these systems should also work properly under varied conditions [2], [12] and deal with variations regarding the speaker's voice, pronunciation and environment [13].

In this scenario, the Laboratory of Bioinformatics (LABI) at the Western Paraná State University (UNIOESTE), campus of Foz do Iguaçu/Paraná, in partnership with the Department of Coloproctology at the State University of Campinas (UNICAMP), has investigated the validation of the use of ASR system in the medical field. For this, a Web System Prototype (WSP) was developed to generate medical reports by means of ASRs. In this work, a medical report consists of a piece of text written by an expert with exam results [14].

As part of the WSP building, this study evaluated the performance of two ASRs in the medical field: Google Web Speech API [15] and Microsoft Bing Speech API [16]. Afterwards both ASRs were integrated into the WSP to generate medical reports from transcribed speech in different computational platforms. It is noteworthy that the evaluation of ASRs integrated into the WSP would lead to a similar accuracy, as the inputs and the speech recognition process would be the same.

However, it is important that an expert conducts a review of the transcription results obtained by the ASRs integrated into the WSP, because possible transcription errors can change the meaning of the sentence. Especially in the medical context, errors regarding sentence meaning can result in serious consequences for the patient.

This work differs from [11] and [9] in different points, such as the following ones: (1) no commercial license is required to use the selected ASR in the WSP; (2) our prototype is flexible, making it possible to integrate other ASRs; (3) the prototype manages the medical reports history, allowing the user to access all changes made in the reports.

This work is organized as follows. Section 2 presents related work. Section 3 describes materials and methods. Section 4 reports and discusses experimental results. Finally, Section 5 concludes this work.

Section snippets

Related work

This section highlights the related works that used ASR to generate medical reports in hospitals.

A commercial ASR (Precision Reporting version 10.7; GE Healthcare) was used in [11] to support the elaboration of radiology reports in English. The system was implemented in a community hospital with 150 beds, from May to July 2011. According to the authors, the implementation of ASR during this period resulted in a reduction of the report elaboration time from 24 hours to about one hour.

To improve

Materials and methods

In this section, the procedures performed to collect and process audio files are presented. Also, the method, technologies and tools considered to develop WSP are described, as well as, the evaluation of ASRs is reported.

Results and discussion

The results of the evaluation of the ASRs and the implementation of the WSP are described and discussed below.

Conclusion

This study compared the performance of two ASRs in the medical field. After conducting an experimental evaluation, it was found that Google's ASR performs better than Microsoft's ASR.

We conclude that the use of the evaluated ASR for preliminary tests in the medical field is feasible, verified from the construction of the WSP to generate medical reports. This recommendation is based on the satisfactory speech recognition performance achieved by the systems integrated with the WSP. In addition,

Conflict of interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Acknowledgements

We would like to acknowledge the EurekaSD project – Enhancing University Research and Education in Areas Useful for Sustainable Development, the Araucária Foundation - Brazil - for the Support of the Scientific and Technological Development of Paraná through a Research and Technological Productivity Scholarship for H.D. Lee (grant 534/2014), and the Coordination for the Improvement of Higher Education Personnel - Brazil (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES) -

References (24)

  • L.M. Prevedello et al.

    Implementation of speech recognition in a community-based radiology practice: effect on report turnaround times

    Journal of the American College of Radiology

    (2014)
  • V.G. Felix et al.

    A pilot study of the use of emerging computer technologies to improve the effectiveness of reading and writing therapies in children with Down syndrome

    British Journal of Educational Technology

    (2017)
  • J.C. Vasquez-Correa et al.

    Word accuracy and dynamic time warping to assess intelligibility deficits in patients with parkinsons disease

  • A. Papangelis et al.

    An adaptive dialogue system for assessing post traumatic stress disorder

  • V. Balaji et al.

    Speech disabilities in adults and the suitable speech recognition software tools – a review

  • A. Leuski et al.

    Mobile personal healthcare mediated by virtual humans

  • M. Riemann et al.

    Oral squamous cell carcinoma of the tongue: prospective and objective speech evaluation of patients undergoing surgical therapy

    Journal of the Sciences and Specialities of the Head and Neck

    (2016)
  • B. Gür

    Improving Speech Recognition Accuracy for Clinical Conversations, Master's Dissertation

    (2012)
  • R.W. Soller et al.

    Performance of a new speech translation device in translating verbal recommendations of medication action plans for patients with diabetes

    Journal of Diabetes Science and Technology

    (2012)
  • View full text