Testing Tactics to Localize De-Identification

Grouin, Cyril; Rosier, Arnaud; Dameron, Olivier; Zweigenbaum, Pierre

doi:10.3233/978-1-60750-044-5-735

Abstract

Recent renewed interest in de-identification (also known as “anonymisation”) has led to the development of a series of systems in the United States with very good performance on challenge test sets. De-identification needs however to be tuned to the local documents and their specificities. We address here two issues raised in this context. First, tuning is generally performed by language engineers who should not have to work on identified text. We therefore perform a first gross de-identification step in the hospital. Second, to set up a de-identification system for new documents in a language different from English, here French patient reports, we tested two methods: the first attempts to adapt an existing US de-identifier for English, the second re-develops a new system which applies the same methods. The first method involved localizing patterns designed for English, which proved cumbersome and did not quickly obtain good performance. With a similar effort, the latter method obtained much better results. Evaluated on a set of 23 randomly selected texts from a corpus of 21,749 clinical texts, it obtained 83% recall and 92% precision.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies