Intelligent analysis of clinical time series: an application in the diabetes mellitus domain

https://doi.org/10.1016/S0933-3657(00)00052-XGet rights and content

Abstract

This paper describes the application of a method for the intelligent analysis of clinical time series in the diabetes mellitus domain. Such a method is based on temporal abstractions and relies on the following steps: (i) ‘pre-processing’ of raw data through the application of suitable filtering techniques; (ii) ‘extraction’ from the pre-processed data of a set of abstract episodes (temporal abstractions); and (iii) ‘post-processing’ of temporal abstractions; the post-processing phase results in a new set of features that embeds high level information on the patient dynamics. The derived features set is used to obtain new knowledge through the application of machine learning algorithms. The paper describes in detail the application of this methodology and presents some results obtained on simulated data and on a data-set of four diabetic patients monitored for >1 year.

Introduction

Intelligent data analysis (IDA) is a new research field, mainly related to developing and applying methods that automatically transform data into information through the exploitation of the background knowledge available on the domain [3], [22]. This approach seems particularly useful in medicine, where the association of a precise meaning to the data is often related to the recognition of the ‘context’ in which the data have been collected. The use of knowledge is also important to properly analyze data in situations in which uncertainty plays a major role and/or when the number of available data is low, due to ethical or cost reasons; the exploitation of knowledge allows the convenient extraction of useful information from each single datum. IDA, therefore, can be viewed as a fundamental step in the knowledge management process within hospital information systems — it provides methods for exploiting existing explicit knowledge to transform data into information. As a natural consequence, IDA collates methodological contributions that come from several disciplines — from AI to Bayesian statistics and from cognitive science to mathematical modeling.

In this paper, we will describe the application of IDA techniques to the problem of analyzing and interpreting time series (TS) coming from the long-term monitoring of chronic patients. The analysis of multi-variate TS is a ubiquitous problem in science and represents a crucial challenge in biomedicine applications, such as clinical monitoring, where several parameters must be contemporaneously examined to understand the patient’s overall situation. This rather complex task has traditionally been faced with descriptive and inferential statistical techniques [13]; within the IDA context, an AI-based methodology, known as temporal abstractions (TAs), has been proposed and successfully exploited in several application domains [4], [16], [21], [28]. The principle of TA methods is to move from a time-point to an interval-based representation of the monitoring data: the time-stamped raw data are aggregated into intervals on the basis of a certain number of conditions, correspondent to the definition of a particular ‘abstract’ episode.

Therefore, if we look for different episodes, we obtain an ‘abstract description’ of (multi-variate) time-stamped data, that contains the patterns considered useful for a correct interpretation of the dynamics of the system under observation. A detailed presentation of this methodology can be found in Ref. [28].

Once a collection of TAs have been obtained, they provide a powerful high level description of the patient’s behavior. TAs can be used to ‘mine’ data collected over time, performing analysis at different levels of abstraction, aggregation and granularity.

In many application domains, a robust application of TAs may be hampered by the presence of noise on the data and in these situations, the derived abstractions can be highly dependent on the values assumed by the parameters that define the episodes.

Several authors have, therefore, found it useful to ‘pre-process’ the data in order to obtain more robust abstract episode calculations. For example, the idea of applying noise reduction techniques to the original TS in combination with TAs has been applied in the monitoring of intensive care unit patients [16]. Of course, the pre-processing techniques can be correspondent to classical data validation and outliers detection processes, but may also include more complex goals, such as the extraction of trend or periodic patterns from raw data.

Statistics may be also used to ‘post-process’ TAs, in order to obtain high level summaries of the patient dynamics over a certain monitoring period. For example, it would be easy to know how many ‘increasing’ episodes a certain variable had during a patient’s follow-up, their average duration and the percentage of duration in the overall period. Under this perspective, the post-processing of TAs can be viewed as the computation of abstracted descriptive statistics: the number, duration and type of TA episodes can be considered as a summary of the time series at an abstract level. An interesting research direction, therefore, is to investigate whether these summaries could also be used to automatically learn some characteristics of the dynamic behavior of the patient under study. The description of the patient characteristics, at an abstract level, should allow the extraction of regularities and understand similarities that could be difficult or nearly impossible to derive from raw data.

On the basis of the above mentioned considerations, we may summarize the basic steps of the TA-based analysis in chronic patients monitoring through the general scheme shown in Fig. 1.

Summarizing such a scheme, from raw data, we obtain a pre-processed data set through the application of suitable filtering techniques; by applying TA mechanisms, we obtain a new ‘abstract level’ of episodes, that, thanks to proper presentation and visualization techniques, can conveniently help the user in ‘interpreting’ the patient’s behavior. The TA-based post-processing allows new features to be obtained that embed high level information on the patients dynamics; such features can be used to derive ‘new’ knowledge through the application of machine learning algorithms.

The aim of this paper is to present the application of the above presented scheme to the problem of analyzing data coming from home monitoring of type I diabetic patients. In this paper, we will describe the methods used in each step and we will present some results. In particular, we will report an assessment study done on a simulated patient and we will show the results obtained on four real patients, who have been monitored for >1 year at the Policlinico S. Matteo Hospital of Pavia. Such patients have been enrolled within the telemedicine project — Telematic Management of Insulin Dependent Diabetes Mellitus’ (T-IDDM ), funded by the European Commission. T-IDDM has been devoted to providing patients and physicians with an information technology infrastructure for better diabetes management. In this project, the physician relies on a set of distributed web services, provided by a medical workstation. The approach described in this paper is part of the data analysis and visualization tools that are linked with the data-management and decision support modules of the whole system. For further details see Ref. [5].

Section snippets

The application domain: a short summary

Diabetes mellitus is one of the major chronic diseases in industrialized countries. Its relevance (≈5%) in the European population and its related costs, force the health care institution towards the improvement of the treatment quality; rather interestingly, information technology has been recognized as one of the potential means for obtaining such improvement [23]. In particular, insulin-dependent (IDDM) patients (≈10% of the total diabetic population) are required to undergo intensive

Intelligent data analysis in diabetes

As stated in the introduction, the goal of this paper is to show how the overall process of TA-based data analysis works when applied to a real problem. As shown in Ref. [4], several TAs can be derived for the analysis of IDDM patients data; however, in this paper we will describe in detail only the TAs useful to obtain the high level summaries that we exploited in the knowledge extraction step (Fig. 1).

We will start describing the pre-processing step, performed in our setting by relying on a

Assessment

To assess the TA-based analysis process described in Fig. 1 we performed three different evaluations:

  • 1.

    Evaluation of the utility of pre-processing to extract trends and cycles on a (real) test case. With this first evaluation, we aimed at comparing the performance of the combined approach, here presented with single-method approaches (pre-processing or TAs only).

  • 2.

    Evaluation of the post-processing techniques on simulated data. Since the post-processing results cannot be compared with gold standard

Comparison with related approaches

As mentioned in Section 1, a high number of approaches have been presented for the analysis of data coming from diabetic patients’ home monitoring. A number of such approaches have been devoted to the prediction of BGL time series [2], [23], [24], while a few were oriented to an overall interpretation of the patient’s behavior [12], [20], [29], including some commercial products, like Camit-Pro™ or Eurotouch™. While the difference in our approach with respect to the former class of such system

Conclusions — strengths and limitations

In Section 1 of this paper, we have presented a general multi-step methodology for exploiting TAs in the context of chronic patients’ monitoring. Such an approach has been applied to data coming from the home monitoring of diabetic patients.

The overall approach presents several novelty aspects:

  • TAs are seen as a core tool for performing IDA on TS data. They are used not only for extracting information from the data, but also to generate high level summaries of the patients dynamic behavior. Such

Acknowledgements

We gratefully acknowledge Dr Giuseppe d’Annunzio and Dr Stefano Fiocchi for providing their support and medical knowledge. We sincerely thank Alberto Riva, without whom this work could not have been done. Finally, we thank the anonymous reviewers for their help in improving the paper. This work is part of the project T-IDDM (HC 1047), funded by the European Commission.

References (31)

  • Bellazzi R, Larizza C, Riva A. Temporal abstractions for interpreting chronic patients monitoring data, Int J Intell...
  • Bellazzi R, Magni P, De Nicolao G. Dynamic probabilistic networks for modelling and identifying dynamic systems: a MCMC...
  • Bellazzi R, Magni P, Larizza C, De Nicolao G, Riva A, Stefanelli M. Mining biomedical time series by combining...
  • R Bellazzi et al.

    Intelligent analysis of clinical time series by combining structural filtering and temporal abstractions

  • Bellazzi R, Magni P, De Nicolao G. Bayesian analysis of blood glucose time series from diabetes home monitoring,...
  • Cited by (62)

    • Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations

      2020, Patterns
      Citation Excerpt :

      Although forms of such representations have been utilized as features in classification/prediction tasks,16–19 application in ML is not the focus in the TA agenda. Furthermore, the development of TA methods has been largely confined to continuous clinical measurements data.12,20,21 In addition to continuous data, however, EHRs contain discrete data, such as records of diagnoses, medications, and procedures.

    • Profiling intra-patient type I diabetes behaviors

      2016, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      The experiments reported in this article have demonstrated the feasibility of clustering different behavior profiles, identifying patterns on different days for the same patient. Unlike other works [22,23], where a set of events and possible behavior are predetermined and time series are classified according to patterns, in this study no behavior is predetermined. The system groups days in clusters according to a general similarity criterion.

    • JTSA: An open source framework for time series abstractions

      2015, Computer Methods and Programs in Biomedicine
    • Thirty years of artificial intelligence in medicine (AIME) conferences: A review of research themes

      2015, Artificial Intelligence in Medicine
      Citation Excerpt :

      Yet while all these papers connect to this theme, they span a broad range of topics, such as representation of clinical processes and tasks [167], time-oriented clinical guidelines [166], workflow systems dealing with guideline management [168], NLP techniques for modeling clinical guidelines [172] and versioning methods for computer-interpretable guidelines [176]. Two other themes that appears throughout the table are the management of temporal information (six papers, [23,74,166,171,174,175]), and data mining and machine learning (seven papers, [113,126,161,170,171,175,178]. Third, most other research themes have a modest representation in the table, such as uncertainty reasoning [107,113,170]; image and signal processing [124,170]; case-based reasoning [92,93]; planning and scheduling [100].

    View all citing articles on Scopus
    View full text