Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter March 5, 2015

Data Stream Quality Evaluation for the Generation of Alarms in the Health Domain

  • Saúl Fagúndez EMAIL logo , Joaquín Fleitas and Adriana Marotta

Abstract

The use of sensors has had an enormous increment in the last years, becoming a valuable tool in many different areas. In this kind of scenario, the quality of data becomes an extremely important issue; however, not much attention has been paid to this specific topic, with only a few existing works that focus on it. In this paper, we present a proposal for managing data streams from sensors that are installed in patients’ homes in order to monitor their health. It focuses on processing the sensors’ data streams, taking into account data quality. In order to achieve this, a data quality model for this kind of data streams and an architecture for the monitoring system are proposed. Moreover, our work introduces a mechanism for avoiding false alarms generated by data quality problems.

Mathematics Subject Classification:: 00-02; 97R30; 97R50

1 Introduction

The use of sensors has had an enormous increment in the last years, becoming a valuable tool in many different areas, such as weather forecasting, driving assistance, water level and quality monitoring, smart homes, and health monitoring. Sensors produce data streams, which in general have a simple structure but are generated with a very high rate. In this kind of scenario, the quality of data becomes an extremely important issue, especially in cases where critical decisions must be made on the basis of the obtained data. However, not much attention has been paid to this specific topic, with only a few existing works that focus on it.

Health monitoring through the use of sensors is sometimes used for the care of elderly people. Different kinds of sensors are installed in their homes and on the patients themselves in order to monitor their behavior and their vital signs (blood pressure and temperature) [7–10]. Their behavior is important, for example, in the case of patients with Alzheimer’s disease. Data provided by the sensors are directly transmitted to the hospital, so that the patient can be monitored, avoiding his movement from one place to another. The data received at the hospital are continuously evaluated through a monitoring system, which generates alarms when suspicious data are detected.

This work is situated in the context described in the previous paragraph, and focuses on processing the sensors’ data streams, taking into account data quality (DQ). In order to achieve this, a DQ model for the health sensors’ data streams, an architecture for the monitoring system, and a mechanism for health alarm and DQ alarm generation are proposed.

The main contribution of this work is the proposal of a DQ model specific to data streams coming from home and on-patient sensors.

DQ is represented by quality dimensions, each one representing a different quality aspect [3, 19, 21, 22]. Our approach is based on a DQ meta-model that consists of a quality dimension that captures one aspect of DQ, a quality factor that represents a particular aspect of a DQ dimension, and a quality metric that defines the criteria for the measurement of the quality factor. Metrics may be applied to data objects at different granularity levels, e.g., a data item or a set of data items. The DQ dimensions that are used along this work are the following: Accuracy, Completeness, Consistency, and Freshness, which are based on well-known concepts, about which there is general consensus in the DQ literature [3, 19, 21, 22].

According to Ref. [12], a data streaming is a continuous and ordered sequence of elements, where elements are presented in real time. The mechanism for applying dynamic queries to the streams of data is through data windows, which capture certain portions of data from the streams. A window may be logically defined considering the number of elements, or physically defined considering the duration, i.e., the data that arrive to the stream during a certain time period [13, 18]. A Data Stream Management System (DSMS) provides a data model for managing dynamic data streams and continuous queries, which are queries that process data as they arrive [1, 2, 6, 11].

The rest of the paper is organized as follows. Section 2 refers to the related work. Section 3 presents the proposed system architecture. Section 4 focuses on the DQ model and its application. Section 5 presents an approach for alarm generation. Section 6 presents the conclusions and future work.

2 Related Work

As we previously stated, some work has been done in the area of sensor data stream quality. In Ref. [14], the author states that quality restrictions in this kind of data must not be ignored and should be carefully managed so that an exhaustive evaluation can be done. This is especially important in applications that directly consume sensor data and their quality becomes a critical issue. In some other applications, data from sensors are stored in a database in order to be processed later. In these cases, DQ is still essential for decision making supported by data. In Ref. [15], a data streaming meta-model is proposed in order to allow the propagation of DQ information toward the corresponding business application. The authors focus their analysis on accuracy and completeness quality dimensions. Later, in Ref. [16], a more complete quality model is presented (five DQ dimensions are managed), and the impact of data stream processing operators on DQ is analyzed.

In Ref. [5] the authors present a model that is based on an intuitive notion of the sensor’s data completeness. They measure the quantity of data that arrive to a consuming point and compare it with the maximum possible quantity of data at that point.

In Ref. [20], some mechanisms for reducing energy consumption in sensor networks are proposed. These mechanisms assure a certain level of DQ, so that they provide a balance between energy efficiency and quality of aggregate data. They propose a metric, called relative error metric, which measures how close the exact answer and the approximate answer are. An approximate answer is one where some sensors fail to send their current reading or decide not to send it. Concerning quality dimensions, we note that they measure accuracy over aggregate data, and they also measure freshness as the response time.

A probabilistic approach is used in Ref. [17] for evaluating the quality of sensor data – modeling the uncertainty in sensor readings. DQ is managed at the different levels of data processing, from the sensor data values to the high-level situation detection.

Finally, an event-based solution for improving DQ in health systems is proposed in Ref. [4]. Event processing techniques are proposed for monitoring data streams exchanged between health organizations and detecting quality problems. They focus on two quality aspects: data consistency with respect to statistical data, and duplicate detection (of laboratory orders). They use alerts to notify detected problems.

Our work proposes a DQ model that is specific to the health monitoring context. It manages a broad set of quality dimensions, also distinguishing different factors, features that allow a more detailed and complete study of the DQ. In addition, besides defining the DQ model, this work introduces a mechanism for avoiding false health alarms generated by DQ problems.

3 Health Monitoring System

We consider a smart home with three rooms, a bedroom, a kitchen, and a bathroom, and a person with Alzheimer’s disease. Each room is equipped with two ultrasound distance sensors that measure the distance of some object to the sensor. When the person is in the room, the sensors report the distance to that person. We also have two on-body sensors: one for blood pressure monitoring and a thermometer.

At the same time, there is a system that receives and manages data from the sensors in order to detect whether the person has certain variations in his behavior or in his vital signs. It is a real-time and autonomous system that is able to analyze the data streaming coming from different sensors and to send alarms in predefined situations.

3.1 Proposed Architecture

The proposed architecture consists of different components that are shown in Figure 1. The user’s access point to the system is the Monitoring component, where he should first define the requirements, quality parameters, and alarms needed in his particular context. Then, the Middleware is responsible for managing the execution of distributed and dynamic queries.

Figure 1: Architecture.
Figure 1:

Architecture.

The Data Quality Manager is responsible for measuring the quality of data obtained from the queries and enriching the data with quality values. This module interacts with the Middleware, so that the Middleware is able to return the data window enriched with quality values to the Monitoring component. A database containing historical blood pressure data is maintained and queried by the Data Quality Manager component in order to evaluate the accuracy of pressure values, and by the Monitoring component in order to evaluate the situation of the patient.

The Monitoring component is responsible for carrying out the monitoring of the person at home. Some of its functions are to control the temperature and blood pressure of the patient and know in which room of the house the patient is located. It includes a system of alarms that are activated according to the parameters set and the information obtained from sensors.

The Data Processing component has the functionality of managing information obtained from the Middleware and returning the result to the Monitoring component.

4 DQ Model and Management

Several DQ issues must be considered in the proposed scenario: (i) Possible errors in locating the person in the house because of wrong sensor measurements. (ii) Absence of sensor measurement during a predefined time period: this problem encompasses all types of sensors. (iii) Blood pressure sensor values whose measures are higher than the expected normal values according to the historical data of the person’s blood pressure. This may be due to a health problem in the patient (an alarm should be sent; see Section 3.1) or due to a DQ problem. (iv) For both blood pressure sensor and temperature sensor data, there is a maximum and a minimum valid value that should be respected. (v) Adequate sensor measurement rate. When increasing the sensor measurement rate, the energy cost and network traffic increase; therefore, this rate should balance the data frequency needed and the energy and traffic supported by the system.

Taking into account the previously described problems and in order to manage them, we define a DQ model that specifies a set of metrics to be applied to the involved data. Table 1 shows the defined DQ model. A metric is defined for each quality factor applied to a kind of sensor; for example, the Dist-Prec metric is defined for the Precision factor applied to distance sensor data.

Table 1:

DQ Model.

DimensionFactorSensorMetric
AccuracyPrecisionDistanceDist-Prec: Satisfaction of minimum distance supported by the sensor; result type: (0 … 1)
Semantic accuracyPressurePres-SAcc: Satisfaction of a maximum threshold calculated from historical personal pressure data; result type: (0 … 1)
CompletenessDensityDistanceDist-Dens: Non-null values percentage; result type: (0 … 1)
FreshnessCurrencyDistance, temperature, pressureDist-Curr, Temp-Curr, Pres-Curr: Time period between two consecutive data windows issued by a sensor; result type: integer (seconds)
ConsistencyDomain integrityDistance, temperature, pressureDist-Dom, Temp_Dom, Pres-Dom: Correspondence of the sensor values to a pre-defined domain: positive integer in a predefined range; result type: (0 … 1)
Intersensor consistencyDistanceDist-Cons: Consistency between the distance values of four different sensors at a certain moment; result type: (0 … 1)

Each distance sensor has a minimum value from which it can measure a distance (its precision). Dist-Prec verifies if the sensor values of a data window satisfy this minimum (issue 1). Pres-SAcc evaluates if blood pressure values are out of the expected values, in which case there can be a DQ problem or a situation that deserves special attention (issue 3). Dist-Dens is applied to distance sensors because a minimum quantity of non-null sensor values is needed for calculating a person’s location in a room (issue 1). Each sensor should issue data with a minimum frequency that is defined in the system; the metrics for the Currency factor are used to verify the satisfaction of this requirement (issues 2 and 5). The metrics for the Domain Integrity factor control if the sensor values belong to certain integer ranges, which are defined in the system (issue 4). Dist-Cons is applied to several data windows coming from several different distance sensors at the same time. Its objective is to measure if the values of the distance sensors of two different rooms are consistent, i.e., they do not show the presence of the person in both rooms at the same time (issue 1).

The granularity for the defined metrics is the data window, except in the case of Dist-Cons, where the metric is associated to several sensor windows at the same time. Note that the Accuracy and Domain Integrity metrics calculate the result considering the quantity of the window values that satisfy the required condition.

The Data Quality Manager component attaches DQ information to the data streams. Figure 2 shows the conceptual schema corresponding to the data window and the DQ information attached to it. (This does not apply to the metric Dist-Cons, which is calculated by the Data Processing component.)

Figure 2: Data Window with DQ Information.
Figure 2:

Data Window with DQ Information.

Example: Consider the distance sensors in the home’s rooms. Each data stream sent from the Middleware to the Data Processing component has the format shown in Table 2.

Table 2:

Distance Data Stream with Quality Information.

Timestamp102030405060708090
Distance Value2.52.5220.30.42.52.5
Precision10.31
Density110.7
Currency222
Domain integrity111

Quality values are calculated using the proposed quality model applying the respective quality metric for distance sensors over the data windows. In this example, in the second data window, there is a problem of sensor precision as some of its values are lower than the minimum for the sensor, so Precision = 0.3. Meanwhile, in the third window, there is a null value, so Density = 0.7.

The Data Processing component integrates data from all distance sensors of the rooms, detecting where the person is located in the home and calculating associated quality information. Table 3 shows an example of the generated data stream. We consider a range of 10 min and the rooms of the house: bedroom (Be), kitchen (K), and bathroom (B). In the table, we can see that the system returns the location of the patient, and the corresponding quality values, using a window of size 3.

Table 3:

Data Stream Generated by the Data Processing Component.

Timestamp102030405060708090
LocationBeBeBeKKKBBB
Precision0.810.9
Density0.80.70.8
Currency10.81
Domain integrity10.80.9
Intersensor consistency111

5 Alarm Generation

The system’s main function is to monitor the person at home using the installed sensors. This is achieved through the analysis of the data streams, considering the parameters set by the user as well as the quality of the data. Depending on this analysis, different outputs will be obtained. If sensor errors are detected, certain alarms will be generated, while if potential patient health problems are detected other alarms are generated.

In the following, we present two examples of possible situations that generate alarms.

Situation 1:

In order to detect where the person is located in the house, two distance sensors are placed in each room so that the system can get the position of the person.

  • If information is missing from one or both sensors (metric Dist-Dens), then the system returns a DQ alarm that indicates the DQ problem encountered.

  • If the person is located in two rooms simultaneously (metric Dist-Cons), then the system returns a DQ alarm that indicates the DQ problem encountered.

  • Otherwise, if two distance sensors locate the person in a room, and in a preestablished time period other two sensors detect the person in another room, and this behavior is repeated for another preestablished time period, then there could be a risk of agitation of the person, so the system returns a health alarm.

Figure 3 shows the dynamic of the system components and how they interact to send health alarms in this use case. If the patient moves from one room to another room in a period of time shorter than the predefined parameter (change-period), this is considered as a potential problem; however, if this behavior is repeated some times (time period parameterized agit-period), the system should send an alarm indicating that the patient is suffering agitation. Figure 4 shows the algorithm.

Figure 3: Situation 1: Agitation of the Person.
Figure 3:

Situation 1: Agitation of the Person.

Figure 4: Situation 1: Agitation Determination Algorithm.
Figure 4:

Situation 1: Agitation Determination Algorithm.

The Monitoring system considers that the patient is probably agitated when he is detected in different rooms of the house in a short time period. Two parameters are set by the system user: change-period and agit-period. Change-period indicates a maximum for the time period that elapsed between the detections of the person in two different rooms. Agit-period indicates a minimum for the time period during which the patient is continuously changing his location in the house. Figure 4 presents the algorithm for determining if a health alarm caused by the person’s agitation is generated.

Situation 2:

In order to detect high or low blood pressure in the patient, the system uses an on-body pressure sensor. This sensor periodically measures the patient’s pressure, and the system additionally uses historical pressure data to compare the values and detect a health problem.

  • If there is no information from the sensor in a predefined time period (metric Pres-Curr), then the system returns a DQ alarm that indicates the DQ problem encountered.

  • If information measured from the sensor is out of the parameters indicated in the system (metric Pres-Dom), then the system returns a DQ alarm that indicates the DQ problem encountered.

  • Otherwise, if the measured values exceed the maximum historical pressure value of the patient (obtained from the historical pressure database) or fall below the minimum historical pressure value, and if this behavior remains for a certain time, then this situation indicates high/low blood pressure of the patient and the system returns a health alarm.

Figure 5 shows the dynamic of the system components and how they interact to send health alarms in this use case.

Figure 5: Situation 2: High/Low Blood Pressure.
Figure 5:

Situation 2: High/Low Blood Pressure.

The Monitoring system considers that the patient has high blood pressure if the measured value of blood pressure exceeds the maximum historical pressure during a determined number of consecutive measurements. Also, the system considers low blood pressure if the measured value falls below the minimum historical pressure value during the same amount of predefined measurements. One parameter set by the system user, period_prec, indicates the number of consecutive measurements to consider a blood pressure problem.

Figures 6 and 7 present algorithms for determining if a health alarm caused by the person’s high/low blood pressure is generated.

Figure 6: Situation 2: High Blood Pressure Determination Algorithm.
Figure 6:

Situation 2: High Blood Pressure Determination Algorithm.

Figure 7: Situation 2: Low Blood Pressure Determination Algorithm.
Figure 7:

Situation 2: Low Blood Pressure Determination Algorithm.

6 Conclusions

In this paper, we present a proposal for managing data streams from sensors that are installed in patients’ homes in order to monitor their health.

A set of possible problems in the sensors’ data is described and, taking into account these problems, a DQ model is proposed. In addition, an architecture for the system that is in charge of processing sensor data is proposed.

Finally, an approach for the generation of health alarms and DQ alarms is presented by means of two examples. We emphasize the importance of communicating DQ errors and their details. DQ alarms are the result of applying DQ metrics, which throw information about the kind of error and the degree of the problem. The user of the system will have the elements to identify the data errors and take actions to correct and prevent them. The message is sent as a DQ alarm, differentiating it from any other error message. Apart from these are the health alarms, which are intended to notify of a probable health problem in the patient.

The DQ model presented in this work was specifically designed for a particular context with particular kinds of sensors. However, this proposal can be seen as a step toward the definition of a general DQ model for sensor data streams.

This work is part of two ongoing postgraduate theses, where a deeper study on the most appropriate DQ dimensions and metrics for sensor data streams is being done, and also quality metrics implementation using the particularities of DSMSs are being explored. In this context, implementations of the proposed solutions are in progress.


Corresponding author: Saúl Fagúndez, Universidad de la República, Facultad de Ingeniería, Instituto de Computación, Julio Herrera y Reissig 565, Código Postal 11.300, Montevideo, Uruguay, e-mail:

Bibliography

[1] A. Arasu, S. Babu and J. Widom, The CQL Continuous Query Language: Semantic Foundations and Query Execution, 2003.Search in Google Scholar

[2] B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom, Models and Issues in Data Stream Systems, 2002.10.1145/543613.543615Search in Google Scholar

[3] C. Batini and M. Scannapieco, Data Quality: Concepts, Methodologies and Techniques, Springer-Verlag, Berlin, 2006.Search in Google Scholar

[4] A. Berry and Z. Milosevic, Real-time analytics for legacy data streams in health: monitoring health data quality, in: Enterprise Distributed Object Computing Conference (EDOC), 2013 17th IEEE International, pp. 91–100, 2013.10.1109/EDOC.2013.19Search in Google Scholar

[5] J. Bitwas, F. Naumann and Q. Qiu, Assessing the Completeness of Sensor Data 11th International Conference, DASFAA, Singapore, 2006.10.1007/11733836_50Search in Google Scholar

[6] D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul and S. Zdonik, Monitoring streams: a new class of data management applications, in: Proceedings of the 28th International Conference on Very Large Databases, pp. 215–226, VLDB Endowment, 2002.10.1016/B978-155860869-6/50027-5Search in Google Scholar

[7] Center for Future Health, Smart Medical Home Research Laboratory, University of Rochester, Retrieved on August, 20, 2009 from http://www.futurehealth.rochester.edu/smart;home/, 2005.Search in Google Scholar

[8] M. Chan, E. Campo, D. Estève and J. Y. Fourniols, Smart homes – current features and future perspectives, Maturitas64 (2009), 90–97.10.1016/j.maturitas.2009.07.014Search in Google Scholar PubMed

[9] D. Cook, J. Augusto and V. R. Jakkula, Ambient intelligence: technologies, applications, and opportunities, Pervasive and Mobile Computing5 (2009), 277–298.10.1016/j.pmcj.2009.04.001Search in Google Scholar

[10] R. V. Duchêne, F. Noury, N. Bajolle, and L. Demongeot, Health “smart” home: information technology for patients at home, Telemedicine Journal and e-Health8 (2002), 395–409.10.1089/15305620260507530Search in Google Scholar PubMed

[11] A. K. Elmagarmid, Stream Data Management. http://www.springeronline.com, Accessed August, 2013.Search in Google Scholar

[12] L. Golab and M. Tamer Özsu, Issues in data stream management, ACM SIGMOD Record32 (2003), 5–14.10.1145/776985.776986Search in Google Scholar

[13] G. Hebrail, Data stream management and mining, in: Mining Massive Data Sets for Security, pp. 89–102, Paris, 2008.Search in Google Scholar

[14] A. Klein, Incorporating Quality Aspects in Sensor Data Streams, ACM first PhD, 2007.10.1145/1316874.1316888Search in Google Scholar

[15] A. Klein and W. Lehner, Representing data quality for streaming and static data, in: IEEE 23rd International Conference, Istanbul, 2007.10.1109/ICDEW.2007.4400967Search in Google Scholar

[16] A. Klein and W. Lehner, Representing data quality in sensor data streaming environments, Journal of Data and Information Quality1 (2009), 10:1–10:28.10.1145/1577840.1577845Search in Google Scholar

[17] C. Kuka and D. Nicklas, Quality matters: supporting quality-aware pervasive applications by probabilistic data stream management, in: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems, pp. 1–12, New York, 2014.10.1145/2611286.2611292Search in Google Scholar

[18] L. Petit, C. Labbé and C. L. Roncancio, An algebraic window model for data stream management, in: Proceedings of the Ninth ACM International Workshop on Data Engineering for Wireless and Mobile Access, pp. 17–24, ACM, New York, NY, USA, 2010.10.1145/1850822.1850826Search in Google Scholar

[19] L. Pipino, Y. W. Lee and R. Y. Wang, Data quality assessment, Communications of ACM45 (2002), 211–218.10.1145/505248.506010Search in Google Scholar

[20] A. L. P. K. C. M. A. Sharaf, J. Beaver, A. Labrinidis and K. Chrysanthis, Balancing energy efficiency and quality of aggregate data in sensor networks, The VLDB Journal – The International Journal on Very Large Data Bases13 (2004), 384–403.10.1007/s00778-004-0138-0Search in Google Scholar

[21] D. M. Strong, Y. W. Lee and Y. Wang, Data quality in context, Communications of ACM40 (1997), 103–110.10.1145/253769.253804Search in Google Scholar

[22] R. Y. Wang and D. M. Strong, Beyond accuracy: what data quality means to data consumers, Journal of Management Information Systems12 (1996), 5–33.10.1080/07421222.1996.11518099Search in Google Scholar

Received: 2014-10-31
Published Online: 2015-3-5
Published in Print: 2015-8-1

©2015 by De Gruyter

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 18.5.2024 from https://www.degruyter.com/document/doi/10.1515/jisys-2014-0166/html
Scroll to top button