Analysis of incomplete and inconsistent clinical survey data

Arslanturk, Suzan; Siadat, Mohammad-Reza; Ogunyemi, Theophilus; Killinger, Kim; Diokno, Ananias

doi:10.1007/s10115-015-0850-7

Analysis of incomplete and inconsistent clinical survey data

Regular Paper
Published: 08 July 2015

Volume 46, pages 731–750, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Suzan Arslanturk ORCID: orcid.org/0000-0002-4554-4373¹,
Mohammad-Reza Siadat¹,
Theophilus Ogunyemi²,
Kim Killinger³ &
…
Ananias Diokno³

685 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

It is common for clinical data in survey trials to be incomplete and inconsistent for several reasons. Inconsistent data occur when more than one set of exclusive alternative questions are answered. One objective of this study was to identify and eliminate inconsistent data as an important data mining preprocessing step. We define three types of incomplete data: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD). Identifying the type of missing data is another important objective as all missing data types cannot be treated the same. This goal cannot be achieved manually on large data of complex surveys since each subject should be processed individually. The analyses are accomplished in a mathematical framework by exploiting graph theoretic structure inherent in the questionnaire. An undirected graph is built using mutually inconsistent responses as well as its complement. The responses not in the largest maximal clique of complement graph are considered inconsistent. This guarantees removing as few responses as possible so that remaining ones are mutually consistent. Further, all potential paths in questionnaire’s graph are considered, based on the responses of subjects, to identify each type of incomplete data. Experiments are conducted on MESA data. Results show 15.4 % GMD, 9.8 % SPMD, 12.9 % UMD, and 0.021 % inconsistent data. Further utility of the approach is using a) the SPMD for data stratification, and b) inconsistent data for noise estimation. Proposed method is a preprocessing prerequisite for any data mining of clinical survey data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Article 01 December 2018

Sayan Ghosh & Palaniappan Vellaisamy

Evaluating Imputation Methods for Missing Data in a MCI Dataset

Diagnostic Test for Realized Missingness in Mixed-type Data

Article 20 December 2023

Ruizhe Chen, Yu-Che Chung, … Qian Shi

References

Ambler G, Omar RZ, Royston P (2007) A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 16:277–298
Article MathSciNet MATH Google Scholar
Arslanturk S, Siadat MR, Ogunyemi T, Sethi I, Diokno A (2011) Comparison of feature selection techniques using fully controlled simulation based datasets. In: 2nd international conference on information management and evaluation, Toronto, Canada, pp 18–23
Beale EML, Little RJA (1975) Missing values in multivariate analysis. J R Stat Soc 37:129–145
MathSciNet MATH Google Scholar
Cohen WW (1995) Fast effective rule induction. In: Twelfth international conference on machine learning, Lake Tahoe, CA, pp 115–123
Dillman DA, Baxter LC, Jackson A (1999) Skip-pattern compliance in three test forms: a theoretical and empirical evaluation. The social and economic sciences research center technical report number: 99-01
Diokno AC, Brock BM, Brown MB et al (1986) Prevalence of urinary incontinence and other urological symptoms in the noninstutionalized elderly. J Urol 136:1022
Google Scholar
Diokno AC, Brown MB, Brock BM et al (1988) Clinical and cystometric characteristics of continent and incontinent noninstitutionalized elderly. J Urol 140:567
Google Scholar
Diokno AC, Sampselle CM, Herzog AR et al (2004) Prevention of urinary incontinence by behavioral modification program: a randomized, controlled trial among older women in the community. J Urol 171:1165
Article Google Scholar
Fagan J, Greenberg BV (1988) Using graph theory to analyze skip patterns in questionnaires. Bureau of the census, statistical research division report series, SRD research report number: census/SRD/RR-88/06
Hall MA (1999) Correlation-based feature subset selection for machine learning. Dissertation, The University of Waikato Thesis
Heijden G, Donders A, Stijnen T, Moons K (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59:1102–1109
Article Google Scholar
Jerez JM, Molina I et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50:105–115
Article Google Scholar
Junninen H, Niska H, Tuppurainen K, Ruuskanen J, Kolehmainen M (2004) Methods for imputation of missing values in air quality data sets. J Atmos Environ 38:2895–2907
Article Google Scholar
Lenderking WR, Nackley JF, Anderson RB, Testa MA (1996) A review of the quality of life aspects of urinary urge incontinence. J PharmacoEconomics 9:11–23
Article Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Manca A, Palmer S (2005) Handling missing data in patient-level cost-effectiveness analysis alongside randomised clinical trials. Appl Health Econ Health Policy 4:65–75
Article Google Scholar
Penny KI, Chesney T (2006) Imputation methods to deal with missing values when data mining trauma injury data. In: 28th international conference on information technology interfaces, Cavtat, Croatia, pp 213–218
Ouzienkio V, Obradovic Z (2014) Imputation of missing links and attributes in longitudinal social surveys. J Mach Learn 95(3):329–356
Article Google Scholar
Li Yuanyuan, Parker LE (2014) Full length article: nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks. J Inf Fusion 15:64–79
Article Google Scholar
Zhang C, Zu Y, Zhang J, Zhang S (2006) Clustering-based missing value imputation for data preprocessing. In: International conference on industrial informatics, Singapore, pp 1081–1086. doi:10.1109/INDIN.2006.275672
Zhong H (2009) The impact of missing data in the estimation of concentration index: a potential source of bias. Eur J Health Econ 11:255–66. doi:10.1007/s10198-009-0170-5
Article Google Scholar

Download references

Acknowledgments

The project described was supported by Grant Number R01AG038673 from the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Oakland University, 540 Engineering Center, 2200 Squirrel Rd., Rochester, MI, 48309, USA
Suzan Arslanturk & Mohammad-Reza Siadat
Department of Mathematics and Statistics, Oakland University, Rochester, MI, 48309, USA
Theophilus Ogunyemi
Department of Urology, William Beaumont Hospital, Royal Oak, MI, 48073, USA
Kim Killinger & Ananias Diokno

Authors

Suzan Arslanturk
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad-Reza Siadat
View author publications
You can also search for this author in PubMed Google Scholar
Theophilus Ogunyemi
View author publications
You can also search for this author in PubMed Google Scholar
Kim Killinger
View author publications
You can also search for this author in PubMed Google Scholar
Ananias Diokno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suzan Arslanturk.

Additional information

This work was supported in part by NIH Grant# R01AG038673.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arslanturk, S., Siadat, MR., Ogunyemi, T. et al. Analysis of incomplete and inconsistent clinical survey data. Knowl Inf Syst 46, 731–750 (2016). https://doi.org/10.1007/s10115-015-0850-7

Download citation

Received: 26 July 2014
Revised: 14 February 2015
Accepted: 07 June 2015
Published: 08 July 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10115-015-0850-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Analysis of incomplete and inconsistent clinical survey data

Abstract

Access this article

Similar content being viewed by others

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Evaluating Imputation Methods for Missing Data in a MCI Dataset

Diagnostic Test for Realized Missingness in Mixed-type Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of incomplete and inconsistent clinical survey data

Abstract

Access this article

Similar content being viewed by others

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Evaluating Imputation Methods for Missing Data in a MCI Dataset

Diagnostic Test for Realized Missingness in Mixed-type Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation