Skip to main content

Applying Domain Knowledge for Data Quality Assessment in Dermatology

  • Conference paper
  • First Online:
Intelligent Decision Technologies 2017 (IDT 2017)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 73))

Included in the following conference series:

  • 1416 Accesses

Abstract

The Dermatology Clinic at the Clinical Center of Vojvodina, Novi Sad, Serbia, has actively collected data regarding patients’ treatment, health insurance and examinations. These data were stored in documents in the comma-separated values (CSV) format. Since many fields in these documents were presented as free form text or allow null values, there are many data records that are inconsistent with the real-world system. Currently, there is a large need for an analytic system that can analyze these data and find relevant patterns. Since such an analytic system would require clean and accurate data, there is a need to assess data quality. Therefore, a data quality system should be designed and built with a goal of identifying inaccurate records so that they can be aligned with the real-world state. In our approach to data quality assessment, the domain knowledge about data is used to define rules which are then used to evaluate the quality of the data. In this paper, we present the architecture of a data quality system that is used to define and apply these rules. The rules are first defined by a domain expert and then applied to data in order to determine the number of records that do not match the defined rules and identify the exact anomalies in the given records. Also, we present a case study in which we applied this data quality system to the data collected by the Dermatology Clinic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kwetishe, D., Osofisan, A.O.: Evaluation of predictive data mining algorithms in Erythemato-Squamous disease diagnosis. IJCSI Int. J. Comput. Sci. Issues 11(6), 85–94 (2014)

    Google Scholar 

  2. Brause, R.W.: Medical analysis and diagnosis by neural networks. Med. Data Anal. 2199, 1–13 (2001)

    Article  MATH  Google Scholar 

  3. Ji, Z.: Applications analysis of big data analysis in the medical industry. Int. J. Database Theor. Appl. 8(4), 107–116 (2015)

    Article  Google Scholar 

  4. Shouman M., Turner T., Stocke R.: Using data mining techniques in heart disease diagnosis and treatment. In: Proceedings of the 2012 Japan-Egypt Conference on Electronics, Communications and Computers, pp. 173–177 (2012)

    Google Scholar 

  5. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM Support. Commun. Build. Soc. Capital 45(4), 211–218 (2002)

    Google Scholar 

  6. Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16 (2009). Article No. 16

    Article  Google Scholar 

  7. Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M., Baldoni, R.: The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems. Inf. Syst. 29(7), 551–582 (2004)

    Article  Google Scholar 

  8. Ballou, D.P., Pazer, H.L.: Modeling data and process quality in multi-input, multi-output information systems. Manage. Sci. 31(2), 150–162 (1985)

    Article  Google Scholar 

  9. Ballou, D.P., Wang, R.Y., Pazer, H., Tayi, G.K.: Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44(4), 462–484 (1998)

    Article  MATH  Google Scholar 

  10. Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, Y.W.: AIMQ: a methodology for information quality assessment. J. Inf. Manage. 40(2), 133–146 (2002)

    Article  Google Scholar 

  11. Laudon, K.C.: Data quality and due process in large interorganizational record systems. Commun. ACM 29(1), 4–11 (1986)

    Article  Google Scholar 

  12. Cai, L., Zhu, Y.: The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 14, 2 (2015)

    Article  Google Scholar 

  13. Silvola, R., Harkonen, J., Vilppola, O., Kropsu-Vehkapera, H., Haapasalo, H.: Data quality assessment and improvement. Int. J. Bus. Inf. Syst. 22(1), 62–81 (2016)

    Google Scholar 

  14. Hipp, J., Guntzer, U., Grimmer, U.: Data quality mining. In: Proceedings of the 6th ACM Sigmod Workshop on Research Issues in Data Mining and Knowledge Discovery (2001)

    Google Scholar 

  15. Farzi, S., Baraani, D.A.: Data quality measurement using data mining. Int. J. Comput. Theor. Eng. 2(1), 1793–8201 (2010)

    Google Scholar 

  16. Nahm, M.: Data quality in clinical research. In: Richesson, R.L., Andrews, J.E. (eds.) Clinical Research Informatics, pp. 175–201. Springer, London (2012)

    Chapter  Google Scholar 

  17. Bae, C.J., Griffith, S., Fan, Y., Dunphy, C., Thompson, N., Urchek, J., Parchman, A., Katzan, I.L.: Challenges of data quality in medical informatics data warehouses. EGEMS (Wash DC) 3(1), 1125 (2015)

    Google Scholar 

  18. Zozus M.N., Ed Hammond, W., Green, B.B., Kahn, M.G., Richesson, R.L., Rusincovitch, R.A., Simon, G.E., Smerek, M.M.: Assessing data quality for healthcare systems data used in clinical research. NIH Health Care Systems Research Collaboratory

    Google Scholar 

  19. Spring – Spring Framework. http://spring.io/

  20. JDMP – Java Data Mining Package. http://jdmp.org/

  21. Karttunen, L., Chanod, J.P., Grefenstette, G., Schiller, A.: Regular expressions for language engineering. Nat. Lang. Eng. 1–24 (1997)

    Google Scholar 

  22. Scowen, R.S.: Extended BNF — a generic base standard. In: Proceedings of the Software Engineering Standards Symposium (1993)

    Google Scholar 

Download references

Acknowledgements

The research presented in this paper was supported by the Ministry of Education, Science, and Technological Development of the Republic of Serbia under Grant III-44010. The authors are most grateful to Clinical Center of Vojvodina for the provided data set and valuable support throughout the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nemanja Igić .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Igić, N., Terzić, B., Matić, M., Ivančević, V., Luković, I. (2018). Applying Domain Knowledge for Data Quality Assessment in Dermatology. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59424-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59423-1

  • Online ISBN: 978-3-319-59424-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics