Skip to main content

A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data

  • Chapter
Interactive Knowledge Discovery and Data Mining in Biomedical Informatics

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8401))

Abstract

Large amounts of data are collected by public administrations and healthcare organizations, the integration of the data scattered in several information systems can facilitate the comprehension of complex scenarios and support the activities of decision makers.

Unfortunately, the quality of information system archives is very poor, as widely stated by the existing literature. Data cleansing is one of the most frequently used data improvement technique. Data can be cleansed in several ways, the optimal choice however is strictly dependent on the integration and analysis processes to be performed. Therefore, the design of a data analysis process should consider in a holistic way the data integration, cleansing, and analysis activities. However, in the existing literature, the data integration and cleansing issues have been mostly addressed in isolation.

In this paper we describe how a model based cleansing framework is extended to address also integration activities. The combined approach facilitates the rapid prototyping, development, and evaluation of data pre-processing activities. Furthermore, the combined use of formal methods and visualization techniques strongly empower the data analyst which can effectively evaluate how cleansing and integration activities can affect the data analysis. An example focusing on labour and healthcare data integration is showed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel. ACM SIGKDD Explorations Newsletter 5(2), 191–196 (2003)

    Article  Google Scholar 

  2. Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Data quality sensitivity analysis on aggregate indicators. In: International Conference on Data Technologies and Applications (DATA), pp. 97–108. SciTePress (2012)

    Google Scholar 

  3. Tee, S., Bowen, P., Doyle, P., Rohde, F.: Data quality initiatives: Striving for continuous improvements. International Journal of Information Quality 1(4), 347–367 (2007)

    Article  Google Scholar 

  4. Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41, 79–82 (1998)

    Article  Google Scholar 

  5. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Data-Centric Systems and Applications. Springer (2006)

    Google Scholar 

  6. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Communications of the ACM 39(11), 27–34 (1996)

    Article  Google Scholar 

  7. Sadiq, S.: Handbook of Data Quality. Springer (2013)

    Google Scholar 

  8. Fisher, C., Lauría, E., Chengalur-Smith, S., Wang, R.: Introduction to information quality. AuthorHouse (2012)

    Google Scholar 

  9. Holzinger, A., Yildirim, P., Geier, M., Simonic, K.M.: Quality-based knowledge discovery from medical text on the web. In: [142], pp. 145–158

    Google Scholar 

  10. Pasi, G., Bordogna, G., Jain, L.C.: An introduction to quality issues in the management of web information. In: [142], pp. 1–3

    Google Scholar 

  11. Herrera-Viedma, E., Peis, E.: Evaluating the informative quality of documents in sgml format from judgements by means of fuzzy linguistic techniques based on computing with words. Information Processing & Management 39(2), 233–249 (2003)

    Article  MATH  Google Scholar 

  12. Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Inconsistency knowledge discovery for longitudinal data management: A model-based approach. In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 183–194. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Widom, J., Ceri, S.: Active database systems: Triggers and rules for advanced database processing. Morgan Kaufmann (1996)

    Google Scholar 

  14. Xu, H., Jin, Y.: Biorl: An xml-based active rule language for biological database constraint management. In: International Conference on BioMedical Engineering and Informatics, BMEI 2008, vol. 1, pp. 883–887. IEEE (2008)

    Google Scholar 

  15. Calvanese, D., De Giacomo, G., Montali, M.: Foundations of data-aware process analysis: A database theory perspective. In: Proceedings of the 32nd Symposium on Principles of Database Systems, PODS 2013, pp. 1–12. ACM, New York (2013)

    Chapter  Google Scholar 

  16. Shapiro, S.C.: Artificial Intelligence. In: Encyclopedia of Artificial Intelligence, vol. 2, John Wiley & Sons, Inc., New York (1992)

    Google Scholar 

  17. Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)

    Article  MathSciNet  Google Scholar 

  18. Morgan, T.: Business Rules and Information Systems: Aligning IT with Business Goals. Pearson Education (2002)

    Google Scholar 

  19. Han, J., Kamber, M., Pei, J.: Data mining: Concepts and techniques. Morgan kaufmann (2006)

    Google Scholar 

  20. Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006)

    Google Scholar 

  21. Arens, Y., Chee, C.Y., Hsu, C.N., Knoblock, C.A.: Retrieving and integrating data from multiple information sources. International Journal of Intelligent and Cooperative Information Systems 2(02), 127–158 (1993)

    Article  Google Scholar 

  22. Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACM (2002)

    Google Scholar 

  23. Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)

    Article  MATH  Google Scholar 

  24. Hull, R.: Managing semantic heterogeneity in databases: a theoretical prospective. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 51–61. ACM (1997)

    Google Scholar 

  25. Ullman, J.D.: Information integration using logical views. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 19–40. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  26. Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–9. ACM (2006)

    Google Scholar 

  27. Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)

    Article  Google Scholar 

  28. Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41, 58–65 (1998)

    Article  Google Scholar 

  29. Scannapieco, M., Missier, P., Batini, C.: Data Quality at a Glance. Datenbank-Spektrum 14, 6–14 (2005)

    Google Scholar 

  30. Wang, Y.R., Madnick, S.E.: The inter-database instance identification problem in integrating autonomous systems. In: Proceedings of the Fifth International Conference on Data Engineering, pp. 46–55. IEEE (1989)

    Google Scholar 

  31. Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for etl processes. In: Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP, DOLAP 2002, pp. 14–21. ACM, New York (2002)

    Google Scholar 

  32. Codd, E.F.: Further normalization of the data base relational model. Data Base Systems 6, 33–64 (1972)

    Google Scholar 

  33. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 746–755. IEEE (2007)

    Google Scholar 

  34. Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering 23(5), 683–698 (2011)

    Article  Google Scholar 

  35. Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992)

    Article  Google Scholar 

  36. Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The tsimmis approach to mediation: Data models and languages. Journal of intelligent information systems 8(2), 117–132 (1997)

    Article  Google Scholar 

  37. Welty, C.: Guest editorial: Ontology research. AI Mag. 24(3), 11–12 (2003)

    Google Scholar 

  38. Noy, N.F.: Semantic integration: a survey of ontology-based approaches. ACM Sigmod Record 33(4), 65–70 (2004)

    Article  Google Scholar 

  39. Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18, 323–364 (1986)

    Article  Google Scholar 

  40. Miller, R.J., Haas, L.M., Hernández, M.A.: Schema mapping as query discovery. In: VLDB, vol. 2000, pp. 77–88 (2000)

    Google Scholar 

  41. Bouzeghoub, M., Lenzerini, M.: Introduction to: data extraction, cleaning, and reconciliation a special issue of information systems, an international journal. Information Systems 26(8), 535–536 (2001)

    Article  Google Scholar 

  42. Fox, C., Levitin, A., Redman, T.: The notion of data and its quality dimensions. Information Processing & Management 30(1), 9–19 (1994)

    Article  Google Scholar 

  43. Levitin, A., Redman, T.: Quality dimensions of a conceptual view. Information Processing & Management 31(1), 81–88 (1995)

    Article  Google Scholar 

  44. Ballou, D.P., Tayi, G.K.: Enhancing data quality in data warehouse environments. Communications of the ACM 42(1), 73–78 (1999)

    Article  Google Scholar 

  45. Hipp, J., Güntzer, U., Grimmer, U.: Data quality mining-making a virute of necessity. In: DMKD (2001)

    Google Scholar 

  46. Haug, A., Zachariassen, F., Van Liempd, D.: The costs of poor data quality. Journal of Industrial Engineering and Management 4(2), 168–193 (2011)

    Article  Google Scholar 

  47. Dasu, T.: Data glitches: Monsters in your data. In: Handbook of Data Quality, pp. 163–178. Springer (2013)

    Google Scholar 

  48. Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine 34(2), 113–127 (2005)

    Article  Google Scholar 

  49. Lavrač, N.: Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1), 3–23 (1999); Data Mining Techniques and Applications in Medicine

    Article  Google Scholar 

  50. Kriegel, H.P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Mining and Knowledge Discovery 15(1), 87–97 (2007)

    Article  MathSciNet  Google Scholar 

  51. Yan, X., Zhang, C., Zhang, S.: Toward databases mining: Pre-processing collected data. Applied Artificial Intelligence 17(5-6), 545–561 (2003)

    Article  Google Scholar 

  52. Espinosa, R., Zubcoff, J., Mazón, J.-N.: A set of experiments to consider data quality criteria in classification techniques for data mining. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part II. LNCS, vol. 6783, pp. 680–694. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  53. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)

    Article  Google Scholar 

  54. Rajagopalan, B., Isken, M.W.: Exploiting data preparation to enhance mining and knowledge discovery. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31(4), 460–467 (2001)

    Article  Google Scholar 

  55. Zhu, X., Wu, X.: Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review 22(3), 177–210 (2004)

    Article  MATH  Google Scholar 

  56. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for dna microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  57. Halevy, A.Y.: Data integration: A status report. In: Proc. BTW 2003 (2003)

    Google Scholar 

  58. Haas, L.: Beauty and the beast: The theory and practice of information integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  59. Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: An interactive tutorial. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1213–1216. ACM, New York (2011)

    Google Scholar 

  60. Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)

    Google Scholar 

  61. Haas, L.M., Hentschel, M., Kossmann, D., Miller, R.J.: Schema AND data: A holistic approach to mapping, resolution and fusion in information integration. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 27–40. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  62. Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1 (2008)

    Article  Google Scholar 

  63. Sattler, K.U., Conrad, S., Saake, G.: Adding conflict resolution features to a query language for database federations. In: Roantree, M., Hasselbring, W., Conrad, S. (eds.) International Workshop on Engineering Federated Information Systems (EFIS), pp. 41–52 (2000)

    Google Scholar 

  64. Schallehn, H., Saltler, K.U.: Using similarity-based operations for resolving data-level conflicts. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, pp. 172–189. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  65. Schallehn, E., Sattler, K.U., Saake, G.: Efficient similarity-based operations for data integration. Data & Knowledge Engineering 48(3), 361–387 (2004)

    Article  Google Scholar 

  66. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: A new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)

    Article  Google Scholar 

  67. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (CSUR) 38(2), 6 (2006)

    Article  Google Scholar 

  68. Meng, W., Yu, C., Liu, K.L.: Building efficient and effective metasearch engines. ACM Computing Surveys (CSUR) 34(1), 48–89 (2002)

    Article  Google Scholar 

  69. Uschold, M., Gruninger, M.: Ontologies and semantics for seamless connectivity. SIGMOD Rec. 33(4), 58–64 (2004)

    Article  Google Scholar 

  70. Calvanese, D., De Giacomo, G.: Data integration: A logic-based perspective. AI Magazine 26(1), 59 (2005)

    Google Scholar 

  71. Abello, J., Pardalos, P.M., Resende, M.G.: Handbook of massive data sets, vol. 4. Springer (2002)

    Google Scholar 

  72. Mayfield, C., Neville, J., Prabhakar, S.: Eracer: a database approach for statistical inference and data cleaning. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 75–86. ACM (2010)

    Google Scholar 

  73. Winkler, W.E.: Editing discrete data. Bureau of the Census (1997)

    Google Scholar 

  74. Fellegi, I., Holt, D.: A systematic approach to automatic edit and inputation. Journal of the American Statistical Association 71(353), 17–35 (1976)

    Article  Google Scholar 

  75. Winkler, W.E.: Machine learning, information retrieval and record linkage. In: Proc. Section on Survey Research Methods, American Statistical Association, pp. 20–29 (2000)

    Google Scholar 

  76. Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)

    Article  MATH  Google Scholar 

  77. Newcombe, H.B., Kennedy, J.M.: Record linkage: making maximum use of the discriminating power of identifying information. Communications of the ACM 5(11), 563–566 (1962)

    Article  Google Scholar 

  78. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)

    Article  Google Scholar 

  79. Winkler, W.: Methods for evaluating and creating data quality. Information Systems 29(7), 531–550 (2004)

    Article  Google Scholar 

  80. Holzinger, A., Bruschi, M., Eder, W.: On interactive data visualization of physiological low-cost-sensor data with focus on mental stress. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 469–480. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  81. Ferreira de Oliveira, M.C., Levkowitz, H.: From visual data exploration to visual data mining: A survey. IEEE Trans. Vis. Comput. Graph. 9(3), 378–394 (2003)

    Article  Google Scholar 

  82. Clemente, P., Kaba, B., Rouzaud-Cornabas, J., Alexandre, M., Aujay, G.: SPTrack: Visual analysis of information flows within sELinux policies and attack logs. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 596–605. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  83. Bertossi, L.: Consistent query answering in databases. ACM Sigmod Record 35(2), 68–76 (2006)

    Article  Google Scholar 

  84. Chomicki, J., Marcinkowski, J.: On the computational complexity of minimal-change integrity maintenance in relational databases. In: Bertossi, L., Hunter, A., Schaub, T. (eds.) Inconsistency Tolerance. LNCS, vol. 3300, pp. 119–150. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  85. Yu, L., Wang, S., Lai, K.K.: An integrated data preparation scheme for neural network data analysis. IEEE Transactions on Knowledge and Data Engineering 18(2), 217–230 (2006)

    Article  Google Scholar 

  86. Wang, H., Wang, S.: Discovering patterns of missing data in survey databases: an application of rough sets. Expert Systems with Applications 36(3), 6256–6260 (2009)

    Article  Google Scholar 

  87. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1), 90–121 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  88. Greco, G., Greco, S., Zumpano, E.: A logic programming approach to the integration, repairing and querying of inconsistent databases. In: Codognet, P. (ed.) ICLP 2001. LNCS, vol. 2237, pp. 348–364. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  89. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: ACM Symp. on Principles of Database Systems, pp. 68–79. ACM Press (1999)

    Google Scholar 

  90. Yakout, M., Berti-Équille, L., Elmagarmid, A.K.: Don’t be scared: use scalable automatic repairing with maximal likelihood and bounded changes. In: Proceedings of the 2013 International Conference on Management of Data, pp. 553–564. ACM (2013)

    Google Scholar 

  91. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 315–326. VLDB Endowment (2007)

    Google Scholar 

  92. Kolahi, S., Lakshmanan, L.V.: On approximating optimum repairs for functional dependency violations. In: Proceedings of the 12th International Conference on Database Theory, pp. 53–62. ACM (2009)

    Google Scholar 

  93. Vardi, M.: Fundamentals of dependency theory. In: Trends in Theoretical Computer Science, pp. 171–224 (1987)

    Google Scholar 

  94. Chomicki, J.: Efficient checking of temporal integrity constraints using bounded history encoding. ACM Transactions on Database Systems (TODS) 20(2), 149–186 (1995)

    Article  Google Scholar 

  95. Fan, W.: Dependencies revisited for improving data quality. In: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–170 (2008)

    Google Scholar 

  96. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment 3(1-2), 173–184 (2010)

    Article  Google Scholar 

  97. Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Tang, N.: Nadeef: a commodity data cleaning system. In: Ross, K.A., Srivastava, D., Papadias, D. (eds.) SIGMOD Conference, pp. 541–552. ACM (2013)

    Google Scholar 

  98. Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The llunatic data-cleaning framework. PVLDB 6(9), 625–636 (2013)

    Google Scholar 

  99. Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Towards data cleansing via planning. Intelligenza Artificiale 8(1) (2014)

    Google Scholar 

  100. Choi, E.H., Tsuchiya, T., Kikuno, T.: Model checking active database rules under various rule processing strategies. IPSJ Digital Courier 2, 826–839 (2006)

    Article  Google Scholar 

  101. Ray, I., Ray, I.: Detecting termination of active database rules using symbolic model checking. In: Caplinskas, A., Eder, J. (eds.) ADBIS 2001. LNCS, vol. 2151, pp. 266–279. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  102. Neven, F.: Automata theory for xml researchers. SIGMOD Rec. 31, 39–46 (2002)

    Article  Google Scholar 

  103. Dovier, A., Quintarelli, E.: Model-checking based data retrieval. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 62–77. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  104. Dovier, A., Quintarelli, E.: Applying Model-checking to solve Queries on semistructured Data. Computer Languages, Systems & Structures 35(2), 143–172 (2009)

    Article  Google Scholar 

  105. Afanasiev, L., Franceschet, M., Marx, M., de Rijke, M.: Ctl model checking for processing simple xpath queries. In: TIME, pp. 117–124 (2004)

    Google Scholar 

  106. Herbert, K.G., Gehani, N.H., Piel, W.H., Wang, J.T., Wu, C.H.: Bio-ajax: an extensible framework for biological data cleaning. ACM SIGMOD Record 33(2), 51–57 (2004)

    Article  Google Scholar 

  107. Chen, J.Y., Carlis, J.V., Gao, N.: A complex biological database querying method. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 110–114. ACM, New York (2005)

    Google Scholar 

  108. Apiletti, D., Bruno, G., Ficarra, E., Baralis, E.: Data cleaning and semantic improvement in biological databases. Journal of Integrative Bioinformatics 3(2), 1–11 (2006)

    Google Scholar 

  109. Chellamuthu, S., Punithavalli, D.M.: Detecting redundancy in biological databases? an efficient approach. Global Journal of Computer Science and Technology 9(4) (2009)

    Google Scholar 

  110. Shui, W.M., Wong, R.K.: Application of xml schema and active rules system in management and integration of heterogeneous biological data. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 367–374. IEEE (2003)

    Google Scholar 

  111. Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  112. Parsaye, K., Chignell, M.: Intelligent Database Tools and Applications: Hyperinformation access, data quality, visualization, automatic discovery. John Wiley (1993)

    Google Scholar 

  113. Simonic, K.-M., Holzinger, A., Bloice, M., Hermann, J.: Optimizing long-term treatment of rheumatoid arthritis with systematic documentation. In: International Conference on Pervasive Computing Technologies for Healthcare, PervasiveHealth, pp. 550–554. IEEE (2011)

    Google Scholar 

  114. Holzinger, A., Zupan, M.: Knodwat: A scientific framework application for testing knowledge discovery methods for the biomedical domain. BMC Bioinformatics 14, 191 (2013)

    Article  Google Scholar 

  115. Holzinger, A.: On knowledge discovery and interactive intelligent visualization of biomedical data - challenges in human-computer interaction & biomedical informatics. In: Helfert, M., Francalanci, C., Filipe, J. (eds.) DATA. SciTePress (2012)

    Google Scholar 

  116. Holzinger, A.: Weakly structured data in health-informatics: the challenge for human-computer-interaction. In: Proceedings of INTERACT 2011 Workshop: Promoting and Supporting Healthy Living by Desing, IFIP, pp. 5–7 (2011)

    Google Scholar 

  117. Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  118. Lovaglio, P.G., Mezzanzanica, M.: Classification of longitudinal career paths. Quality & Quantity 47(2), 989–1008 (2013)

    Article  Google Scholar 

  119. Hansen, P., Järvelin, K.: Collaborative information retrieval in an information-intensive domain. Information Processing & Management 41(5), 1101–1119 (2005)

    Article  Google Scholar 

  120. Prinzie, A., Van den Poel, D.: Modeling complex longitudinal consumer behavior with dynamic bayesian networks: an acquisition pattern analysis application. Journal of Intelligent Information Systems 36(3), 283–304 (2011)

    Article  Google Scholar 

  121. Devaraj, S., Kohli, R.: Information technology payoff in the health-care industry: a longitudinal study. Journal of Management Information Systems 16(4), 41–68 (2000)

    Article  Google Scholar 

  122. Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Data quality through model checking techniques. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 270–281. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  123. Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F.: UPMurphi: a tool for universal planning on PDDL+ problems. In: ICAPS, pp. 106–113. AAAI Press (2009)

    Google Scholar 

  124. Fox, M., Long, D., Magazzeni, D.: Plan-based policies for efficient multiple battery load management. J. Artif. Intell. Res. (JAIR) 44, 335–382 (2012)

    MATH  Google Scholar 

  125. Fox, M., Long, D., Magazzeni, D.: Automatic construction of efficient multiple battery usage policies. In: Walsh, T. (ed.) IJCAI, IJCAI/AAAI, pp. 2620–2625 (2011)

    Google Scholar 

  126. Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F., Tronci, E.: Cost-optimal strong planning in non-deterministic domains. In: Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO), pp. 56–66. SciTePress (2011)

    Google Scholar 

  127. Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F.: A PDDL+ benchmark problem: The batch chemical plant. In: Proceedings of ICAPS 2010, pp. 222–224. AAAI Press (2010)

    Google Scholar 

  128. Della Penna, G., Magazzeni, D., Mercorio, F.: A universal planning system for hybrid domains. Applied Intelligence 36(4), 932–959 (2012)

    Article  Google Scholar 

  129. Della Penna, G., Intrigila, B., Magazzeni, D., Melatti, I., Tronci, E.: Cgmurphi: Automatic synthesis of numerical controllers for nonlinear hybrid systems. European Journal of Control (2013)

    Google Scholar 

  130. Mercorio, F.: Model checking for universal planning in deterministic and non-deterministic domains. AI Communications 26(2), 257–259 (2013)

    Google Scholar 

  131. Boselli, R., Mezzanzanica, M., Cesarini, M., Mercorio, F.: Planning meets data cleansing. In: 24th International Conference on Automated Planning and Scheduling, ICAPS (2014)

    Google Scholar 

  132. Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Automatic synthesis of data cleansing activities. In: DATA 2013 - Proceedings of the International Conference on Data Technologies and Applications. SciTePress (2013)

    Google Scholar 

  133. Jurafsky, D., James, H.: Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech. Pearson Education (2000)

    Google Scholar 

  134. Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1(2), 69–91 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  135. Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. ACM SIGMOD Record 37(3), 26–32 (2008)

    Article  Google Scholar 

  136. Blake, R., Mangiameli, P.: The effects and interactions of data quality and problem complexity on classification. J. Data and Information Quality 2(2), 8:1–8:28 (2011)

    Article  Google Scholar 

  137. Morton, K., Balazinska, M., Grossman, D., Mackinlay, J.: Support the data enthusiast: Challenges for next-generation data-analysis systems. Proceedings of the VLDB Endowment 7(6) (2014)

    Google Scholar 

  138. Hanrahan, P.: Analytic database technologies for a new kind of user: the data enthusiast. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 577–578. ACM (2012)

    Google Scholar 

  139. Holzinger, A.: Human-computer interaction and knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  140. Pasi, G., Bordogna, G., Jain, L.C. (eds.): Qual. Issues in the Management of Web Information. ISRL, vol. 50. Springer, Heidelberg (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43968-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43967-8

  • Online ISBN: 978-3-662-43968-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics