A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data

Boselli, Roberto; Cesarini, Mirko; Mercorio, Fabio; Mezzanzanica, Mario

doi:10.1007/978-3-662-43968-5_8

Roberto Boselli^17,18,
Mirko Cesarini^17,18,
Fabio Mercorio¹⁸ &
…
Mario Mezzanzanica^17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8401))

3615 Accesses
4 Citations

Abstract

Large amounts of data are collected by public administrations and healthcare organizations, the integration of the data scattered in several information systems can facilitate the comprehension of complex scenarios and support the activities of decision makers.

Unfortunately, the quality of information system archives is very poor, as widely stated by the existing literature. Data cleansing is one of the most frequently used data improvement technique. Data can be cleansed in several ways, the optimal choice however is strictly dependent on the integration and analysis processes to be performed. Therefore, the design of a data analysis process should consider in a holistic way the data integration, cleansing, and analysis activities. However, in the existing literature, the data integration and cleansing issues have been mostly addressed in isolation.

In this paper we describe how a model based cleansing framework is extended to address also integration activities. The combined approach facilitates the rapid prototyping, development, and evaluation of data pre-processing activities. Furthermore, the combined use of formal methods and visualization techniques strongly empower the data analyst which can effectively evaluate how cleansing and integration activities can affect the data analysis. An example focusing on labour and healthcare data integration is showed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel. ACM SIGKDD Explorations Newsletter 5(2), 191–196 (2003)
Article Google Scholar
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Data quality sensitivity analysis on aggregate indicators. In: International Conference on Data Technologies and Applications (DATA), pp. 97–108. SciTePress (2012)
Google Scholar
Tee, S., Bowen, P., Doyle, P., Rohde, F.: Data quality initiatives: Striving for continuous improvements. International Journal of Information Quality 1(4), 347–367 (2007)
Article Google Scholar
Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41, 79–82 (1998)
Article Google Scholar
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Data-Centric Systems and Applications. Springer (2006)
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Communications of the ACM 39(11), 27–34 (1996)
Article Google Scholar
Sadiq, S.: Handbook of Data Quality. Springer (2013)
Google Scholar
Fisher, C., Lauría, E., Chengalur-Smith, S., Wang, R.: Introduction to information quality. AuthorHouse (2012)
Google Scholar
Holzinger, A., Yildirim, P., Geier, M., Simonic, K.M.: Quality-based knowledge discovery from medical text on the web. In: [142], pp. 145–158
Google Scholar
Pasi, G., Bordogna, G., Jain, L.C.: An introduction to quality issues in the management of web information. In: [142], pp. 1–3
Google Scholar
Herrera-Viedma, E., Peis, E.: Evaluating the informative quality of documents in sgml format from judgements by means of fuzzy linguistic techniques based on computing with words. Information Processing & Management 39(2), 233–249 (2003)
Article MATH Google Scholar
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Inconsistency knowledge discovery for longitudinal data management: A model-based approach. In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 183–194. Springer, Heidelberg (2013)
Chapter Google Scholar
Widom, J., Ceri, S.: Active database systems: Triggers and rules for advanced database processing. Morgan Kaufmann (1996)
Google Scholar
Xu, H., Jin, Y.: Biorl: An xml-based active rule language for biological database constraint management. In: International Conference on BioMedical Engineering and Informatics, BMEI 2008, vol. 1, pp. 883–887. IEEE (2008)
Google Scholar
Calvanese, D., De Giacomo, G., Montali, M.: Foundations of data-aware process analysis: A database theory perspective. In: Proceedings of the 32nd Symposium on Principles of Database Systems, PODS 2013, pp. 1–12. ACM, New York (2013)
Chapter Google Scholar
Shapiro, S.C.: Artificial Intelligence. In: Encyclopedia of Artificial Intelligence, vol. 2, John Wiley & Sons, Inc., New York (1992)
Google Scholar
Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
Article MathSciNet Google Scholar
Morgan, T.: Business Rules and Information Systems: Aligning IT with Business Goals. Pearson Education (2002)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data mining: Concepts and techniques. Morgan kaufmann (2006)
Google Scholar
Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006)
Google Scholar
Arens, Y., Chee, C.Y., Hsu, C.N., Knoblock, C.A.: Retrieving and integrating data from multiple information sources. International Journal of Intelligent and Cooperative Information Systems 2(02), 127–158 (1993)
Article Google Scholar
Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACM (2002)
Google Scholar
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)
Article MATH Google Scholar
Hull, R.: Managing semantic heterogeneity in databases: a theoretical prospective. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 51–61. ACM (1997)
Google Scholar
Ullman, J.D.: Information integration using logical views. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 19–40. Springer, Heidelberg (1996)
Chapter Google Scholar
Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–9. ACM (2006)
Google Scholar
Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
Article Google Scholar
Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41, 58–65 (1998)
Article Google Scholar
Scannapieco, M., Missier, P., Batini, C.: Data Quality at a Glance. Datenbank-Spektrum 14, 6–14 (2005)
Google Scholar
Wang, Y.R., Madnick, S.E.: The inter-database instance identification problem in integrating autonomous systems. In: Proceedings of the Fifth International Conference on Data Engineering, pp. 46–55. IEEE (1989)
Google Scholar
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for etl processes. In: Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP, DOLAP 2002, pp. 14–21. ACM, New York (2002)
Google Scholar
Codd, E.F.: Further normalization of the data base relational model. Data Base Systems 6, 33–64 (1972)
Google Scholar
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 746–755. IEEE (2007)
Google Scholar
Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering 23(5), 683–698 (2011)
Article Google Scholar
Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992)
Article Google Scholar
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The tsimmis approach to mediation: Data models and languages. Journal of intelligent information systems 8(2), 117–132 (1997)
Article Google Scholar
Welty, C.: Guest editorial: Ontology research. AI Mag. 24(3), 11–12 (2003)
Google Scholar
Noy, N.F.: Semantic integration: a survey of ontology-based approaches. ACM Sigmod Record 33(4), 65–70 (2004)
Article Google Scholar
Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18, 323–364 (1986)
Article Google Scholar
Miller, R.J., Haas, L.M., Hernández, M.A.: Schema mapping as query discovery. In: VLDB, vol. 2000, pp. 77–88 (2000)
Google Scholar
Bouzeghoub, M., Lenzerini, M.: Introduction to: data extraction, cleaning, and reconciliation a special issue of information systems, an international journal. Information Systems 26(8), 535–536 (2001)
Article Google Scholar
Fox, C., Levitin, A., Redman, T.: The notion of data and its quality dimensions. Information Processing & Management 30(1), 9–19 (1994)
Article Google Scholar
Levitin, A., Redman, T.: Quality dimensions of a conceptual view. Information Processing & Management 31(1), 81–88 (1995)
Article Google Scholar
Ballou, D.P., Tayi, G.K.: Enhancing data quality in data warehouse environments. Communications of the ACM 42(1), 73–78 (1999)
Article Google Scholar
Hipp, J., Güntzer, U., Grimmer, U.: Data quality mining-making a virute of necessity. In: DMKD (2001)
Google Scholar
Haug, A., Zachariassen, F., Van Liempd, D.: The costs of poor data quality. Journal of Industrial Engineering and Management 4(2), 168–193 (2011)
Article Google Scholar
Dasu, T.: Data glitches: Monsters in your data. In: Handbook of Data Quality, pp. 163–178. Springer (2013)
Google Scholar
Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine 34(2), 113–127 (2005)
Article Google Scholar
Lavrač, N.: Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1), 3–23 (1999); Data Mining Techniques and Applications in Medicine
Article Google Scholar
Kriegel, H.P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Mining and Knowledge Discovery 15(1), 87–97 (2007)
Article MathSciNet Google Scholar
Yan, X., Zhang, C., Zhang, S.: Toward databases mining: Pre-processing collected data. Applied Artificial Intelligence 17(5-6), 545–561 (2003)
Article Google Scholar
Espinosa, R., Zubcoff, J., Mazón, J.-N.: A set of experiments to consider data quality criteria in classification techniques for data mining. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part II. LNCS, vol. 6783, pp. 680–694. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)
Article Google Scholar
Rajagopalan, B., Isken, M.W.: Exploiting data preparation to enhance mining and knowledge discovery. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31(4), 460–467 (2001)
Article Google Scholar
Zhu, X., Wu, X.: Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review 22(3), 177–210 (2004)
Article MATH Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for dna microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Halevy, A.Y.: Data integration: A status report. In: Proc. BTW 2003 (2003)
Google Scholar
Haas, L.: Beauty and the beast: The theory and practice of information integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)
Chapter Google Scholar
Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: An interactive tutorial. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1213–1216. ACM, New York (2011)
Google Scholar
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)
Google Scholar
Haas, L.M., Hentschel, M., Kossmann, D., Miller, R.J.: Schema AND data: A holistic approach to mapping, resolution and fusion in information integration. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 27–40. Springer, Heidelberg (2009)
Chapter Google Scholar
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1 (2008)
Article Google Scholar
Sattler, K.U., Conrad, S., Saake, G.: Adding conflict resolution features to a query language for database federations. In: Roantree, M., Hasselbring, W., Conrad, S. (eds.) International Workshop on Engineering Federated Information Systems (EFIS), pp. 41–52 (2000)
Google Scholar
Schallehn, H., Saltler, K.U.: Using similarity-based operations for resolving data-level conflicts. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, pp. 172–189. Springer, Heidelberg (2003)
Chapter Google Scholar
Schallehn, E., Sattler, K.U., Saake, G.: Efficient similarity-based operations for data integration. Data & Knowledge Engineering 48(3), 361–387 (2004)
Article Google Scholar
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: A new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)
Article Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (CSUR) 38(2), 6 (2006)
Article Google Scholar
Meng, W., Yu, C., Liu, K.L.: Building efficient and effective metasearch engines. ACM Computing Surveys (CSUR) 34(1), 48–89 (2002)
Article Google Scholar
Uschold, M., Gruninger, M.: Ontologies and semantics for seamless connectivity. SIGMOD Rec. 33(4), 58–64 (2004)
Article Google Scholar
Calvanese, D., De Giacomo, G.: Data integration: A logic-based perspective. AI Magazine 26(1), 59 (2005)
Google Scholar
Abello, J., Pardalos, P.M., Resende, M.G.: Handbook of massive data sets, vol. 4. Springer (2002)
Google Scholar
Mayfield, C., Neville, J., Prabhakar, S.: Eracer: a database approach for statistical inference and data cleaning. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 75–86. ACM (2010)
Google Scholar
Winkler, W.E.: Editing discrete data. Bureau of the Census (1997)
Google Scholar
Fellegi, I., Holt, D.: A systematic approach to automatic edit and inputation. Journal of the American Statistical Association 71(353), 17–35 (1976)
Article Google Scholar
Winkler, W.E.: Machine learning, information retrieval and record linkage. In: Proc. Section on Survey Research Methods, American Statistical Association, pp. 20–29 (2000)
Google Scholar
Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Article MATH Google Scholar
Newcombe, H.B., Kennedy, J.M.: Record linkage: making maximum use of the discriminating power of identifying information. Communications of the ACM 5(11), 563–566 (1962)
Article Google Scholar
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Article Google Scholar
Winkler, W.: Methods for evaluating and creating data quality. Information Systems 29(7), 531–550 (2004)
Article Google Scholar
Holzinger, A., Bruschi, M., Eder, W.: On interactive data visualization of physiological low-cost-sensor data with focus on mental stress. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 469–480. Springer, Heidelberg (2013)
Chapter Google Scholar
Ferreira de Oliveira, M.C., Levkowitz, H.: From visual data exploration to visual data mining: A survey. IEEE Trans. Vis. Comput. Graph. 9(3), 378–394 (2003)
Article Google Scholar
Clemente, P., Kaba, B., Rouzaud-Cornabas, J., Alexandre, M., Aujay, G.: SPTrack: Visual analysis of information flows within sELinux policies and attack logs. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 596–605. Springer, Heidelberg (2012)
Chapter Google Scholar
Bertossi, L.: Consistent query answering in databases. ACM Sigmod Record 35(2), 68–76 (2006)
Article Google Scholar
Chomicki, J., Marcinkowski, J.: On the computational complexity of minimal-change integrity maintenance in relational databases. In: Bertossi, L., Hunter, A., Schaub, T. (eds.) Inconsistency Tolerance. LNCS, vol. 3300, pp. 119–150. Springer, Heidelberg (2005)
Chapter Google Scholar
Yu, L., Wang, S., Lai, K.K.: An integrated data preparation scheme for neural network data analysis. IEEE Transactions on Knowledge and Data Engineering 18(2), 217–230 (2006)
Article Google Scholar
Wang, H., Wang, S.: Discovering patterns of missing data in survey databases: an application of rough sets. Expert Systems with Applications 36(3), 6256–6260 (2009)
Article Google Scholar
Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1), 90–121 (2005)
Article MathSciNet MATH Google Scholar
Greco, G., Greco, S., Zumpano, E.: A logic programming approach to the integration, repairing and querying of inconsistent databases. In: Codognet, P. (ed.) ICLP 2001. LNCS, vol. 2237, pp. 348–364. Springer, Heidelberg (2001)
Chapter Google Scholar
Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: ACM Symp. on Principles of Database Systems, pp. 68–79. ACM Press (1999)
Google Scholar
Yakout, M., Berti-Équille, L., Elmagarmid, A.K.: Don’t be scared: use scalable automatic repairing with maximal likelihood and bounded changes. In: Proceedings of the 2013 International Conference on Management of Data, pp. 553–564. ACM (2013)
Google Scholar
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 315–326. VLDB Endowment (2007)
Google Scholar
Kolahi, S., Lakshmanan, L.V.: On approximating optimum repairs for functional dependency violations. In: Proceedings of the 12th International Conference on Database Theory, pp. 53–62. ACM (2009)
Google Scholar
Vardi, M.: Fundamentals of dependency theory. In: Trends in Theoretical Computer Science, pp. 171–224 (1987)
Google Scholar
Chomicki, J.: Efficient checking of temporal integrity constraints using bounded history encoding. ACM Transactions on Database Systems (TODS) 20(2), 149–186 (1995)
Article Google Scholar
Fan, W.: Dependencies revisited for improving data quality. In: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–170 (2008)
Google Scholar
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment 3(1-2), 173–184 (2010)
Article Google Scholar
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Tang, N.: Nadeef: a commodity data cleaning system. In: Ross, K.A., Srivastava, D., Papadias, D. (eds.) SIGMOD Conference, pp. 541–552. ACM (2013)
Google Scholar
Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The llunatic data-cleaning framework. PVLDB 6(9), 625–636 (2013)
Google Scholar
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Towards data cleansing via planning. Intelligenza Artificiale 8(1) (2014)
Google Scholar
Choi, E.H., Tsuchiya, T., Kikuno, T.: Model checking active database rules under various rule processing strategies. IPSJ Digital Courier 2, 826–839 (2006)
Article Google Scholar
Ray, I., Ray, I.: Detecting termination of active database rules using symbolic model checking. In: Caplinskas, A., Eder, J. (eds.) ADBIS 2001. LNCS, vol. 2151, pp. 266–279. Springer, Heidelberg (2001)
Chapter Google Scholar
Neven, F.: Automata theory for xml researchers. SIGMOD Rec. 31, 39–46 (2002)
Article Google Scholar
Dovier, A., Quintarelli, E.: Model-checking based data retrieval. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 62–77. Springer, Heidelberg (2002)
Chapter Google Scholar
Dovier, A., Quintarelli, E.: Applying Model-checking to solve Queries on semistructured Data. Computer Languages, Systems & Structures 35(2), 143–172 (2009)
Article Google Scholar
Afanasiev, L., Franceschet, M., Marx, M., de Rijke, M.: Ctl model checking for processing simple xpath queries. In: TIME, pp. 117–124 (2004)
Google Scholar
Herbert, K.G., Gehani, N.H., Piel, W.H., Wang, J.T., Wu, C.H.: Bio-ajax: an extensible framework for biological data cleaning. ACM SIGMOD Record 33(2), 51–57 (2004)
Article Google Scholar
Chen, J.Y., Carlis, J.V., Gao, N.: A complex biological database querying method. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 110–114. ACM, New York (2005)
Google Scholar
Apiletti, D., Bruno, G., Ficarra, E., Baralis, E.: Data cleaning and semantic improvement in biological databases. Journal of Integrative Bioinformatics 3(2), 1–11 (2006)
Google Scholar
Chellamuthu, S., Punithavalli, D.M.: Detecting redundancy in biological databases? an efficient approach. Global Journal of Computer Science and Technology 9(4) (2009)
Google Scholar
Shui, W.M., Wong, R.K.: Application of xml schema and active rules system in management and integration of heterogeneous biological data. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 367–374. IEEE (2003)
Google Scholar
Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)
Chapter Google Scholar
Parsaye, K., Chignell, M.: Intelligent Database Tools and Applications: Hyperinformation access, data quality, visualization, automatic discovery. John Wiley (1993)
Google Scholar
Simonic, K.-M., Holzinger, A., Bloice, M., Hermann, J.: Optimizing long-term treatment of rheumatoid arthritis with systematic documentation. In: International Conference on Pervasive Computing Technologies for Healthcare, PervasiveHealth, pp. 550–554. IEEE (2011)
Google Scholar
Holzinger, A., Zupan, M.: Knodwat: A scientific framework application for testing knowledge discovery methods for the biomedical domain. BMC Bioinformatics 14, 191 (2013)
Article Google Scholar
Holzinger, A.: On knowledge discovery and interactive intelligent visualization of biomedical data - challenges in human-computer interaction & biomedical informatics. In: Helfert, M., Francalanci, C., Filipe, J. (eds.) DATA. SciTePress (2012)
Google Scholar
Holzinger, A.: Weakly structured data in health-informatics: the challenge for human-computer-interaction. In: Proceedings of INTERACT 2011 Workshop: Promoting and Supporting Healthy Living by Desing, IFIP, pp. 5–7 (2011)
Google Scholar
Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)
Chapter Google Scholar
Lovaglio, P.G., Mezzanzanica, M.: Classification of longitudinal career paths. Quality & Quantity 47(2), 989–1008 (2013)
Article Google Scholar
Hansen, P., Järvelin, K.: Collaborative information retrieval in an information-intensive domain. Information Processing & Management 41(5), 1101–1119 (2005)
Article Google Scholar
Prinzie, A., Van den Poel, D.: Modeling complex longitudinal consumer behavior with dynamic bayesian networks: an acquisition pattern analysis application. Journal of Intelligent Information Systems 36(3), 283–304 (2011)
Article Google Scholar
Devaraj, S., Kohli, R.: Information technology payoff in the health-care industry: a longitudinal study. Journal of Management Information Systems 16(4), 41–68 (2000)
Article Google Scholar
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Data quality through model checking techniques. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 270–281. Springer, Heidelberg (2011)
Chapter Google Scholar
Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F.: UPMurphi: a tool for universal planning on PDDL+ problems. In: ICAPS, pp. 106–113. AAAI Press (2009)
Google Scholar
Fox, M., Long, D., Magazzeni, D.: Plan-based policies for efficient multiple battery load management. J. Artif. Intell. Res. (JAIR) 44, 335–382 (2012)
MATH Google Scholar
Fox, M., Long, D., Magazzeni, D.: Automatic construction of efficient multiple battery usage policies. In: Walsh, T. (ed.) IJCAI, IJCAI/AAAI, pp. 2620–2625 (2011)
Google Scholar
Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F., Tronci, E.: Cost-optimal strong planning in non-deterministic domains. In: Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO), pp. 56–66. SciTePress (2011)
Google Scholar
Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F.: A PDDL+ benchmark problem: The batch chemical plant. In: Proceedings of ICAPS 2010, pp. 222–224. AAAI Press (2010)
Google Scholar
Della Penna, G., Magazzeni, D., Mercorio, F.: A universal planning system for hybrid domains. Applied Intelligence 36(4), 932–959 (2012)
Article Google Scholar
Della Penna, G., Intrigila, B., Magazzeni, D., Melatti, I., Tronci, E.: Cgmurphi: Automatic synthesis of numerical controllers for nonlinear hybrid systems. European Journal of Control (2013)
Google Scholar
Mercorio, F.: Model checking for universal planning in deterministic and non-deterministic domains. AI Communications 26(2), 257–259 (2013)
Google Scholar
Boselli, R., Mezzanzanica, M., Cesarini, M., Mercorio, F.: Planning meets data cleansing. In: 24th International Conference on Automated Planning and Scheduling, ICAPS (2014)
Google Scholar
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Automatic synthesis of data cleansing activities. In: DATA 2013 - Proceedings of the International Conference on Data Technologies and Applications. SciTePress (2013)
Google Scholar
Jurafsky, D., James, H.: Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech. Pearson Education (2000)
Google Scholar
Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1(2), 69–91 (1985)
Article MathSciNet MATH Google Scholar
Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. ACM SIGMOD Record 37(3), 26–32 (2008)
Article Google Scholar
Blake, R., Mangiameli, P.: The effects and interactions of data quality and problem complexity on classification. J. Data and Information Quality 2(2), 8:1–8:28 (2011)
Article Google Scholar
Morton, K., Balazinska, M., Grossman, D., Mackinlay, J.: Support the data enthusiast: Challenges for next-generation data-analysis systems. Proceedings of the VLDB Endowment 7(6) (2014)
Google Scholar
Hanrahan, P.: Analytic database technologies for a new kind of user: the data enthusiast. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 577–578. ACM (2012)
Google Scholar
Holzinger, A.: Human-computer interaction and knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)
Chapter Google Scholar
Pasi, G., Bordogna, G., Jain, L.C. (eds.): Qual. Issues in the Management of Web Information. ISRL, vol. 50. Springer, Heidelberg (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano Bicocca, Milan, Italy
Roberto Boselli, Mirko Cesarini & Mario Mezzanzanica
CRISP Research Centre, University of Milano Bicocca, Milan, Italy
Roberto Boselli, Mirko Cesarini, Fabio Mercorio & Mario Mezzanzanica

Authors

Roberto Boselli
View author publications
You can also search for this author in PubMed Google Scholar
Mirko Cesarini
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Mercorio
View author publications
You can also search for this author in PubMed Google Scholar
Mario Mezzanzanica
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Unit Human-Computer Interaction, Austrian IBM Watson Think Gruop, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerplatz 2/V, 8036, Graz, Austria
Andreas Holzinger
IBM Life Sciences Discovery Centre, TECHNA for the Advancement of Technology for Health, Princess Margaret Cancer Centre, University Health Network, TMDT Room 11-314, 101 College Street, M5G 1L7, Toronto, ON, Canada
Igor Jurisica

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-662-43968-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43967-8
Online ISBN: 978-3-662-43968-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics