Abstract
Large amounts of data are collected by public administrations and healthcare organizations, the integration of the data scattered in several information systems can facilitate the comprehension of complex scenarios and support the activities of decision makers.
Unfortunately, the quality of information system archives is very poor, as widely stated by the existing literature. Data cleansing is one of the most frequently used data improvement technique. Data can be cleansed in several ways, the optimal choice however is strictly dependent on the integration and analysis processes to be performed. Therefore, the design of a data analysis process should consider in a holistic way the data integration, cleansing, and analysis activities. However, in the existing literature, the data integration and cleansing issues have been mostly addressed in isolation.
In this paper we describe how a model based cleansing framework is extended to address also integration activities. The combined approach facilitates the rapid prototyping, development, and evaluation of data pre-processing activities. Furthermore, the combined use of formal methods and visualization techniques strongly empower the data analyst which can effectively evaluate how cleansing and integration activities can affect the data analysis. An example focusing on labour and healthcare data integration is showed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel. ACM SIGKDD Explorations Newsletter 5(2), 191–196 (2003)
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Data quality sensitivity analysis on aggregate indicators. In: International Conference on Data Technologies and Applications (DATA), pp. 97–108. SciTePress (2012)
Tee, S., Bowen, P., Doyle, P., Rohde, F.: Data quality initiatives: Striving for continuous improvements. International Journal of Information Quality 1(4), 347–367 (2007)
Redman, T.C.: The impact of poor data quality on the typical enterprise. Commun. ACM 41, 79–82 (1998)
Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Data-Centric Systems and Applications. Springer (2006)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: The kdd process for extracting useful knowledge from volumes of data. Communications of the ACM 39(11), 27–34 (1996)
Sadiq, S.: Handbook of Data Quality. Springer (2013)
Fisher, C., Lauría, E., Chengalur-Smith, S., Wang, R.: Introduction to information quality. AuthorHouse (2012)
Holzinger, A., Yildirim, P., Geier, M., Simonic, K.M.: Quality-based knowledge discovery from medical text on the web. In: [142], pp. 145–158
Pasi, G., Bordogna, G., Jain, L.C.: An introduction to quality issues in the management of web information. In: [142], pp. 1–3
Herrera-Viedma, E., Peis, E.: Evaluating the informative quality of documents in sgml format from judgements by means of fuzzy linguistic techniques based on computing with words. Information Processing & Management 39(2), 233–249 (2003)
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Inconsistency knowledge discovery for longitudinal data management: A model-based approach. In: Holzinger, A., Pasi, G. (eds.) HCI-KDD 2013. LNCS, vol. 7947, pp. 183–194. Springer, Heidelberg (2013)
Widom, J., Ceri, S.: Active database systems: Triggers and rules for advanced database processing. Morgan Kaufmann (1996)
Xu, H., Jin, Y.: Biorl: An xml-based active rule language for biological database constraint management. In: International Conference on BioMedical Engineering and Informatics, BMEI 2008, vol. 1, pp. 883–887. IEEE (2008)
Calvanese, D., De Giacomo, G., Montali, M.: Foundations of data-aware process analysis: A database theory perspective. In: Proceedings of the 32nd Symposium on Principles of Database Systems, PODS 2013, pp. 1–12. ACM, New York (2013)
Shapiro, S.C.: Artificial Intelligence. In: Encyclopedia of Artificial Intelligence, vol. 2, John Wiley & Sons, Inc., New York (1992)
Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
Morgan, T.: Business Rules and Information Systems: Aligning IT with Business Goals. Pearson Education (2002)
Han, J., Kamber, M., Pei, J.: Data mining: Concepts and techniques. Morgan kaufmann (2006)
Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: Resolving schema, tuple, and value inconsistencies. IEEE Data Eng. Bull. 29(2), 21–31 (2006)
Arens, Y., Chee, C.Y., Hsu, C.N., Knoblock, C.A.: Retrieving and integrating data from multiple information sources. International Journal of Intelligent and Cooperative Information Systems 2(02), 127–158 (1993)
Lenzerini, M.: Data integration: A theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACM (2002)
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)
Hull, R.: Managing semantic heterogeneity in databases: a theoretical prospective. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 51–61. ACM (1997)
Ullman, J.D.: Information integration using logical views. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 19–40. Springer, Heidelberg (1996)
Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–9. ACM (2006)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41, 58–65 (1998)
Scannapieco, M., Missier, P., Batini, C.: Data Quality at a Glance. Datenbank-Spektrum 14, 6–14 (2005)
Wang, Y.R., Madnick, S.E.: The inter-database instance identification problem in integrating autonomous systems. In: Proceedings of the Fifth International Conference on Data Engineering, pp. 46–55. IEEE (1989)
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for etl processes. In: Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP, DOLAP 2002, pp. 14–21. ACM, New York (2002)
Codd, E.F.: Further normalization of the data base relational model. Data Base Systems 6, 33–64 (1972)
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: IEEE 23rd International Conference on Data Engineering (ICDE 2007), pp. 746–755. IEEE (2007)
Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering 23(5), 683–698 (2011)
Wiederhold, G.: Mediators in the architecture of future information systems. Computer 25(3), 38–49 (1992)
Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The tsimmis approach to mediation: Data models and languages. Journal of intelligent information systems 8(2), 117–132 (1997)
Welty, C.: Guest editorial: Ontology research. AI Mag. 24(3), 11–12 (2003)
Noy, N.F.: Semantic integration: a survey of ontology-based approaches. ACM Sigmod Record 33(4), 65–70 (2004)
Batini, C., Lenzerini, M., Navathe, S.B.: A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18, 323–364 (1986)
Miller, R.J., Haas, L.M., Hernández, M.A.: Schema mapping as query discovery. In: VLDB, vol. 2000, pp. 77–88 (2000)
Bouzeghoub, M., Lenzerini, M.: Introduction to: data extraction, cleaning, and reconciliation a special issue of information systems, an international journal. Information Systems 26(8), 535–536 (2001)
Fox, C., Levitin, A., Redman, T.: The notion of data and its quality dimensions. Information Processing & Management 30(1), 9–19 (1994)
Levitin, A., Redman, T.: Quality dimensions of a conceptual view. Information Processing & Management 31(1), 81–88 (1995)
Ballou, D.P., Tayi, G.K.: Enhancing data quality in data warehouse environments. Communications of the ACM 42(1), 73–78 (1999)
Hipp, J., Güntzer, U., Grimmer, U.: Data quality mining-making a virute of necessity. In: DMKD (2001)
Haug, A., Zachariassen, F., Van Liempd, D.: The costs of poor data quality. Journal of Industrial Engineering and Management 4(2), 168–193 (2011)
Dasu, T.: Data glitches: Monsters in your data. In: Handbook of Data Quality, pp. 163–178. Springer (2013)
Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine 34(2), 113–127 (2005)
Lavrač, N.: Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1), 3–23 (1999); Data Mining Techniques and Applications in Medicine
Kriegel, H.P., Borgwardt, K.M., Kröger, P., Pryakhin, A., Schubert, M., Zimek, A.: Future trends in data mining. Data Mining and Knowledge Discovery 15(1), 87–97 (2007)
Yan, X., Zhang, C., Zhang, S.: Toward databases mining: Pre-processing collected data. Applied Artificial Intelligence 17(5-6), 545–561 (2003)
Espinosa, R., Zubcoff, J., Mazón, J.-N.: A set of experiments to consider data quality criteria in classification techniques for data mining. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part II. LNCS, vol. 6783, pp. 680–694. Springer, Heidelberg (2011)
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence 17(5-6), 375–381 (2003)
Rajagopalan, B., Isken, M.W.: Exploiting data preparation to enhance mining and knowledge discovery. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31(4), 460–467 (2001)
Zhu, X., Wu, X.: Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review 22(3), 177–210 (2004)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for dna microarrays. Bioinformatics 17(6), 520–525 (2001)
Halevy, A.Y.: Data integration: A status report. In: Proc. BTW 2003 (2003)
Haas, L.: Beauty and the beast: The theory and practice of information integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)
Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: An interactive tutorial. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1213–1216. ACM, New York (2011)
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)
Haas, L.M., Hentschel, M., Kossmann, D., Miller, R.J.: Schema AND data: A holistic approach to mapping, resolution and fusion in information integration. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 27–40. Springer, Heidelberg (2009)
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys (CSUR) 41(1), 1 (2008)
Sattler, K.U., Conrad, S., Saake, G.: Adding conflict resolution features to a query language for database federations. In: Roantree, M., Hasselbring, W., Conrad, S. (eds.) International Workshop on Engineering Federated Information Systems (EFIS), pp. 41–52 (2000)
Schallehn, H., Saltler, K.U.: Using similarity-based operations for resolving data-level conflicts. In: James, A., Younas, M., Lings, B. (eds.) BNCOD 2003. LNCS, vol. 2712, pp. 172–189. Springer, Heidelberg (2003)
Schallehn, E., Sattler, K.U., Saake, G.: Efficient similarity-based operations for data integration. Data & Knowledge Engineering 48(3), 361–387 (2004)
Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: A new abstraction for information management. SIGMOD Rec. 34(4), 27–33 (2005)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys (CSUR) 38(2), 6 (2006)
Meng, W., Yu, C., Liu, K.L.: Building efficient and effective metasearch engines. ACM Computing Surveys (CSUR) 34(1), 48–89 (2002)
Uschold, M., Gruninger, M.: Ontologies and semantics for seamless connectivity. SIGMOD Rec. 33(4), 58–64 (2004)
Calvanese, D., De Giacomo, G.: Data integration: A logic-based perspective. AI Magazine 26(1), 59 (2005)
Abello, J., Pardalos, P.M., Resende, M.G.: Handbook of massive data sets, vol. 4. Springer (2002)
Mayfield, C., Neville, J., Prabhakar, S.: Eracer: a database approach for statistical inference and data cleaning. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 75–86. ACM (2010)
Winkler, W.E.: Editing discrete data. Bureau of the Census (1997)
Fellegi, I., Holt, D.: A systematic approach to automatic edit and inputation. Journal of the American Statistical Association 71(353), 17–35 (1976)
Winkler, W.E.: Machine learning, information retrieval and record linkage. In: Proc. Section on Survey Research Methods, American Statistical Association, pp. 20–29 (2000)
Fellegi, I., Sunter, A.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Newcombe, H.B., Kennedy, J.M.: Record linkage: making maximum use of the discriminating power of identifying information. Communications of the ACM 5(11), 563–566 (1962)
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Winkler, W.: Methods for evaluating and creating data quality. Information Systems 29(7), 531–550 (2004)
Holzinger, A., Bruschi, M., Eder, W.: On interactive data visualization of physiological low-cost-sensor data with focus on mental stress. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 469–480. Springer, Heidelberg (2013)
Ferreira de Oliveira, M.C., Levkowitz, H.: From visual data exploration to visual data mining: A survey. IEEE Trans. Vis. Comput. Graph. 9(3), 378–394 (2003)
Clemente, P., Kaba, B., Rouzaud-Cornabas, J., Alexandre, M., Aujay, G.: SPTrack: Visual analysis of information flows within sELinux policies and attack logs. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds.) AMT 2012. LNCS, vol. 7669, pp. 596–605. Springer, Heidelberg (2012)
Bertossi, L.: Consistent query answering in databases. ACM Sigmod Record 35(2), 68–76 (2006)
Chomicki, J., Marcinkowski, J.: On the computational complexity of minimal-change integrity maintenance in relational databases. In: Bertossi, L., Hunter, A., Schaub, T. (eds.) Inconsistency Tolerance. LNCS, vol. 3300, pp. 119–150. Springer, Heidelberg (2005)
Yu, L., Wang, S., Lai, K.K.: An integrated data preparation scheme for neural network data analysis. IEEE Transactions on Knowledge and Data Engineering 18(2), 217–230 (2006)
Wang, H., Wang, S.: Discovering patterns of missing data in survey databases: an application of rough sets. Expert Systems with Applications 36(3), 6256–6260 (2009)
Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1), 90–121 (2005)
Greco, G., Greco, S., Zumpano, E.: A logic programming approach to the integration, repairing and querying of inconsistent databases. In: Codognet, P. (ed.) ICLP 2001. LNCS, vol. 2237, pp. 348–364. Springer, Heidelberg (2001)
Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: ACM Symp. on Principles of Database Systems, pp. 68–79. ACM Press (1999)
Yakout, M., Berti-Équille, L., Elmagarmid, A.K.: Don’t be scared: use scalable automatic repairing with maximal likelihood and bounded changes. In: Proceedings of the 2013 International Conference on Management of Data, pp. 553–564. ACM (2013)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 315–326. VLDB Endowment (2007)
Kolahi, S., Lakshmanan, L.V.: On approximating optimum repairs for functional dependency violations. In: Proceedings of the 12th International Conference on Database Theory, pp. 53–62. ACM (2009)
Vardi, M.: Fundamentals of dependency theory. In: Trends in Theoretical Computer Science, pp. 171–224 (1987)
Chomicki, J.: Efficient checking of temporal integrity constraints using bounded history encoding. ACM Transactions on Database Systems (TODS) 20(2), 149–186 (1995)
Fan, W.: Dependencies revisited for improving data quality. In: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–170 (2008)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proceedings of the VLDB Endowment 3(1-2), 173–184 (2010)
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Tang, N.: Nadeef: a commodity data cleaning system. In: Ross, K.A., Srivastava, D., Papadias, D. (eds.) SIGMOD Conference, pp. 541–552. ACM (2013)
Geerts, F., Mecca, G., Papotti, P., Santoro, D.: The llunatic data-cleaning framework. PVLDB 6(9), 625–636 (2013)
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M.: Towards data cleansing via planning. Intelligenza Artificiale 8(1) (2014)
Choi, E.H., Tsuchiya, T., Kikuno, T.: Model checking active database rules under various rule processing strategies. IPSJ Digital Courier 2, 826–839 (2006)
Ray, I., Ray, I.: Detecting termination of active database rules using symbolic model checking. In: Caplinskas, A., Eder, J. (eds.) ADBIS 2001. LNCS, vol. 2151, pp. 266–279. Springer, Heidelberg (2001)
Neven, F.: Automata theory for xml researchers. SIGMOD Rec. 31, 39–46 (2002)
Dovier, A., Quintarelli, E.: Model-checking based data retrieval. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 62–77. Springer, Heidelberg (2002)
Dovier, A., Quintarelli, E.: Applying Model-checking to solve Queries on semistructured Data. Computer Languages, Systems & Structures 35(2), 143–172 (2009)
Afanasiev, L., Franceschet, M., Marx, M., de Rijke, M.: Ctl model checking for processing simple xpath queries. In: TIME, pp. 117–124 (2004)
Herbert, K.G., Gehani, N.H., Piel, W.H., Wang, J.T., Wu, C.H.: Bio-ajax: an extensible framework for biological data cleaning. ACM SIGMOD Record 33(2), 51–57 (2004)
Chen, J.Y., Carlis, J.V., Gao, N.: A complex biological database querying method. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 110–114. ACM, New York (2005)
Apiletti, D., Bruno, G., Ficarra, E., Baralis, E.: Data cleaning and semantic improvement in biological databases. Journal of Integrative Bioinformatics 3(2), 1–11 (2006)
Chellamuthu, S., Punithavalli, D.M.: Detecting redundancy in biological databases? an efficient approach. Global Journal of Computer Science and Technology 9(4) (2009)
Shui, W.M., Wong, R.K.: Application of xml schema and active rules system in management and integration of heterogeneous biological data. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 367–374. IEEE (2003)
Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)
Parsaye, K., Chignell, M.: Intelligent Database Tools and Applications: Hyperinformation access, data quality, visualization, automatic discovery. John Wiley (1993)
Simonic, K.-M., Holzinger, A., Bloice, M., Hermann, J.: Optimizing long-term treatment of rheumatoid arthritis with systematic documentation. In: International Conference on Pervasive Computing Technologies for Healthcare, PervasiveHealth, pp. 550–554. IEEE (2011)
Holzinger, A., Zupan, M.: Knodwat: A scientific framework application for testing knowledge discovery methods for the biomedical domain. BMC Bioinformatics 14, 191 (2013)
Holzinger, A.: On knowledge discovery and interactive intelligent visualization of biomedical data - challenges in human-computer interaction & biomedical informatics. In: Helfert, M., Francalanci, C., Filipe, J. (eds.) DATA. SciTePress (2012)
Holzinger, A.: Weakly structured data in health-informatics: the challenge for human-computer-interaction. In: Proceedings of INTERACT 2011 Workshop: Promoting and Supporting Healthy Living by Desing, IFIP, pp. 5–7 (2011)
Wong, B.L.W., Xu, K., Holzinger, A.: Interactive visualization for information analysis in medical diagnosis. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 109–120. Springer, Heidelberg (2011)
Lovaglio, P.G., Mezzanzanica, M.: Classification of longitudinal career paths. Quality & Quantity 47(2), 989–1008 (2013)
Hansen, P., Järvelin, K.: Collaborative information retrieval in an information-intensive domain. Information Processing & Management 41(5), 1101–1119 (2005)
Prinzie, A., Van den Poel, D.: Modeling complex longitudinal consumer behavior with dynamic bayesian networks: an acquisition pattern analysis application. Journal of Intelligent Information Systems 36(3), 283–304 (2011)
Devaraj, S., Kohli, R.: Information technology payoff in the health-care industry: a longitudinal study. Journal of Management Information Systems 16(4), 41–68 (2000)
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Data quality through model checking techniques. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 270–281. Springer, Heidelberg (2011)
Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F.: UPMurphi: a tool for universal planning on PDDL+ problems. In: ICAPS, pp. 106–113. AAAI Press (2009)
Fox, M., Long, D., Magazzeni, D.: Plan-based policies for efficient multiple battery load management. J. Artif. Intell. Res. (JAIR) 44, 335–382 (2012)
Fox, M., Long, D., Magazzeni, D.: Automatic construction of efficient multiple battery usage policies. In: Walsh, T. (ed.) IJCAI, IJCAI/AAAI, pp. 2620–2625 (2011)
Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F., Tronci, E.: Cost-optimal strong planning in non-deterministic domains. In: Proceedings of the 8th International Conference on Informatics in Control, Automation and Robotics (ICINCO), pp. 56–66. SciTePress (2011)
Della Penna, G., Intrigila, B., Magazzeni, D., Mercorio, F.: A PDDL+ benchmark problem: The batch chemical plant. In: Proceedings of ICAPS 2010, pp. 222–224. AAAI Press (2010)
Della Penna, G., Magazzeni, D., Mercorio, F.: A universal planning system for hybrid domains. Applied Intelligence 36(4), 932–959 (2012)
Della Penna, G., Intrigila, B., Magazzeni, D., Melatti, I., Tronci, E.: Cgmurphi: Automatic synthesis of numerical controllers for nonlinear hybrid systems. European Journal of Control (2013)
Mercorio, F.: Model checking for universal planning in deterministic and non-deterministic domains. AI Communications 26(2), 257–259 (2013)
Boselli, R., Mezzanzanica, M., Cesarini, M., Mercorio, F.: Planning meets data cleansing. In: 24th International Conference on Automated Planning and Scheduling, ICAPS (2014)
Mezzanzanica, M., Boselli, R., Cesarini, M., Mercorio, F.: Automatic synthesis of data cleansing activities. In: DATA 2013 - Proceedings of the International Conference on Data Technologies and Applications. SciTePress (2013)
Jurafsky, D., James, H.: Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech. Pearson Education (2000)
Inselberg, A.: The plane with parallel coordinates. The Visual Computer 1(2), 69–91 (1985)
Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. ACM SIGMOD Record 37(3), 26–32 (2008)
Blake, R., Mangiameli, P.: The effects and interactions of data quality and problem complexity on classification. J. Data and Information Quality 2(2), 8:1–8:28 (2011)
Morton, K., Balazinska, M., Grossman, D., Mackinlay, J.: Support the data enthusiast: Challenges for next-generation data-analysis systems. Proceedings of the VLDB Endowment 7(6) (2014)
Hanrahan, P.: Analytic database technologies for a new kind of user: the data enthusiast. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 577–578. ACM (2012)
Holzinger, A.: Human-computer interaction and knowledge discovery (HCI-KDD): What is the benefit of bringing those two fields to work together? In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 319–328. Springer, Heidelberg (2013)
Pasi, G., Bordogna, G., Jain, L.C. (eds.): Qual. Issues in the Management of Web Information. ISRL, vol. 50. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M. (2014). A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data. In: Holzinger, A., Jurisica, I. (eds) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Lecture Notes in Computer Science, vol 8401. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43968-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-662-43968-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43967-8
Online ISBN: 978-3-662-43968-5
eBook Packages: Computer ScienceComputer Science (R0)