Abstract
In many application domains, particularly in healthcare, an access for individual datapoints is limited, while data aggregated in form of means and standard deviations are widely available. This limitation is a result of many factors, including privacy laws that prevent clinicians and scientists from freely sharing individual patient data, inability to share proprietary business data, and inadequate data collection methods. Consequently, it prevents the use of the traditional machine learning methods for model construction. The problem is especially important if a study involves comparisons of multiple datasets, where each is derived from different open-access publications where data are represented in an aggregated form. This chapter describes the problem of machine learning of models from aggregated data as compared to traditional learning from individual examples. It presents a method of rule induction from such data as well as an application of this method to constructing of the predictive models for diagnosing liver complications of the metabolic syndrome – one of the most common chronic diseases in humans. Other possible applications of the method are also discussed.
Keywords
- Metabolic Syndrome
- Aggregate Data
- Clinical Decision Support System
- Inductive Logic Programming
- Rule Induction
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Annas, G.J.: HIPAA Regulations — A New Era of Medical-Record Privacy? New England Journal of Medicine 348, 1486–1490 (2003)
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.: How to Explain Individual Classification Decisions. Journal of Machine Learning Research 11, 1803–1831 (2010)
Burza, P., Weeber, M.: Literature-based Discovery. Springer, Heidelberg (2008)
The Cochrane Collaboration, The Cochrane Manual 4 (2008) (updated August 14, 2008)
Davies, F., Boruch, R.: The Campbell Collaboration Does for Public Policy what Cochrane Does for Health. BMJ 323, 294–295 (2001)
De Raedt, L.: Logical and Relational Learning. Springer, Heidelberg (2008)
Diamond, C.C., Mostashari, F., Shirky, C.: Collecting And Sharing Data For Population Health: A New Paradigm. Health Affairs 28(2) (2009)
Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)
Farrington, D.P., Petrosino, A.: The Campbell Collaboration Crime and Justice Group. Annals of the American Academy of Political and Social Science 578, 35–49 (2001)
Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13, 3–54 (1999)
Getoor, L., Taskar, B. (eds.): Introduction to statistical relational learning. MIT Press, Cambridge (2007)
Gordon, M., Lindsay, R.K., Fan, W.: Literature-Based Discovery on the World Wide Web. ACM Transactions on Internet Technology 2(4), 261–275 (2002)
Higgins, J.P.T., Green, S. (eds.): Cochrane Handbook for Systematic Reviews of Interventions (2008), http://www.cochrane-handbook.org Version 5.0.0 (updated February 2008)
Hripcsak, G.: Writing Arden Syntax medical logic modules. Computers in Biol-ogy and Medicine 24(5), 331–363 (1994)
Hunter, J.E., Schmidt, F.L.: Methods of Meta-Analysis, Correcting Error and Bias in Research Findings, 2nd edn. Sage Publications Inc., Thousand Oaks (2004)
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Ap-plications. Ellis Horwood, New York (1994)
Lipsey, M.W., Wilson, D.: Practical Meta-Analysis. Sage Publications, Thousand Oaks (2000)
Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O., O’Blenis, P.: A new algorithm for reducing the workload of experts in performing systematic re-views. Journal of the American Medical Informatics Association 17(4), 446–453 (2010)
Michalski, R.S.: On the Quasi-Minimal Solution of the General Covering Prob-lem. In: Bled, Y. (ed.) Proceedings of the V International Symposium on Information Processing (FCIP 1969), vol. 3, pp. 125–128 (1969)
Michalski, R.S.: A Theory and Methodology of Inductive Learning. In: Michalski, R.S., Carbonell, T.J., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. TIOGA Publishing Co, Palo Alto (1983)
Michalski, R.S.: ATTRIBUTIONAL CALCULUS: A Logic and Representation Language for Natural Induction, Reports of the Machine Learning and Inference Laboratory, MLI 04-2, George Mason University. Fairfax, VA (2004)
Michalski, R.S., Wojtusiak, J.: Reasoning with Missing, Not-applicable and Irrelevant Meta-values in Concept Learning and Pattern Discovery, Technical Report 2005-02, Collaborative Research Center 637, University of Bremen, Germany (2005)
Michalski, R.S., Wojtusiak, J.: Semantic and Syntactic Attribute Types in AQ Learning, Reports of the Machine Learning and Inference Laboratory, MLI 07-1, George Mason University. Fairfax, VA (2007)
Michalski, R.S., Wojtusiak, J.: The Distribution Approximation Approach to Learning from Aggregated Data, Reports of the Machine Learning and Inference Laboratory, MLI 08-2, George Mason University. Fairfax, VA (2008)
Muggleton, S.H., De Raedt, L.: Inductive logic programming: Theory and me-thods. Journal of Logic Programming 19(20), 629–679 (1994)
Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62, 65–105 (2006)
Poynard, T., Ratziu, V., Charlotte, F., Messous, D., Munteanu, M., Imbert-Bismut, F., Massard, J., Bonyhay, L., Tahiri, M., Thabut, D., Cadranel, J.F., Le Bail, B., de Ledinghen, V.: LIDO Study Group, CYTOL study group, Diagnostic value of bi-ochemical markers (NashTest) for the prediction of non alcoholo steato hepatitis in patients with non-alcoholic fatty liver disease. BMC Gastroenterology 6(34) (2006)
Vens, C.: Complex aggregates in relational learning. AI Communications 21, 219–220 (2008)
Verschuuren, M., Badeyan, G., Carnicero, J., Gissler, M., Asciak, R.P., Sakkeus, L., Stenbeck, M., Devillé, W.: and For The Work Group on Confidentiality and Data Protection of the Network of Competent Authorities of the Health Information and Knowledge Strand of the EU Public Health Programme (August 2003) ; The European data protection legislation and its consequences for public health monitoring: a plea for action. European Journal of Public Health 18(6), 550–551 (2008) doi:10.1093/eurpub/ckn014
Weeber, M., Kors, J.A., Mons, B.: Online tools to support literature-based discov-ery in the life sciences. Briefings in Bioinformatics 6(3), 277–286 (2005)
Wojtusiak, J.: AQ21 User’s Guide, Reports of the Machine Learning and Infe-rence Laboratory, MLI 04-3, George Mason University. Fairfax, VA (2004)
Wojtusiak, J., Michalski, R.S., Kaufman, K., Pietrzykowski, J.: The AQ21 Natural Induction Program for Pattern Discovery: Initial Version and its Novel Features. In: Proceedings of The 18th IEEE International Conference on Tools with Artificial Intelligence, Washington D.C (2006)
Wojtusiak, J., Michalski, R.S., Simanivanh, T., Baranova, A.V.: The Natural Induction System AQ21 and Its Application to Data Describing Patients with Metabolic Syndrome: Initial Results. In: Proceedings of the International Conference on Machine Learning and Applications, Cincinnati, OH (2007)
Wojtusiak, J., Michalski, R.S., Simanivanh, T., Baranova, A.V.: Towards application of rule learning to the meta-analysis of clinical data: An example of the metabolic syndrome. International Journal of Medical In-formatics 78(12), e104–e111(2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wojtusiak, J., Baranova, A. (2011). Model Learning from Published Aggregated Data. In: Biba, M., Xhafa, F. (eds) Learning Structure and Schemas from Documents. Studies in Computational Intelligence, vol 375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22913-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-22913-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22912-1
Online ISBN: 978-3-642-22913-8
eBook Packages: EngineeringEngineering (R0)