Abstract
The provision of high-quality food data presents challenges for developers of health apps. There are no standardized data sources with information on all food products available in Europe. Commercial data sources are expensive and do not allow long-term storage, whereas open data sources from communities often contain inconsistent, duplicate, and incomplete data. In this thesis, methods are presented to load data from multiple sources via extract, transform, and load process into a central food data warehouse and to improve the data quality. Data profiling is used to detect inconsistencies and duplicates. With the help of machine learning methods and ontologies, data is completed and checked for plausibility using similar datasets. Via a specific API, an usage context can send to the central food data warehouse together with the search word to be queried. The API send a response with the food data results which were checked based on the context and provides further information as to whether the quality of the result data is sufficient in the respective context. All developed methods are tested using linear sampled test data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Muenzberg A, Sauer J, Laemmel S, Teichmann S, Hein A, Roesch N (2019) Optimization and merging of food product data and food composition databases for medical use. In: European academy of allergy & clinical immunology (EAACI) Congress, Lisbon
Roesch N, Muenzberg A, Sauer J, Arens-Volland A, Laemmel S, Teichmann S, Eichelberg M, Hein A (2019) Digital supported diagnostics in food allergy by analyzing app-based diaries. In: European academy of allergy & clinical immunology (EAACI) Congress Lisbon
Dig D, Johnson R (2006) How do APIs evolve? A story of refactoring. J Softw Maint Evol Res Pract. (John Wiley & Sons, Ltd.)
Muenzberg A, Sauer J, Hein A, Roesch N (2018) The use of ETL and data profiling to integrate data and improve quality in food databases. In: 14th international conference on wireless and mobile computing, networking and communications (WiMob 2018), Limassol, pp 231–238
Neuleben I (2020) Dokumentationspflicht und Aufbewahrungsfristen. Kassenärztliche Vereinigung Nordrhein. Düsseldorf, Deutschland: KVNO unterwegs, https://www.kvno.de/10praxis/30honorarundrecht/30recht/20dokupflicht/15_05_aufbewahrungsfristen/index.html. Accessed 12 July 2020
Elfert P et al (2017) DiDiER-digitized services in dietary counselling for people with increased health risks related to malnutrition and food allergies. In: Computers and communications (ISCC), Heraklion, Greece, pp 100–104
Muenzberg A, Sauer J, Hein A, Roesch N (2020) Intelligent combination of food composition databases and food product databases for use in health applications. In: O’Hare G, O’Grady M, O’Donoghue J, Henn P (eds) Wireless mobile communication and healthcare. MobiHealth 2019. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 320. Springer, Cham
Kusumasari TF, Fitria (2016) Data profiling for data quality improvement with OpenRefine. In: IEEE international conference on information technology systems and innovation (ICITSI)
The IEEE and The Open Group, The Open Group Base Specifications Issue 6, 9. Regular Expressions, https://pubs.open-group.org/onlinepubs/009695399/basedefs/xbd_chap9.html#tag_09_03_05. Accessed 29 Oct 2020
Olson JE (2003) Data quality: the accuracy dimension, Morgan Kaufmann Publishers
Abedjan Z, Golab L, Naumann F (2016) Data profiling. In: IEEE international conference on data engineering (ICDE), pp 1432–1435
NIST, Statistical Data Engineering Division Dataplot, COSINE DISTANCE, https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/cosdist.htm. Accessed 29 Oct 2020
Snowball, https://snowballstem.org/. Accessed 29 Oct 2020
Cleve J, Laemmel U (2014) Data mining. De Gruyter, Oldenburg
Fink L (2020) Hidden treasures in our groceries. https://www.kaggle.com/allunia/hidden-treasures-in-our-groceries. Accessed 29 Oct 2020
Ng A, Soo K (2018) Data science–was ist das eigentlich?!. Springer, Berlin
Abdi H, Williams LJ (2010) Principle component analysis. In: Wiley interdisciplinary reviews: computational statistics, vol 2. In Press (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Muenzberg, A., Sauer, J., Hein, A., Roesch, N. (2022). Machine Learning and Context-Based Approaches to Get Quality Improved Food Data. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Sixth International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 236. Springer, Singapore. https://doi.org/10.1007/978-981-16-2380-6_37
Download citation
DOI: https://doi.org/10.1007/978-981-16-2380-6_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2379-0
Online ISBN: 978-981-16-2380-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)