Abstract
The Internet of Things (IoT) is ushering-in the era of connected environments, i.e., networks of physical objects that are embedded with sensors and softwar, connecting and exchanging data with other devices and systems. The huge amount of data produced by such systems calls for solutions to reduce the amount of data being handled and transmitted over the network. In this study, we investigate data deduplication as a prominent pre-processing method that can address such a challenge. Data deduplication techniques have been traditionally developed for data storage and data warehousing applications, and aim at identifying and eliminating redundant data items. Few recent approaches have been designed for sensor networks and connected environments, yet existing solutions mostly rely on crisp thresholds and provide minimum-to-no expert control over the deduplication process, disregarding the domain expert’s needs in defining redundancy. In this study, we propose a new approach for Fuzzy Redundancy Elimination for Data Deduplication in a connected environment. We use simple natural language rules to represent domain knowledge and expert preferences regarding data duplication boundaries. We then apply pattern codes and fuzzy reasoning to detect duplicate data items at the outer-most edge (sensor node) level of the network. This reduces the time required to hard-code the deduplication process, while adapting to the domain expert’s needs for different data sources and applications. Experiments on a real-world dataset highlight our solutions’ potential and improvement compared with existing solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Microgram Per Cubic Meter.
- 2.
We adopt a three-layer architecture: i) a Web API layer that allows client-side applications to communicate with the server to request data, etc.; ii) a Business Logic layer where FREDD’s main decision making processes are implemented; and iii) a Data Access layer where data storage and retrieval take place.
- 3.
References
Nižetić, S., et al.: Internet of Things (IoT): opportunities, issues and challenges towards a smart and sustainable future. J. Clean. Prod. 274, 122877 (2020)
Lytras, M., et al.: Enabling technologies and business infrastructures for next generation social media: big data, cloud computing, IoT and VR. J. Univ. Comput. Sci. 21(11), 1379–1384 (2015)
VoucherCloud, The Uses of Big Data (2018). www.vouchercloud.com/resources/everyday-big-data
IoT Analytics, State of IoT 2021 (2021) https://iot-analytics.com/number-connected-iot-devices/. Accssed Feb 2023
Ismael, W., et al.: An in-networking double-layered data reduction for internet of things (IoT). Sensors 19(4), 795 (2019)
Mansour E., et al.: Data redundancy management in connected environments. In: International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems (Q2SWinet), pp. 75–80 (2020)
Qutub B., et al.: Data Reduction in Low Powered Wireless Sensor Networks. Wireless Sensor Networks- Technology and Applications (2012). https://doi.org/10.5772/50178
Li, S., et al.: EF-Dedup: enabling collaborative data deduplication at the network edge.In: IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 986–996 (2019)
Patil, P., Kulkarni, U.: SVM-based data redundancy elimination for data aggregation in wireless sensor networks. In: Advances in Computing Communications Informatics (ICACCI), pp. 1309–1316 (2013)
Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24(9), 1537–1555 (2012)
Malhotra, J., Bakal, J.: A survey and comparative study of data deduplication techniques. In: International Conference on Pervasive Computing (ICPC), pp. 1–5 (2015)
Bhalerao, A., Pawar, A.: A survey on data deduplication for efficiently utilizing cloud storage for big data backups. In: Trends in Electronics and Informatics (ICEI), pp. 933–938 (2017)
Ullah, A., et al.: Secure healthcare data aggregation and deduplication scheme for FoG-orineted IoT. In: IEEE International Conference on Smart Internet of Things (SmartIoT), pp. 314–319 (2019)
Chowdhury, S., Benslimane, A.: Relocating redundant sensors in randomly deployed wireless sensor networks. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2018)
Santini, S., Romer, K.: An adaptive strategy for quality-based data reduction in wireless sensor networks. In: International Conference on Networked Sensing Systems (INSS 2006), p.14407470 (2006)
Liansheng, T., Wu, M.: Data reduction in wireless sensor networks: a hierarchical LMS prediction approach. IEEE Sens. J. 16(6), 1708–1715 (2015)
Shahzad, F., et al.: Data redundancy management framework for connected environments. Comput. J. 104(7), 1565–1588 (2022)
Khriji, S., et al.: Redundancy elimination for data aggregation in wireless sensor networks. In: International Multi-Conference on Systems, Signals & Devices (SSD 2018), pp. 28–33 (2018)
Salloum, G., Tekli, J.: Automated and personalized nutrition health assessment, recommendation, and progress evaluation using fuzzy reasoning. Inter. J. Hum. Comput. Stud. 151, 102610 (2021)
Bouchon-Meunier, B., et al.: Compositional rule of inference as an analogical scheme. Fuzzy Sets Syst. 138(1), 53–65 (2003)
Ross, T.J.: Fuzzy Logic with Engineering Applications. 4th edn, p. 580 (2016)
Cingolani, P., Alcala-Fdez, J.: jFuzzyLogic: a robust and flexible fuzzy-logic inference system language implementation. In: IEEE International Conference on Fuzzy Systems, pp. 1–8 (2012)
Cingolani, P., Alcalá-Fdez, J.: jFuzzyLogic: a java library to design fuzzy logic controllers according to the standard for fuzzy control programming. Int. J. Comput. Intell. Syst. 6(1), 61–75 (2013)
Bodik P., et al.: Intel Lab Data (2019). http://db.csail.mit.edu/labdata/labdata.html. Accessed Feb 2023
Hopfield, J.: The effectiveness of neural computing. In: IFIP World Computer Congress (WCC), pp. 402–409 (1989)
Zou, F., et al.: A reinforcement learning approach for dynamic multi-objective optimization. Inf. Sci. 546, 815–834 (2021)
Salloum, G., Tekli, T.: Automated and personalized meal plan generation and relevance scoring using a multi-factor adaptation of the transportation problem. Soft. Comput. 26(5), 2561–2585 (2022)
Abboud, R., Tekli, J.: Integration of non-parametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition. Soft. Comput. 24(13), 9875–9925 (2019)
Wen, X.: Using deep learning approach and IoT architecture to build the intelligent music recommendation system. Soft. Comput. 25(4), 3087–3096 (2021)
Azar, D., et al.: A combined ant colony optimization and simulated annealing algorithm to assess stability and fault-proneness of classes based on internal software quality attributes. Inter. J. AI 14, 2 (2016)
Nguyen, T.: A novel metaheuristic method based on artificial ecosystem-based optimization for optimization of network reconfiguration to reduce power loss. Soft. Comput. 25(23), 14729–14740 (2021)
Yakhni, S., et al.: Using fuzzy reasoning to improve redundancy elimination for data deduplication in connected environments. Soft Comput. (2023). https://doi.org/10.1007/s00500-023-07880-z
Haraty, R., El Sai, M.: Information warfare: a lightweight matrix-based approach for database recovery. Knowl. Inf. Syst. 2017 50(1), 287–313 (2017)
Haraty, R., et al.: Data damage assessment and recovery algorithm from malicious attacks in healthcare data sharing systems. Peer Peer Netw. Appl. 2016 9(5), 812–823 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yakhni, S., Tekli, J., Mansour, E., Chbeir, R. (2023). Fuzzy Data Deduplication at Edge Nodes in Connected Environments. In: Younas, M., Awan, I., Grønli, TM. (eds) Mobile Web and Intelligent Information Systems. MobiWIS 2023. Lecture Notes in Computer Science, vol 13977. Springer, Cham. https://doi.org/10.1007/978-3-031-39764-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-39764-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39763-9
Online ISBN: 978-3-031-39764-6
eBook Packages: Computer ScienceComputer Science (R0)