Skip to main content

Fuzzy Data Deduplication at Edge Nodes in Connected Environments

  • Conference paper
  • First Online:
Mobile Web and Intelligent Information Systems (MobiWIS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13977))

  • 489 Accesses

Abstract

The Internet of Things (IoT) is ushering-in the era of connected environments, i.e., networks of physical objects that are embedded with sensors and softwar, connecting and exchanging data with other devices and systems. The huge amount of data produced by such systems calls for solutions to reduce the amount of data being handled and transmitted over the network. In this study, we investigate data deduplication as a prominent pre-processing method that can address such a challenge. Data deduplication techniques have been traditionally developed for data storage and data warehousing applications, and aim at identifying and eliminating redundant data items. Few recent approaches have been designed for sensor networks and connected environments, yet existing solutions mostly rely on crisp thresholds and provide minimum-to-no expert control over the deduplication process, disregarding the domain expert’s needs in defining redundancy. In this study, we propose a new approach for Fuzzy Redundancy Elimination for Data Deduplication in a connected environment. We use simple natural language rules to represent domain knowledge and expert preferences regarding data duplication boundaries. We then apply pattern codes and fuzzy reasoning to detect duplicate data items at the outer-most edge (sensor node) level of the network. This reduces the time required to hard-code the deduplication process, while adapting to the domain expert’s needs for different data sources and applications. Experiments on a real-world dataset highlight our solutions’ potential and improvement compared with existing solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Microgram Per Cubic Meter.

  2. 2.

    We adopt a three-layer architecture: i) a Web API layer that allows client-side applications to communicate with the server to request data, etc.; ii) a Business Logic layer where FREDD’s main decision making processes are implemented; and iii) a Data Access layer where data storage and retrieval take place.

  3. 3.

    http://sigappfr.acm.org/projects/fredd/

References

  1. Nižetić, S., et al.: Internet of Things (IoT): opportunities, issues and challenges towards a smart and sustainable future. J. Clean. Prod. 274, 122877 (2020)

    Article  Google Scholar 

  2. Lytras, M., et al.: Enabling technologies and business infrastructures for next generation social media: big data, cloud computing, IoT and VR. J. Univ. Comput. Sci. 21(11), 1379–1384 (2015)

    Google Scholar 

  3. VoucherCloud, The Uses of Big Data (2018). www.vouchercloud.com/resources/everyday-big-data

  4. IoT Analytics, State of IoT 2021 (2021) https://iot-analytics.com/number-connected-iot-devices/. Accssed Feb 2023

  5. Ismael, W., et al.: An in-networking double-layered data reduction for internet of things (IoT). Sensors 19(4), 795 (2019)

    Article  Google Scholar 

  6. Mansour E., et al.: Data redundancy management in connected environments. In: International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems (Q2SWinet), pp. 75–80 (2020)

    Google Scholar 

  7. Qutub B., et al.: Data Reduction in Low Powered Wireless Sensor Networks. Wireless Sensor Networks- Technology and Applications (2012). https://doi.org/10.5772/50178

  8. Li, S., et al.: EF-Dedup: enabling collaborative data deduplication at the network edge.In: IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pp. 986–996 (2019)

    Google Scholar 

  9. Patil, P., Kulkarni, U.: SVM-based data redundancy elimination for data aggregation in wireless sensor networks. In: Advances in Computing Communications Informatics (ICACCI), pp. 1309–1316 (2013)

    Google Scholar 

  10. Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24(9), 1537–1555 (2012)

    Article  Google Scholar 

  11. Malhotra, J., Bakal, J.: A survey and comparative study of data deduplication techniques. In: International Conference on Pervasive Computing (ICPC), pp. 1–5 (2015)

    Google Scholar 

  12. Bhalerao, A., Pawar, A.: A survey on data deduplication for efficiently utilizing cloud storage for big data backups. In: Trends in Electronics and Informatics (ICEI), pp. 933–938 (2017)

    Google Scholar 

  13. Ullah, A., et al.: Secure healthcare data aggregation and deduplication scheme for FoG-orineted IoT. In: IEEE International Conference on Smart Internet of Things (SmartIoT), pp. 314–319 (2019)

    Google Scholar 

  14. Chowdhury, S., Benslimane, A.: Relocating redundant sensors in randomly deployed wireless sensor networks. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2018)

    Google Scholar 

  15. Santini, S., Romer, K.: An adaptive strategy for quality-based data reduction in wireless sensor networks. In: International Conference on Networked Sensing Systems (INSS 2006), p.14407470 (2006)

    Google Scholar 

  16. Liansheng, T., Wu, M.: Data reduction in wireless sensor networks: a hierarchical LMS prediction approach. IEEE Sens. J. 16(6), 1708–1715 (2015)

    Google Scholar 

  17. Shahzad, F., et al.: Data redundancy management framework for connected environments. Comput. J. 104(7), 1565–1588 (2022)

    MathSciNet  Google Scholar 

  18. Khriji, S., et al.: Redundancy elimination for data aggregation in wireless sensor networks. In: International Multi-Conference on Systems, Signals & Devices (SSD 2018), pp. 28–33 (2018)

    Google Scholar 

  19. Salloum, G., Tekli, J.: Automated and personalized nutrition health assessment, recommendation, and progress evaluation using fuzzy reasoning. Inter. J. Hum. Comput. Stud. 151, 102610 (2021)

    Article  Google Scholar 

  20. Bouchon-Meunier, B., et al.: Compositional rule of inference as an analogical scheme. Fuzzy Sets Syst. 138(1), 53–65 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  21. Ross, T.J.: Fuzzy Logic with Engineering Applications. 4th edn, p. 580 (2016)

    Google Scholar 

  22. Cingolani, P., Alcala-Fdez, J.: jFuzzyLogic: a robust and flexible fuzzy-logic inference system language implementation. In: IEEE International Conference on Fuzzy Systems, pp. 1–8 (2012)

    Google Scholar 

  23. Cingolani, P., Alcalá-Fdez, J.: jFuzzyLogic: a java library to design fuzzy logic controllers according to the standard for fuzzy control programming. Int. J. Comput. Intell. Syst. 6(1), 61–75 (2013)

    Article  Google Scholar 

  24. Bodik P., et al.: Intel Lab Data (2019). http://db.csail.mit.edu/labdata/labdata.html. Accessed Feb 2023

  25. Hopfield, J.: The effectiveness of neural computing. In: IFIP World Computer Congress (WCC), pp. 402–409 (1989)

    Google Scholar 

  26. Zou, F., et al.: A reinforcement learning approach for dynamic multi-objective optimization. Inf. Sci. 546, 815–834 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  27. Salloum, G., Tekli, T.: Automated and personalized meal plan generation and relevance scoring using a multi-factor adaptation of the transportation problem. Soft. Comput. 26(5), 2561–2585 (2022)

    Article  Google Scholar 

  28. Abboud, R., Tekli, J.: Integration of non-parametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition. Soft. Comput. 24(13), 9875–9925 (2019)

    Article  Google Scholar 

  29. Wen, X.: Using deep learning approach and IoT architecture to build the intelligent music recommendation system. Soft. Comput. 25(4), 3087–3096 (2021)

    Article  Google Scholar 

  30. Azar, D., et al.: A combined ant colony optimization and simulated annealing algorithm to assess stability and fault-proneness of classes based on internal software quality attributes. Inter. J. AI 14, 2 (2016)

    Google Scholar 

  31. Nguyen, T.: A novel metaheuristic method based on artificial ecosystem-based optimization for optimization of network reconfiguration to reduce power loss. Soft. Comput. 25(23), 14729–14740 (2021)

    Article  Google Scholar 

  32. Yakhni, S., et al.: Using fuzzy reasoning to improve redundancy elimination for data deduplication in connected environments. Soft Comput. (2023). https://doi.org/10.1007/s00500-023-07880-z

  33. Haraty, R., El Sai, M.: Information warfare: a lightweight matrix-based approach for database recovery. Knowl. Inf. Syst. 2017 50(1), 287–313 (2017)

    Google Scholar 

  34. Haraty, R., et al.: Data damage assessment and recovery algorithm from malicious attacks in healthcare data sharing systems. Peer Peer Netw. Appl. 2016 9(5), 812–823 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joe Tekli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yakhni, S., Tekli, J., Mansour, E., Chbeir, R. (2023). Fuzzy Data Deduplication at Edge Nodes in Connected Environments. In: Younas, M., Awan, I., Grønli, TM. (eds) Mobile Web and Intelligent Information Systems. MobiWIS 2023. Lecture Notes in Computer Science, vol 13977. Springer, Cham. https://doi.org/10.1007/978-3-031-39764-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39764-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39763-9

  • Online ISBN: 978-3-031-39764-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics