Skip to main content

Graph-Based Data Deduplication in Mobile Edge Computing Environment

  • Conference paper
  • First Online:
Service-Oriented Computing (ICSOC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 13121))

Included in the following conference series:

Abstract

Mobile edge computing (MEC) extends cloud computing by deploying edge servers with computing and storage resources at base stations within users’ geographic proximity. The networked edge servers in an area constitute an edge storage system (ESS), where edge servers cooperate to provide services for the users in the area. However, the potential of ESSs is challenged by edge servers’ constrained storage resources due to their limited physical sizes. A straightforward method to tackle this challenge is to reduce data redundancy in the ESS. The unique characteristics and constraints in the MEC environment, e.g., edge servers’ geographic coverage and distribution, render conventional data deduplication techniques designed for cloud storage systems obsolete. In this paper, we make the first attempt to study this novel Edge Data Deduplication (EDDE) problem. First, we model it as a constrained optimization problem with the aim to maximize data deduplication ratio under latency constraint by taking advantage of the collaboration between edge servers. Then, we prove that the EDDE problem is \(\mathcal {NP}\)-hard and propose an approach named EDDE-O for solving the EDDE problem optimally based on integer programming. To accommodate large-scale EDDE scenarios, we propose a \(ln\alpha +1\)-approximation algorithm, namely EDDE-A, to find sub-optimal EDDE solutions efficiently. The results of extensive experiments conducted on a widely-used dataset demonstrate that EDDE-O and EDDE-A can solve the EDDE problem effectively and efficiently, outperforming four representative approaches significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Multiple data can be deduplicated individually and independently.

  2. 2.

    The edge server covering a user will retrieve a data from other edge servers if it does not have the data requested by the user. Thus, we refer to edge servers instead of users here for ease of exposition.

  3. 3.

    https://www.ibm.com/analytics/cplex-optimizer.

  4. 4.

    https://www.gurobi.com/products/gurobi-optimizer/.

  5. 5.

    https://github.com/swinedge/eua-dataset.

References

  1. https://docs.aws.amazon.com/fsx/latest/windowsguide/using-data-dedup.html

  2. https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/overview

  3. Chudak, F.A., Shmoys, D.B.: Improved approximation algorithms for the uncapacitated facility location problem. SIAM J. Comput. 33(1), 1–25 (2003)

    Article  MathSciNet  Google Scholar 

  4. Dubnicki, C., et al.: Hydrastor: a scalable secondary storage. In: Proceedings of 7th USENIX Conference on File and Storage Technologies, vol. 9, pp. 197–210 (2009)

    Google Scholar 

  5. He, Q., et al.: A game-theoretical approach for user allocation in edge computing environment. IEEE Trans. Parallel Distrib. Syst. 31(3), 515–529 (2019)

    Article  Google Scholar 

  6. He, Q., et al.: A game-theoretical approach for mitigating edge DDoS attack. IEEE Trans. Dependable Secure Comput. 1 (2021). https://doi.org/10.1109/TDSC.2021.3055559

  7. Lai, P., et al.: Optimal edge user allocation in edge computing with variable sized vector bin packing. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 230–245. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_15

    Chapter  Google Scholar 

  8. Li, S., Lan, T.: Hotdedup: managing hot data storage at network edge through optimal distributed deduplication. In: Proceedings of 39th IEEE Conference on Computer Communications, pp. 247–256 (2020)

    Google Scholar 

  9. Li, S., Lan, T., Balasubramanian, B., Ra, M.R., Lee, H.W., Panta, R.: Ef-dedup: enabling collaborative data deduplication at the network edge. In: Proceedings of 39th IEEE International Conference on Distributed Computing Systems, pp. 986–996. IEEE (2019)

    Google Scholar 

  10. Li, T., Braud, T., Li, Y., Hui, P.: Lifecycle-aware online video caching. IEEE Trans. Mob. Comput. 20, 2624–2636 (2020)

    Article  Google Scholar 

  11. Meister, D., Kaiser, J., Brinkmann, A., Cortes, T., Kuhn, M., Kunkel, J.: A study on data deduplication in HPC storage systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012)

    Google Scholar 

  12. Shinkuma, R., Nishio, T., Inagaki, Y., Oki, E.: Data assessment and prioritization in mobile networks for real-time prediction of spatial information using machine learning. EURASIP J. Wirel. Commun. Netw. 2020(1), 1–19 (2020). https://doi.org/10.1186/s13638-020-01709-1

    Article  Google Scholar 

  13. Xia, X., et al.: Budgeted data caching based on k-median in mobile edge computing. In: Proceedings of 27th IEEE International Conference on Web Services, pp. 197–206. IEEE (2020)

    Google Scholar 

  14. Xia, X., Chen, F., Grundy, J., Abdelrazek, M., Jin, H., He, Q.: Constrained app data caching over edge server graphs in edge computing environment. IEEE Trans. Serv. Comput. 1 (2021). https://doi.org/10.1109/TSC.2021.3062017

  15. Xia, X., et al.: Graph-based optimal data caching in edge computing. In: Proceedings of 17th International Conference on Service-Oriented Computing, pp. 477–493 (2019)

    Google Scholar 

  16. Xia, X., Chen, F., He, Q., Grundy, J.C., Abdelrazek, M., Jin, H.: Cost-effective app data distribution in edge computing. IEEE Trans. Parallel Distrib. Syst. 32(1), 31–44 (2020)

    Article  Google Scholar 

  17. Yan, H., Li, X., Wang, Y., Jia, C.: Centralized duplicate removal video storage system with privacy preservation in IoT. Sensors 18(6), 1814 (2018)

    Article  Google Scholar 

  18. Yan, Z., Jiang, H., Tan, Y., Skelton, S., Luo, H.: Z-dedup: a case for deduplicating compressed contents in cloud. In: Proceedings of 33rd IEEE International Parallel and Distributed Processing Symposium, pp. 386–395 (2019)

    Google Scholar 

  19. Yuan, L., et al.: Coopedge: a decentralized blockchain-based platform for cooperative edge computing. In: Proceedings of the 30th Web Conference (2021)

    Google Scholar 

  20. Zhang, Y., Wu, Y., Yang, G.: Droplet: a distributed solution of data deduplication. In: Proceedings of 13th ACM/IEEE International Conference on Grid Computing, pp. 114–121 (2012)

    Google Scholar 

Download references

Acknowledgement

We thank the anonymous reviewers for their helpful feedback. This work is supported by National Science Foundation of China under grant No.62032008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, R., Jin, H., He, Q., Wu, S., Zeng, Z., Xia, X. (2021). Graph-Based Data Deduplication in Mobile Edge Computing Environment. In: Hacid, H., Kao, O., Mecella, M., Moha, N., Paik, Hy. (eds) Service-Oriented Computing. ICSOC 2021. Lecture Notes in Computer Science(), vol 13121. Springer, Cham. https://doi.org/10.1007/978-3-030-91431-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91431-8_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91430-1

  • Online ISBN: 978-3-030-91431-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics