Abstract
Mobile edge computing (MEC) extends cloud computing by deploying edge servers with computing and storage resources at base stations within users’ geographic proximity. The networked edge servers in an area constitute an edge storage system (ESS), where edge servers cooperate to provide services for the users in the area. However, the potential of ESSs is challenged by edge servers’ constrained storage resources due to their limited physical sizes. A straightforward method to tackle this challenge is to reduce data redundancy in the ESS. The unique characteristics and constraints in the MEC environment, e.g., edge servers’ geographic coverage and distribution, render conventional data deduplication techniques designed for cloud storage systems obsolete. In this paper, we make the first attempt to study this novel Edge Data Deduplication (EDDE) problem. First, we model it as a constrained optimization problem with the aim to maximize data deduplication ratio under latency constraint by taking advantage of the collaboration between edge servers. Then, we prove that the EDDE problem is \(\mathcal {NP}\)-hard and propose an approach named EDDE-O for solving the EDDE problem optimally based on integer programming. To accommodate large-scale EDDE scenarios, we propose a \(ln\alpha +1\)-approximation algorithm, namely EDDE-A, to find sub-optimal EDDE solutions efficiently. The results of extensive experiments conducted on a widely-used dataset demonstrate that EDDE-O and EDDE-A can solve the EDDE problem effectively and efficiently, outperforming four representative approaches significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Multiple data can be deduplicated individually and independently.
- 2.
The edge server covering a user will retrieve a data from other edge servers if it does not have the data requested by the user. Thus, we refer to edge servers instead of users here for ease of exposition.
- 3.
- 4.
- 5.
References
https://docs.aws.amazon.com/fsx/latest/windowsguide/using-data-dedup.html
https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/overview
Chudak, F.A., Shmoys, D.B.: Improved approximation algorithms for the uncapacitated facility location problem. SIAM J. Comput. 33(1), 1–25 (2003)
Dubnicki, C., et al.: Hydrastor: a scalable secondary storage. In: Proceedings of 7th USENIX Conference on File and Storage Technologies, vol. 9, pp. 197–210 (2009)
He, Q., et al.: A game-theoretical approach for user allocation in edge computing environment. IEEE Trans. Parallel Distrib. Syst. 31(3), 515–529 (2019)
He, Q., et al.: A game-theoretical approach for mitigating edge DDoS attack. IEEE Trans. Dependable Secure Comput. 1 (2021). https://doi.org/10.1109/TDSC.2021.3055559
Lai, P., et al.: Optimal edge user allocation in edge computing with variable sized vector bin packing. In: Pahl, C., Vukovic, M., Yin, J., Yu, Q. (eds.) ICSOC 2018. LNCS, vol. 11236, pp. 230–245. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03596-9_15
Li, S., Lan, T.: Hotdedup: managing hot data storage at network edge through optimal distributed deduplication. In: Proceedings of 39th IEEE Conference on Computer Communications, pp. 247–256 (2020)
Li, S., Lan, T., Balasubramanian, B., Ra, M.R., Lee, H.W., Panta, R.: Ef-dedup: enabling collaborative data deduplication at the network edge. In: Proceedings of 39th IEEE International Conference on Distributed Computing Systems, pp. 986–996. IEEE (2019)
Li, T., Braud, T., Li, Y., Hui, P.: Lifecycle-aware online video caching. IEEE Trans. Mob. Comput. 20, 2624–2636 (2020)
Meister, D., Kaiser, J., Brinkmann, A., Cortes, T., Kuhn, M., Kunkel, J.: A study on data deduplication in HPC storage systems. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012)
Shinkuma, R., Nishio, T., Inagaki, Y., Oki, E.: Data assessment and prioritization in mobile networks for real-time prediction of spatial information using machine learning. EURASIP J. Wirel. Commun. Netw. 2020(1), 1–19 (2020). https://doi.org/10.1186/s13638-020-01709-1
Xia, X., et al.: Budgeted data caching based on k-median in mobile edge computing. In: Proceedings of 27th IEEE International Conference on Web Services, pp. 197–206. IEEE (2020)
Xia, X., Chen, F., Grundy, J., Abdelrazek, M., Jin, H., He, Q.: Constrained app data caching over edge server graphs in edge computing environment. IEEE Trans. Serv. Comput. 1 (2021). https://doi.org/10.1109/TSC.2021.3062017
Xia, X., et al.: Graph-based optimal data caching in edge computing. In: Proceedings of 17th International Conference on Service-Oriented Computing, pp. 477–493 (2019)
Xia, X., Chen, F., He, Q., Grundy, J.C., Abdelrazek, M., Jin, H.: Cost-effective app data distribution in edge computing. IEEE Trans. Parallel Distrib. Syst. 32(1), 31–44 (2020)
Yan, H., Li, X., Wang, Y., Jia, C.: Centralized duplicate removal video storage system with privacy preservation in IoT. Sensors 18(6), 1814 (2018)
Yan, Z., Jiang, H., Tan, Y., Skelton, S., Luo, H.: Z-dedup: a case for deduplicating compressed contents in cloud. In: Proceedings of 33rd IEEE International Parallel and Distributed Processing Symposium, pp. 386–395 (2019)
Yuan, L., et al.: Coopedge: a decentralized blockchain-based platform for cooperative edge computing. In: Proceedings of the 30th Web Conference (2021)
Zhang, Y., Wu, Y., Yang, G.: Droplet: a distributed solution of data deduplication. In: Proceedings of 13th ACM/IEEE International Conference on Grid Computing, pp. 114–121 (2012)
Acknowledgement
We thank the anonymous reviewers for their helpful feedback. This work is supported by National Science Foundation of China under grant No.62032008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Luo, R., Jin, H., He, Q., Wu, S., Zeng, Z., Xia, X. (2021). Graph-Based Data Deduplication in Mobile Edge Computing Environment. In: Hacid, H., Kao, O., Mecella, M., Moha, N., Paik, Hy. (eds) Service-Oriented Computing. ICSOC 2021. Lecture Notes in Computer Science(), vol 13121. Springer, Cham. https://doi.org/10.1007/978-3-030-91431-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-91431-8_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91430-1
Online ISBN: 978-3-030-91431-8
eBook Packages: Computer ScienceComputer Science (R0)