Abstract
In this work, we consider the problem of summarizing a data stream through an item-based summary using core items. We consider an IoT setting, where computing such summaries at the edge devices instead of emitting the whole data stream can drastically reduce the network traffic and speed up further processing. Core items of a data stream are the items with the highest values for a given monotone submodular utility function. To create stream summaries, we propose the SoftSieving approach for parallel processing with low memory consumption and fast execution time while attaining acceptable utility gain. Through extensive experiments with real-world datasets, we show the suitability of our approach and its superiority over state-of-the-art competitors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache storm (2011). https://storm.apache.org/. Accessed 09 Apr 2022
Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB, pp. 607–618 (2006)
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD (2014)
Brown, G., Pocock, A.C., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)
Buchbinder, N., Feldman, M., Schwartz, R.: Online submodular maximization with preemption. In: SIAM, pp. 1202–1216 (2014)
Buschjäger, S., Honysz, P.-J., Pfahler, L., Morik, K.: Very fast streaming submodular function maximization. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12977, pp. 151–166. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86523-8_10
Buschjäger, S., Morik, K., Schmidt, M.: Summary extraction on data streams in embedded systems. In: IOTSTREAMING@ PKDD/ECML (2017)
Chen, J., Zhang, Q.: Distinct sampling on streaming data with near-duplicates. In: PODS, pp. 369–382 (2018)
Dasgupta, A., Kumar, R., Ravi, S.: Summarization through submodularity and dispersion. In: ACL, vol. 1, pp. 1014–1022 (2013)
Feige, U.: A threshold of ln n for approximating set cover. J. ACM (JACM) 45(4), 634–652 (1998)
Feldman, M., Norouzi-Fard, A., Svensson, O., Zenklusen, R.: The one-way communication complexity of submodular maximization with applications to streaming and robustness. In: SIGACT, pp. 1363–1374 (2020)
Fu, X., Ghaffar, T., Davis, J.C., Lee, D.: \(\{\)EdgeWise\(\}\): a better stream processing engine for the edge. In: USENIX ATC, pp. 929–946 (2019)
Indyk, P., Mahabadi, S., Mahdian, M., Mirrokni, V.S.: Composable core-sets for diversity and coverage maximization. In: PODS, pp. 100–108 (2014)
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program pam). In: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics, vol. 344, pp. 68–125 (1990)
Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi, S., Karbasi, A.: Submodular streaming in all its glory: tight approximation, minimum memory and low adaptive complexity. In: ICML, pp. 3311–3320 (2019)
Lawrence, N., Seeger, M., Herbrich, R.: Fast sparse gaussian process methods: the informative vector machine. In: NIPS, vol. 15 (2002)
Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: SIGKDD, pp. 420–429 (2007)
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: HLT-NAACL, pp. 912–920 (2010)
Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: HLT, pp. 510–520 (2011)
Luber, M., Spinello, L., Arras, K.: People tracking in RGB-D data with on-line boosted target models. In: IROS, pp. 3844–3849 (2011)
Mirrokni, V.S., Zadimoghaddam, M.: Randomized composable core-sets for distributed submodular maximization. In: STOC, pp. 153–162 (2015)
Mirzasoleiman, B., Karbasi, A., Sarkar, R., Krause, A.: Distributed submodular maximization: Identifying representative elements in massive data. In: NIPS, vol. 26 (2013)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions-i. Math. Program. 14(1), 265–294 (1978)
Seeger, M.: Greedy forward selection in the informative vector machine. Technical report, UC Berkeley (2004)
Spinello, L., Arras, K.O.: People detection in RGB-D data. In: RSJ, pp. 3838–3843 (2011)
Stafford, G.A.: Environmental sensor telemetry data (2020). https://www.kaggle.com/datasets/garystafford/environmental-sensor-data-132k. Accessed 28 Apr 2022
Vitter, J.S.: Random sampling with a reservoir. ACM TOMS 11(1), 37–57 (1985)
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT press, Cambridge (2006)
Zhao, J., Wang, P., Tao, J., Zhang, S., Lui, J.C.: Continuously tracking core items in data streams with probabilistic decays. In: ICDE, pp. 769–780 (2020)
Zhuang, H., Rahman, R., Hu, X., Guo, T., Hui, P., Aberer, K.: Data summarization with social contexts. In: CIKM, pp. 397–406. ACM (2016)
Acknowledgments
This work has been partially funded by the German Federal Ministry of Education and Research under grant number 28DE113C18 (DigiVine). The responsibility for the content of this publication lies with the authors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Gjurovski, D., Heidemann, J., Michel, S. (2022). Summarizing Edge-Device Data via Core Items. In: Chiusano, S., Cerquitelli, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2022. Lecture Notes in Computer Science, vol 13389. Springer, Cham. https://doi.org/10.1007/978-3-031-15740-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-15740-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15739-4
Online ISBN: 978-3-031-15740-0
eBook Packages: Computer ScienceComputer Science (R0)