Abstract
This chapter introduces cyber security researchers to key concepts in the data streaming and sketching literature that are relevant to Adaptive Cyber Defense (ACD) and Moving Target Defense (MTD). We begin by observing the challenges met in the big data realm. Particular attention is paid to the need for compact representations of large datasets, as well as designing algorithms that are robust to changes in the underlying dataset. We present a summary of the key research and tools developed in the data stream and sketching literature, with a focus on practical applications. Finally, we present several concrete extensions to problems related to ACD applications throughout this book, with a focus on improving scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 459–467. SIAM (2012)
Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 5–14. ACM (2012)
Alahakoon, T., Tripathi, R., Kourtellis, N., Simha, R., Iamnitchi, A.: K-path centrality: a new centrality measure in social networks. In: Proceedings of the 4th Workshop on Social Network Systems, p. 1. ACM (2011)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999)
Andoni, A., Krauthgamer, R., Onak, K.: Streaming algorithms via precision sampling. In: 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 363–372. IEEE (2011)
Bader, D.A., Kintali, S., Madduri, K., Mihail, M.: Approximating betweenness centrality. In: Bonato, A., Chung, F.R.K. (eds.) WAW 2007. LNCS, vol. 4863, pp. 124–137. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77004-6_10
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45726-7_1
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Bergamini, E., Meyerhenke, H., Staudt, C.L.: Approximating betweenness centrality in large evolving networks. In: 2015 Proceedings of the Seventeenth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 133–146. SIAM (2014)
Boldi, P., Rosa, M., Vigna, S.: HyperANF: approximating the neighbourhood function of very large graphs on a budget. In: Proceedings of the 20th International Conference on World Wide Web, pp. 625–634. ACM (2011)
Boldi, P., Vigna, S.: Axioms for centrality. Internet Math. 10(3–4), 222–262 (2014)
Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Sociol. 25(2), 163–177 (2001)
Brandes, U., Pich, C.: Centrality estimation in large networks. Int. J. Bifurc. Chaos 17(07), 2303–2318 (2007)
Cárdenas, A.A., Manadhata, P.K., Rajan, S.P.: Big data analytics for security. IEEE Secur. Priv. 11(6), 74–76 (2013)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Eidenbenz, S., Triguero, F., Morales, R., Conejo, R., Hennessy, M. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45465-9_59
Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 205–214. ACM (2009)
Clarkson, K.L., Woodruff, D.P.: Low-rank approximation and regression in input sparsity time. J. ACM (JACM) 63(6), 54 (2017)
Cohen, R., Katzir, L., Yehezkel, A.: A minimal variance estimator for the cardinality of big data set intersection. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 95–103. ACM (2017)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. (TODS) 30(1), 249–278 (2005)
Deng, F., Rafiei, D.: New estimation algorithms for streaming data: count-min can do more (2007)
Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. J, Algorithms 25(1), 19–51 (1997)
Durand, M., Flajolet, P.: Loglog counting of large cardinalities. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 605–617. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39658-1_55
Ertl, O.: New cardinality estimation algorithms for HyperLogLog sketches. arXiv preprint arXiv:1702.01284 (2017)
Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high speed links. In: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, pp. 153–166. ACM (2003)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. (TON) 8(3), 281–293 (2000)
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theoret. Comput. Sci. 348(2–3), 207–216 (2005)
Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Discrete Mathematics and Theoretical Computer Science. pp. 137–156 (2007)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Goyal, A., Daumé III, H., Cormode, G.: Sketch algorithms for estimating point queries in NLP. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1093–1103. Association for Computational Linguistics (2012)
Green, O., McColl, R., Bader, D.A.: A fast algorithm for streaming betweenness centrality. In: 2012 International Conference on Privacy, Security, Risk and Trust (PASSAT) and 2012 International Conference on Social Computing (SocialCom), pp. 11–20. IEEE (2012)
Guha, S., McGregor, A.: Graph streams and sketches: resources (2018). https://people.cs.umass.edu/~mcgregor/graphs/
Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at Twitter. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 505–514. ACM (2013)
Hayashi, T., Akiba, T., Yoshida, Y.: Fully dynamic betweenness centrality maintenance on massive networks. Proc. VLDB Endow. 9(2), 48–59 (2015)
Heule, S., Nunkesser, M., Hall, A.: HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 683–692. ACM (2013)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM (JACM) 53(3), 307–323 (2006)
Indyk, P., Woodruff, D.: Optimal approximations of the frequency moments of data streams. In: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, pp. 202–208. ACM (2005)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26(189–206), 1 (1984)
Jowhari, H., Sağlam, M., Tardos, G.: Tight bounds for Lp samplers, finding duplicates in streams, and related problems. In: Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 49–58. ACM (2011)
Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 41–52. ACM (2010)
Kang, C., Kraus, S., Molinaro, C., Spezzano, F., Subrahmanian, V.: Diffusion centrality: a paradigm to maximize spread in social networks. Artif. Intell. 239, 70–96 (2016)
Kapralov, M., Lee, Y.T., Musco, C., Musco, C., Sidford, A.: Single pass spectral sparsification in dynamic streams. SIAM J. Comput. 46(1), 456–477 (2017)
Kourtellis, N., Alahakoon, T., Simha, R., Iamnitchi, A., Tripathi, R.: Identifying high betweenness centrality nodes in large social networks. Soc. Netw. Anal. Min. 3(4), 899–914 (2013)
Kourtellis, N., Morales, G.D.F., Bonchi, F.: Scalable online betweenness centrality in evolving graphs. IEEE Trans. Knowl. Data Eng. 27(9), 2494–2506 (2015)
Li, Y., Nguyen, H.L., Woodruff, D.P.: On sketching matrix norms and the top singular vector. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1562–1581. Society for Industrial and Applied Mathematics (2014)
Li, Y., Nguyen, H.L., Woodruff, D.P.: Turnstile streaming algorithms might as well be linear sketches. In: Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, pp. 174–183. ACM (2014)
Li, Y., Woodruff, D.P.: Tight bounds for sketching the operator norm, Schatten norms, and subspace embeddings. In: LIPIcs-Leibniz International Proceedings in Informatics, vol. 60. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)
Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
McGregor, A.: Graph mining on streams. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 1271–1275. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-39940-9_184
Monemizadeh, M., Woodruff, D.P.: 1-pass relative-error Lp-sampling with applications. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1143–1160. SIAM (2010)
Muthukrishnan, S., et al.: Data streams: algorithms and applications. Found. Trends® Theor. Comput. Sci. 1(2), 117–236 (2005)
Myers, S.A., Sharma, A., Gupta, P., Lin, J.: Information network or social network?: the structure of the Twitter follow graph. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 493–498. ACM (2014)
Nelson, J., Nguyên, H.L.: OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pp. 117–126. IEEE (2013)
Nelson, J., Nguyn, H.L., Woodruff, D.P.: On deterministic sketching and streaming for sparse recovery and norm estimation. Linear Algebra Appl. 441, 152–167 (2014)
Nisan, N.: Pseudorandom generators for space-bounded computation. Combinatorica 12(4), 449–461 (1992)
Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90. ACM (2002)
Pearce, R.: Triangle counting for scale-free graphs at scale in distributed memory. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–4. IEEE (2017)
Pearce, R., Gokhale, M., Amato, N.M.: Faster parallel traversal of scale free graphs at extreme scale with vertex delegates. In: SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 549–559. IEEE (2014)
Priest, B.W.: Semi-streaming approximation of centrality indices in massive graphs. Ph.D. thesis, Dartmouth College (2019)
Priest, B.W., Pearce, R., Sanders, G.: Estimating edge-local triangle count heavy hitters in edge-linear time and almost-vertex-linear space. In: 2018 IEEE High Performance Extreme Computing Conference (HPEC). IEEE (2018)
Pătraşcu, M., Thorup, M.: The power of simple tabulation hashing. J. ACM (JACM) 59(3), 14 (2012)
Qin, J., Kim, D., Tung, Y.: LogLog-beta and more: a new algorithm for cardinality estimation based on LogLog counting. arXiv preprint arXiv:1612.02284 (2016)
Riondato, M., Kornaropoulos, E.M.: Fast approximation of betweenness centrality through sampling. Data Min. Knowl. Disc. 30(2), 438–475 (2016)
Sun, X., Dai, J., Liu, P., Singhal, A., Yen, J.: Using bayesian networks for probabilistic identification of zero-day attack paths. IEEE Trans. Inf. Forensics Secur. 13(10), 2506–2521 (2018)
Ting, D.: Streamed approximate counting of distinct elements: Beating optimal batch methods. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 442–451. ACM (2014)
Upstill, T., Craswell, N., Hawking, D.: Predicting fame and fortune: PageRank or indegree. In: Proceedings of the Australasian Document Computing Symposium, ADCS, pp. 31–40 (2003)
Vu, H.: Data stream algorithms for large graphs and high dimensional data (2018)
Wegman, M.N., Carter, J.L.: New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci. 22(3), 265–279 (1981)
Wei, W., Carley, K.: Real time closeness and betweenness centrality calculations on streaming network data. In: Proceedings of the 2014 ASE Big-Data/SocialCom/Cybersecurity Conference, Stanford University (2014)
Whang, K.Y., Vander-Zanden, B.T., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst. (TODS) 15(2), 208–229 (1990)
Woodruff, D.P., et al.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10(1–2), 1–157 (2014)
Xiao, Q., Zhou, Y., Chen, S.: Better with fewer bits: improving the performance of cardinality estimation of large data streams. In: INFOCOM 2017-IEEE Conference on Computer Communications, pp. 1–9. IEEE (2017)
Yoshida, Y.: Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1416–1425. ACM (2014)
Zhang, Q., Pell, J., Canino-Koning, R., Howe, A.C., Brown, C.T.: These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLoS ONE 9(7), e101271 (2014)
Acknowledgements
The work presented in this chapter was supported by the Army Research Office under grant W911NF-13-1-0421.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Priest, B.W., Cybenko, G., Singh, S., Albanese, M., Liu, P. (2019). Online and Scalable Adaptive Cyber Defense. In: Jajodia, S., Cybenko, G., Liu, P., Wang, C., Wellman, M. (eds) Adversarial and Uncertain Reasoning for Adaptive Cyber Defense. Lecture Notes in Computer Science(), vol 11830. Springer, Cham. https://doi.org/10.1007/978-3-030-30719-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-30719-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30718-9
Online ISBN: 978-3-030-30719-6
eBook Packages: Computer ScienceComputer Science (R0)