Abstract
We address the problem of sketching the hamming distance of data streams. We develop Fixable Sketches which compare data streams or files and restore the differences between them. Our contribution: For two streams with hamming distance bounded by k we show a sketch of size O(klogn) with O(logn) processing time per new element in the stream and how to restore all locations where the two streams differ in time linear in the sketch size. Probability of error is less than 1/n.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bar-Yossef, Z., Jayram, T.S., Krauthgamer, R., Kumar, R.: Approximating edit distance efficiently. In: FOCS, pp. 550–559. IEEE Computer Society Press, Los Alamitos (2004)
Bar-Yossef, Z., Jayram, T.S, Kumar, R., Sivakumar, D.: Manuscript (2003)
Batu, T., Ergün, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.: A sublinear algorithm for weakly approximating edit distance. In: STOC, pp. 316–324. ACM, New York (2003)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)
Cormode, G., Paterson, M., Sahinalp, S.C, Vishkin, U.: Communication complexity of document exchange. In: SODA ’00: Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, pp. 197–206. Society for Industrial and Applied Mathematics (2000)
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M., Wright, R.: Secure multiparty computation of approximations. In: Orejas, F., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 927–938. Springer, Heidelberg (2001)
Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An approximate l1-difference algorithm for massive data streams. SIAM J. Comput (and in Proceedings of the 40th Annual Symposium on Foundations of Computer Science), 32(1) 131–151, (2002) Appeared in Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pp. 501–511 (1999)
Gilbert, A.C, Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: STOC 2002: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 389–398. ACM Press, New York (2002)
Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: STOC 2001: Proceedings of the thirty-third annual ACM symposium on Theory of computing, pp. 471–475. ACM Press, New York (2001)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: FOCS 2000: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Washington, DC, USA, p. 189. IEEE Computer Society Press, Los Alamitos (2000)
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30(2), 457–474 (2000)
Muthukrishnan, S.: Data streams: algorithms and applications. In: SODA ’03: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 413–413, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics (2003)
Starobinski, D., Trachtenberg, A., Agarwal, S.: Efficient pda synchronization. IEEE Trans. Mob. Comput. 2(1), 40–51 (2003)
Trachtenberg, A., Starobinski, D., Agarwal, S.: Fast pda synchronization using characteristic polynomial interpolation. In: INFOCOM (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Porat, E., Lipsky, O. (2007). Improved Sketching of Hamming Distance with Error Correcting. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-73437-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73436-9
Online ISBN: 978-3-540-73437-6
eBook Packages: Computer ScienceComputer Science (R0)