Abstract
Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp. 1027–1035, 2007
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Bookstein, A., Klein, S.T.: Compression of correlated bit-vectors. Inf. Syst. 16(4), 387–400 (1991)
Burrows, M. and Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation, SRC-RR-124:1–18 (1994)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Choueka, Y., Fraenkel, A.S., Klein, S.T., Segal, E.: Improved hierarchical bit-vector compression in document retrieval systems. In: SIGIR 1986, Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 8–10 September 1986, pp. 88–96 (1986)
Claude, F., Nicholson, P.K., Seco, D.: Differentially encoded search trees. In: 2012 Data Compression Conference, pp. 357–366 (2012)
Fraenkel, A.S., Klein, S.T.: Novel compression of sparse bit-strings – preliminary report. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 12, pp. 169–183. Springer, Heidelberg (1985). https://doi.org/10.1007/978-3-642-82456-2_12
Fukunaga, K., Narendra, P.M.: A branch and bound algorithms for computing \(k\)-nearest neighbors. IEEE Trans. Comput. 24(7), 750–753 (1975)
Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)
Klein, S.T., Shapira, D.: Compressed pattern matching in JPEG images. Int. J. Found. Comput. Sci. 17(6), 1297–1306 (2006)
Klein, S.T., Shapira, D.: Compressed matching for feature vectors. Theor. Comput. Sci. 638, 52–62 (2016)
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, New York, NY, USA, 17–22 June 2006, pp. 2161–2168 (2006)
Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Heidelberg (2005). https://doi.org/10.1007/0-387-25465-X_15
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011, pp. 2564–2571 (2011)
Trzcinski, T., Lepetit, V., Fua, P.: Thick boundaries in binary space and their influence on nearest-neighbor search. Pattern Recogn. Lett. 33(16), 2173–2180 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Baruch, G., Klein, S.T., Shapira, D. (2018). Applying Compression to Hierarchical Clustering. In: Marchand-Maillet, S., Silva, Y., Chávez, E. (eds) Similarity Search and Applications. SISAP 2018. Lecture Notes in Computer Science(), vol 11223. Springer, Cham. https://doi.org/10.1007/978-3-030-02224-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-02224-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02223-5
Online ISBN: 978-3-030-02224-2
eBook Packages: Computer ScienceComputer Science (R0)