Skip to main content

Applying Compression to Hierarchical Clustering

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11223))

Included in the following conference series:

Abstract

Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://cocodataset.org/#download.

References

  1. Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp. 1027–1035, 2007

    Google Scholar 

  2. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  3. Bookstein, A., Klein, S.T.: Compression of correlated bit-vectors. Inf. Syst. 16(4), 387–400 (1991)

    Article  Google Scholar 

  4. Burrows, M. and Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report, Digital Equipment Corporation, SRC-RR-124:1–18 (1994)

    Google Scholar 

  5. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56

    Chapter  Google Scholar 

  6. Choueka, Y., Fraenkel, A.S., Klein, S.T., Segal, E.: Improved hierarchical bit-vector compression in document retrieval systems. In: SIGIR 1986, Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 8–10 September 1986, pp. 88–96 (1986)

    Google Scholar 

  7. Claude, F., Nicholson, P.K., Seco, D.: Differentially encoded search trees. In: 2012 Data Compression Conference, pp. 357–366 (2012)

    Google Scholar 

  8. Fraenkel, A.S., Klein, S.T.: Novel compression of sparse bit-strings – preliminary report. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 12, pp. 169–183. Springer, Heidelberg (1985). https://doi.org/10.1007/978-3-642-82456-2_12

    Chapter  Google Scholar 

  9. Fukunaga, K., Narendra, P.M.: A branch and bound algorithms for computing \(k\)-nearest neighbors. IEEE Trans. Comput. 24(7), 750–753 (1975)

    Article  Google Scholar 

  10. Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)

    Article  Google Scholar 

  11. Klein, S.T., Shapira, D.: Compressed pattern matching in JPEG images. Int. J. Found. Comput. Sci. 17(6), 1297–1306 (2006)

    Article  MathSciNet  Google Scholar 

  12. Klein, S.T., Shapira, D.: Compressed matching for feature vectors. Theor. Comput. Sci. 638, 52–62 (2016)

    Article  MathSciNet  Google Scholar 

  13. Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)

    Google Scholar 

  14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  MathSciNet  Google Scholar 

  15. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)

    Google Scholar 

  16. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  17. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006, New York, NY, USA, 17–22 June 2006, pp. 2161–2168 (2006)

    Google Scholar 

  18. Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Heidelberg (2005). https://doi.org/10.1007/0-387-25465-X_15

    Chapter  MATH  Google Scholar 

  19. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011, pp. 2564–2571 (2011)

    Google Scholar 

  20. Trzcinski, T., Lepetit, V., Fua, P.: Thick boundaries in binary space and their influence on nearest-neighbor search. Pattern Recogn. Lett. 33(16), 2173–2180 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shmuel Tomi Klein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baruch, G., Klein, S.T., Shapira, D. (2018). Applying Compression to Hierarchical Clustering. In: Marchand-Maillet, S., Silva, Y., Chávez, E. (eds) Similarity Search and Applications. SISAP 2018. Lecture Notes in Computer Science(), vol 11223. Springer, Cham. https://doi.org/10.1007/978-3-030-02224-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02224-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02223-5

  • Online ISBN: 978-3-030-02224-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics