skip to main content
10.1145/2820783.2820802acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

BACR: set similarities with lower bounds and application to spatial trajectories

Published:03 November 2015Publication History

ABSTRACT

This paper proposes a length-independent feature representation of sets of strings based on Bloom filters called BACR for similarity search in databases. Further, we show how a Z-curve-based discretization of geospatial trajectories can be used in order to search for similar trajectories in large databases. Additionally to the already-known estimation of the size of the union and the intersection of sets from Bloom filters, we propose a way to calculate an upper bound for the intersection and a lower bound for the union of sets. Consequently, we show that the Jaccard distance and many other similarity measures allow for a lower bound. This makes exact similarity search on large databases of this type feasible. Finally, we show that the Jaccard distance is incompatible with the union of sets and replace the Jaccard distance appropriately in a way such that even collections of sets of strings can be represented with a single BACR feature vector at least for similarity search applications. The algorithms are thoroughly evaluated and motivated by real-world examples.

References

  1. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485--509, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 327--336. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380--388. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Deng, K. Xie, K. Zheng, and X. Zhou. Trajectory indexing and retrieval. In Computing with Spatial Trajectories, pages 35--60. 2011.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Gong, Y. Xiong, W. Huang, L. Chen, Q. Lu, and Y. Hu. Fast similarity search of multi-dimensional time series via segment rotation. In M. Renz, C. Shahabi, X. Zhou, and M. A. Cheema, editors, Database Systems for Advanced Applications, volume 9049 of Lecture Notes in Computer Science, pages 108--124. Springer International Publishing, 2015.Google ScholarGoogle Scholar
  8. E. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and information systems, 7(3):358--386, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Niemeyer. Geohash, 2008.Google ScholarGoogle Scholar
  10. P. Ruppel and A. Küpper. Geocookie: a space-efficient representation of geographic location sets.Google ScholarGoogle Scholar
  11. S. J. Swamidass and P. Baldi. Mathematical correction for fingerprint similarity measures to improve chemical retrieval. Journal of chemical information and modeling, 47(3):952--964, 2007.Google ScholarGoogle Scholar
  12. P. Willett, J. M. Barnard, and G. M. Downs. Chemical similarity searching. Journal of chemical information and computer sciences, 38(6):983--996, 1998.Google ScholarGoogle Scholar
  13. J. J.-C. Ying, W.-C. Lee, T.-C. Weng, and V. S. Tseng. Semantic trajectory mining for location prediction. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 34--43. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C.-T. Zhang, R. Zhang, and H.-Y. Ou. The z curve database: a graphic representation of genome sequences. Bioinformatics, 19(5):593--599, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. K. Zheng, S. Shang, N. J. Yuan, and Y. Yang. Towards efficient search for activity trajectories. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 230--241. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Zheng, X. Xie, and W.-Y. Ma. Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull., 33(2):32--39, 2010.Google ScholarGoogle Scholar

Index Terms

  1. BACR: set similarities with lower bounds and application to spatial trajectories

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems
          November 2015
          646 pages
          ISBN:9781450339674
          DOI:10.1145/2820783

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGSPATIAL '15 Paper Acceptance Rate38of212submissions,18%Overall Acceptance Rate220of1,116submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader