ABSTRACT
This paper proposes a length-independent feature representation of sets of strings based on Bloom filters called BACR for similarity search in databases. Further, we show how a Z-curve-based discretization of geospatial trajectories can be used in order to search for similar trajectories in large databases. Additionally to the already-known estimation of the size of the union and the intersection of sets from Bloom filters, we propose a way to calculate an upper bound for the intersection and a lower bound for the union of sets. Consequently, we show that the Jaccard distance and many other similarity measures allow for a lower bound. This makes exact similarity search on large databases of this type feasible. Finally, we show that the Jaccard distance is incompatible with the union of sets and replace the Jaccard distance appropriately in a way such that even collections of sets of strings can be represented with a single BACR feature vector at least for similarity search applications. The algorithms are thoroughly evaluated and motivated by real-world examples.
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- A. Broder and M. Mitzenmacher. Network applications of bloom filters: A survey. Internet mathematics, 1(4):485--509, 2004.Google ScholarCross Ref
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 327--336. ACM, 1998. Google ScholarDigital Library
- M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 380--388. ACM, 2002. Google ScholarDigital Library
- K. Deng, K. Xie, K. Zheng, and X. Zhou. Trajectory indexing and retrieval. In Computing with Spatial Trajectories, pages 35--60. 2011.Google ScholarCross Ref
- A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518--529, 1999. Google ScholarDigital Library
- X. Gong, Y. Xiong, W. Huang, L. Chen, Q. Lu, and Y. Hu. Fast similarity search of multi-dimensional time series via segment rotation. In M. Renz, C. Shahabi, X. Zhou, and M. A. Cheema, editors, Database Systems for Advanced Applications, volume 9049 of Lecture Notes in Computer Science, pages 108--124. Springer International Publishing, 2015.Google Scholar
- E. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and information systems, 7(3):358--386, 2005. Google ScholarDigital Library
- G. Niemeyer. Geohash, 2008.Google Scholar
- P. Ruppel and A. Küpper. Geocookie: a space-efficient representation of geographic location sets.Google Scholar
- S. J. Swamidass and P. Baldi. Mathematical correction for fingerprint similarity measures to improve chemical retrieval. Journal of chemical information and modeling, 47(3):952--964, 2007.Google Scholar
- P. Willett, J. M. Barnard, and G. M. Downs. Chemical similarity searching. Journal of chemical information and computer sciences, 38(6):983--996, 1998.Google Scholar
- J. J.-C. Ying, W.-C. Lee, T.-C. Weng, and V. S. Tseng. Semantic trajectory mining for location prediction. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 34--43. ACM, 2011. Google ScholarDigital Library
- C.-T. Zhang, R. Zhang, and H.-Y. Ou. The z curve database: a graphic representation of genome sequences. Bioinformatics, 19(5):593--599, 2003.Google ScholarCross Ref
- K. Zheng, S. Shang, N. J. Yuan, and Y. Yang. Towards efficient search for activity trajectories. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 230--241. IEEE, 2013. Google ScholarDigital Library
- Y. Zheng, X. Xie, and W.-Y. Ma. Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull., 33(2):32--39, 2010.Google Scholar
Index Terms
- BACR: set similarities with lower bounds and application to spatial trajectories
Recommendations
One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories
An interesting issue in moving object databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories ...
A low-dimensional feature vector representation for alignment-free spatial trajectory analysis
MobiGIS '16: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Mobile Geographic Information SystemsTrajectory analysis is a central problem in the era of big data due to numerous interconnected mobile devices generating unprecedented amounts of spatio-temporal trajectories. Unfortunately, datasets of spatial trajectories are quite difficult to ...
Finding long and similar parts of trajectories
A natural time-dependent similarity measure for two trajectories is their average distance at corresponding times. We give algorithms for computing the most similar subtrajectories under this measure, assuming the two trajectories are given as two ...
Comments