ABSTRACT
We propose a novel statistical approach to evaluate the statistical significance (reliability) of the findings in the discriminative sub-trajectory mining problem, called Statistically Discriminative Sub-trajectory Mining (Stat-DSM). Given two groups of trajectories, the goal is to extract moving patterns in the form of sub-trajectories that occur statistically significantly more often in one group than in the other. An advantage of the Stat-DSM method is that the statistical significance of the extracted sub-trajectories are properly controlled in the sense that the probability of finding a false discriminative sub-trajectory is smaller than a specified significance threshold a (e.g., 0.05). We conduct experiments on real-world datasets to demonstrate the effectiveness of the Stat-DSM method.
- Y. Benjamini and Y. Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological) (1995), 289--300.Google Scholar
- Y. Benjamini and D. Yekutieli. 2005. False discovery rate--adjusted multiple confidence intervals for selected parameters. J. Amer. Statist Assoc. 100, 469 (2005), 71--81.Google ScholarCross Ref
- S. Dudoit, J. P. Shaffer, and J. C. Boldrick. 2003. Multiple hypothesis testing in microarray experiments. Statist. Sci. (2003), 71--103.Google Scholar
- C. A. Ferrero, L. O. Alvares, W. Zalewski, and V. Bogomy. 2018. MOVELETS: Exploring Relevant Subtrajectories for Robust Trajectory Classification. In Proceedings of the 33rd ACM/SIGAPP Symposium on Applied Computing, Pau, France. 9--13.Google Scholar
- R. A. Fisher. 1922. On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society 85, 1 (1922), 87--94.Google ScholarCross Ref
- J.-G. Lee, J. Han, X. Li, and H. Gonzalez. 2008. TraClass: trajectory classification using hierarchical region-based and trajectory-based clustering. Proceedings of the VLDB Endowment 1, 1 (2008), 1081--1094.Google ScholarDigital Library
- D. Patel. 2013. Incorporating duration and region association information in trajectory classification. Journal of Location Based Services 7, 4 (2013), 246--271.Google ScholarDigital Library
- J. P. Shaffer. 1995. Multiple hypothesis testing. Annual review of psychology 46, 1 (1995), 561--584.Google Scholar
- R. Tarone. 1990. A modified Bonferroni method for discrete data. Biometrics (1990), 515--522.Google Scholar
- A. Terada, M. Okada-Hatakeyama, K. Tsuda, and J. Sese. 2013. Statistical significance of combinatorial regulations. Proceedings of the National Academy of Sciences 110, 32 (2013), 12996--13001.Google ScholarCross Ref
- P. H. Westfall and S. S. Young. 1993. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment (Wiley Series in Probability and Statistics). (1993).Google Scholar
- Y. Zheng. 2015. Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 3 (2015), 29.Google ScholarDigital Library
Index Terms
- Statistically Discriminative Sub-trajectory Mining with Multiple Testing Correction
Recommendations
Trajectory Data Mining: An Overview
Survey Paper, Regular Papers and Special Section on Participatory Sensing and Crowd IntelligenceThe advances in location-acquisition and mobile computing techniques have generated massive spatial trajectory data, which represent the mobility of a diversity of moving objects, such as people, vehicles, and animals. Many techniques have been proposed ...
Statistically Significant Pattern Mining with Ordinal Utility
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningStatistically significant patterns mining (SSPM) is an essential and challenging data mining task in the field of knowledge discovery in databases (KDD), in which each pattern is evaluated via a hypothesis test. Our study aims to introduce a preference ...
Statistical Emerging Pattern Mining with Multiple Testing Correction
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningEmerging patterns are patterns whose support significantly differs between two databases. We study the problem of listing emerging patterns with a multiple testing guarantee. Recently, Terada et al., proposed the Limitless Arity Multiple-testing ...
Comments