ABSTRACT
In recent years, the research of computer vision is popular. However, the image data that can be used for computer vision training is very limited, so it is necessary to find an effective method to expand the datasets based on the existing image data. In this paper, we study methods to collect more training data from existing datasets and compare detectors’ performance trained with datasets generated by different methods. One method is to perform sampling-based on statistical properties of feature descriptors. For every feature, the underlying assumption is that a probability density function (PDF) exists, such PDF is approximated with existing training examples and new training examples are sampled from the approximated PDF. The other method is simply to expand the existing datasets by flipping each training example along its symmetric axis. Locally Adaptive Regression Kernel (LARK) feature is used in this paper because it is robust against illumination changes and noise. Our experimental results demonstrate that an expanded training dataset is not always preferable, even if the expanded dataset includes all original training data.
- Box, G. E. P., & Muller, M. E. (1958). A note on the generation of random normal deviates. The Annals of Mathematical Statistics, 29(2), 610–611. https://doi.org/10.1214/aoms/1177706645Google ScholarCross Ref
- Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 886-893). Ieee.Google Scholar
- Goodman, L. A. (1954). Some practical techniques in serial number analysis. Journal of the American Statistical Association, 49(265), 97–112. https://doi.org/10.1080/01621459.1954.10501218Google ScholarCross Ref
- Hussein, M., Porikli, F., & Davis, L. (2009). A comprehensive evaluation framework and a comparative study for human detectors. IEEE Transactions on Intelligent Transportation Systems, 10(3), 417–427. https://doi.org/10.1109/tits.2009.2026670Google ScholarDigital Library
- Classic AdaBoost Classifier. (2012). Www.mathworks.com. http://www.mathworks.com/matlabcentral/fileexchange/27813Google Scholar
- Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399–402. https://doi.org/10.1080/01621459.1967.10482916Google ScholarCross Ref
- RRuggles, R., & Brodie, H. (1947). An empirical approach to economic intelligence in World War II. Journal of the American Statistical Association, 42(237), 72–91. https://doi.org/10.1080/01621459.1947.10501915Google ScholarCross Ref
- Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine learning, 37(3), 297-336.Google Scholar
- Seo, H. J., & Milanfar, P. (2010). Training-free, generic object detection using locally adaptive regression kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1688–1704. https://doi.org/10.1109/tpami.2009.153Google ScholarDigital Library
- Seo, H. J., & Milanfar, P. (2010). Action recognition from one example. IEEE transactions on pattern analysis and machine intelligence, 33(5), 867-882.Google Scholar
- Takeda, H., Farsiu, S., & Milanfar, P. (2007). Kernel regression for image processing and reconstruction. IEEE Transactions on image processing, 16(2), 349-366.Google ScholarDigital Library
- Viola, P., & Jones, M. (2001, December). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001 (Vol. 1, pp. I-I). Ieee.Google Scholar
Index Terms
- Sampling May Not Always Increase Detector Performance: A Study on Collecting Training Examples
Recommendations
Classification with class noises through probabilistic sampling
A probabilistic sampling (PSAM) scheme is proposed to improve the classifier accuracy with mislabeled training data.A multiple voting based method is proposed for PSAM to estimate the confidence of each training data.A novel sampling method is proposed ...
Non-IID always Bad? Semi-Supervised Heterogeneous Federated Learning with Local Knowledge Enhancement
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementFederated learning (FL) is important for privacy-preserving services by training models without collecting raw user data. Most FL algorithms assume all data is annotated, which is impractical due to the high cost of labeling data in real applications. To ...
UnseenNet: Fast Training Detector for Unseen Concepts with No Bounding Boxes
Image and Vision ComputingAbstractTraining of object detection models using less data is currently the focus of existing N-shot learning models in computer vision. Such methods use object-level labels and takes hours to train on unseen classes. There are many cases where we have ...
Comments