research-article

A Survey of Dataset Refinement for Problems in Computer Vision Datasets

Authors:

Cheukting Chung,

Zheng WangAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 7

Article No.: 172, Pages 1 - 34

https://doi.org/10.1145/3627157

Published: 09 April 2024 Publication History

Abstract

Large-scale datasets have played a crucial role in the advancement of computer vision. However, they often suffer from problems such as class imbalance, noisy labels, dataset bias, or high resource costs, which can inhibit model performance and reduce trustworthiness. With the advocacy of data-centric research, various data-centric solutions have been proposed to solve the dataset problems mentioned above. They improve the quality of datasets by re-organizing them, which we call dataset refinement. In this survey, we provide a comprehensive and structured overview of recent advances in dataset refinement for problematic computer vision datasets. Firstly, we summarize and analyze the various problems encountered in large-scale computer vision datasets. Then, we classify the dataset refinement algorithms into three categories based on the refinement process: data sampling, data subset selection, and active learning. In addition, we organize these dataset refinement methods according to the addressed data problems and provide a systematic comparative description. We point out that these three types of dataset refinement have distinct advantages and disadvantages for dataset problems, which informs the choice of the data-centric method appropriate to a particular research objective. Finally, we summarize the current literature and propose potential future research topics.

References

[1]

Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. 2015. Variance reduction in SGD by distributed importance sampling. arXiv preprint arXiv:1511.06481 (2015).

[2]

Görkem Algan and Ilkay Ulusoy. 2021. Image classification with deep learning in the presence of noisy labels: A survey. Knowledge-Based Systems 215 (2021), 106771.

[3]

Rahaf Aljundi, Min Lin, Baptiste Goujaud, and Yoshua Bengio. 2019. Gradient based sample selection for online continual learning. Advances in Neural Information Processing Systems (NIPS) 32 (2019), 11816–11825.

[4]

Eric Arazo, Diego Ortego, Paul Albert, Noel E. O’Connor, and Kevin McGuinness. 2021. How important is importance sampling for deep budgeted training?. In British Machine Vision Conference (BMVC). BMVA Press, 335.

[5]

Devansh Arpit, Stanisław Jastrzȩbski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. 2017. A closer look at memorization in deep networks. In International Conference on Machine Learning (ICML’17), Vol. 70. PMLR, Sydney, NSW, 233–242.

[6]

Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. 2019. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671 (2019).

[7]

Nebojsa Bacanin, Miodrag Zivkovic, Fadi Al-Turjman, K. Venkatachalam, Pavel Trojovskỳ, Ivana Strumberger, and Timea Bezdan. 2022. Hybridized sine cosine algorithm with convolutional neural networks dropout regularization application. Scientific Reports 12, 1 (2022), 1–20.

[8]

Olivier Bachem, Mario Lucic, and Andreas Krause. 2015. Coresets for nonparametric estimation-the case of DP-means. In International Conference on Machine Learning (ICML), Vol. 37. PMLR, Lille, France, 209–217.

[9]

Sara Beery, Grant Van Horn, and Pietro Perona. 2018. Recognition in terra incognita. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, Munich, Germany, 456–473.

Digital Library

[10]

William H. Beluch, Tim Genewein, Andreas Nürnberger, and Jan M. Köhler. 2018. The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Salt Lake City, UT, USA, 9368–9377.

[11]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning. Association for Computing Machinery, New York, NY, USA, 41–48.

Digital Library

[12]

Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xiaohua Zhai, and Aäron van den Oord. 2020. Are we done with ImageNet? arXiv preprint arXiv:2006.07159 (2020).

[13]

Mustafa Bilgic and Lise Getoor. 2009. Link-based active learning. In NIPS Workshop on Analyzing Networks and Learning with Graphs, Vol. 4. 9.

[14]

Vighnesh Birodkar, Hossein Mobahi, and Samy Bengio. 2019. Semantic redundancies in image-classification datasets: The 10% you don’t need. arXiv preprint arXiv:1901.11409 (2019).

[15]

Zalán Borsos, Mojmir Mutny, and Andreas Krause. 2020. Coresets via bilevel optimization for continual learning and streaming. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 14879–14890.

[16]

Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research 11 (1999), 131–167.

[17]

Trevor Campbell and Tamara Broderick. 2018. Bayesian coreset construction via greedy iterative geodesic ascent. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 698–706.

[18]

Dong Cao, Xiangyu Zhu, Xingyu Huang, Jianzhu Guo, and Zhen Lei. 2020. Domain balancing: Face recognition on long-tailed domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 5671–5679.

[19]

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and Jun-Yan Zhu. 2022. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 4750–4759.

[20]

Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. 2017. Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems (NIPS) 30 (2017), 1002–1012.

[21]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.

[22]

Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. 2019. Understanding and utilizing deep neural networks trained with noisy labels. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 1062–1070.

[23]

Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, and Yang Liu. 2020. Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347 (2020).

[24]

Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. 2019. Selection via proxy: Efficient data selection for deep learning. arXiv preprint arXiv:1906.11829 (2019).

[25]

Gabriella Contardo, Ludovic Denoyer, and Thierry Artières. 2017. A meta-learning approach to one-step active learning. arXiv preprint arXiv:1706.08334 (2017).

[26]

R. Dennis Cook and Sanford Weisberg. 1980. Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 22, 4 (1980), 495–508.

[27]

Filipe R. Cordeiro and Gustavo Carneiro. 2020. A survey on deep learning with noisy labels: How to train your model when you cannot trust on the annotations?. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 9–16.

[28]

Begüm Demir, Claudio Persello, and Lorenzo Bruzzone. 2010. Batch-mode active-learning methods for the interactive classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 49, 3 (2010), 1014–1031.

[29]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Miami, Florida, USA, 248–255.

[30]

Ayça Deniz, Hakan Ezgi Kiziloz, Tansel Dokeroglu, and Ahmet Cosar. 2017. Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques. Neurocomputing 241 (2017), 128–146.

Digital Library

[31]

Tansel Dokeroglu, Ayça Deniz, and Hakan Ezgi Kiziloz. 2022. A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494 (2022), 269–296.

[32]

Chris Drummond and Robert C. Holte. 2003. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats oversampling. In Workshop on Learning from Imbalanced Datasets II, Vol. 11. Citeseer, 1–8.

[33]

Ethan R. Elenberg, Rajiv Khanna, Alexandros G. Dimakis, and Sahand Negahban. 2018. Restricted strong convexity implies weak submodularity. The Annals of Statistics 46, 6B (2018), 3539–3568.

[34]

Ehsan Elhamifar, Guillermo Sapiro, and S. Shankar Sastry. 2015. Dissimilarity-based sparse subset selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 11 (2015), 2182–2197.

Digital Library

[35]

Ehsan Elhamifar, Guillermo Sapiro, Allen Yang, and S. Shankar Sasrty. 2013. A convex optimization framework for active learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Sydney, Australia, 209–216.

Digital Library

[36]

Simone Fabbrizzi, Symeon Papadopoulos, Eirini Ntoutsi, and Ioannis Kompatsiaris. 2022. A survey on bias in visual datasets. Computer Vision and Image Understanding 223 (2022), 103552.

Digital Library

[37]

Yang Fan, Fei Tian, Tao Qin, Jiang Bian, and Tie-Yan Liu. 2017. Learning what data to learn. arXiv preprint arXiv:1702.08635 (2017).

[38]

Meng Fang, Yuan Li, and Trevor Cohn. 2017. Learning how to active learn: A deep reinforcement learning approach. arXiv preprint arXiv:1708.02383 (2017).

[39]

Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility Location: Concepts, Models, Algorithms and Case Studies. Springer Science & Business Media, 1–545.

[40]

Dan Feldman. 2020. Core-Sets: Updated Survey. Springer International Publishing, Cham, 23–44.

[41]

Victor Rezende Franco, Marcos Cicarini Hott, Ricardo Guimarães Andrade, and Leonardo Goliatt. 2022. Hybrid machine learning methods combined with computer vision approaches to estimate biophysical parameters of pastures. Evolutionary Intelligence (2022), 1–14.

[42]

Satoru Fujishige. 2005. Submodular Functions and Optimization. Elsevier Science.

[43]

Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. 2020. Shortcut learning in deep neural networks. Nature Machine Intelligence 2, 11 (2020), 665–673.

[44]

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. 2018. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018).

[45]

Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 2242–2251.

[46]

Xian-Jin Gui, Wei Wang, and Zhang-Hao Tian. 2021. Towards understanding deep learning from noisy labels with small-loss criterion. arXiv preprint arXiv:2106.09291 (2021).

[47]

Chengcheng Guo, Bo Zhao, and Yanbing Bai. 2022. DeepCore: A comprehensive library for coreset selection in deep learning. In Database and Expert Systems Applications: 33rd International Conference, DEXA 2022, Vienna, Austria, August 22–24, 2022, Proceedings, Part I. Springer, Vienna, Austria, 181–195.

Digital Library

[48]

Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, and Dinglong Huang. 2018. CurriculumNet: Weakly supervised learning from large-scale web images. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, Munich, Germany, 135–150.

Digital Library

[49]

Yuhong Guo. 2010. Active instance sampling via matrix partition. Advances in Neural Information Processing Systems 23 (2010), 802–810.

[50]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems (NIPS) 31 (2018), 8536–8546.

[51]

Rui Han, Chi Harold Liu, Shilin Li, Lydia Y. Chen, Guoren Wang, Jian Tang, and Jieping Ye. 2019. SlimML: Removing non-critical input data in large-scale iterative machine learning. IEEE Transactions on Knowledge and Data Engineering 33, 5 (2019), 2223–2236.

[52]

Sariel Har-Peled and Akash Kushal. 2007. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry 37, 1 (2007), 3–19.

Digital Library

[53]

Manuel Haußmann, Fred A. Hamprecht, and Melih Kandemir. 2019. Deep active learning with adaptive acquisition. arXiv preprint arXiv:1906.11471 (2019).

[54]

Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1263–1284.

Digital Library

[55]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Las Vegas, NV, USA, 770–778.

[56]

Jinchi Huang, Lie Qu, Rongfei Jia, and Binqiang Zhao. 2019. O2U-Net: A simple noisy label detection approach for deep neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, South Korea, 3326–3334.

[57]

Jonathan Huggins, Trevor Campbell, and Tamara Broderick. 2016. Coresets for scalable Bayesian logistic regression. Advances in Neural Information Processing Systems (NIPS) 29 (2016), 4080–4088.

[58]

David Isele and Akansel Cosgun. 2018. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, New Orleans, LA, USA, 3302–3309.

[59]

Angela H. Jiang, Daniel L.-K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, and Padmanabhan Pillai. 2019. Accelerating deep learning by focusing on the biggest losers. arXiv preprint arXiv:1910.00762 (2019).

[60]

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 2304–2313.

[61]

Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2009. Multi-class active learning for image classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Miami, Florida, USA, 2372–2379.

[62]

Lie Ju, Xin Wang, Lin Wang, Tongliang Liu, Xin Zhao, Tom Drummond, Dwarikanath Mahapatra, and Zongyuan Ge. 2021. Relational subsets knowledge distillation for long-tailed retinal diseases recognition. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 3–12.

Digital Library

[63]

Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, and Mubarak Shah. 2022. UNICON: Combating label noise through uniform selection and contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 9676–9686.

[64]

Angelos Katharopoulos and François Fleuret. 2018. Not all samples are created equal: Deep learning with importance sampling. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 2525–2534.

[65]

Harsurinder Kaur, Husanbir Singh Pannu, and Avleen Kaur Malhi. 2019. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR) 52, 4 (2019), 1–36.

[66]

Vishal Kaushal, Rishabh Iyer, Suraj Kothawade, Rohan Mahadev, Khoshrav Doctor, and Ganesh Ramakrishnan. 2019. Learning from less data: A unified data subset selection and active learning framework for computer vision. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, United States, 1289–1299.

[67]

Kenji Kawaguchi and Haihao Lu. 2020. Ordered SGD: A new stochastic optimization framework for empirical risk minimization. In International Conference on Artificial Intelligence and Statistics, Vol. 108. PMLR, Palermo, Sicily, Italy, 669–679.

[68]

Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, and Adina Williams. 2021. Dynabench: Rethinking benchmarking in NLP. arXiv preprint arXiv:2104.14337 (2021).

[69]

Krishnateja Killamsetty, Guttu Sai Abhishek, Aakriti Lnu, Ganesh Ramakrishnan, Alexandre Evfimievski, Lucian Popa, and Rishabh Iyer. 2022. Automata: Gradient based data subset selection for compute-efficient hyper-parameter tuning. Advances in Neural Information Processing Systems (NIPS) 35 (2022), 28721–28733.

[70]

Krishnateja Killamsetty, S. Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. 2021. Grad-Match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning (ICML), Vol. 139. PMLR, 5464–5474.

[71]

Krishnateja Killamsetty, Durga Sivasubramanian, Ganesh Ramakrishnan, and Rishabh Iyer. 2021. GLISTER: Generalization based data subset selection for efficient and robust learning. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 8110–8118.

[72]

Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, and Rishabh Iyer. 2021. Retrieve: Coreset selection for efficient and robust semi-supervised learning. Advances in Neural Information Processing Systems (NIPS) 34 (2021), 14488–14501.

[73]

Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International Conference on Machine Learning (ICML), Vol. 70. PMLR, Sydney, NSW, Australia, 1885–1894.

[74]

Suraj Kothawade, Vishal Kaushal, Ganesh Ramakrishnan, Jeff Bilmes, and Rishabh Iyer. 2022. PRISM: A rich class of parameterized submodular information measures for guided data subset selection. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 10238–10246.

[75]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep. 1 (01 2009).

[76]

M. Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. Advances in Neural Information Processing Systems (NIPS) 23 (2010), 1189–1197.

[77]

Agata Lapedriza, Hamed Pirsiavash, Zoya Bylinskii, and Antonio Torralba. 2013. Are all training examples equally valuable? arXiv preprint arXiv:1311.6510 (2013).

[78]

Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew Peters, Ashish Sabharwal, and Yejin Choi. 2020. Adversarial filters of dataset biases. In International Conference on Machine Learning (ICML), Vol. 119. PMLR, 1078–1088.

[79]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[80]

Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. 2019. Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 10657–10665.

[81]

David D. Lewis. 1995. A sequential algorithm for training text classifiers: Corrigendum and additional data. In ACM SIGIR Forum. ACM New York, NY, USA, 13–19.

[82]

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50, 6 (2017), 1–45.

Digital Library

[83]

Yingwei Li, Yi Li, and Nuno Vasconcelos. 2018. Resound: Towards action recognition without representation bias. In Proceedings of the European Conference on Computer Vision (ECCV). Springer, Tel Aviv, Israel, 513–528.

Digital Library

[84]

Yi Li and Nuno Vasconcelos. 2019. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 9572–9581.

[85]

Weixin Liang, Girmaw Abebe Tadesse, Daniel Ho, Fei-Fei Li, Matei Zaharia, Ce Zhang, and James Zou. 2022. Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence 4 (2022), 669–677.

[86]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 2980–2988.

[87]

Peng Liu, Lizhe Wang, Rajiv Ranjan, Guojin He, and Lei Zhao. 2022. A survey on active deep learning: From model driven to data driven. ACM Computing Surveys (CSUR) 54, 10s (Sep.2022), 34 pages.

Digital Library

[88]

Tongliang Liu and Dacheng Tao. 2015. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3 (2015), 447–461.

Digital Library

[89]

Ilya Loshchilov and Frank Hutter. 2015. Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343 (2015).

[90]

Yucheng Lu, Youngsuk Park, Lifan Chen, Yuyang Wang, Christopher De Sa, and Dean Foster. 2021. Variance reduced training with stratified sampling for forecasting models. In International Conference on Machine Learning (ICML), Vol. 139. PMLR, 7145–7155.

[91]

Yueming Lyu and Ivor W. Tsang. 2019. Curriculum loss: Robust learning and generalization against label corruption. arXiv preprint arXiv:1905.10045 (2019).

[92]

Eran Malach and Shai Shalev-Shwartz. 2017. Decoupling ”when to update” from ”how to update”. Advances in Neural Information Processing Systems (NIPS) 30 (2017), 960–970.

[93]

Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. 2016. Fast constrained submodular maximization: Personalized data summarization. In International Conference on Machine Learning (ICML), Vol. 48. JMLR.org, New York City, NY, 1358–1367.

[94]

Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. 2020. Coresets for robust training of deep neural networks against noisy labels. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 11465–11477.

[95]

Lokesh Nagalapatti, Ruhi Sharma Mittal, and Ramasuri Narayanam. 2022. Is your data relevant?: Dynamic selection of relevant data for federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 7859–7867.

[96]

George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions–I. Mathematical Programming 14, 1 (1978), 265–294.

Digital Library

[97]

David F. Nettleton, Albert Orriols-Puig, and Albert Fornells. 2010. A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review 33, 4 (2010), 275–306.

Digital Library

[98]

Vincent Ng and Claire Cardie. 2002. Combining sample selection and error-driven pruning for machine learning of coreference rules. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). ACL, Philadelphia, PA, USA, 55–62.

Digital Library

[99]

Duc Tam Nguyen, Chaithanya Kumar Mummadi, Thi Phuong Nhung Ngo, Thi Hoai Phuong Nguyen, Laura Beggel, and Thomas Brox. 2019. Self: Learning to filter noisy labels with self-ensembling. arXiv preprint arXiv:1910.01842 (2019).

[100]

Hieu T. Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the Twenty-first International Conference on Machine Learning. Association for Computing Machinery, New York, NY, USA, 79.

Digital Library

[101]

Curtis Northcutt, Lu Jiang, and Isaac Chuang. 2021. Confident learning: Estimating uncertainty in dataset labels. Journal of Artificial Intelligence Research 70 (2021), 1373–1411.

Digital Library

[102]

Curtis G. Northcutt, Tailin Wu, and Isaac L. Chuang. 2017. Learning with confident examples: Rank pruning for robust classification with noisy labels. arXiv preprint arXiv:1705.01936 (2017).

[103]

J. Arturo Olvera-López, J. Ariel Carrasco-Ochoa, J. Martínez-Trinidad, and Josef Kittler. 2010. A review of instance selection methods. Artificial Intelligence Review 34, 2 (2010), 133–143.

Digital Library

[104]

Seulki Park, Jongin Lim, Younghan Jeon, and Jin Young Choi. 2021. Influence-balanced loss for imbalanced visual classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 735–744.

[105]

Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Gholamreza Reza Haffari, Anton Van Den Hengel, and Javen Qinfeng Shi. 2022. Active learning by feature mixing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 12237–12246.

[106]

Mansheej Paul, Surya Ganguli, and Gintare Karolina Dziugaite. 2021. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems (NIPS) 34 (2021), 20596–20607.

[107]

Xinyu Peng, Li Li, and Fei-Yue Wang. 2019. Accelerating minibatch stochastic gradient descent using typicality sampling. IEEE Transactions on Neural Networks and Learning Systems 31, 11 (2019), 4649–4659.

[108]

Geoff Pleiss, Tianyi Zhang, Ethan Elenberg, and Kilian Q. Weinberger. 2020. Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems 33 (2020), 17044–17056.

[109]

Fait Poms, Vishnu Sarukkai, Ravi Teja Mullapudi, Nimit S. Sohoni, William R. Mark, Deva Ramanan, and Kayvon Fatahalian. 2021. Low-shot validation: Active importance sampling for estimating classifier performance on rare categories. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 10705–10714.

[110]

Omead Pooladzandi, David Davini, and Baharan Mirzasoleiman. 2022. Adaptive second order coresets for data-efficient machine learning. In International Conference on Machine Learning (ICML), Vol. 162. PMLR, Baltimore, Maryland, USA, 17848–17869.

[111]

Hiranmayi Ranganathan, Hemanth Venkateswara, Shayok Chakraborty, and Sethuraman Panchanathan. 2017. Deep active learning for image classification. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, Beijing, China, 3934–3938.

Digital Library

[112]

Sachin Ravi and Hugo Larochelle. 2017. Optimization as a model for few-shot learning. In International Conference on Learning Representations. OpenReview.net, Palais des Congrès Neptune, Toulon, France, 1–11. https://openreview.net/forum?id=rJY0-Kcll

[113]

Sachin Ravi and Hugo Larochelle. 2018. Meta-learning for batch mode active learning. (2018). https://openreview.net/forum?id=r1PsGFJPz

[114]

Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu Zhao, Shuai Yi, and Hongsheng Li. 2020. Balanced meta-softmax for long-tailed visual recognition. In Advances in Neural Information Processing Systems (NIPS), Vol. 33. Curran Associates, Inc., 4175–4186.

[115]

Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In International Conference on Machine Learning (ICML), Vol. 80. PMLR, Stockholm, Sweden, 4334–4343.

[116]

Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B. Gupta, Xiaojiang Chen, and Xin Wang. 2021. A survey of deep active learning. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–40.

Digital Library

[117]

Samuel Ritter, David G. T. Barrett, Adam Santoro, and Matt M. Botvinick. 2017. Cognitive psychology for deep neural networks: A shape bias case study. In International Conference on Machine Learning (ICML), Vol. 70. PMLR, Sydney, NSW, Australia, 2940–2949.

[118]

Vít Ruzicka, Stefano D’Aronco, Jan D. Wegner, and Konrad Schindler. 2020. Deep active learning in remote sensing for data efficient change detection. In Proceedings of MACLEAN: MAChine Learning for EArth ObservatioN Workshop co-located with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2020), Vol. 2766. RWTH Aachen University.

[119]

Shiori Sagawa, Aditi Raghunathan, Pang Wei Koh, and Percy Liang. 2020. An investigation of why overparameterization exacerbates spurious correlations. In International Conference on Machine Learning (ICML), Vol. 119. PMLR, 8346–8356.

[120]

Mark Schmidt, Nicolas Le Roux, and Francis Bach. 2017. Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 1 (2017), 83–112.

Digital Library

[121]

Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM 63, 12 (2020), 54–63.

Digital Library

[122]

Ozan Sener and Silvio Savarese. 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489 (2017).

[123]

Ozan Sener and Silvio Savarese. 2017. A geometric approach to active learning for convolutional neural networks. arXiv preprint arXiv:1708.00489 7 (2017).

[124]

Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1–114.

[125]

H. Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory. Association for Computing Machinery, New York, NY, USA, 287–294.

Digital Library

[126]

Yanyao Shen and Sujay Sanghavi. 2019. Learning with bad training data via iterative trimmed loss minimization. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 5739–5748.

[127]

Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1–48.

[128]

Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 761–769.

[129]

Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-Weight-Net: Learning an explicit mapping for sample weighting. Advances in Neural Information Processing Systems (NIPS) 32 (2019), 1917–1928.

[130]

Jun Shu, Xiang Yuan, Deyu Meng, and Zongben Xu. 2022. CMW-Net: Learning a class-aware sample weighting mapping for robust deep learning. arXiv preprint arXiv:2202.05613 (2022).

[131]

Hwanjun Song, Minseok Kim, and Jae-Gil Lee. 2019. Selfie: Refurbishing unclean samples for robust deep learning. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 5907–5915.

[132]

Hwanjun Song, Minseok Kim, Dongmin Park, and Jae-Gil Lee. 2019. How does early stopping help generalization against label noise? arXiv preprint arXiv:1911.08059 (2019).

[133]

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. 2021. Robust learning by self-transition for handling noisy labels. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 1490–1500.

Digital Library

[134]

Hwanjun Song, Minseok Kim, Dongmin Park, Yooju Shin, and Jae-Gil Lee. 2022. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022), 1–19.

[135]

Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).

[136]

Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 843–852.

[137]

Ming Sun, Haoxuan Dou, Baopu Li, Junjie Yan, Wanli Ouyang, and Lei Cui. 2021. AutoSampling: Search for effective data sampling schedules. In International Conference on Machine Learning (ICML), Vol. 139. PMLR, virtual, 9923–9933.

[138]

Zeren Sun, Xian-Sheng Hua, Yazhou Yao, Xiu-Shen Wei, Guosheng Hu, and Jian Zhang. 2020. CRSSC: Salvage reusable samples from noisy data for robust learning. In Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 92–101.

Digital Library

[139]

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. 2020. Dataset cartography: Mapping and diagnosing datasets with training dynamics. arXiv preprint arXiv:2009.10795 (2020).

[140]

Tatiana Tommasi, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. 2017. A Deeper Look at Dataset Bias. Springer, Cham, 37–55.

[141]

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J. Gordon. 2018. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018).

[142]

Antonio Torralba. 2003. Contextual priming for object detection. International Journal of Computer Vision 53, 2 (2003), 169–191.

Digital Library

[143]

Antonio Torralba and Alexei A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Colorado Springs, CO, USA, 1521–1528.

Digital Library

[144]

Ivor W. Tsang, James T. Kwok, Pak-Ming Cheung, and Nello Cristianini. 2005. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research 6, 4 (2005), 363–392.

Digital Library

[145]

Eva Tuba, Ivana Strumberger, Ira Tuba, Nebojsa Bacanin, and Milan Tuba. 2022. Acute lymphoblastic leukemia detection by tuned convolutional neural network. In 2022 32nd International Conference Radioelektronika (RADIOELEKTRONIKA). IEEE, 1–4.

[146]

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. 2018. The iNaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Salt Lake City, UT, USA, 8769–8778.

[147]

Ruxin Wang, Tongliang Liu, and Dacheng Tao. 2017. Multiclass learning with partially corrupted labels. IEEE Transactions on Neural Networks and Learning Systems 29, 6 (2017), 2568–2580.

[148]

Tianyang Wang, Jun Huan, and Bo Li. 2018. Data dropout: Optimizing training data for convolutional neural networks. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, Greece, 39–46.

[149]

Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, and Jiashi Feng. 2020. The devil is in classification: A simple framework for long-tail instance segmentation. In European Conference on Computer Vision. Springer, Cham, 728–744.

Digital Library

[150]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. 2018. Dataset distillation. arXiv preprint arXiv:1811.10959 (2018).

[151]

Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2021), 4555–4576.

[152]

Yiru Wang, Weihao Gan, Jie Yang, Wei Wu, and Junjie Yan. 2019. Dynamic curriculum learning for imbalanced data classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, South Korea, 5017–5026.

[153]

Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, and Zhangyang Wang. 2019. E2-Train: Training state-of-the-art CNNs with over 80% energy savings. Advances in Neural Information Processing Systems (NIPS) 32 (2019), 5139–5151.

[154]

Zifeng Wang, Hong Zhu, Zhenhua Dong, Xiuqiang He, and Shao-Lun Huang. 2020. Less is better: Unweighted data subsampling via influence function. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 6340–6347.

[155]

Kai Wei, Rishabh Iyer, and Jeff Bilmes. 2015. Submodularity in data subset selection and active learning. In International Conference on Machine Learning (ICML), Vol. 37. PMLR, Lille, France, 1954–1963.

[156]

Steven Euijong Whang, Yuji Roh, Hwanjun Song, and Jae-Gil Lee. 2023. Data collection and quality challenges in deep learning: A data-centric AI perspective. The VLDB Journal (2023), 1–23.

[157]

Jiaxi Wu, Jiaxin Chen, and Di Huang. 2022. Entropy-based active learning for object detection with progressive diversity constraint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 9397–9406.

[158]

Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, and Chao Chen. 2020. A topological filter for learning with label noise. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 21382–21393.

[159]

Zhi-Fan Wu, Tong Wei, Jianwen Jiang, Chaojie Mao, Mingqian Tang, and Yu-Feng Li. 2021. NGC: A unified framework for learning with open-world noisy data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 62–71.

[160]

Erkun Yang, Dongren Yao, Tongliang Liu, and Cheng Deng. 2022. Mutual quantization for cross-modal search with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 7551–7560.

[161]

Yazhou Yang and Marco Loog. 2019. Single shot active learning using pseudo annotators. Pattern Recognition 89 (2019), 22–31.

[162]

Yazhou Yao, Xian-sheng Hua, Fumin Shen, Jian Zhang, and Zhenmin Tang. 2016. A domain robust approach for image dataset construction. In Proceedings of the 24th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 212–216.

Digital Library

[163]

Jaehong Yoon, Divyam Madaan, Eunho Yang, and Sung Ju Hwang. 2021. Online coreset selection for rehearsal-based continual learning. arXiv preprint arXiv:2106.01085 (2021).

[164]

Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. 2018. ImageNet training in minutes. In Proceedings of the 47th International Conference on Parallel Processing (Eugene, OR, USA). Association for Computing Machinery, New York, NY, USA, Article 1, 10 pages.

Digital Library

[165]

Sihao Yu, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Zizhen Wang, and Xueqi Cheng. 2022. A re-balancing strategy for class-imbalanced classification based on instance difficulty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New Orleans, LA, USA, 70–79.

[166]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption?. In International Conference on Machine Learning (ICML), Vol. 97. PMLR, Long Beach, California, USA, 7164–7173.

[167]

Yuhang Zang, Chen Huang, and Chen Change Loy. 2021. FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 3457–3466.

[168]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107–115.

Digital Library

[169]

Enwei Zhang, Xinyang Jiang, Hao Cheng, Ancong Wu, Fufu Yu, Ke Li, Xiaowei Guo, Feng Zheng, Weishi Zheng, and Xing Sun. 2021. One for more: Selecting generalizable samples for generalizable ReID model. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 3324–3332.

[170]

HaiYang Zhang, XiMing Xing, and Liang Liu. 2021. DualGraph: A graph-based method for reasoning about label noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 9654–9663.

[171]

Songyang Zhang, Zeming Li, Shipeng Yan, Xuming He, and Jian Sun. 2021. Distribution alignment: A unified framework for long-tail visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2361–2370.

[172]

Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, and Yu Qiao. 2017. Range loss for deep face recognition with long-tailed training data. In Proceedings of the IEEE International Conference on Computer Vision (CVPR). IEEE Computer Society, Venice, Italy, 5409–5418.

[173]

Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, and Larry S. Davis. 2021. VideoLT: Large-scale long-tailed video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, BC, Canada, 7960–7969.

[174]

Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. 2021. Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596 (2021).

[175]

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2021. Dataset condensation with gradient matching. 9th International Conference on Learning Representations, ICLR 2021, Virtual Event 1, 2 (2021), 3.

[176]

Peilin Zhao and Tong Zhang. 2014. Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprint arXiv:1405.3080 (2014).

[177]

Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. 2022. Coverage-centric coreset selection for high pruning rates. arXiv preprint arXiv:2210.15809 (2022).

[178]

Tianyi Zhou, Shengjie Wang, and Jeffrey Bilmes. 2020. Curriculum learning by dynamic instance hardness. Advances in Neural Information Processing Systems (NIPS) 33 (2020), 8602–8613.

[179]

Xiao Zhou, Renjie Pi, Weizhong Zhang, Yong Lin, Zonghao Chen, and Tong Zhang. 2022. Probabilistic bilevel coreset selection. In International Conference on Machine Learning (ICML). PMLR, Baltimore, Maryland, USA, 27287–27302.

Cited By

Tulbure ASzabo IDanciu DTulbure ADulf E(2024)Training cost analysis on a large computer vision defect detection model for ceramics2024 47th International Spring Seminar on Electronics Technology (ISSE)10.1109/ISSE61612.2024.10603525(1-6)Online publication date: 15-May-2024
https://doi.org/10.1109/ISSE61612.2024.10603525
Walther CWalther C(2024)WHY: Perspective: POZE—A Multidisciplinary Framework of LifeHuman Leadership for Humane Technology10.1007/978-3-031-67823-3_1(1-101)Online publication date: 21-Sep-2024
https://doi.org/10.1007/978-3-031-67823-3_1

Index Terms

A Survey of Dataset Refinement for Problems in Computer Vision Datasets

Recommendations

Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
CSCW2

Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has ...
Taming massive distributed datasets: data sampling using bitmap indices
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. The growing sizes are making it extremely hard to store, manage, disseminate, ...
A survey on RGB-D datasets
Abstract
RGB-D data is essential for solving many problems in computer vision. Hundreds of public RGB-D datasets containing various scenes, such as indoor, outdoor, aerial, driving, and medical, have been proposed. These datasets are useful for different ...
Highlights
- 231 public datasets related to RGB-D data gathered and organized.
- A survey with 2119 papers reviewed, and datasets collected from seven distinct sub-areas.
- More than 100 new RGB-D datasets included when compared to the most recent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 7

July 2024

1006 pages

EISSN:1557-7341

DOI:10.1145/3613612

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2024

Online AM: 10 October 2023

Accepted: 26 September 2023

Revised: 24 April 2023

Received: 14 October 2022

Published in CSUR Volume 56, Issue 7

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Project
National Natural Science Foundation of China
Hubei Key R&D
CAAI-Huawei MindSpore Open Fund

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
647
Total Downloads

Downloads (Last 12 months)426
Downloads (Last 6 weeks)57

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tulbure ASzabo IDanciu DTulbure ADulf E(2024)Training cost analysis on a large computer vision defect detection model for ceramics2024 47th International Spring Seminar on Electronics Technology (ISSE)10.1109/ISSE61612.2024.10603525(1-6)Online publication date: 15-May-2024
https://doi.org/10.1109/ISSE61612.2024.10603525
Walther CWalther C(2024)WHY: Perspective: POZE—A Multidisciplinary Framework of LifeHuman Leadership for Humane Technology10.1007/978-3-031-67823-3_1(1-101)Online publication date: 21-Sep-2024
https://doi.org/10.1007/978-3-031-67823-3_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents