research-article

Unsupervised Learning of Human Action Categories in Still Images with Deep Representations

Authors:

Xiaoqiang LuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 4

Article No.: 112, Pages 1 - 20

https://doi.org/10.1145/3362161

Published: 16 December 2019 Publication History

Abstract

In this article, we propose a novel method for unsupervised learning of human action categories in still images. In contrast to previous methods, the proposed method explores distinctive information of actions directly from unlabeled image databases, attempting to learn discriminative deep representations in an unsupervised manner to distinguish different actions. In the proposed method, action image collections can be used without manual annotations. Specifically, (i) to deal with the problem that unsupervised discriminative deep representations are difficult to learn, the proposed method builds a training dataset with surrogate labels from the unlabeled dataset, then learns discriminative representations by alternately updating convolutional neural network (CNN) parameters and the surrogate training dataset in an iterative manner; (ii) to explore the discriminatory information among different action categories, training batches for updating the CNN parameters are built with triplet groups and the triplet loss function is introduced to update the CNN parameters; and (iii) to learn more discriminative deep representations, a Random Forest classifier is adopted to update the surrogate training dataset, and more beneficial triplet groups then can be built with the updated surrogate training dataset. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method.

References

[1]

Kashif Ahmad, Mohamed Lamine Mekhalfi, Nicola Conci, Farid Melgani, and Francesco G. B. De Natale. 2018. Ensemble of deep models for event recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2, 51:1--51:20.

Digital Library

[2]

Miguel Ángel Bautista, Artsiom Sanakoyeu, and Björn Ommer. 2017. Deep unsupervised similarity learning using partially ordered sets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1923--1932.

[3]

Miguel Ángel Bautista, Artsiom Sanakoyeu, Ekaterina Tikhoncheva, and Björn Ommer. 2016. CliqueCNN: Deep unsupervised exemplar learning. In Advances in Neural Information Processing Systems. NIPSF, 3846--3854.

[4]

Anna Bosch, Andrew Zisserman, and Xavier Muñoz. 2007. Image classification using random forests and ferns. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1--8.

[5]

Lukas Bossard, Matthieu Guillaumin, and Luc J. Van Gool. 2014. Food-101— mining discriminative components with random forests. In Proceedings of the European Conference on Computer Vision. Springer, 446--461.

[6]

Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32.

Digital Library

[7]

Deng Cai, Xiaofei He, and Jiawei Han. 2005. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering 17, 12, 1624--1637.

Digital Library

[8]

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision. Springer, 139--156.

Digital Library

[9]

Vincent Delaitre, Ivan Laptev, and Josef Sivic. 2010. Recognizing human actions in still images: A study of bag-of-features and part-based representations. In Proceedings of the British Machine Vision Conference. BMVA, 1--11.

[10]

Vincent Delaitre, Josef Sivic, and Ivan Laptev. 2011. Learning person-object interactions for action recognition in still images. In Advances in Neural Information Processing Systems. NIPSF, 1503--1511.

[11]

Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised visual representation learning by context prediction. In Advances in Neural Information Processing Systems. NIPSF, 1422--1430.

[12]

Alexey Dosovitskiy, Jost Tobias Springenberg, Martin A. Riedmiller, and Thomas Brox. 2014. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems. NIPSF, 766--774.

[13]

Haoshu Fang, Jinkun Cao, Yu-Wing Tai, and Cewu Lu. 2018. Pairwise body-part attention for recognizing human-object interactions. In Proceedings of the European Conference on Computer Vision. Springer, 52--68.

Digital Library

[14]

Basura Fernando, Sareh Shirazi, and Stephen Gould. 2017. Unsupervised human action detection by action matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1604--1612.

[15]

Georgia Gkioxari, Ross B. Girshick, and Jitendra Malik. 2015. Actions and attributes from wholes and parts. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2470--2478.

Digital Library

[16]

Georgia Gkioxari, Ross B. Girshick, and Jitendra Malik. 2015. Contextual action recognition with R*CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1080--1088.

Digital Library

[17]

Guodong Guo and Alice Lai. 2014. A survey on still image based human action recognition. Pattern Recognition 47, 10, 3343--3361.

[18]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2980--2988.

[19]

Min Huang, Song-Zhi Su, Hongbo Zhang, Guo-Rong Cai, Dong-Ying Gong, Donglin Cao, and Shao-Zi Li. 2018. Multifeature selection for 3D human action recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2, 45:1--45:18.

Digital Library

[20]

Nazli Ikizler, Ramazan Gokberk Cinbis, Selen Pehlivan, and Pinar Duygulu. 2008. Recognizing actions from still images. In Proceedings of the International Conference on Pattern Recognition. IEEE, 1--4.

[21]

Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8, 651--666.

Digital Library

[22]

Herve Jegou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3304---3311.

[23]

Shuhui Jiang, Yue Wu, and Yun Fu. 2018. Deep bidirectional cross-triplet embedding for online clothing shopping. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1, 5:1--5:22.

Digital Library

[24]

Qiuhong Ke, Mohammed Bennamoun, Senjian An, Ferdous Ahmed Sohel, and Farid Boussaïd. 2018. Learning clip representations for skeleton-based 3D action recognition. IEEE Transactions on Image Processing 27, 6, 2842--2855.

[25]

Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceedings of the European Symposium on Artificial Neural Networks. i6doc.com publication, 489--494.

[26]

Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2169--2178.

Digital Library

[27]

Dieu-Thu Le, Raffaella Bernardi, and Jasper R. R. Uijlings. 2013. Exploiting language models to recognize unseen actions. In Proceedings of the International Conference on Multimedia Retrieval. ACM, 231--238.

[28]

Quoc V. Le. 2013. Building high-level features using large scale unsupervised learning. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing. IEEE, 8595--8598.

[29]

Honglak Lee, Roger B. Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the International Conference on Machine Learning. ACM, 609--616.

[30]

Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2017. Unsupervised representation learning by sorting sequences. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 667--676.

[31]

Fei-Fei Li, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 178--178.

[32]

Piji Li, Jun Ma, and Shuai Gao. 2011. Actions in still web images: Visualization, detection and retrieval. In Web-Age Information Management. 302--313.

[33]

Sheng Li, Kang Li, and Yun Fu. 2018. Early recognition of 3D human actions. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1s, 20:1--20:21.

Digital Library

[34]

Xin Li and Mooi Choo Chuah. 2018. ReHAR: Robust and efficient human activity recognition. In Proceedings of the IEEE Conference on Applications of Computer Vision. IEEE, 362--371.

[35]

Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, and Alex C. Kot. 2019. Skeleton-based online action prediction using scale selection network. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]

Jiawei Liu, Zheng-Jun Zha, Xuejin Chen, Zilei Wang, and Yongdong Zhang. 2019. Dense 3D-convolutional neural network for person re-identification in videos. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s, 8:1--8:19.

Digital Library

[37]

David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2, 91--110.

Digital Library

[38]

Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the International Conference on Machine Learning. IMLS, 2275--2284.

[39]

Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, and Stan Sclaroff. 2017. Do less and achieve more: Training CNNs for action recognition utilizing action images from the web. Pattern Recognition 68, 334--345.

Digital Library

[40]

Subhransu Maji, Lubomir D. Bourdev, and Jitendra Malik. 2011. Action recognition from a distributed representation of pose and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3177--3184.

Digital Library

[41]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.

[42]

Juan Carlos Niebles, Hongcheng Wang, and Fei-Fei Li. 2008. Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79, 3, 299--318.

Digital Library

[43]

Christos H. Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall.

Digital Library

[44]

Alessandro Prest, Cordelia Schmid, and Vittorio Ferrari. 2012. Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 3, 601--614.

Digital Library

[45]

Lei Qi, Xiaoqiang Lu, and Xuelong Li. 2018. Action recognition by jointly using video proposal and trajectory. In ACM International Conference on Vision, Image and Signal Processing. ACM, 4--4.

Digital Library

[46]

Hossein Rahmani and Mohammed Bennamoun. 2017. Learning action recognition model from depth and skeleton videos. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 5833--5842.

[47]

Nima Razavi, Juergen Gall, and Luc J. Van Gool. 2011. Scalable multi-class object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1505--1512.

[48]

Marko Ristin, Matthieu Guillaumin, Juergen Gall, and Luc J. Van Gool. 2016. Incremental learning of random forests for large-scale image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 3, 490--503.

Digital Library

[49]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: a unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 815--823.

[50]

Fadime Sener, Cagdas Bas, and Nazli Ikizler-Cinbis. 2012. On recognizing actions in still images via multiple features. In Proceedings of the European Conference on Computer Vision. Springer, 263--272.

Digital Library

[51]

Gaurav Sharma, Frédéric Jurie, and Cordelia Schmid. 2017. Expanded parts model for semantic description of humans in still images. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 1, 87--101.

Digital Library

[52]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. ICLR.

[53]

Khurram Soomro and Mubarak Shah. 2017. Unsupervised action discovery and localization in videos. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 696--705.

[54]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: a dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01.

[55]

Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583--617.

Digital Library

[56]

Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. 2010. Locality-constrained linear coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3360--3367.

[57]

Peisong Wang, Qinghao Hu, Zhiwei Fang, Chaoyang Zhao, and Jian Cheng. 2018. DeepSearch: a fast image search framework for mobile devices. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1, 6:1--6:22.

Digital Library

[58]

Xiaolong Wang, Kaiming He, and Abhinav Gupta. 2017. Transitive invariance for self-supervised visual representation learning. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1338--1347.

[59]

Yang Wang, Hao Jiang, Mark S. Drew, Ze-Nian Li, and Greg Mori. 2006. Unsupervised discovery of action classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1654--1661.

Digital Library

[60]

Chenxia Wu, Jiemi Zhang, Silvio Savarese, and Ashutosh Saxena. 2015. Watch-n-patch: Unsupervised understanding of actions and relations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 4362--4370.

[61]

Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 5177--5186.

[62]

Jianwei Yang, Devi Parikh, and Dhruv Batra. 2016. Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 5147--5156.

[63]

Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas J. Guibas, and Fei-Fei Li. 2011. Human action recognition by learning bases of action attributes and parts. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1331--1338.

Digital Library

[64]

Bangpeng Yao and Fei-Fei Li. 2012. Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9, 1691--1703.

Digital Library

[65]

Mark Yatskar, Luke S. Zettlemoyer, and Ali Farhadi. 2016. Situation recognition: Visual semantic role labeling for image understanding. In Proceedings of the IEEE Conference Computer Vision and Pattern Recognition. IEEE, 5534--5542.

[66]

Yuan Yuan, Lei Qi, and Xiaoqiang Lu. 2016. Action recognition by joint learning. Image and Vision Computing 55, 77--85.

Digital Library

[67]

Yuan Yuan, Yang Zhao, and Qi Wang. 2018. Action recognition using spatial-optical data organization and sequential learning framework. Neurocomputing 315, 221--233.

[68]

Yu Zhang, Li Cheng, Jianxin Wu, Jianfei Cai, Minh N. Do, and Jiangbo Lu. 2016. Action recognition in still images with minimum annotation efforts. IEEE Transactions on Image Processing 25, 11, 5479--5490.

[69]

Shichao Zhao, Yanbin Liu, Yahong Han, Richang Hong, Qinghua Hu, and Qi Tian. 2018. Pooling the convolutional layers in deep ConvNets for video action recognition. IEEE Transactions on Circuits and Systems for Video Technology 28, 8, 1839--1849.

Digital Library

[70]

Zhichen Zhao, Huimin Ma, and Xiaozhi Chen. 2016. Semantic parts based top-down pyramid for action recognition. Pattern Recognition Letters 84, 134--141.

[71]

Yin Zheng, Yu-Jin Zhang, Xue Li, and Bao-Di Liu. 2012. Action recognition in still images using a combination of human pose and context information. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 785--788.

[72]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2018. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1 (2018), 13:1--13:20.

Digital Library

[73]

Yu Zhu, Wenbin Chen, and Guodong Guo. 2015. Fusing multiple features for depth-based action recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 6, 2, 18:1--18:20.

[74]

Maryam Ziaeefard and Robert Bergevin. 2015. Semantic human activity recognition: a literature review. Pattern Recognition 48, 8, 2329--2345.

Digital Library

Cited By

Cai ZFan QLi LYu LLi C(2024)An efficient Meta-VSW method for ship behaviors recognition and applicationOcean Engineering10.1016/j.oceaneng.2024.118870311(118870)Online publication date: Nov-2024
https://doi.org/10.1016/j.oceaneng.2024.118870
Liang SMa WXie C(2023)Relation with Free Objects for Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361759620:2(1-19)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3617596
Ghimire AKakani VKim H(2023)SSRT: A Sequential Skeleton RGB Transformer to Recognize Fine-Grained Human-Object Interactions and Action RecognitionIEEE Access10.1109/ACCESS.2023.327897411(51930-51948)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3278974
Show More Cited By

Index Terms

Unsupervised Learning of Human Action Categories in Still Images with Deep Representations
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Computer vision tasks
        Activity recognition and understanding
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Improve Deep Learning with Unsupervised Objective
Neural Information Processing
Abstract
We propose a novel approach capable of embedding the unsupervised objective into hidden layers of the deep neural network (DNN) for preserving important unsupervised information. To this end, we exploit a very simple yet effective unsupervised ...
Unsupervised learning from videos using temporal coherency deep networks
Abstract
In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this ...
Graphical abstract

Display Omitted
Highlights
- Results for the action and scene discovery problems are presented.
- Our models ...
Unsupervised Cell Segmentation in Fluorescence Microscopy Images via Self-supervised Learning
Pattern Recognition and Artificial Intelligence
Abstract
Cell segmentation in microscopy images is challenging particularly when only few or no annotations available. Existing unsupervised deep learning-based segmentation methods rely on large data sets to train large networks, use synthetic training ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 4

November 2019

322 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3376119

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2019

Accepted: 01 September 2019

Revised: 01 June 2019

Received: 01 August 2018

Published in TOMM Volume 15, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Key Research Program of Frontier Sciences, CAS
National Key R8D Program of China
CAS “Light of West China” Program
National Natural Science Foundation of China
Young Top-notch Talent Program of Chinese Academy of Sciences

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cai ZFan QLi LYu LLi C(2024)An efficient Meta-VSW method for ship behaviors recognition and applicationOcean Engineering10.1016/j.oceaneng.2024.118870311(118870)Online publication date: Nov-2024
https://doi.org/10.1016/j.oceaneng.2024.118870
Liang SMa WXie C(2023)Relation with Free Objects for Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361759620:2(1-19)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3617596
Ghimire AKakani VKim H(2023)SSRT: A Sequential Skeleton RGB Transformer to Recognize Fine-Grained Human-Object Interactions and Action RecognitionIEEE Access10.1109/ACCESS.2023.327897411(51930-51948)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3278974
GÜLDAL S(2021)UNSUPERVISED MACHINE LEARNING ALGORITHM TO SOLVE KNIGHT COVERING PROBLEM FOR 6 BY 6 BOARD6'YA 6 TAHTA ÜZERİNDE AT KAPLAMA PROBLEMİNİ ÇÖZMEK İÇİN DENETİMSİZ MAKİNE ÖĞRENME ALGORİTMASIAdıyaman Üniversitesi Mühendislik Bilimleri Dergisi10.54365/adyumbd.9806608:15(414-426)Online publication date: 31-Dec-2021
https://doi.org/10.54365/adyumbd.980660
Sun BYuan NLi SWu SWang N(2021)Human behaviour recognition with mid‐level representations for crowd understanding and analysisIET Image Processing10.1049/ipr2.1214715:14(3414-3424)Online publication date: 25-Feb-2021
https://doi.org/10.1049/ipr2.12147

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents