ABSTRACT
Computer Vision Problems, such as object detection, object tracking, action recognition and so on, have been, in the past, usually addressed through Statistical Pattern Recognition techniques. SVM, Regression or Neural Networks, are some examples of classical statistical techniques that have been used, quite effectively, in many application contexts of computer vision.
Nevertheless, some attempts have been proposed using more complex data structures (notably graphs) for solving Computer Vision Tasks. However, in terms of performances, their use did not have the same success as techniques based on vector representations. First part of this talk will present some of these proposals, in the context of object tracking ([1]), people re-identification ([3]) and action recognition ([2]). An graph representation is proposed in [1] to deal with occlusion problem. The representation is based on a graph pyramid, namely, each moving region is represented at different levels of resolution using a graph for each level. The algorithm compares the topmost levels of each pyramid in the association phase between moving objects in two consecutive frames. If the comparison outcome is sufficient to assign a label to each node the tracking algorithm stops. Instead, if some ambiguities arise (as it is the case when two objects over- lap), the algorithm is repeated using the next levels of the pyramids, until either a consistent labelling is found. The purpose of re-identification (re-id) is to identify people coming back into the field of view of a camera or to recognize an individual through different cameras in a distributed network. At the heart of the process there is a comparison between signatures given probe and gallery sets. In [3] graphs are used to represent people appearance and comparison is done by means of Graph Kernels. Finally, action recognition is a classification problem in which each video representing an action has to be classified with the correct action label. In [2] we proposed to represent videos using graph sequences and proposed a model inspired from bag-of-words techniques to classify a sequence.
Recently, graphs have gained a lot of attention in the Computer Vision community thanks to the use of this kind of data within deep learning techniques. Graph Neural Networks have demonstrated their effectiveness in solving Computer Vision problems, and in some cases recent proposals have bridged the gap between statistical and structural pattern recognition. Second part of the talk will be devoted to illustrate some of these examples ([4, 5, 6]). Starting from the already mentioned applications in Computer Vision (object tracking, action recognition), we will discuss the new proposals based on Deep Learning with graphs and the open problems in this context.
- Donatello Conte, Pasquale Foggia, Jean-Michel Jolion, and Mario Vento. 2006. A graph-based, multi-resolution algorithm for tracking objects in presence of occlusions. Pattern Recognition 39, 4 (2006), 562–572.Google ScholarDigital Library
- Xavier Cortés, Donatello Conte, and Hubert Cardot. 2018. Bags of graphs for human action recognition. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). Springer, 429–438.Google ScholarDigital Library
- Amal Mahboubi, Luc Brun, and Donatelo Conte. 2018. A structural approach to Person Re-identification problem. In 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, 1616–1621.Google ScholarCross Ref
- Akshay Rangesh, Pranav Maheshwari, Mez Gebre, Siddhesh Mhatre, Vahid Ramezani, and Mohan M Trivedi. 2021. Trackmpnn: A message passing graph neural architecture for multi-object tracking. arXiv preprint arXiv:2101.04206(2021).Google Scholar
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1(2020), 4–24.Google ScholarCross Ref
- Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence.Google ScholarCross Ref
Index Terms
- Graphs in Computer Vision then and now: how Deep Learning has reinvigorated Structural Pattern Recognition
Recommendations
Three-dimensional representations for computer graphics and computer vision
Representing complex three-dimensional objects in a computer involves more than just evaluating its display capabilities. Other factors are the uses and costs of the representation, what operations can be performed on it and, ultimately, how useful it ...
Three-dimensional representations for computer graphics and computer vision
SIGGRAPH '78: Proceedings of the 5th annual conference on Computer graphics and interactive techniquesRepresenting complex three-dimensional objects in a computer involves more than just evaluating its display capabilities. Other factors are the uses and costs of the representation, what operations can be performed on it and, ultimately, how useful it ...
Facial Expression Recognition from Occluded Images Using Deep Convolution Neural Network with Vision Transformer
Image and GraphicsAbstractFacial expression recognition (FER) is a challenging task due to various unrestricted conditions. Normal facial expression algorithms work well on frontal faces. However, detection expression from the occluded faces is still a challenging task. In ...
Comments