Abstract
The small group detection has become one of the most important step in crow scene analysis, which has several application in surveillance-video (i.e. for detecting, preventing and predicting dangerous situations). A Steading Conversational Group (a.k.a. F-Formation) is a kind of small group, where their stationary people interact through social signals (i.e. non-verbal expressions). The proposed state-of-the-art methods have reported encouraging results; however, they are based on complex theories. Moreover, these methods have had difficulties for rehearsing and high computational complexity. In this paper, we propose a new method for detecting F-Formation in an image. We introduce a new representation and clustering method, basing our solution on the fuzzy relation theory. The performance of our proposal is evaluated and compared against other reported methods over a synthetic and two real-world databases. The experimental results show the effectiveness of our proposal.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
In Pattern Recognition and Computer Vision, several works are focused to automate the scene analysis [1, 2]. They are based on some social, biological and psychological theories. These works have shown that scenes with crowds are composed by small groups of people and the behavior of the groups are given by the interactions between people [3].
The small groups detection is an important step in scene analysis, allowing to obtain high levels of semantic interpretation. It has several application in surveillance-video, such as: anomaly detection and video classification [4,5,6,7].
Steading Conversational Group (a.k.a. F-Formation) is a kind of small groups, which has achieved great interest in the scientific community. An F-Formation is composed by stationary people, which interact through social signals (i.e. non-verbal expressions) [1]. Beside, the people form patterns of space and orientation between them, when they are interacting. Moreover, they have equal and exclusive access to a space inside of the F-Formation [8].
An F-Formation is composed by three spaces: O-space, P-space and R-space (see Fig. 1(a)). The O-space is an empty space, which is surrounded by oriented people toward it. This is the most important space, because most of the algorithms reported in the literature are based on it. The P-space involves the O-space, while the R-space is the complement of the P-space.
An F-Formation can take different geometrical forms: L-form, Face-to-Face, Side-by-Side and Circular form (see Figs. 1(b), (c), (d) and (e), respectively). When the number of people in an F-Formation is longer than two, the F-Formation has commonly the circular form.
Several approaches were proposed to detect F-Formations [9,10,11,12]. They have used some features, such as: the people positions on the ground floor and their head orientations.
The first approach is based on the Hough transformation [9, 10], where an accumulator space for finding many local maximum by a vote strategy is created. Each local maximum represents an O-space center, where the people are assigned.
Other approaches are based on graph theory [13], where people and their relations are represented by vertices and edges, respectively. In this way, the F-Formation detection is reduced to the maximal clique detection (i.e. dominant set) problem [11, 14].
The methods proposed in [12, 15] are based on the game-theory [16], where the F-Formation detection is reduced to a clustering problem over an evolutionary environment.
The aforementioned approaches have high computational complexity, are based on complex theories and have had difficulties for rehearsing. In the literature, little efforts have been realized for reducing the mentioned difficulties, but, to the best of our knowledge, only in [17], the authors treat to solver it. However, this method requires large number of parameters and the group detection is not automatic. Furthermore, it is designed for detecting F-Formations in sequence of images. Based on [17], we propose a solution for reducing number of parameter and automatically detecting F-Formations.
The main contributions of this paper are: (1) a new method for detecting F-Formations in an image, (2) a new image representation, where a membership function for computing social people relations is introduced, and (3) an automatic clustering for associating people with their O-space.
The basic outline of this paper is the following. In Sect. 2, some basic concepts are provided. Section 3 contains the description of the proposed method. The experimental results are discussed in Sect. 4. Finally, conclusions and some ideas about future directions are exposed in Sect. 5.
2 Basic Concepts
In this section, we show a set of concepts, which are required for understanding our proposal.
Definition 1
(Fuzzy relation). Let X and Y be two sets, a fuzzy relation from X to Y is a membership function \(\rho :X \times Y \rightarrow [0,1]\). If \(X=Y\), then \(\rho \) is named fuzzy relation on X.
By Definition 1, the similarity relation can be defined as follows.
Definition 2
(Similarity relation (by Zadeh [18])). The fuzzy relation \(\rho \) on X is a similarity relation if for all \(x,y \in X\) the followed properties are fulfilled:
-
Reflexivity: \(\rho (x,x) = 1\)
-
Symmetry: \(\rho (x,y) = \rho (y,x)\)
-
Transitivity: \(\rho (x,y) \ge \displaystyle \max _{z \in X} \left\{ \min \left\{ \rho (x,z),\rho (z,y) \right\} \right\} \).
Sometimes, a fuzzy relation is represented by a matrix, which is known as fuzzy matrix (see Definition 3).
Definition 3
(Fuzzy matrix). A matrix M is an \(m \times n\) fuzzy matrix if each cell of M has a value in the interval [0, 1].
On a fuzzy matrix M, which represents a similarity relation, we define an F-Formation as follows:
Definition 4
(F-Formation). An F-Formation is a set of connected cell indexes of M, where their corresponding cell values are greater than a given \(\alpha \), \(\in [0,1]\).
3 The Proposed Method
Given a database of images with people positions on the floor and orientations (i.e. the head or body orientations), our proposal carry out two steps: (1) to build a representation, where people relations are modeled through fuzzy relations, which later are codified in a fuzzy matrix (see Sect. 3.1), and (2) to cluster fuzzy relations for detecting F-Formations (see Sect. 3.2).
3.1 Representation
Let \((x_k, y_k)\) and \(\sigma _k\), be the position and the orientation of a person \(p_k\), and let \(v_k = [x_k + r \cdot \cos {{\sigma }_k}, y_k + r \cdot \sin {{{\sigma }_k}}]\) be their vote point [9], where r is the vote length. The visual field interaction between people \(p_i\) and \(p_j\), \(i,j \in [1,k]\) is computed by their frustum interception (i.e. the vote point interception).
In [17], the authors proposed an idea based on vote points, where each person frustumFootnote 1 is represented by a vote point. However, this idea fall on assumption of perfect alienation [9], where some F-Formations detections could be missed (see Fig. 2(a)). For this reason, we propose an alternative of the idea proposed in [17], where we represent each person frustum by three vote points. In this way, we avoid the assumption of perfect alienation (see Fig. 2(b)).
According to [17], a valid frustum interception between \(p_i\) and \(p_j\), \(i,j \in [1,k]\), is fulfilled by the following two rules: (1) both vote points \(v_i\) and \(v_j\) must be on the same side of the segment d, and (2) the distance between \(p_i\) and \(p_j\) (i.e. length of d) must be longer than the distance between \(v_i\) and \(v_j\). Notice that, the previous rules are accomplished only for \(r = d/2\) value.
In our proposal (see Algorithm 1), for each people \(p_i\) and \(p_j\) we compute their votes points \(v_{i,l}\) and \(v_{j,l}\), \(l \in [1,3]\) by the Eqs. 1, 2 and 3. For searching a valid frustum interception, we use the Eqs. 6 and 7 after build a matrix X where their elements are values of the distances between the vote points.
For computing the social relation between two people, we propose the membership function \(\mu _{i,j}\) (see Eq. 8), the h values are taken from the Hall theory [19]. The Hall theory characterizes people social interactions by physical distances. The value of u is the minimum value taken from the positive elements of X.
When all element of X are negative values, then \(\mu _{i,j} = 0\). However, when \(i = j\) (i.e. \(d = 0\)), \(\mu _{i,j} = 1\). Notice that, \(\mu _{i,j}\) is a fuzzy relation on a people set, and our representation is a fuzzy matrix M with \(\mu _{i,j}\) values.
3.2 Clustering
We propose the Algorithm 2 for clustering, which uses ClusteringRF algorithm [17] for transforming the input fuzzy matrix M in a similarity relation matrix \(M'\) (i.e. a fuzzy relation fulfills the reflexivity, symmetry and transitivity properties) [18] and generating a partition \(C_ \alpha = \{c_1... c_k\}\) by an \(\alpha \)-cut.
For determining the number of clusters (i.e. F-Formations number within an image), we use a naive average of scores and select a \(C_ \alpha \) for the maximal \(w_\alpha \) value. Notice that, \(|C_ \alpha |\) is the cluster number, \(|c_k|\) is the number of elements in the cluster k and \(M'(i,j)\) is a value of the fuzzy matrix \(M'\).
4 Experimental Results
In this section, we present the experimental evaluation of our proposed method; comparing its results against the best results reported in the literature over two real-world databases (Coffee Break [9] and GDet [9]) and one synthetic database (Synth [9]).
4.1 Databases
Coffee Break database [9], was obtained from a real-world environment in outdoor scenario, from a single camera with a resolution of \(1440 \times 1080\) px. It is composed by social events of people which are interacting and enjoying a cup of coffee. This database has 120 annotated images by psychologists using several questionnaires, where head orientations were estimated considering four directions: front, back, left and right.
GDet database [9], was obtained from an indoor scenario of vending machines area with several occlusions. It has 403 images, which were acquired by two low resolution cameras with \(352 \times 328\) px, located on opposite angles of the room. Ground truth generations were carried out by psychologists, where the head orientations were estimated considering four directions (front, back, left and right) and people position were computed by a particle filter tracking algorithm [20].
Synth database [9] was generated by a trained expert, contains 100 situations provided by using 10 different based situations and slightly varying the position and head orientations of the people. It is important to highlight that, there are not noise in this database.
4.2 Experiments
For evaluating our proposal, we use the validation protocol proposed in [12], where a group is correctly detected if at least \(\lceil T \cdot |G| \rceil \) of its members are found and not more than \(\lceil (1-T) \cdot |G| \rceil \) are not members. The value |G| is the cardinality of the labeled group and \(T = 2/3\). For each image, the precision p, sensitivity s and the parameter F1 are computed for each group formation.
Our experiments were carried out with C++ on Eclipse, using opencv and armadillo libraries, over a personal computer Intel(R) Core(TM) 2 Duo CPU with 1.83 GHz and 2 GB RAM, with the Ubuntu 18.04.2 distribution.
Table 1 shows the obtained results by the related works reported in the literature, as well as, the results achieved by our proposal. In the first column of this table, the name of the methods are shown. In the other three columns, the precision, sensitivity and F1 values achieved over Coffee Break, GDet and Synth, highlighting the best results of each columns.
We varied h values between intimate and social space of Hall theory [19] (i.e [0, 360]). For generating each people frustum, the orientation \(\sigma _k\) are token of the people head and the vote points \(v_{k, g}, g \in \{1,3\}\) are computed with angles \(\sigma _k \pm \gamma \), where \(\gamma \) values are between 0 and 60 for an effective visual field. Our best results are achieved with \(h = 70\) and \(\gamma = 30\) in Coffee Break, \(h = 10\) and \(\gamma = 60\) in GDet and \(h = 116\) and \(\gamma = 30\) in Synth database.
We obtained different results because theses databases represent different environments, where the crowd level changed in the scene (i.e. the people number, occlusion and distance of interaction between them). For this reason, in practice, the parameter h and \(\gamma \) must be carefully selected in a pre-processing step. We recommend to decrease value of h, and to increase value of \(\gamma \), when the crowd level increase.
For showing only an example, in Fig. 3, we show a result of our proposal in Coffee Break database, where circles with the same color over head people represent the same detected Steading Conversational Groups. Notice that, are 4 small groups (green, red, yellow and blue groups), with cardinality between 2 and 3.
5 Conclusions and Future Works
In this paper, we proposed a new method for detecting Steading Conversational Group (F-Formation) in a still image. We based our proposal on fuzzy relations theory for building a new representation with three vote points. Moreover, we proposed a clustering on fuzzy relations, for obtaining the best number of F-Formation. We evaluate our proposal over two real-world databases (Coffee Break and GDet) and a synthetic one (Synth).
The results archived by our proposal outperform the best ones reported in the literature over the GDet database, keeping similar results over Coffee Break, while in Synth we obtained the best results. Based on our experiments, we can conclude that our proposal is an effective and simple solution. In the future, we will explore several internal index validation clustering for improving as possible the number of F-Formation.
Notes
- 1.
A frustum is a biological area where interactions between people often occur [12].
References
Vinciarelli, A., Pantic, M., Bourlard, H.: Social signal processing: Survey of an emerging domain. Image Vis. Comput. 27(12), 1743–1759 (2009)
Li, T., Chang, H., Wang, M., Ni, B., Hong, R., Yan, S.: Crowded scene analysis: a survey. IEEE Trans. Circ. Syst. Video Technol. 25(3), 367–386 (2015)
Moussaïd, M., Perozo, N., Garnier, S., Helbing, D., Theraulaz, G.: The walking behaviour of pedestrian social groups and its impact on crowd dynamics. PLoS ONE 5(4), e10047 (2010)
Musse, S.R., Thalmann, D.: A model of human crowd behavior : group inter-relationship and collision detection analysis. In: Thalmann, D., van de Panne, M. (eds.) Computer Animation and Simulation 1997. Eurographics. Springer, Vienna (1997). https://doi.org/10.1007/978-3-7091-6874-5_3
Shao, J., Loy, C., Wang, X.: Scene-independent group profiling in crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2219–2226 (2014)
Cosar, S., Donatiello, G., Bogorny, V., Gárate, C., Alvares, L.O., Brémond, F.: Toward abnormal trajectory and event detection in video surveillance. IEEE Trans. Circuits Syst. Video Techn. 27(3), 683–695 (2017)
Liu, C., Wang, G., Ning, W., Lin, X., Li, L., Liu, Z.: Anomaly detection in surveillance video using motion direction statistics. In: Proceedings of the International Conference on Image Processing, ICIP 2010, 26–29 September 2010, Hong Kong, China, pp. 717–720 (2010)
Kendon, A.: Spacing and orientation in co-present interaction. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony. LNCS, vol. 5967, pp. 1–15. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12397-9_1
Cristani, M., et al.: Social interaction discovery by statistical analysis of F-formations. In: BMVC, vol. 2, p. 4(2011)
Setti, F., Lanz, O., Ferrario, R., Murino, V., Cristani, M.: Multi-scale F-formation discovery for group detection. In: 2013 IEEE International Conference on Image Processing, pp. 3547–3551. IEEE (2013)
Zhang, L., Hung, H.: Beyond F-formations: Determining social involvement in free standing conversing groups from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1086–1095 (2016)
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: Detecting conversational groups in images and sequences: a robust game-theoretic approach. Comput. Vis. Image Underst. 143, 11–24 (2016)
West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River (2001)
Hung, H., Kröse, B.: Detecting F-formations as dominant sets. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 231–238. ACM (2011)
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: A game-theoretic probabilistic approach for detecting conversational groups. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 658–675. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16814-2_43
Sigmund, K.: Introduction to evolutionary game theory. In: Sigmund, K., (ed.) Evolutionary Game Dynamics, vol. 69, pp. 1–26 (2011)
Ferrera, E., Acosta, N., Alonso, A., García, E.: Detecting free standing conversational group in video using fuzzy relations. INFORMATICA 30(1), 21–32 (2019)
Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3(2), 177–200 (1971)
Hall, E.T.: The Hidden Dimension. New York (1966)
Lanz, O.: Approximate Bayesian multibody tracking. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1436–1449 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ferrera-Cedeño, E., Acosta-Mendoza, N., Gago-Alonso, A. (2019). Detecting Steading Conversational Groups on an Still Image: A Single Relational Fuzzy Approach. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-33904-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33903-6
Online ISBN: 978-3-030-33904-3
eBook Packages: Computer ScienceComputer Science (R0)