Composite distance based approach to von Mises mixture reduction
Introduction
Many statistical and engineering problems [1], [2], [3] require modeling of complex multi-modal data, wherein mixture distributions became an inevitable tool. In this paper we draw attention to finite mixtures of a specific distribution on the unit circle, the von Mises distribution. Starting from 1918 and the seminal work of von Mises [4], where he investigated hypothesis on integrality of atomic weights of chemical elements, the proposed parametric density plays a pertinent role in directional statistics with wide range of applications in physics, biology, image analysis, neural science and medicine – confer monograms [5], [6], [7] and references therein.
Estimation of complex data by mixture distributions may lead to models with large or, in applications like target tracking, ever increasing number of components. In lack of efficient reduction procedures, such models become computationally intractable and lose their feasibility. Therefore, component number reduction in mixture models is an essential tool in many domains like image and multimedia indexing [8], [9], speech segmentation [10], and it is an indispensable part of any tracking system with mixtures of Gaussian [11], [12], [13] or von Mises distributions [3]. The subject matter is particularly relevant to the information fusion domain since it relates to the following challenging problems in multisensor data fusion [14]: data dimensionality, processing framework, and data association. These problems are related to component reduction by the fact that measurement data as quantity of interest can be preprocessed (compressed) prior to communicating it to other nodes (in a decentralized framework) or the fusion center, thus effectively saving on the communication bandwidth and power required for transmitting data. For example, consider the problem of people trajectory analysis with von Mises mixtures [2] in a distributed sensor networks where the mixtures might need to be communicated between the sensor nodes. Motivated by [2], [3], in this paper we study methods and respective algorithms for component number reduction in mixtures of von Mises distributions, but due to the general exposition of the subject in the framework of exponential family mixtures, the methods and findings easily extend to other examples like mixtures of Gaussian distributions, and von Mises-Fisher distributions [5].
Existing literature on mixture reduction schemes is mostly related to Gaussian mixture models. A reduction scheme for Gaussian mixtures in the context of Bayesian tracking systems in a cluttered environment, which successively merges the closest pair of components and henceforth referred to as the joining algorithm, was proposed in [11]. The main drawback of the scheme is its local character, which gives no information about the global deviation of the reduced mixture from the original one. In [15] the mixture reduction was formulated as an optimization problem for the integral square difference cost function. A better suited distance measure between probability distributions is the Kullback–Leibler (KL) distance [16], but it lacks a closed form formula between mixtures, what makes it computationally inconvenient. Several concepts have been employed to circumvent this problem. A new distance measure between mixture distributions, based on the KL distance, which can be expressed analytically was derived in [17], and utilized to solve the mixture reduction problem. In [12] an upper bound for the KL distance was obtained and used as dissimilarity measure in a successive pairwise reduction of Gaussian mixtures – henceforth we refer to it as the pairwise merging algorithm. Unlike the joining algorithm, this procedure gives a control of the global deviation of the reduced mixture from the original one. Introducing the notion of Bregman information, the authors in [18] generalized the previously developed Gaussian mixture reduction concepts to arbitrary exponential family mixtures. Further development of these techniques for exponential family mixtures can be found in [19], [20], [21], [22], [23], [24]. Finally, we mention the variational Bayesian approach [25], [26] as well as [27] as alternative concepts of mixture reduction developed for Gaussian mixtures.
Contributions of the present paper are as follows. Firstly, we formulate the problem of component number reduction in exponential family mixtures as an optimization problem utilizing a new class of composite distance measures as cost functions. These distance measures are constructed employing Rényi -divergences as ground distances, and it is shown that the composite distance bounds the corresponding Rényi -divergence from above (see Lemma 1 below). This inequality is very important since it provides an information on the global deviation of the reduced mixture from the original one measured by the Rényi -divergence. Secondly, we synthesize previously developed reduction techniques [12], [18], [24] in the sense that they can all be interpreted as suboptimal solution strategies to the proposed optimization problem. For the purpose of computational complexity and accuracy comparisons, the joining algorithm is extended using the scaled symmetrized KL distance as a dissimilarity measure between mixture components. Thirdly, special attention is given to von Mises mixtures for which we present analytical expressions for solving the component number reduction problem and analyze them on two examples: a synthetic 100-component mixture with several dominant modes and a real-world mixture stemming from the work on people trajectory analysis in video data [2].
Outline of the paper is as follows. The general framework of exponential family mixtures is introduced in Section 2 together with a brief survey on distance measures between probability distributions and definition of composite distance measures. Section 3 presents the component number reduction in exponential family mixtures as a constrained optimization problem. In Section 4 we discuss two suboptimal solution strategies and additionally consider the joining algorithm. Numerical experiments on two examples of circular data are performed and obtained results are discussed in Section 5. Finally, Section 6 concludes the paper by outlining main achievements and commenting on possible extensions.
Section snippets
General background
In this section we introduce exponential family distributions and the von Mises distribution as their subclass, we recall the notion of finite mixture distributions and discuss variety of distance measures between probability distributions emphasizing on composite distance measures between mixtures.
Problem formulation
Having defined suitable distance measures from the previous section, we formulate the problem of reduction of the number of components, described in Section 1, as follows. Let be the given starting exponential family mixture and let D denote the chosen ground distance, the KL or Renyi -divergence with . The optimization problemaims to find a mixture having at most L components, which is the best approximation of p with respect to the composite D-distance.
Component reduction schemes
In this section we present three different approaches for solving the component number reduction problem. The first two approaches: (i) generalized k-means clustering, and (ii) a gradual pairwise merging scheme, present solution strategies which aim to solve the optimization problem (15), i.e. to minimize the composite distance between the original and the reduced mixture. The third approach, the joining algorithm, is a heuristic reduction scheme which successively merges a pair of components
Results and discussion for von Mises mixtures
To test and compare the reduction algorithms for von Mises mixtures we utilized two examples. The first is a synthetic mixture consisting of 100 components chosen in a random manner, but with two components having a dominant weight in order to ensure a couple of dominant modes in the mixture. The second mixture is a real-world example stemming from a people trajectory analysis dataset [2].
Conclusion
In this paper we have presented a novel systematic approach to the reduction of the number of components in the mixtures of exponential families, with special emphasis on the mixtures of von Mises distributions for which explicit formulae have been presented in Section 3.2. The component number reduction problem has been formulated as an optimization problem utilizing newly proposed composite distance measures, namely the composite Rényi -divergences, as cost functions. The benefits of using
Acknowledgements
This work has been supported by European Community’s Seventh Framework Programme under Grant agreement No. 285939 (ACROSS). We are grateful to the anonymous reviewers for useful comments and suggestions, which resulted with a significant improvement of the paper.
References (42)
- et al.
Multisensor data fusion: a review of the state-of-the-art
Inform. Fus.
(2013) - et al.
Simplification and hierarchical representations of mixtures of exponential families
Signal Process.
(2010) - et al.
A low-cost variational-Bayes technique for merging mixtures of probabilistic principal component analyzers
Inform. Fus.
(2013) The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming
USSR Comput. Math. Math. Phys.
(1967)- et al.
Finite Mixture models
(2004) - et al.
Mixtures of von Mises distributions for people trajectory shape analysis
IEEE Trans. Circ. Syst. Video Technol.
(2011) - I. Marković, I. Petrović, Bearing-only tracking with a mixture of von Mises distributions, in: IEEE/RSJ International...
Uber die ‘Ganzzahligkeit’ der Atomgewicht und Verwandte Fragen
Physikalische Zeitschrift
(1918)- et al.
Directional Statistics
(1999) - et al.
Topics in Circular Statistics
(2001)
Statistical Analysis of Circular Data
Gossip-based computation of a Gaussian mixture model for distributed multimedia indexing
IEEE Trans. Multi.
Mixture reduction algorithms for point and extended object tracking in clutter
IEEE Trans. Aerosp. Electron. Syst.
Kullback–Leibler approach to gaussian mixture reduction
IEEE Trans. Aerosp. Electron. Syst.
Information Theory and Statistics
Hierarchical clustering of a mixture model
Clustering with Bregman divergences
J. Mach. Learn. Res.
Cited by (9)
Finite mixture modeling in time series: A survey of Bayesian filters and fusion approaches
2023, Information FusionImage quality tendency modeling by fusing multiple visual cues
2019, Journal of Visual Communication and Image RepresentationHierarchical distance learning by stacking nearest neighbor classifiers
2016, Information FusionCitation Excerpt :Inspiring from these studies, we introduce a new criterion, called shareability, to measure the collaboration among the base-layer classifiers, and investigate the dependencies between the feature and decision spaces of base-layer and meta-layer classifiers. In the literature, distance learning methods have been employed for selection and weighted combination of samples and features by computing the weights associated with the samples, feature vectors and distributions [24–27]. The weights are used in weighted distance functions to transform the feature space of a classifier into a more discriminative space [28,29], and decrease the N-sample classification error of the classifiers [30].
Mixture Reduction on Matrix Lie Groups
2017, IEEE Signal Processing LettersMixture reduction on matrix Lie groups
2017, arXiv