Elsevier

Information Fusion

Volume 20, November 2014, Pages 136-145
Information Fusion

Composite distance based approach to von Mises mixture reduction

https://doi.org/10.1016/j.inffus.2014.01.003Get rights and content

Abstract

This paper presents a systematic approach for component number reduction in mixtures of exponential families, putting a special emphasis on the von Mises mixtures. We propose to formulate the problem as an optimization problem utilizing a new class of computationally tractable composite distance measures as cost functions, namely the composite Rényi α-divergences, which include the composite Kullback–Leibler distance as a special case. Furthermore, we prove that the composite divergence bounds from above the corresponding intractable Rényi α-divergence between a pair of mixtures. As a solution to the optimization problem we synthesize that two existing suboptimal solution strategies, the generalized k-means and a pairwise merging approach, are actually minimization methods for the composite distance measures. Moreover, in the present paper the existing joining algorithm is also extended for comparison purposes. The algorithms are implemented and their reduction results are compared and discussed on two examples of von Mises mixtures: a synthetic mixture and a real-world mixture used in people trajectory shape analysis.

Introduction

Many statistical and engineering problems [1], [2], [3] require modeling of complex multi-modal data, wherein mixture distributions became an inevitable tool. In this paper we draw attention to finite mixtures of a specific distribution on the unit circle, the von Mises distribution. Starting from 1918 and the seminal work of von Mises [4], where he investigated hypothesis on integrality of atomic weights of chemical elements, the proposed parametric density plays a pertinent role in directional statistics with wide range of applications in physics, biology, image analysis, neural science and medicine – confer monograms [5], [6], [7] and references therein.

Estimation of complex data by mixture distributions may lead to models with large or, in applications like target tracking, ever increasing number of components. In lack of efficient reduction procedures, such models become computationally intractable and lose their feasibility. Therefore, component number reduction in mixture models is an essential tool in many domains like image and multimedia indexing [8], [9], speech segmentation [10], and it is an indispensable part of any tracking system with mixtures of Gaussian [11], [12], [13] or von Mises distributions [3]. The subject matter is particularly relevant to the information fusion domain since it relates to the following challenging problems in multisensor data fusion [14]: data dimensionality, processing framework, and data association. These problems are related to component reduction by the fact that measurement data as quantity of interest can be preprocessed (compressed) prior to communicating it to other nodes (in a decentralized framework) or the fusion center, thus effectively saving on the communication bandwidth and power required for transmitting data. For example, consider the problem of people trajectory analysis with von Mises mixtures [2] in a distributed sensor networks where the mixtures might need to be communicated between the sensor nodes. Motivated by [2], [3], in this paper we study methods and respective algorithms for component number reduction in mixtures of von Mises distributions, but due to the general exposition of the subject in the framework of exponential family mixtures, the methods and findings easily extend to other examples like mixtures of Gaussian distributions, and von Mises-Fisher distributions [5].

Existing literature on mixture reduction schemes is mostly related to Gaussian mixture models. A reduction scheme for Gaussian mixtures in the context of Bayesian tracking systems in a cluttered environment, which successively merges the closest pair of components and henceforth referred to as the joining algorithm, was proposed in [11]. The main drawback of the scheme is its local character, which gives no information about the global deviation of the reduced mixture from the original one. In [15] the mixture reduction was formulated as an optimization problem for the integral square difference cost function. A better suited distance measure between probability distributions is the Kullback–Leibler (KL) distance [16], but it lacks a closed form formula between mixtures, what makes it computationally inconvenient. Several concepts have been employed to circumvent this problem. A new distance measure between mixture distributions, based on the KL distance, which can be expressed analytically was derived in [17], and utilized to solve the mixture reduction problem. In [12] an upper bound for the KL distance was obtained and used as dissimilarity measure in a successive pairwise reduction of Gaussian mixtures – henceforth we refer to it as the pairwise merging algorithm. Unlike the joining algorithm, this procedure gives a control of the global deviation of the reduced mixture from the original one. Introducing the notion of Bregman information, the authors in [18] generalized the previously developed Gaussian mixture reduction concepts to arbitrary exponential family mixtures. Further development of these techniques for exponential family mixtures can be found in [19], [20], [21], [22], [23], [24]. Finally, we mention the variational Bayesian approach [25], [26] as well as [27] as alternative concepts of mixture reduction developed for Gaussian mixtures.

Contributions of the present paper are as follows. Firstly, we formulate the problem of component number reduction in exponential family mixtures as an optimization problem utilizing a new class of composite distance measures as cost functions. These distance measures are constructed employing Rényi α-divergences as ground distances, and it is shown that the composite distance bounds the corresponding Rényi α-divergence from above (see Lemma 1 below). This inequality is very important since it provides an information on the global deviation of the reduced mixture from the original one measured by the Rényi α-divergence. Secondly, we synthesize previously developed reduction techniques [12], [18], [24] in the sense that they can all be interpreted as suboptimal solution strategies to the proposed optimization problem. For the purpose of computational complexity and accuracy comparisons, the joining algorithm is extended using the scaled symmetrized KL distance as a dissimilarity measure between mixture components. Thirdly, special attention is given to von Mises mixtures for which we present analytical expressions for solving the component number reduction problem and analyze them on two examples: a synthetic 100-component mixture with several dominant modes and a real-world mixture stemming from the work on people trajectory analysis in video data [2].

Outline of the paper is as follows. The general framework of exponential family mixtures is introduced in Section 2 together with a brief survey on distance measures between probability distributions and definition of composite distance measures. Section 3 presents the component number reduction in exponential family mixtures as a constrained optimization problem. In Section 4 we discuss two suboptimal solution strategies and additionally consider the joining algorithm. Numerical experiments on two examples of circular data are performed and obtained results are discussed in Section 5. Finally, Section 6 concludes the paper by outlining main achievements and commenting on possible extensions.

Section snippets

General background

In this section we introduce exponential family distributions and the von Mises distribution as their subclass, we recall the notion of finite mixture distributions and discuss variety of distance measures between probability distributions emphasizing on composite distance measures between mixtures.

Problem formulation

Having defined suitable distance measures from the previous section, we formulate the problem of reduction of the number of components, described in Section 1, as follows. Let p=i=1Kwipi be the given starting exponential family mixture and let D denote the chosen ground distance, the KL or Renyi α-divergence with α(0,1). The optimization problemminqMLdD(p,q)aims to find a mixture q having at most L components, which is the best approximation of p with respect to the composite D-distance.

Component reduction schemes

In this section we present three different approaches for solving the component number reduction problem. The first two approaches: (i) generalized k-means clustering, and (ii) a gradual pairwise merging scheme, present solution strategies which aim to solve the optimization problem (15), i.e. to minimize the composite distance between the original and the reduced mixture. The third approach, the joining algorithm, is a heuristic reduction scheme which successively merges a pair of components

Results and discussion for von Mises mixtures

To test and compare the reduction algorithms for von Mises mixtures we utilized two examples. The first is a synthetic mixture consisting of 100 components chosen in a random manner, but with two components having a dominant weight in order to ensure a couple of dominant modes in the mixture. The second mixture is a real-world example stemming from a people trajectory analysis dataset [2].

Conclusion

In this paper we have presented a novel systematic approach to the reduction of the number of components in the mixtures of exponential families, with special emphasis on the mixtures of von Mises distributions for which explicit formulae have been presented in Section 3.2. The component number reduction problem has been formulated as an optimization problem utilizing newly proposed composite distance measures, namely the composite Rényi α-divergences, as cost functions. The benefits of using

Acknowledgements

This work has been supported by European Community’s Seventh Framework Programme under Grant agreement No. 285939 (ACROSS). We are grateful to the anonymous reviewers for useful comments and suggestions, which resulted with a significant improvement of the paper.

References (42)

  • N. Fisher

    Statistical Analysis of Circular Data

    (1995)
  • N. Vasconcelos, Image indexing with mixture hierarchies, in: Proceedings of the 2001 IEEE Computer Society Conference...
  • A. Nikseresht et al.

    Gossip-based computation of a Gaussian mixture model for distributed multimedia indexing

    IEEE Trans. Multi.

    (2008)
  • J. Goldberger, H. Aronowitz, A distance measure between GMMs based on the unscented transform and its application to...
  • D.J. Salmond

    Mixture reduction algorithms for point and extended object tracking in clutter

    IEEE Trans. Aerosp. Electron. Syst.

    (2009)
  • R. Runnalls

    Kullback–Leibler approach to gaussian mixture reduction

    IEEE Trans. Aerosp. Electron. Syst.

    (2007)
  • L.-L.S. Ong, Non-Gaussian Representations for Decentralised Bayesian Estimation, Ph.D. thesis, The University of...
  • J.L. Williams, P.S. Maybeck, Cost-function-based Gaussian mixture reduction for target tracking, in: Proceedings of the...
  • S. Kullback

    Information Theory and Statistics

    (1997)
  • J. Goldberger et al.

    Hierarchical clustering of a mixture model

  • A. Banerjee et al.

    Clustering with Bregman divergences

    J. Mach. Learn. Res.

    (2005)
  • Cited by (9)

    • Image quality tendency modeling by fusing multiple visual cues

      2019, Journal of Visual Communication and Image Representation
    • Hierarchical distance learning by stacking nearest neighbor classifiers

      2016, Information Fusion
      Citation Excerpt :

      Inspiring from these studies, we introduce a new criterion, called shareability, to measure the collaboration among the base-layer classifiers, and investigate the dependencies between the feature and decision spaces of base-layer and meta-layer classifiers. In the literature, distance learning methods have been employed for selection and weighted combination of samples and features by computing the weights associated with the samples, feature vectors and distributions [24–27]. The weights are used in weighted distance functions to transform the feature space of a classifier into a more discriminative space [28,29], and decrease the N-sample classification error of the classifiers [30].

    • Mixture Reduction on Matrix Lie Groups

      2017, IEEE Signal Processing Letters
    View all citing articles on Scopus
    View full text