Elsevier

Signal Processing

Volume 91, Issue 4, April 2011, Pages 890-905
Signal Processing

A tree-weighting approach to sequential decision problems with multiplicative loss

https://doi.org/10.1016/j.sigpro.2010.09.007Get rights and content

Abstract

In this paper, we consider sequential decision problems in which the decision at each time is taken as a convex-combination of observations and whose performance metric is multiplicatively compounded over time. Such sequential decision problems arise in gambling, investing and in a host of signal processing applications from statistical language modeling to mixed-modality multimedia signal processing. Using a competitive algorithm framework, we construct sequential strategies that asymptotically achieve the performance of the best piecewise-convex strategy that could have been chosen by observing the entire sequence of outcomes in advance. Using the notion of context-trees, a mixture approach is able to asymptotically achieve the performance of the best choice of both the partitioning of the space of past observations and convex strategies within each region, for every sequence of outcomes. This performance is achieved with linear complexity in the depth of the context-tree, per decision. For the application of sequential investment, we also investigate transaction costs incurred for each decision. An explicit algorithmic description and examples demonstrating the performance of the algorithms are given. Our methods can be used to sequentially combine probability distributions produced by different statistical language models used in speech recognition or natural language processing and by different modalities in multimedia signal processing.

Introduction

In this paper, we consider sequential decision problems whose metric of performance is multiplicatively compounded over time and in which the decisions made at each time amount to picking a strategy taken as a convex-combination of the vector-valued outcomes. In this general framework that encompasses a number of applications, the observations, which are observed sequentially, are represented as vectors in the positive orthant, i.e., {x[t]}t1, x[t]R+m, where some entries of x[t] can be zero implying that the observations are simply vectors of nonnegative numbers. We represent our decision at time t as b[t], where b[t]R+m and j=1mbj[t]=1 for all t. Based on the decision b[t] and then observing x[t], we incur the benefit or reward b[t]Tx[t] at time t, yielding the accumulated gain over all past observations as t=1nbT[t]x[t] for all n. In this paper, the goal is to maximize the accumulated gain over any possible and unknown {x[t]}t1 by sequentially choosing appropriate {b[t]}t1 for any n. The decisions {b[t]}t1 are “sequential” such that b[t] only depends on the past sequence of observations, i.e., x[1],,x[t1], but not on information from the future. Extensions of this basic framework for more general decision vectors, such as b[t][1,)m [1], [2], and or observation vectors, such as x[t][1,)m [3], are also possible. We study the basic structure of the problem here and our results can be readily extended to these more general cases.

Sequential decision problems whose metric of performance is multiplicatively compounded over time arise in a host of signal processing applications from statistical language modeling [4], [5], [6] and Gaussian mixture models [5] to mixed-modality multimedia signal processing [7], [8]. We refer the reader to [9], and the references therein, for a more general discussion of such sequential decision problems. For concreteness of terminology, in this paper we will pay particular attention to the application of sequential investing in a market of m stocks, and note that the mathematical techniques used here apply more generally to the contexts discussed in [9]. In the investing context, the observation vectors {x[t]}t1 represent the market gains such that the jth entry xj[t] of the vector x[t] represents the gain achieved by the jth stock on the tth day of investment, as might be measured by the ratio of opening price of the jth stock on the tth day to the opening price of the jth stock on the (t−1)th trading day. The sequence of decisions {b[t]}t1 will correspond to a sequence of investments made by allocating a fraction of the current holdings, or wealth, of the player to each of m stocks or other financial instruments. The selection of such a “portfolio” amounts to choosing a particular weighting among stocks of the wealth available on a given investment period. An investment strategy at day t is represented by the “portfolio vector” b[t], b[t]R+m and j=1mbj[t]=1 for all t. Each entry bj[t] corresponds to the fraction of the total wealth available to be invested in the jth stock on the tth day. The achieved wealth after n investment periods is given by the product of the gains achieved in each successive day of trading, i.e., t=1nbT[t]x[t]. In another context, x[t] could correspond to probabilities derived from various signal source models for a sequence of observations and their convex-combination would amount to a mixture distribution in a Bayesian mixture representation [5], [6].

The sequential portfolio assignment problem studied here has been considered broadly in the signal processing [10], machine learning [3], [2] and information theory [11], [1] research literature. The objective is to select a sequence of investment strategies, or portfolios, for a market with a finite number of stocks, to maximize wealth over time over any possible deterministic observation vectors {x[t]}t1 without stochastic assumptions on {x[t]}t1. To define a meaningful performance measure, we define a competitive algorithm framework in which the goal is to perform well with respect to a candidate class of investment strategies, i.e., a competition class, over any possible {x[t]}t1. As an example, a problem studied extensively in this context is to find a sequential algorithm that asymptotically achieves the wealth of the best constant rebalanced portfolio tuned to the individual sequence of observation vectors. When the portfolio vector remains constant for all t, i.e., b[t]=b, corresponding to an apportionment of assets at each point in time that is a fixed constant convex-combination, b, the strategy is called a “constant rebalanced portfolio” (CRP). In this context, we try to achieve maxbt=1nbTx[t], where b is fixed and {x[t]}t1 is an arbitrary individual sequence, using a sequential algorithm. The goal is to find a sequential algorithm b[t] such that when it is applied to any deterministic and unknown {x[t]}t1 (which is revealed sequentially), it will achieve the gain t=1nb[t]Tx[t] that is close to maxbt=1nbTx[t] for all n (or asymptotically equal to maxbt=1nbTx[t] as n goes to infinity), without knowing n or {x[t]}t1 beforehand. If we can find such an algorithm, this algorithm is said to be competitive with respect to the class of all CRPs, since it asymptotically achieves the performance of even the best CRP in the class that is tuned to the underlying {x[t]}t1, for all n. In the context of probabilistic model combinations, such a CRP would amount to a constant weighting among constituent probabilistic models, i.e., an a priori weighting. Cover [11] presented an algorithm that asymptotically achieves the wealth of the best CRP for any sequence of observation vectors, that is, his algorithm can sequentially achieve nearly the same performance of an investment strategy that could only have been chosen in hindsight, after observing the entire sequence of stock market values in advance, but which was restricted to only select a CRP.

In the first part of this paper, we investigate sequential portfolios that compete against the best piecewise-constant rebalanced portfolios (PCRPs), which are a direct extension of CRPs. Here, instead of trying to achieve the performance of the best CRP that is tuned to the underlying sequence of observations, we try to achieve the performance of the best PCRP. In our framework, the space in which the sequence of observations lies is partitioned into a union of disjoint regions, over each of which, a CRP is fitted independently. This is a natural nonlinear extension to linear modeling where piecewise models are used to approximate more general nonlinear functions such as in [12]. As an example, suppose at trading period t, we divide the space x[t1]R+m as in Fig. 1 into K disjoint regions Vk, where k=1KVk=R+m (e.g., K=4 for Fig. 1). Here, if x[t1]V1, then stock 1 outperformed stock 2 at trading day t−1 (however, both stocks lost value, i.e., x1[t1]<1,x2[t1]<1). If x[t1]V1V2 where the gain of stock 1 was greater than that of stock 2 at trading day t−1, then investing in stock 1 more than stock 2 in the next trading period t may be a good idea. This strategy may work if there is useful information in the relative performance of various assets in the market. Hence, to follow this idea, we define the PCRP competition class by assigning a CRP to each region and such a PCRP invests during each trading day using the portfolio that depends on the relative performance of each stock on the previous day. For PCRPs, the portfolio used in each region is a fixed CRP, bk, such that if x[t1]Vk then we invest with bk at trading period t. The CRPs can be selected arbitrarily for each region. We point out that as the number of regions grows, the piecewise constant model can better approximate any fixed nonlinear portfolio b=f(x[t1]) for some arbitrary locally convex smoothly varying nonlinear function f(·) [13]. However, we emphasize that neither the optimal partitioning of R+m nor the best CRPsfor each region are known in advance, and both depend on {x[t]}t1 and n.

Note that if the piecewise regions or the partition of R+m is fixed, the assignment of portfolio vectors to each region are known. As in the first part of this paper and in [11], independently applying the algorithm of [11] for each piecewise region will asymptotically achieve the performance of the optimal CRPs in each region. In the second part of this paper, we extend these results when the partitioning of the past observations space, i.e., partition of the R+m, is also a design parameter that can be selected from a large class of possible partitions. In this sense, if we consider the partition information as the side-information, the side-information generating mechanism is also a design parameter [11]. The class of possible partitions will be compactly represented using a “context-tree” [14], which will be used to define a doubly exponential number of partitions. We have neither a priori knowledge of the selected partition nor the best model parameters, i.e., the best PCRP given that partition. Here, we demonstrate an algorithm that asymptotically achieves the performance of the best sequential portfolio (corresponding to a particular partition) from the doubly exponentially large class of such partitioned portfolios. To accomplish this, we use the notion of context-trees as shown in Fig. 2 which is explained in Section 2.2. By using context-trees, we are able to compete against the best partition among a doubly exponential number of possible partitions that can be embedded in a context-tree with computational complexity only linear in the depth of the context-tree.

Competition against the known side-information dependent CRP is investigated in [11] and then several different sequential algorithms have been introduced that also attempt to achieve the performance of the best CRP, albeit either with different bounds or different performance on historical data [3], [15]. This basic problem has been extended to portfolios with side-information [11], [3], [16], transaction costs [17], margin and short sales [1], [2], smoothly varying target classes [16], competition against the best switching constant rebalanced portfolios [18], [10] and internal regret [19]. We emphasize that we only use Cover's algorithm as an example in our derivations and the methods we use are generic such that they can use other algorithms such as those in [3], [19], [15], [20], [21] in our theorems. Note that the alternative algorithms are provided since although the algorithm of [11] has “asymptotically” tight performance bounds, its exact implementation requires O(nm−1) computational complexity per investment period. In this sense, the alternative algorithms such as the ones in [3], [22], [15] are introduced to provide both computational efficiency and logarithmic regret at the same time (if possible). These alternative algorithms are also experimentally shown to outperform the algorithm of [11] in certain scenarios, especially when the market has large number of stocks [3], [21]. As an example, the exponentiated gradient based algorithm of [3] has linear complexity per investment period with O(2logm/n) normalized regret and the follow-the-leader based algorithm of [22] has O(m3) complexity per investment period with O(4mlog(n)/n) normalized regret (only when certain parameters are optimized in hindsight, unlike Cover's algorithm). To demonstrate the versatility and ease of our scheme for incorporating new algorithms into the studied framework, we use algorithms from [3], [22] in addition to the algorithm from [11] in the Simulations section. Moreover, unlike [11], our model includes the presence of transactions costs and can be straightforwardly extended to investing on margin and short sales. While competition against CRPs was extended to more general target classes in [16], [3], we point out that in all these cases considering side-information, the side-information generating mechanism or the side-information itself is known or fixed. Hence, in these results, the competition with respect to the side-information sequence is achieved by merely repeating the basic algorithm for each side-information value. However, in this paper, the side-information generating mechanism can also be selected by the competition class. Only in hindsight, one can determine which partition of the R+m, i.e., the side-information, will yield the optimal growth. Without such a priori knowledge, our algorithm asymptotically achieves the performance of any such partition, i.e., the best side-information generating mechanism from this class.

Context-trees and the context-tree weighting algorithm have been used extensively in lossless source coding and related fields essentially to assign Bayesian mixture probabilities to binary sequences [23], [14]. In these frameworks, context-trees are mainly used to efficiently calculate a weighted average of probabilities produced by an exponential number of models represented on the context-tree. However, in this paper, the purpose of using context-trees is not to directly calculate a weighted average of wealths produced by an exponential number of investment models, which was the main tool in [14], [24], [25] to achieve the performance of the best model. Here, we specifically design “an algorithm” that when applied to the sequence of price relatives, yields a performance that is as large as this weighted average. Hence, we use the context-tree concept to construct this algorithm, however, not to perform any weighted averaging. The key difference and the main problem that is solved, unlike [14], is to construct this algorithm using the tools of context-trees for convex-combinations under log loss.

Furthermore, although the application of such models and the context-tree weighting algorithm to universal prediction appeared in [13] for the piecewise linear prediction of bounded arbitrary sequences under the square error loss, there are important differences. While the problem of universal portfolio selection considered here can be viewed as a sequential decision problem with a restricted form of the log loss, the results in [13] for square error are incompatible with the portfolio context. We note that the log loss function considered in here is not bounded and the regret defined in [13] is with respect to a loss function which is exp-concave and bounded. These conditions must hold for the scheme in [13] to hold. Hence, the algorithmic steps as well as the proofs of the performance for the algorithms are different. Furthermore, intrinsic to portfolio selection, here, we also consider the case when there are transaction costs present and provide an algorithm using context-tree weighting that performs as well as the best context dependent algorithm under transaction costs.

We begin our discussion of piecewise constant rebalanced portfolios with the case when the partition is fixed and known in Section 2.1. We then extend these results using context-trees in Section 2.2 to include comparison classes with arbitrary partitions from a doubly exponential class of possible partitions. In each section, we provide theorems that upper-bound the regret with respect to the best competing algorithm in the class. The theorems are constructive, in that they yield algorithms satisfying the corresponding bounds. An explicit implementation of the context-tree PCRP algorithm is also given. Extension to investment under transaction costs is given in Section 2.3. The paper is then concluded with simulations of the algorithms on historical data.

Section snippets

Fixed partition

In this section, we investigate the framework when the partition of the space of past observation vectors is given, i.e., say k=1KVk=R+m is known. Since the partition is fixed, the side-information generating mechanism, i.e., assigning CRPs to each region, is known. In this case, we seek to find a sequential portfolio such that when applied to any {x[t]}t1, asymptotically achieves, for all n, supb1B,,bKBt=1nbs[t1]Tx[t],where s[t−1]=k when x[t1]Vk and B is the simplex. That is, we wish

Simulations

In this section, we illustrate the performance of our algorithms on historical data sets collected from the New York Stock exchange over a 22-year period until 19851 [29]. In the initial set of experiments, we demonstrate performance of our algorithms and illustrate effects of the internal parameters within the algorithms on the final performance using the stock pair Kinark–Iroquois, which are chosen because of their

Conclusions

In this paper, we consider the problem of investing using PCRPs from a competitive algorithm perspective. Using context-trees and methods based on sequential probability assignment, we have shown a portfolio selection algorithm the logarithm of whose achieved wealth is within O(ln(n)) of that of the best PCRP, which can only be selected using all of the data in hindsight. We use a method similar to context-tree weighting to compete well against a doubly exponential class of possible partitions

References (32)

  • T. Cover, E. Ordentlich, Universal portfolios with short sales and margin, in: Proceedings of ISIT, 1998, p....
  • V. Vovk, C. Watkins, Universal portfolio selection, in: COLT, 1998, pp....
  • D.P. Helmbold et al.

    Online portfolio selection using multiplicative updates

    Mathematical Finance

    (1998)
  • E. Charniak

    Statistical Language Learning

    (1993)
  • L. Rabiner et al.

    Fundamentals of Speech Recognition

    (1993)
  • L. Rabiner et al.

    Digital Processing of Speech Signals

    (1978)
  • H. Glotin, D. Vergyr, C. Neti, G. Potamianos, J. Luettin, Weighting schemes for audio–visual fusion in speech...
  • A. Morris, A. Hagen, H. Glotin, H. Bourlard, Multistream adaptive evidence combination to noise robust ASR, in: ICASSP,...
  • S.S. Kozat et al.

    Switching strategies for sequential decision problems with multiplicative loss with application to portfolios

    IEEE Transactions on Signal Processing

    (2009)
  • S.S. Kozat, A.C. Singer, Universal constant rebalanced portfolios with switching, in: ICASSP, 2007, pp....
  • T. Cover et al.

    Universal portfolios with side-information

    IEEE Transactions on Information Theory

    (1996)
  • S.S. Kozat, K. Visweswariah, R. Gopinath, Feature adaptation based on Gaussian posteriors, in: ICASSP,...
  • S.S. Kozat et al.

    Universal piecewise linear prediction via context trees

    IEEE Transactions on Signal Processing

    (2007)
  • F.M.J. Willems et al.

    The context-tree weighting method: basic properties

    IEEE Transactions on Information Theory

    (1995)
  • A. Agarwal, E. Hazan, Efficient algorithms for online game playing and universal portfolio management, in: Electronic...
  • J.E. Cross et al.

    Efficient universal portfolios for past dependent target classes

    Mathematical Finance

    (2003)
  • Cited by (7)

    • Highly efficient hierarchical online nonlinear regression using second order methods

      2017, Signal Processing
      Citation Excerpt :

      Efficient and effective processing of this data can significantly improve the performance of many signal processing and machine learning algorithms [4–6]. In accordance with the aim of achieving more efficient algorithms, hierarchical approaches have been recently proposed for nonlinear modeling systems [7,8]. In this paper, we investigate the nonlinear regression problem that is one of the most important topics in the machine learning and signal processing literatures.

    • Growth optimal investment in discrete-time markets with proportional transaction costs

      2016, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Although the performance guarantees in [1,2,15,18,19]) are derived without any stochastic assumptions, these methods are highly conservative due to the worst case scenario optimization, i.e., they are only optimal in an asymptotical sense. However, the order of such performance upper bounds may not be negligible in actual financial markets [6,20], even though they may be neglected in source coding applications (where these algorithms are inspired from). We demonstrate that our algorithm readily outperforms these universal methods over historical data.

    • Online portfolio selection: Principles and algorithms

      2015, Online Portfolio Selection: Principles and Algorithms
    • Online portfolio selection: A survey

      2014, ACM Computing Surveys
    • Growth optimal investment with threshold rebalancing portfolios under transaction costs

      2013, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
    View all citing articles on Scopus
    1

    Tel.: +1 217 2449263.

    View full text