A Functional Perspective on Information Measures

Esposito, Amedeo Roberto

doi:10.5075/epfl-thesis-9122

Esposito, Amedeo Roberto

2022

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Since the birth of Information Theory, researchers have defined and exploited various information measures, as well as endowed them with operational meanings. Some were born as a "solution to a problem", like Shannon's Entropy and Mutual Information. Others were the fruit of generalisation and the mathematical genius of bright minds like Rényi, Csizsár and Sibson. These powerful objects allow us to manipulate probabilities intuitively and seem always to be somehow connected to concrete settings in communication, coding or estimation theory. A common theme is: take a problem in one of these areas, try to control (upper or lower-bound) the expected value of some function of interest (often, probabilities of error) and, with enough work, an information measure appears as a fundamental limit of the problem. The most striking example of this is in Shannon's seminal paper in 1948: his purpose was to characterise the smallest possible expected length of a uniquely decodable encoding that compresses the realisations of a random variable. As he brilliantly proved, the smallest expected length one can hope for is the Entropy of the random variable. In establishing this connection, another quantity needed to be implicitly controlled: the Kraft's sum of the code. Seemingly unrelated before, these three objects joined forces in harmony to provide a beautiful and fundamental result. But why are they related? The answer seems to be: duality. Duality is an abstract notion commonly used in linear algebra and functional analysis. It has been expanded and generalised over the years. Several incarnations have been discovered throughout mathematics. One particular instance of this involves vector spaces: given two vector spaces and a "duality pairing" one can jump from one space to the other (its dual) through Legendre-Fenchel-like transforms. In the most common settings in Information Theory, the two spaces and the pairing are, respectively: 1) the space of (probability)measures defined on X; 2) the space of bounded functions defined on X; 3) the Lebesgue integral of the function (the expected value of the function if the measure is a probability measure). Once these are set, Legendre-Fenchel-like transforms allow us to connect a) a functional acting on the space described in 1), b) a functional acting on the space described in 2) and the anchor point is c) the (expected) value described in 3). These three pieces (a), b) and c)) represent the actors of many of the results provided in Information Theory. Once they are found, one usually bounds the functional described in b) and obtains a bound connecting the expected value and the functional of measures (e.g., an information measure). Going back to Shannon's result, fixed a random variable (and thus, a probability measure) and selected the function to be the length of a code: the functional a) is the Shannon Entropy of the source; the functional b) is the Kraft sum of the code; the pairing c) is the expected length of the code. We explore this connection and this pattern throughout the thesis. We will see how it can be found in notable results like Coding Theorems for one-to-one codes, Campbell's Coding Theorem, Arikan's Guessing Theorem, Fano-like and Transportation-Cost Inequalities and so on. Moreover, unearthing the pattern allows us to generalise it to other information measures and apply the technique in a variety of fields, including Learning Theory, Estimation Theory and Hypothesis Testing.