The Giry Monad, an Interesting Perspective on Probability Theory

Perspectives and implementation in Python

2024/05/20

In this article, I will explore an intriguing way of thinking about probabilities from a categorical viewpoint, which eventually allows us to understand the definition of probabilistic computation in a functional form. This idea is far from new (in the 1960s, Lawvere already constructed this notion to create a framework that could be used to plan certain treaties with the Soviets on disarmament), and there are several ways to bridge the gap between category theory and probability theory.

One of the most classical ways to make this connection is through the Giry monad, developed earlier by Lawvere, which focuses on the category of measurable spaces and the Giry monad. In the following sections of this article, we will try to understand all of this, which will allow us to propose an interesting way to program probability laws in Python.

Probability Theory and Measure Theory

Before delving into the Giry monad, let’s recall some fundamental concepts in probability theory as developed by Kolmogorov in the early 20th century. The first fundamental notion is that of a measure and a measurable space:

A measurable space $(X, \mathcal{A})$ is a set $X$ equipped with a sigma-algebra $\mathcal{A}$. The elements of $\mathcal{A}$ are called the measurable sets of $X$. A sigma-algebra is a non-empty collection of subsets of $X$ that is closed under complementation and countable unions (and thus also countable intersections), which allows us to rigorously define what types of subsets of $X$ we can “measure”. Consequently, a measure space $(X, \mathcal{A}, \mu)$ is a measurable space equipped with a measure $\mu$, which is a function defined on $\mathcal{A}$ that maps to the real numbers. In this sense, a measure is just a generalization of the notion of volume over subsets that “behave well” (those in a sigma-algebra). Kolmogorov’s idea is to define the notion of probabilities as a “volume”, thus a measure, on the set of “possible events”.

We can also define the important notion of a measurable function as a function $f$ defined between two measurable spaces $(A, \mathcal{A})$ and $(B, \mathcal{B})$ such that for all $B \in \mathcal{B}$, $f^{-1}(B) \in \mathcal{A}$. This allows us to define an integral over a measurable function and to generalize the notion of the Riemann integral (Lebesgue integral). An integral over a measurable function $f$ is also a measure on the codomain of $f$.

Now, a probability space is simply a measure space whose measure takes values in $[0, 1]$.

Category of Measurable Spaces and the Giry Monad

Consider now the category $\mathbf{Meas}$ as the category whose objects are measurable spaces and whose arrows are measurable functions.

For any measurable space $A$, we can associate another measurable space $G(A)$, which corresponds to the space of all probability measures defined on $A$. This is indeed a measurable space because we can define a sigma-algebra generated by all evaluation functions $env_U: G(A) \rightarrow [0, 1]$, where $U$ is any measurable set of $A$, and which associates to any probability measure $\mu$ the value $\mu(U)$:

$$ eval_U(\mu) = \mu(U) = \int_U d\mu $$

We note that the $eval_U$ are themselves measurable functions on the space $G(A)$. We can also “overload” the definition of $env$ for functions such as:

$$ eval_f(\mu) = \int f d\mu $$

This shows a measure on the measures with respect to a function (density on $A$). In these two definitions, the relationship between measure and function/set is reversed. The measure $\mu$ is being measured.

$G$ is an Endofunctor

First, let’s notice that $G$ is an endofunctor since it is a mapping defined on the objects and arrows of $\mathbf{Meas}$ (a category) with the same category as its codomain: $G: \mathbf{Meas} \rightarrow \mathbf{Meas}$. The “fmap” of the functor can be simply defined as, for any morphism $f$ of $\mathbf{Meas}$:

$$ G(f): \mu \mapsto \mu \circ f^{-1} $$

which means that we are taking the pushforward measure of $\mu$ by $f$.

$G$ is a Monad

$G$ is indeed a monad because we can identify the existence of:

$$ \mu(\rho)(A) = \int_{G(M)} eval_A , d\rho $$

and for a measurable function $f: M \rightarrow \mathbb{R}$,

$$ \mu(\rho)(f) = \int_{G(M)} eval_f , d\rho $$

This means that the monoidal action is done by integrating over the space of probability measures on $M$, which still yields a probability measure on $M$.

In programming, we tend to use the flatmap operator for a monad $M$, which has the type:

$$ \text{flatmap}: M(T) \rightarrow (T \rightarrow M(U)) \rightarrow M(U) $$

This operation can be written in the context of the Giry monad as:

$$ \text{flatmap}(\rho, g)(A) = \int_{T} {\lambda m. eval_{g(m)}(A) } , d\rho $$

and we can similarly as before overload the definition on functions.

$$ \text{flatmap}(\rho, g)(f) = \int_{T} {\lambda m. eval_{g(m)}(f) } , d\rho $$

Implementing the Giry Monad in Python

Implementing the Giry monad directly can be quite complex due to the intrinsic difficulty of manipulating measures on sets in a computational setting. Instead of using measures directly on sets or algebras, we can use an alternate definition involving functions. This allows us to work with measures by defining probabilities and computations on these probabilities.

Why Use Measures on Functions?

In measure theory, a measure on a set assigns a non-negative value to subsets of the set, representing the “size” or “volume” of those subsets. However, directly implementing this concept on a computer is challenging for several reasons:

Instead, we can leverage an alternative perspective where we define a measure via its action on functions. This approach uses the concept of integration: rather than focusing on measuring sets directly, we measure functions over these sets. Specifically, we use the idea that for a given measure $\mu$ and a function $f$, the measure of $f$ is given by the integral of $f$ with respect to $\mu$.

Defining Measures on Functions

A measure on a function $f$ with respect to a measure $\mu$ can be understood as evaluating the expected value or integral of $f$ over the space. Formally, for a measure $\mu$ on a space $X$ and a function $f: X \rightarrow \mathbb{R}$, we define:

$$ \text{env}_f(\mu) = \int_X f , d\mu $$

This notation emphasizes that we are focusing on the integral (or expected value) of functions rather than the direct measurement of sets. This perspective aligns well with computational approaches, where functions can be more easily manipulated and evaluated.

Giry Monad Implementation

First, let’s define the Measure class in Python. This class will represent a measure as a function (mu) that takes another function and returns a float, representing the integral of the function with respect to the measure.

from typing import Generic, TypeVar, Callable, List, Tuple
from functools import partial, reduce
from scipy.integrate import quad
import numpy as np
import scipy

T = TypeVar("T")
U = TypeVar("U")

class Measure(Generic[T]):
    def __init__(self, mu: Callable[[Callable[[T], float]], float]):
        self.mu = mu
    
    def fmap(self, f: Callable[[T], U]) -> "Measure[U]":
        def new_measure(g: Callable[[U], float]) -> float:
            return self.mu

(lambda x: g(f(x)))
        return Measure(mu=new_measure)
    
    def bind(self, f: Callable[[T], "Measure[U]"]) -> "Measure[U]":
        def new_measure(g: Callable[[U], float]) -> float:
            return self.mu(lambda x: f(x).mu(g))
        return Measure(mu=new_measure) 
        
    @staticmethod
    def unit(x: T) -> "Measure[T]":
        def new_measure(f: Callable[[T], float]) -> float:
            return f(x)
        return Measure(mu=new_measure)
    
def integrate(f: Callable[[T], float], measure: "Measure[T]") -> float:
    return measure.mu(f)

def fromMassFunction(f: Callable[[T], float], xs: List[T]) -> "Measure[T]":
    def new_measure(g: Callable[[T], float]) -> float:
        return sum(f(x) * g(x) for x in xs)
    return Measure(mu=new_measure)

def fromDensityFunction(f: Callable[[T], float]) -> "Measure[T]":
    def new_measure(g: Callable[[T], float]) -> float:
        integral, _ = quad(lambda x: f(x) * g(x), -np.inf, np.inf, limit=100)
        return integral
    return Measure(mu=new_measure)

Explanation of the Code

  1. Measure Class:

    • The Measure class encapsulates a measure, represented as a function (mu) that takes another function and returns a float.
    • fmap maps a function over a measure, transforming the measure according to the given function.
    • bind (also known as flatMap) is used to chain together computations that return measures.
    • unit creates a measure from a single value, corresponding to the Dirac delta function.
  2. Integrate Function:

    • integrate takes a function and a measure and returns the integral of the function with respect to the measure.
  3. fromMassFunction Function:

    • fromMassFunction creates a measure from a given mass function and a list of possible outcomes.
  4. Binomial Distribution:

    • The binomial function creates a binomial measure given parameters n (number of trials) and p (success probability).

Computing Metrics

Here are two utility functions to compute the expectation and variance of a measure.

def expectation(measure: Measure[T]) -> float:
    return integrate(lambda x: x, measure)

def variance(measure: Measure[T]) -> float:
    return integrate(lambda x: x**2, measure) - expectation(measure) ** 2

Now, we can define for instance the binomial probability law with the following code:

def binomial(params: Tuple[int, float]) -> Measure[int]:
    n, p = params
    def pmf_bin(x: int) -> float:
        if x < 0 or x > n:
            return 0
        else:
            return scipy.special.comb(n, x) * p ** x * (1 - p) ** (n - x)
    return fromMassFunction(pmf_bin, list(range(n + 1)))

and test it with simple values

expectation(binomial([5, 0.6])), variance(binomial([5, 0.6]))

which gives

(3.0, 1.1999999999999993)

which is indeed the expected result.

Example Usage

Let’s consider a more refined and realistic probabilistic use case involving two random variables: the number of rainy days in a month and the amount of rain on a given rainy day. We’ll use these to compose measures and perform computations.

Use Case: Rainfall Analysis

  1. Number of Rainy Days: Assume the number of rainy days in a month follows a binomial distribution with parameters (n = 30) (days in a month) and (p = 0.2) (probability of a day being rainy).
  2. Amount of Rain on a Rainy Day: On a rainy day, the amount of rain (in mm) follows a normal distribution with mean 10 mm and standard deviation 5 mm.

We’ll implement this using the measure class and demonstrate how to compose these measures and compute some statistics.

Step-by-Step Implementation

  1. Define the Number of Rainy Days Measure: Binomial distribution with (n = 30) and (p = 0.2).
  2. Define the Amount of Rain Measure: Normal distribution with (\mu = 10 * rainy_days) mm and (\sigma = 5) mm.
  3. Compose the Measures: Use bind to model the total amount of rain in a month.
import scipy.stats as stats

# Number of rainy days in a month
rainy_days_measure = binomial((30, 0.2))

# Amount of rain on a rainy day
def rain_amount_per_day(rainy_days: int) -> Measure[float]:
    return fromDensityFunction(lambda x: stats.norm.pdf(x, 10 * rainy_days, 5))

# Total amount of rain in the month (composing measures)
total_rain_measure = rainy_days_measure.bind(rain_amount_per_day)

# Computing expectation and variance of the total rain
print("Expected Total Rain (mm):", expectation(total_rain_measure))
print("Variance of Total Rain (mm^2):", variance(total_rain_measure))

which gives

Expected Total Rain (mm): 53.497099666414464
Variance of Total Rain (mm^2): 543.9145481257365

Explanation of the Code

  1. Binomial Distribution for Rainy Days: We use the binomial function to define the measure for the number of rainy days in a month.
  2. Normal Distribution for Rainfall: We use the normal function to define the measure for the amount of rain on a given rainy day.
  3. Composing Measures:
    • The rain_amount_per_day function generates a normal distribution measure for the amount of rain given the number of rainy days.
    • The total_rain_measure is composed by binding the rainy_days_measure with rain_amount_per_day, modeling the total amount of rain in a month.
  4. Expectation and Variance: We compute the expected total amount of rain and its variance using the expectation and variance functions.

This refined example demonstrates how to model a real-world probabilistic scenario using measures and compose them to perform meaningful calculations.

Conclusion

While this formalism allows us to compute probabilities and related metrics in a structured way, it has some limitations. The primary challenge is the computational complexity and inefficiency that can arise when working with measures and performing integrations, especially for continuous or high-dimensional spaces. Additionally, the abstraction can be less intuitive for those not familiar with functional programming concepts and category theory. Despite these challenges, this approach provides a powerful framework for probabilistic computations, making it easier to reason about and compose probability measures in a mathematical and structured way.

References: