In this article, I will explore an intriguing way of thinking about probabilities from a categorical viewpoint, which eventually allows us to understand the definition of probabilistic computation in a functional form. This idea is far from new (in the 1960s, Lawvere already constructed this notion to create a framework that could be used to plan certain treaties with the Soviets on disarmament), and there are several ways to bridge the gap between category theory and probability theory.
One of the most classical ways to make this connection is through the Giry monad, developed earlier by Lawvere, which focuses on the category of measurable spaces and the Giry monad. In the following sections of this article, we will try to understand all of this, which will allow us to propose an interesting way to program probability laws in Python.
Probability Theory and Measure Theory
Before delving into the Giry monad, let’s recall some fundamental concepts in probability theory as developed by Kolmogorov in the early 20th century. The first fundamental notion is that of a measure and a measurable space:
A measurable space $(X, \mathcal{A})$ is a set $X$ equipped with a sigma-algebra $\mathcal{A}$. The elements of $\mathcal{A}$ are called the measurable sets of $X$. A sigma-algebra is a non-empty collection of subsets of $X$ that is closed under complementation and countable unions (and thus also countable intersections), which allows us to rigorously define what types of subsets of $X$ we can “measure”. Consequently, a measure space $(X, \mathcal{A}, \mu)$ is a measurable space equipped with a measure $\mu$, which is a function defined on $\mathcal{A}$ that maps to the real numbers. In this sense, a measure is just a generalization of the notion of volume over subsets that “behave well” (those in a sigma-algebra). Kolmogorov’s idea is to define the notion of probabilities as a “volume”, thus a measure, on the set of “possible events”.
We can also define the important notion of a measurable function as a function $f$ defined between two measurable spaces $(A, \mathcal{A})$ and $(B, \mathcal{B})$ such that for all $B \in \mathcal{B}$, $f^{-1}(B) \in \mathcal{A}$. This allows us to define an integral over a measurable function and to generalize the notion of the Riemann integral (Lebesgue integral). An integral over a measurable function $f$ is also a measure on the codomain of $f$.
Now, a probability space is simply a measure space whose measure takes values in $[0, 1]$.
Category of Measurable Spaces and the Giry Monad
Consider now the category $\mathbf{Meas}$ as the category whose objects are measurable spaces and whose arrows are measurable functions.
For any measurable space $A$, we can associate another measurable space $G(A)$, which corresponds to the space of all probability measures defined on $A$. This is indeed a measurable space because we can define a sigma-algebra generated by all evaluation functions $env_U: G(A) \rightarrow [0, 1]$, where $U$ is any measurable set of $A$, and which associates to any probability measure $\mu$ the value $\mu(U)$:
$$ eval_U(\mu) = \mu(U) = \int_U d\mu $$
We note that the $eval_U$ are themselves measurable functions on the space $G(A)$. We can also “overload” the definition of $env$ for functions such as:
$$ eval_f(\mu) = \int f d\mu $$
This shows a measure on the measures with respect to a function (density on $A$). In these two definitions, the relationship between measure and function/set is reversed. The measure $\mu$ is being measured.
$G$ is an Endofunctor
First, let’s notice that $G$ is an endofunctor since it is a mapping defined on the objects and arrows of $\mathbf{Meas}$ (a category) with the same category as its codomain: $G: \mathbf{Meas} \rightarrow \mathbf{Meas}$. The “fmap” of the functor can be simply defined as, for any morphism $f$ of $\mathbf{Meas}$:
$$ G(f): \mu \mapsto \mu \circ f^{-1} $$
which means that we are taking the pushforward measure of $\mu$ by $f$.
$G$ is a Monad
$G$ is indeed a monad because we can identify the existence of:
- A natural transformation $\eta: 1 \rightarrow G$, defined as $\eta_M(x)(A) = \delta_x(A)$ for $x$ an element of $M$, a measurable space in the category $\mathbf{Meas}$, and $A$ a measurable set of $M$. Here, $\delta_x$ is the Dirac measure centered at $x$.
- A natural transformation $\mu: G \circ G \rightarrow G$, defined as a conversion from the space of probability measures on the space of probability measures of a space, to the space of probability measures of the same space. This morphism can be constructed as follows: for all $\rho \in G(G(M))$, with $M$ a measurable space,
$$ \mu(\rho)(A) = \int_{G(M)} eval_A , d\rho $$
and for a measurable function $f: M \rightarrow \mathbb{R}$,
$$ \mu(\rho)(f) = \int_{G(M)} eval_f , d\rho $$
This means that the monoidal action is done by integrating over the space of probability measures on $M$, which still yields a probability measure on $M$.
In programming, we tend to use the flatmap operator for a monad $M$, which has the type:
$$ \text{flatmap}: M(T) \rightarrow (T \rightarrow M(U)) \rightarrow M(U) $$
This operation can be written in the context of the Giry monad as:
$$ \text{flatmap}(\rho, g)(A) = \int_{T} {\lambda m. eval_{g(m)}(A) } , d\rho $$
and we can similarly as before overload the definition on functions.
$$ \text{flatmap}(\rho, g)(f) = \int_{T} {\lambda m. eval_{g(m)}(f) } , d\rho $$
Implementing the Giry Monad in Python
Implementing the Giry monad directly can be quite complex due to the intrinsic difficulty of manipulating measures on sets in a computational setting. Instead of using measures directly on sets or algebras, we can use an alternate definition involving functions. This allows us to work with measures by defining probabilities and computations on these probabilities.
Why Use Measures on Functions?
In measure theory, a measure on a set assigns a non-negative value to subsets of the set, representing the “size” or “volume” of those subsets. However, directly implementing this concept on a computer is challenging for several reasons:
- Infinite Sets: Many interesting sets are infinite or uncountably infinite, making them impractical to represent explicitly in a computer program.
- Complexity: Even for finite sets, the number of subsets can grow exponentially, leading to combinatorial complexity.
Instead, we can leverage an alternative perspective where we define a measure via its action on functions. This approach uses the concept of integration: rather than focusing on measuring sets directly, we measure functions over these sets. Specifically, we use the idea that for a given measure $\mu$ and a function $f$, the measure of $f$ is given by the integral of $f$ with respect to $\mu$.
Defining Measures on Functions
A measure on a function $f$ with respect to a measure $\mu$ can be understood as evaluating the expected value or integral of $f$ over the space. Formally, for a measure $\mu$ on a space $X$ and a function $f: X \rightarrow \mathbb{R}$, we define:
$$ \text{env}_f(\mu) = \int_X f , d\mu $$
This notation emphasizes that we are focusing on the integral (or expected value) of functions rather than the direct measurement of sets. This perspective aligns well with computational approaches, where functions can be more easily manipulated and evaluated.
Giry Monad Implementation
First, let’s define the Measure
class in Python. This class will represent a measure as a function (mu
) that takes another function and returns a float, representing the integral of the function with respect to the measure.
from typing import Generic, TypeVar, Callable, List, Tuple
from functools import partial, reduce
from scipy.integrate import quad
import numpy as np
import scipy
T = TypeVar("T")
U = TypeVar("U")
class Measure(Generic[T]):
def __init__(self, mu: Callable[[Callable[[T], float]], float]):
self.mu = mu
def fmap(self, f: Callable[[T], U]) -> "Measure[U]":
def new_measure(g: Callable[[U], float]) -> float:
return self.mu
(lambda x: g(f(x)))
return Measure(mu=new_measure)
def bind(self, f: Callable[[T], "Measure[U]"]) -> "Measure[U]":
def new_measure(g: Callable[[U], float]) -> float:
return self.mu(lambda x: f(x).mu(g))
return Measure(mu=new_measure)
@staticmethod
def unit(x: T) -> "Measure[T]":
def new_measure(f: Callable[[T], float]) -> float:
return f(x)
return Measure(mu=new_measure)
def integrate(f: Callable[[T], float], measure: "Measure[T]") -> float:
return measure.mu(f)
def fromMassFunction(f: Callable[[T], float], xs: List[T]) -> "Measure[T]":
def new_measure(g: Callable[[T], float]) -> float:
return sum(f(x) * g(x) for x in xs)
return Measure(mu=new_measure)
def fromDensityFunction(f: Callable[[T], float]) -> "Measure[T]":
def new_measure(g: Callable[[T], float]) -> float:
integral, _ = quad(lambda x: f(x) * g(x), -np.inf, np.inf, limit=100)
return integral
return Measure(mu=new_measure)
Explanation of the Code
-
Measure Class:
- The
Measure
class encapsulates a measure, represented as a function (mu
) that takes another function and returns a float. fmap
maps a function over a measure, transforming the measure according to the given function.bind
(also known asflatMap
) is used to chain together computations that return measures.unit
creates a measure from a single value, corresponding to the Dirac delta function.
- The
-
Integrate Function:
integrate
takes a function and a measure and returns the integral of the function with respect to the measure.
-
fromMassFunction Function:
fromMassFunction
creates a measure from a given mass function and a list of possible outcomes.
-
Binomial Distribution:
- The
binomial
function creates a binomial measure given parametersn
(number of trials) andp
(success probability).
- The
Computing Metrics
Here are two utility functions to compute the expectation and variance of a measure.
def expectation(measure: Measure[T]) -> float:
return integrate(lambda x: x, measure)
def variance(measure: Measure[T]) -> float:
return integrate(lambda x: x**2, measure) - expectation(measure) ** 2
Now, we can define for instance the binomial probability law with the following code:
def binomial(params: Tuple[int, float]) -> Measure[int]:
n, p = params
def pmf_bin(x: int) -> float:
if x < 0 or x > n:
return 0
else:
return scipy.special.comb(n, x) * p ** x * (1 - p) ** (n - x)
return fromMassFunction(pmf_bin, list(range(n + 1)))
and test it with simple values
expectation(binomial([5, 0.6])), variance(binomial([5, 0.6]))
which gives
(3.0, 1.1999999999999993)
which is indeed the expected result.
Example Usage
Let’s consider a more refined and realistic probabilistic use case involving two random variables: the number of rainy days in a month and the amount of rain on a given rainy day. We’ll use these to compose measures and perform computations.
Use Case: Rainfall Analysis
- Number of Rainy Days: Assume the number of rainy days in a month follows a binomial distribution with parameters (n = 30) (days in a month) and (p = 0.2) (probability of a day being rainy).
- Amount of Rain on a Rainy Day: On a rainy day, the amount of rain (in mm) follows a normal distribution with mean 10 mm and standard deviation 5 mm.
We’ll implement this using the measure class and demonstrate how to compose these measures and compute some statistics.
Step-by-Step Implementation
- Define the Number of Rainy Days Measure: Binomial distribution with (n = 30) and (p = 0.2).
- Define the Amount of Rain Measure: Normal distribution with (\mu = 10 * rainy_days) mm and (\sigma = 5) mm.
- Compose the Measures: Use
bind
to model the total amount of rain in a month.
import scipy.stats as stats
# Number of rainy days in a month
rainy_days_measure = binomial((30, 0.2))
# Amount of rain on a rainy day
def rain_amount_per_day(rainy_days: int) -> Measure[float]:
return fromDensityFunction(lambda x: stats.norm.pdf(x, 10 * rainy_days, 5))
# Total amount of rain in the month (composing measures)
total_rain_measure = rainy_days_measure.bind(rain_amount_per_day)
# Computing expectation and variance of the total rain
print("Expected Total Rain (mm):", expectation(total_rain_measure))
print("Variance of Total Rain (mm^2):", variance(total_rain_measure))
which gives
Expected Total Rain (mm): 53.497099666414464
Variance of Total Rain (mm^2): 543.9145481257365
Explanation of the Code
- Binomial Distribution for Rainy Days: We use the
binomial
function to define the measure for the number of rainy days in a month. - Normal Distribution for Rainfall: We use the
normal
function to define the measure for the amount of rain on a given rainy day. - Composing Measures:
- The
rain_amount_per_day
function generates a normal distribution measure for the amount of rain given the number of rainy days. - The
total_rain_measure
is composed by binding therainy_days_measure
withrain_amount_per_day
, modeling the total amount of rain in a month.
- The
- Expectation and Variance: We compute the expected total amount of rain and its variance using the
expectation
andvariance
functions.
This refined example demonstrates how to model a real-world probabilistic scenario using measures and compose them to perform meaningful calculations.
Conclusion
While this formalism allows us to compute probabilities and related metrics in a structured way, it has some limitations. The primary challenge is the computational complexity and inefficiency that can arise when working with measures and performing integrations, especially for continuous or high-dimensional spaces. Additionally, the abstraction can be less intuitive for those not familiar with functional programming concepts and category theory. Despite these challenges, this approach provides a powerful framework for probabilistic computations, making it easier to reason about and compose probability measures in a mathematical and structured way.