*guest post by Mark Meckes*

For the past several years I’ve been thinking on and off about whether there’s a fruitful category-theoretic perspective on probability theory, or at least a perspective with a category-theoretic flavor.

(You can see this MathOverflow question by Pete Clark for some background, though I started thinking about this question somewhat earlier. The fact that I’m writing this post should tell you something about my attitude toward my own answer there. On the other hand, that answer indicates something of the perspective I’m coming from.)

I’m a long way from finding such a perspective I’m happy with, but I have some observations I’d like to share with other n-Category Café patrons on the subject, in hopes of stirring up some interesting discussion. The main idea here was pointed out to me by Tom, who I pester about this subject on an approximately annual basis.

Let’s first dispense with one rather banal observation. Let $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$ be the category whose objects are probability spaces (measure spaces with total measure $<semantics>1<annotation\; encoding="application/x-tex">1</annotation></semantics>$), and whose morphisms are almost-everywhere-equality equivalence classes of measure-preserving maps. Then:

Probability theory is

notabout the category $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$.

To put it a little less (ahem) categorically, probability theory is not about the category $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$, in the sense that group theory or topology might be said (however incompletely) to be about the categories $<semantics>\mathrm{Grp}<annotation\; encoding="application/x-tex">\backslash mathbf\{Grp\}</annotation></semantics>$ or $<semantics>\mathrm{Top}<annotation\; encoding="application/x-tex">\backslash mathbf\{Top\}</annotation></semantics>$. The most basic justification of this assertion is that isomorphic objects in $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$ are not “the same” from the point of view of probability theory. Indeed, the distributions of

- a uniform random variable in an interval,
- an infinite sequence of independent coin flips, and
- Brownian motion $<semantics>\{{B}_{t}:t\ge 0\}<annotation\; encoding="application/x-tex">\backslash \{B\_t\; :\; t\; \backslash ge\; 0\backslash \}</annotation></semantics>$

are radically different things in probability theory, but they’re all isomorphic to each other in $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$!

Anyway, as any probabilist will tell you, probability theory isn’t
about probability spaces. The fundamental “objects” in probability
theory are actually the *morphisms* of $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$: random
variables.

Typically, a random variable is defined to be a measurable map
$<semantics>X:\Omega \to E<annotation\; encoding="application/x-tex">X:\backslash Omega\; \backslash to\; E</annotation></semantics>$, where $<semantics>(\Omega ,\mathbb{P})<annotation\; encoding="application/x-tex">(\backslash Omega,\; \backslash mathbb\{P\})</annotation></semantics>$ is a probability space
and $<semantics>E<annotation\; encoding="application/x-tex">E</annotation></semantics>$ is, a priori, just a measurable space. (I’m suppressing
$<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebras here, which indicates how modest the scope of this
post is: serious probability theory works with multiple
$<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebras on a single space.) But every random variable
canonically induces a probability measure on its codomain, its
**distribution** $<semantics>\mu ={X}_{\#}\mathbb{P}<annotation\; encoding="application/x-tex">\backslash mu\; =\; X\_\backslash \#\; \backslash mathbb\{P\}</annotation></semantics>$ defined by

$$<semantics>\mu (A)=\mathbb{P}({X}^{-1}(A))<annotation\; encoding="application/x-tex">\; \backslash mu(A)\; =\; \backslash mathbb\{P\}(X^\{-1\}(A))\; </annotation></semantics>$$

for every measurable $<semantics>A\subseteq E<annotation\; encoding="application/x-tex">A\; \backslash subseteq\; E</annotation></semantics>$. This formula is precisely what it means to say that $<semantics>X:(\Omega ,\mathbb{P})\to (E,\mu )<annotation\; encoding="application/x-tex">X:(\backslash Omega,\; \backslash mathbb\{P\})\; \backslash to\; (E,\; \backslash mu)</annotation></semantics>$ is measure-preserving.

In probability theory, the only questions we’re allowed to ask about $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ are about its distribution. On the other hand, two random variables which have the same distribution are not thought of as “the same random variable” in the same way that isomorphic groups are “the same group”. In fact, a probabilist’s favorite trick is to replace a random variable $<semantics>X:\Omega \to E<annotation\; encoding="application/x-tex">X:\backslash Omega\; \backslash to\; E</annotation></semantics>$ with another random variable $<semantics>X\prime :\Omega \prime \to E<annotation\; encoding="application/x-tex">X\text{\'}:\backslash Omega\text{\'}\; \backslash to\; E</annotation></semantics>$ which has the same distribution, but which is in some way easier to analyze. For example, $<semantics>X\prime <annotation\; encoding="application/x-tex">X\text{\'}</annotation></semantics>$ may factor in a useful way as the composition of two morphisms in $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$ (although probabilists don’t normally write about things in those terms).

Now let’s fix a codomain $<semantics>E<annotation\; encoding="application/x-tex">E</annotation></semantics>$. Then there is a category $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$ whose objects are $<semantics>E<annotation\; encoding="application/x-tex">E</annotation></semantics>$-valued random variables; if $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ and $<semantics>X\prime <annotation\; encoding="application/x-tex">X\text{\'}</annotation></semantics>$ are two random variables with domains $<semantics>(\Omega ,\mathbb{P})<annotation\; encoding="application/x-tex">(\backslash Omega,\; \backslash mathbb\{P\})</annotation></semantics>$ and $<semantics>(\Omega \prime ,\mathbb{P}\prime )<annotation\; encoding="application/x-tex">(\backslash Omega\text{\'},\; \backslash mathbb\{P\}\text{\'})</annotation></semantics>$ respectively, then a morphism from $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ to $<semantics>X\prime <annotation\; encoding="application/x-tex">X\text{\'}</annotation></semantics>$ is a measure-preserving map $<semantics>f:\Omega \to \Omega \prime <annotation\; encoding="application/x-tex">f:\backslash Omega\; \backslash to\; \backslash Omega\text{\'}</annotation></semantics>$ such that $<semantics>X\prime \circ f=X<annotation\; encoding="application/x-tex">X\text{\'}\; \backslash circ\; f\; =\; X</annotation></semantics>$. (Figuring out how to typeset the commutative triangle here is more trouble than I feel like going to.)

In this case

$$<semantics>{X}_{\#}\mathbb{P}=(X\prime \circ f{)}_{\#}\mathbb{P}=X{\prime}_{\#}{f}_{\#}\mathbb{P}=X{\prime}_{\#}\mathbb{P}\prime ,<annotation\; encoding="application/x-tex">\; X\_\{\backslash \#\}\; \backslash mathbb\{P\}\; =\; (X\text{\'}\; \backslash circ\; f)\_\{\backslash \#\}\; \backslash mathbb\{P\}\; =\; X\text{\'}\_\{\backslash \#\}\; f\_\{\backslash \#\}\; \backslash mathbb\{P\}\; =\; X\text{\'}\_\{\backslash \#\}\; \backslash mathbb\{P\}\text{\'},\; </annotation></semantics>$$

so if a morphism $<semantics>X\to X\prime <annotation\; encoding="application/x-tex">X\; \backslash to\; X\text{\'}</annotation></semantics>$ exists, then $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ and $<semantics>X\prime <annotation\; encoding="application/x-tex">X\text{\'}</annotation></semantics>$ have the same distribution. Moreover, if $<semantics>\mu <annotation\; encoding="application/x-tex">\backslash mu</annotation></semantics>$ is a probability measure on $<semantics>E<annotation\; encoding="application/x-tex">E</annotation></semantics>$, there is a canonical random variable with distribution $<semantics>\mu <annotation\; encoding="application/x-tex">\backslash mu</annotation></semantics>$, namely, the identity map $<semantics>{\mathrm{Id}}_{E}<annotation\; encoding="application/x-tex">Id\_E</annotation></semantics>$ on $<semantics>(E,\mu )<annotation\; encoding="application/x-tex">(E,\backslash mu)</annotation></semantics>$, and any random variable $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ with distribution $<semantics>\mu <annotation\; encoding="application/x-tex">\backslash mu</annotation></semantics>$ itself defines a morphism from the object $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ to that object $<semantics>{\mathrm{Id}}_{E}<annotation\; encoding="application/x-tex">Id\_E</annotation></semantics>$.

It follows that the family $<semantics>R(E,\mu )<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E,\; \backslash mu)</annotation></semantics>$ of random variables with distribution $<semantics>\mu <annotation\; encoding="application/x-tex">\backslash mu</annotation></semantics>$ is a connected component of $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$. (I don’t know whether the construction of $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$ from $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$ has a standard name, but I have learned that its connected components $<semantics>R(E,\mu )<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E,\; \backslash mu)</annotation></semantics>$ are slice categories of $<semantics>\mathrm{Prob}<annotation\; encoding="application/x-tex">\backslash mathbf\{Prob\}</annotation></semantics>$.)

Now a typical theorem in probability theory starts by taking a family of random variables $<semantics>{X}_{i}:\Omega \to {E}_{i}<annotation\; encoding="application/x-tex">X\_i\; :\; \backslash Omega\; \backslash to\; E\_i</annotation></semantics>$ all defined on the same domain $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$. That’s no problem in this picture: this is the same as a single random variable $<semantics>X:\Omega \to {\prod}_{i}{E}_{i}<annotation\; encoding="application/x-tex">X\; :\; \backslash Omega\; \backslash to\; \backslash prod\_i\; E\_i</annotation></semantics>$. (There’s also always some kind of assumption about the relationships among the $<semantics>{X}_{i}<annotation\; encoding="application/x-tex">X\_i</annotation></semantics>$ — independence, for example, though that’s only the simplest such relationship that people think about — I don’t (yet!) have any thoughts to share about expressing those relationships in terms of the picture here.)

The next thing is to cook up a new random variable defined on $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$ by applying some measurable function $<semantics>F:{\prod}_{i}{E}_{i}\to E<annotation\; encoding="application/x-tex">F:\backslash prod\_i\; E\_i\; \backslash to\; E</annotation></semantics>$. A prototype is the function (well, family of functions)

$$<semantics>F:{\mathbb{R}}^{n}\to \mathbb{R},\phantom{\rule{2em}{0ex}}({x}_{1},\dots ,{x}_{n})\mapsto \sum _{i=1}^{n}{x}_{n},<annotation\; encoding="application/x-tex">\; F:\; \backslash mathbb\{R\}^n\; \backslash to\; \backslash mathbb\{R\},\; \backslash qquad\; (x\_1,\; \backslash ldots,\; x\_n)\; \backslash mapsto\; \backslash sum\_\{i=1\}^n\; x\_n,\; </annotation></semantics>$$

which has a starring role in all the classics: the Weak and Strong Laws of Large Numbers, Central Limit Theorem, Law of the Iterated Logarithm, Cramér’s Theorem, etc. This fits nicely into this picture, too: any measurable map $<semantics>F:E\to E\prime <annotation\; encoding="application/x-tex">F:E\; \backslash to\; E\text{\'}</annotation></semantics>$ induces a functor $<semantics>{F}_{!}:R(E)\to R(E\prime )<annotation\; encoding="application/x-tex">F\_!:\backslash mathbf\{R\}(E)\; \backslash to\; \backslash mathbf\{R\}(E\text{\'})</annotation></semantics>$ in an obvious way (a morphism in $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$ given by a measure-preserving $<semantics>f:\Omega \to \Omega \prime <annotation\; encoding="application/x-tex">f:\backslash Omega\; \backslash to\; \backslash Omega\text{\'}</annotation></semantics>$ is mapped to a morphism $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$ given by the same $<semantics>f<annotation\; encoding="application/x-tex">f</annotation></semantics>$ — that point is probably obvious to most of the people here, but I needed to think a bit about it a bit to convince myself that $<semantics>{F}_{!}<annotation\; encoding="application/x-tex">F\_!</annotation></semantics>$ really is a functor).

Finally, as I said, a probabilist may go about understanding the distribution of the random variable $<semantics>F(X)<annotation\; encoding="application/x-tex">F(X)</annotation></semantics>$ — that is, the object $<semantics>{F}_{!}(X)<annotation\; encoding="application/x-tex">F\_!(X)</annotation></semantics>$ of $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$ — by instead working with another object $<semantics>Y<annotation\; encoding="application/x-tex">Y</annotation></semantics>$ in the same connected component of $<semantics>R(E)<annotation\; encoding="application/x-tex">\backslash mathbf\{R\}(E)</annotation></semantics>$. Both the assumptions on $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ and the structure of $<semantics>F<annotation\; encoding="application/x-tex">F</annotation></semantics>$ may be used to help cook up $<semantics>Y<annotation\; encoding="application/x-tex">Y</annotation></semantics>$.

This is quite different from any category-theoretic perspective I’ve ever encountered in, say, algebra or topology, but my ignorance of those fields is broad and deep. If anyone finds this kind of category-theoretic picture familiar, I’d love to hear about it!

One last observation here is that I believe (I haven’t tried writing out all the details) that the mappings

$$<semantics>E\mapsto R(E),\phantom{\rule{2em}{0ex}}F\mapsto {F}_{!}<annotation\; encoding="application/x-tex">\; E\; \backslash mapsto\; \backslash mathbf\{R\}(E),\; \backslash qquad\; F\; \backslash mapsto\; F\_!\; </annotation></semantics>$$

define a functor $<semantics>\mathrm{Meas}\to \mathrm{Cat}<annotation\; encoding="application/x-tex">\backslash mathbf\{Meas\}\; \backslash to\; \backslash mathbf\{Cat\}</annotation></semantics>$. I have no idea what, if anything, this observation may do for probability theory.