*Guest post by Tom Avery*

Tom *(here Tom means me, not him — Tom)* has written several times about a piece of categorical machinery that, when given an appropriate input, churns out some well-known mathematical concepts. This machine is the process of constructing the codensity monad of a functor.

In this post, I’ll give another example of a well-known concept that arises as a codensity monad; namely probability measures. This is something that I’ve just written a paper about.

### The Giry monads

Write $<semantics>\mathrm{Meas}<annotation\; encoding="application/x-tex">\backslash mathbf\{Meas\}</annotation></semantics>$ for the category of measurable spaces (sets equipped with a $<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebra of subsets) and measurable maps. I’ll also write $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$ for the unit interval $<semantics>[0,1]<annotation\; encoding="application/x-tex">[0,1]</annotation></semantics>$, equipped with the Borel $<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebra.

Let $<semantics>\Omega \in \mathrm{Meas}<annotation\; encoding="application/x-tex">\backslash Omega\; \backslash in\; \backslash mathbf\{Meas\}</annotation></semantics>$. There are lots of different probability measures we can put on $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$; write $<semantics>G\Omega <annotation\; encoding="application/x-tex">G\backslash Omega</annotation></semantics>$ for the set of all of them.

Is $<semantics>G\Omega <annotation\; encoding="application/x-tex">G\backslash Omega</annotation></semantics>$ a measurable space? Yes: An element of $<semantics>G\Omega <annotation\; encoding="application/x-tex">G\backslash Omega</annotation></semantics>$ is a function that sends measurable subsets of $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$ to numbers in $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$. Turning this around, we have, for each measurable $<semantics>A\subseteq \Omega <annotation\; encoding="application/x-tex">A\; \backslash subseteq\; \backslash Omega</annotation></semantics>$, an evaluation map $<semantics>{\mathrm{ev}}_{A}:G\Omega \to I<annotation\; encoding="application/x-tex">ev\_A\; \backslash colon\; G\backslash Omega\; \backslash to\; I</annotation></semantics>$. Let’s give $<semantics>G\Omega <annotation\; encoding="application/x-tex">G\backslash Omega</annotation></semantics>$ the smallest $<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebra such that all of these are measurable.

Is $<semantics>G<annotation\; encoding="application/x-tex">G</annotation></semantics>$ a functor? Yes: Given a measurable map $<semantics>g:\Omega \to \Omega \prime <annotation\; encoding="application/x-tex">g\; \backslash colon\; \backslash Omega\; \backslash to\; \backslash Omega\text{\'}</annotation></semantics>$ and $<semantics>\pi \in G\Omega <annotation\; encoding="application/x-tex">\backslash pi\; \backslash in\; G\backslash Omega</annotation></semantics>$, we can define the pushforward $<semantics>Gg(\pi )<annotation\; encoding="application/x-tex">G\; g(\backslash pi)</annotation></semantics>$ of $<semantics>\pi <annotation\; encoding="application/x-tex">\backslash pi</annotation></semantics>$ along $<semantics>g<annotation\; encoding="application/x-tex">g</annotation></semantics>$ by

$$<semantics>Gg(\pi )(A\prime )=\pi ({g}^{-1}A\prime )<annotation\; encoding="application/x-tex">\; G\; g(\backslash pi)(A\text{\'})\; =\; \backslash pi(g^\{-1\}\; A\text{\'})\; </annotation></semantics>$$

for measurable $<semantics>A\prime \subseteq \Omega \prime <annotation\; encoding="application/x-tex">A\text{\'}\; \backslash subseteq\; \backslash Omega\text{\'}</annotation></semantics>$.

Is $<semantics>G<annotation\; encoding="application/x-tex">G</annotation></semantics>$ a monad? Yes: Given $<semantics>\omega \in \Omega <annotation\; encoding="application/x-tex">\backslash omega\; \backslash in\; \backslash Omega</annotation></semantics>$ we can define $<semantics>\eta (\omega )\in G\Omega <annotation\; encoding="application/x-tex">\backslash eta(\backslash omega)\; \backslash in\; G\backslash Omega</annotation></semantics>$ by

$$<semantics>\eta (\omega )(A)={\chi}_{A}(\omega )<annotation\; encoding="application/x-tex">\; \backslash eta(\backslash omega)(A)\; =\; \backslash chi\_A\; (\backslash omega)\; </annotation></semantics>$$

where $<semantics>A<annotation\; encoding="application/x-tex">A</annotation></semantics>$ is a measurable subset of $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$ and $<semantics>{\chi}_{A}<annotation\; encoding="application/x-tex">\backslash chi\_A</annotation></semantics>$ is its characteristic function. In other words $<semantics>\eta (\omega )<annotation\; encoding="application/x-tex">\backslash eta(\backslash omega)</annotation></semantics>$ is the *Dirac measure* at $<semantics>\omega <annotation\; encoding="application/x-tex">\backslash omega</annotation></semantics>$. Given $<semantics>\rho \in GG\Omega <annotation\; encoding="application/x-tex">\backslash rho\; \backslash in\; G\; G\backslash Omega</annotation></semantics>$, let

$$<semantics>\mu (\rho )(A)={\int}_{G\Omega}{\mathrm{ev}}_{A}\phantom{\rule{thinmathspace}{0ex}}\mathrm{d}\rho <annotation\; encoding="application/x-tex">\; \backslash mu(\backslash rho)(A)\; =\; \backslash int\_\{\backslash G\backslash Omega\}\; ev\_A\; \backslash ,\backslash mathrm\{d\}\backslash rho\; </annotation></semantics>$$

for measurable $<semantics>A\subseteq \Omega <annotation\; encoding="application/x-tex">A\; \backslash subseteq\; \backslash Omega</annotation></semantics>$, where $<semantics>{ev}_{A}:G\Omega \to I<annotation\; encoding="application/x-tex">\backslash ev\_A\; \backslash colon\; G\backslash Omega\; \backslash to\; I</annotation></semantics>$ is as above.

This is the **Giry monad** $<semantics>\mathbb{G}=(G,\eta ,\mu )<annotation\; encoding="application/x-tex">\backslash mathbb\{G\}\; =\; (G,\backslash eta,\backslash mu)</annotation></semantics>$, first defined (unsurprisingly) by Giry in “A categorical approach to probability theory”.

A finitely additive probability measure $<semantics>\pi <annotation\; encoding="application/x-tex">\backslash pi</annotation></semantics>$ is just like a probability measure, except that it is only well-behaved with respect to *finite* disjoint unions, rather than arbitrary *countable* disjoint unions. More precisely, rather than having

$$<semantics>\pi \left(\bigcup _{i=1}^{\mathrm{\infty}}{A}_{i}\right)=\sum _{i=1}^{\mathrm{\infty}}\pi ({A}_{i})<annotation\; encoding="application/x-tex">\; \backslash pi\backslash left(\backslash bigcup\_\{i=1\}^\{\backslash infty\}\; A\_i\backslash right)\; =\; \backslash sum\_\{i=1\}^\{\backslash infty\}\; \backslash pi(A\_i)\; </annotation></semantics>$$

for disjoint $<semantics>{A}_{i}<annotation\; encoding="application/x-tex">A\_i</annotation></semantics>$, we just have

$$<semantics>\pi \left(\bigcup _{i=1}^{n}{A}_{i}\right)=\sum _{i=1}^{n}\pi ({A}_{i})<annotation\; encoding="application/x-tex">\; \backslash pi\backslash left(\backslash bigcup\_\{i=1\}^\{n\}\; A\_i\backslash right)\; =\; \backslash sum\_\{i=1\}^\{n\}\; \backslash pi(A\_i)\; </annotation></semantics>$$

for disjoint $<semantics>{A}_{i}<annotation\; encoding="application/x-tex">A\_i</annotation></semantics>$.

We could repeat the definition of the Giry monad with “probability measure” replaced by “finitely additive probability measure”; doing so would give the **finitely additive Giry monad** $<semantics>\mathbb{F}=(F,\eta ,\mu )<annotation\; encoding="application/x-tex">\backslash mathbb\{F\}\; =\; (F,\backslash eta,\backslash mu)</annotation></semantics>$. Every probability measure is a finitely additive probability measure, but not all finitely additive probability measures are probability measures. So $<semantics>\mathbb{G}<annotation\; encoding="application/x-tex">\backslash mathbb\{G\}</annotation></semantics>$ is a proper submonad of $<semantics>\mathbb{F}<annotation\; encoding="application/x-tex">\backslash mathbb\{F\}</annotation></semantics>$.

The Kleisli category of $<semantics>\mathbb{G}<annotation\; encoding="application/x-tex">\backslash mathbb\{G\}</annotation></semantics>$ is quite interesting. Its objects are just the measurable spaces, and the morphisms are a kind of non-deterministic map called a **Markov kernel** or **conditional probability distribution**. As a special case, a discrete space equipped with an endomorphism in the Kleisli category is a discrete-time Markov chain.

I’ll explain how the Giry monads arise as codensity monads, but first I’d like to mention a connection with another example of a codensity monad; namely the ultrafilter monad.

An ultrafilter $<semantics>\mathcal{U}<annotation\; encoding="application/x-tex">\backslash mathcal\{U\}</annotation></semantics>$ on a set $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ is a set of subsets of $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ satisfying some properties. So $<semantics>\mathcal{U}<annotation\; encoding="application/x-tex">\backslash mathcal\{U\}</annotation></semantics>$ is a subset of the powerset $<semantics>\mathcal{P}X<annotation\; encoding="application/x-tex">\backslash mathcal\{P\}X</annotation></semantics>$ of $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$, and is therefore determined by its characteristic function, which takes values in $<semantics>\{0,1\}\subseteq I<annotation\; encoding="application/x-tex">\backslash \{0,1\backslash \}\; \backslash subseteq\; I</annotation></semantics>$. In other words, an ultrafilter on $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ can be thought of as a special function

$$<semantics>\mathcal{P}X\to I.<annotation\; encoding="application/x-tex">\; \backslash mathcal\{P\}X\; \backslash to\; I.\; </annotation></semantics>$$

It turns out that “special function” here means “finitely additive probability measure defined on all of $<semantics>\mathcal{P}X<annotation\; encoding="application/x-tex">\backslash mathcal\{P\}X</annotation></semantics>$ and taking values in $<semantics>\{0,1\}<annotation\; encoding="application/x-tex">\backslash \{0,1\backslash \}</annotation></semantics>$”.

So the ultrafilter monad on $<semantics>\mathrm{Set}<annotation\; encoding="application/x-tex">\backslash mathbf\{Set\}</annotation></semantics>$ (which sends a set to the set of ultrafilters on it) is a primitive version of the finitely additive Giry monad. With this in mind, and given the fact that the ultrafilter monad is the codensity monad of the inclusion of the category of finite sets into the category of sets, it is not that surprising that the Giry monads are also codensity monads. In particular, we might expect $<semantics>\mathbb{F}<annotation\; encoding="application/x-tex">\backslash mathbb\{F\}</annotation></semantics>$ to be the codensity monad of some functor involving spaces that are “finite” in some sense, and for $<semantics>\mathbb{G}<annotation\; encoding="application/x-tex">\backslash mathbb\{G\}</annotation></semantics>$ we’ll need to include some information pertaining to countable additivity.

### Integration operators

If you have a measure on a space then you can integrate functions on that space. The converse is also true: if you have a way of integrating functions on a space then you can extract a measure.

There are various ways of making this precise, the most famous of which is the Riesz-Markov-Kakutani Representation Theorem:

**Theorem.** *Let $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ be a compact Hausdorff space. Then the space of finite, signed Borel measures on $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$ is canonically isomorphic to*

$$<semantics>\mathrm{NVS}(\mathrm{Top}(X,\mathbb{R}),\mathbb{R})<annotation\; encoding="application/x-tex">\; \backslash mathbf\{NVS\}(\backslash mathbf\{Top\}(X,\backslash mathbb\{R\}),\backslash mathbb\{R\})\; </annotation></semantics>$$

*as a normed vector space, where $<semantics>\mathrm{Top}<annotation\; encoding="application/x-tex">\backslash mathbf\{Top\}</annotation></semantics>$ is the category of topological spaces, and $<semantics>\mathrm{NVS}<annotation\; encoding="application/x-tex">\backslash mathbf\{NVS\}</annotation></semantics>$ is the category of normed vector spaces.*

Given a finite, signed Borel measure $<semantics>\pi <annotation\; encoding="application/x-tex">\backslash pi</annotation></semantics>$ on $<semantics>X<annotation\; encoding="application/x-tex">X</annotation></semantics>$, the corresponding map $<semantics>\mathrm{Top}(X,\mathbb{R})\to \mathbb{R}<annotation\; encoding="application/x-tex">\backslash mathbf\{Top\}(X,\backslash mathbb\{R\})\; \backslash to\; \backslash mathbb\{R\}</annotation></semantics>$ sends a function to its integral with respect to $<semantics>\pi <annotation\; encoding="application/x-tex">\backslash pi</annotation></semantics>$. There are various different versions of this theorem that go by the same name.

My paper contains the following more modest version, which is a correction of a claim by Sturtz.

**Proposition.** *Finitely additive probability measures on a measurable space $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$ are canonically in bijection with functions $<semantics>\varphi :\mathrm{Meas}(\Omega ,I)\to I<annotation\; encoding="application/x-tex">\backslash phi\; \backslash colon\; \backslash mathbf\{Meas\}(\backslash Omega,I)\; \backslash to\; I</annotation></semantics>$ that are*

**affine:** *if $<semantics>f,g\in \mathrm{Meas}(\Omega ,I)<annotation\; encoding="application/x-tex">f,g\; \backslash in\; \backslash mathbf\{Meas\}(\backslash Omega,I)</annotation></semantics>$ and $<semantics>r\in I<annotation\; encoding="application/x-tex">r\; \backslash in\; I</annotation></semantics>$ then*

$$<semantics>\varphi (rf+(1-r)g)=r\varphi (f)+(1-r)\varphi (g),<annotation\; encoding="application/x-tex">\; \backslash phi(r\; f\; +\; (1-r)g)\; =\; r\backslash phi(f)\; +\; (1-r)\backslash phi(g),\; </annotation></semantics>$$

*and*

**weakly averaging:** *if $<semantics>\overline{r}<annotation\; encoding="application/x-tex">\backslash bar\{r\}</annotation></semantics>$ denotes the constant function with value $<semantics>r<annotation\; encoding="application/x-tex">r</annotation></semantics>$ then $<semantics>\varphi (\overline{r})=r<annotation\; encoding="application/x-tex">\backslash phi(\backslash bar\{r\})\; =\; r</annotation></semantics>$.*

*Call such a function a ***finitely additive integration operator**. The bijection restricts to a correspondence between (countably additive) probability measures and functions $<semantics>\varphi <annotation\; encoding="application/x-tex">\backslash phi</annotation></semantics>$ that additionally

**respect limits:** *if $<semantics>{f}_{n}\in \mathrm{Meas}(\Omega ,I)<annotation\; encoding="application/x-tex">f\_n\; \backslash in\; \backslash mathbf\{Meas\}(\backslash Omega,I)</annotation></semantics>$ is a sequence of functions converging pointwise to $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$ then $<semantics>\varphi ({f}_{n})<annotation\; encoding="application/x-tex">\backslash phi(f\_n)</annotation></semantics>$ converges to $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$.*

Call such a function an **integration operator**. The integration operator corresponding to a probability measure $<semantics>\pi <annotation\; encoding="application/x-tex">\backslash pi</annotation></semantics>$ sends a function $<semantics>f<annotation\; encoding="application/x-tex">f</annotation></semantics>$ to

$$<semantics>{\int}_{\Omega}f\mathrm{d}\pi ,<annotation\; encoding="application/x-tex">\; \backslash int\_\{\backslash Omega\}f\; \backslash mathrm\{d\}\backslash pi,\; </annotation></semantics>$$

which justifies the name. In the other direction, given an integration operator $<semantics>\varphi <annotation\; encoding="application/x-tex">\backslash phi</annotation></semantics>$, the value of the corresponding probability measure on a measurable set $<semantics>A\subseteq \Omega <annotation\; encoding="application/x-tex">A\; \backslash subseteq\; \backslash Omega</annotation></semantics>$ is $<semantics>\varphi ({\chi}_{A})<annotation\; encoding="application/x-tex">\backslash phi(\backslash chi\_A)</annotation></semantics>$.

These bijections are measurable (with respect to a natural $<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebra on the set of finitely additive integration operators) and natural in $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$, so they define isomorphisms of endofunctors of $<semantics>\mathrm{Meas}<annotation\; encoding="application/x-tex">\backslash mathbf\{Meas\}</annotation></semantics>$. Hence we can transfer the monad structures across the isomorphisms, and obtain descriptions of the Giry monads in terms of integration operators.

### The Giry monads via codensity monads

So far so good. But what does this have to do with codensity monads? First let’s recall the definition of a codensity monad. I won’t go into a great deal of detail; for more information see Tom’s first post on the topic.

Let $<semantics>U:\u2102\to \mathcal{M}<annotation\; encoding="application/x-tex">U\; \backslash colon\; \backslash mathbb\{C\}\; \backslash to\; \backslash mathcal\{M\}</annotation></semantics>$ be a functor. The codensity monad of $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$ is the right Kan extension of $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$ along itself. This consists of a functor $<semantics>{T}^{U}:\mathcal{M}\to \mathcal{M}<annotation\; encoding="application/x-tex">T^U\; \backslash colon\; \backslash mathcal\{M\}\; \backslash to\; \backslash mathcal\{M\}</annotation></semantics>$ satisfying a universal property, which equips $<semantics>{T}^{U}<annotation\; encoding="application/x-tex">T^U</annotation></semantics>$ with a canonical monad structure. The codensity monad doesn’t always exist, but it will whenever $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ is small and $<semantics>\mathcal{M}<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}</annotation></semantics>$ is complete. You can think of $<semantics>{T}^{U}<annotation\; encoding="application/x-tex">T^U</annotation></semantics>$ as a generalisation of the monad induced by the adjunction between $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$ and its left adjoint that makes sense when the left adjoint doesn’t exist. In particular, when the left adjoint *does* exist, the two monads coincide.

The end formula for right Kan extensions gives

$$<semantics>{T}^{U}m={\int}_{c\in \u2102}[\mathcal{M}(m,Uc),Uc],<annotation\; encoding="application/x-tex">\; T^U\; m\; =\; \backslash int\_\{c\; \backslash in\; \backslash mathbb\{C\}\}\; [\backslash mathcal\{M\}(m,U\; c),U\; c],\; </annotation></semantics>$$

where $<semantics>[\mathcal{M}(m,Uc),Uc]<annotation\; encoding="application/x-tex">[\backslash mathcal\{M\}(m,U\; c),U\; c]</annotation></semantics>$ denotes the $<semantics>\mathcal{M}(m,Uc)<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}(m,U\; c)</annotation></semantics>$ power of $<semantics>Uc<annotation\; encoding="application/x-tex">U\; c</annotation></semantics>$ in $<semantics>\mathcal{M}<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}</annotation></semantics>$, i.e. the product of $<semantics>\mathcal{M}(m,Uc)<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}(m,U\; c)</annotation></semantics>$ (a set) copies of $<semantics>Uc<annotation\; encoding="application/x-tex">U\; c</annotation></semantics>$ (an object of $<semantics>\mathcal{M}<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}</annotation></semantics>$) in $<semantics>\mathcal{M}<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}</annotation></semantics>$.

It doesn’t matter too much if you’re not familiar with ends because we can give an explicit description of $<semantics>{T}^{U}m<annotation\; encoding="application/x-tex">T^U\; m</annotation></semantics>$ in the case that $<semantics>\mathcal{M}=\mathrm{Meas}<annotation\; encoding="application/x-tex">\backslash mathcal\{M\}\; =\; \backslash mathbf\{Meas\}</annotation></semantics>$: The elements of $<semantics>{T}^{U}\Omega <annotation\; encoding="application/x-tex">T^U\backslash Omega</annotation></semantics>$ are families $<semantics>\alpha <annotation\; encoding="application/x-tex">\backslash alpha</annotation></semantics>$ of functions

$$<semantics>{\alpha}_{c}:\mathrm{Meas}(\Omega ,Uc)\to Uc<annotation\; encoding="application/x-tex">\; \backslash alpha\_c\; \backslash colon\; \backslash mathbf\{Meas\}(\backslash Omega,\; U\; c)\; \backslash to\; U\; c\; </annotation></semantics>$$

that are natural in $<semantics>c\in \u2102<annotation\; encoding="application/x-tex">c\; \backslash in\; \backslash mathbb\{C\}</annotation></semantics>$. For each $<semantics>c\in \u2102<annotation\; encoding="application/x-tex">c\; \backslash in\; \backslash mathbb\{C\}</annotation></semantics>$ and measurable $<semantics>f:\Omega \to Uc<annotation\; encoding="application/x-tex">f\; \backslash colon\; \backslash Omega\; \backslash to\; U\; c</annotation></semantics>$ we have $<semantics>{ev}_{f}:{T}^{U}\Omega \to I<annotation\; encoding="application/x-tex">\backslash ev\_f\; \backslash colon\; T^U\; \backslash Omega\; \backslash to\; I</annotation></semantics>$ mapping $<semantics>\alpha <annotation\; encoding="application/x-tex">\backslash alpha</annotation></semantics>$ to $<semantics>{\alpha}_{c}(f)<annotation\; encoding="application/x-tex">\backslash alpha\_c\; (f)</annotation></semantics>$. The $<semantics>\sigma <annotation\; encoding="application/x-tex">\backslash sigma</annotation></semantics>$-algebra on $<semantics>{T}^{U}\Omega <annotation\; encoding="application/x-tex">T^U\; \backslash Omega</annotation></semantics>$ is the smallest such that each of these maps is measurable.

All that’s left is to say what we should choose $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ and $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$ to be in order to get the Giry monads.

A subset $<semantics>c<annotation\; encoding="application/x-tex">c</annotation></semantics>$ of a real vector space $<semantics>V<annotation\; encoding="application/x-tex">V</annotation></semantics>$ is convex if for any $<semantics>x,y\in c<annotation\; encoding="application/x-tex">x,y\; \backslash in\; c</annotation></semantics>$ and $<semantics>r\in I<annotation\; encoding="application/x-tex">r\; \backslash in\; I</annotation></semantics>$ the convex combination $<semantics>rx+(1-r)y<annotation\; encoding="application/x-tex">r\; x\; +\; (1-r)y</annotation></semantics>$ is also in $<semantics>c<annotation\; encoding="application/x-tex">c</annotation></semantics>$, and a map $<semantics>h:c\to c\prime <annotation\; encoding="application/x-tex">h\; \backslash colon\; c\; \backslash to\; c\text{\'}</annotation></semantics>$ between convex sets is called **affine** if it preserves convex combinations. So there’s a category of convex sets and affine maps between them. We will be interested in certain full subcategories of this.

Let $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$ be the (convex) set of sequences in $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$ that converge to $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$ (it is a subset of the vector space $<semantics>{c}_{0}<annotation\; encoding="application/x-tex">c\_0</annotation></semantics>$ of all real sequences converging to $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$). Now we can define the categories of interest:

Let $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ be the category whose objects are all finite powers $<semantics>{I}^{n}<annotation\; encoding="application/x-tex">I^n</annotation></semantics>$ of $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$, with all affine maps between them.

Let $<semantics>\mathbb{D}<annotation\; encoding="application/x-tex">\backslash mathbb\{D\}</annotation></semantics>$ be the category whose objects are all finite powers of $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$, together with $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$, and all affine maps between them.

All the objects of $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ and $<semantics>\mathbb{D}<annotation\; encoding="application/x-tex">\backslash mathbb\{D\}</annotation></semantics>$ can be considered as measurable spaces (as subspaces of powers of $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$), and all the affine maps between them are then measurable, so we have (faithful but not full) inclusions $<semantics>U:\u2102\to \mathrm{Meas}<annotation\; encoding="application/x-tex">U\; \backslash colon\; \backslash mathbb\{C\}\; \backslash to\; \backslash mathbf\{Meas\}</annotation></semantics>$ and $<semantics>V:\mathbb{D}\to \mathrm{Meas}<annotation\; encoding="application/x-tex">V\; \backslash colon\; \backslash mathbb\{D\}\; \backslash to\; \backslash mathbf\{Meas\}</annotation></semantics>$.

**Theorem.** *The codensity monad of $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$ is the finitely additive Giry monad, and the codensity monad of $<semantics>V<annotation\; encoding="application/x-tex">V</annotation></semantics>$ is the Giry monad.*

Why should this be true? Let’s start with $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$. An element of $<semantics>{T}^{U}\Omega <annotation\; encoding="application/x-tex">T^U\; \backslash Omega</annotation></semantics>$ is a family of functions

$$<semantics>{\alpha}_{{I}^{n}}:\mathrm{Meas}(\Omega ,{I}^{n})\to {I}^{n}.<annotation\; encoding="application/x-tex">\; \backslash alpha\_\{I^n\}\; \backslash colon\backslash mathbf\{Meas\}(\backslash Omega,I^n)\; \backslash to\; I^n.\; </annotation></semantics>$$

But a map into $<semantics>{I}^{n}<annotation\; encoding="application/x-tex">I^n</annotation></semantics>$ is determined by its composites with the projections to $<semantics>I<annotation\; encoding="application/x-tex">I</annotation></semantics>$, and these projections are affine. This means that $<semantics>\alpha <annotation\; encoding="application/x-tex">\backslash alpha</annotation></semantics>$ is completely determined by $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_\{I\}</annotation></semantics>$, and the other components are obtained by applying $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_\{I\}</annotation></semantics>$ separately in each coordinate. In other words, an element of $<semantics>{T}^{U}\Omega <annotation\; encoding="application/x-tex">T^U\; \backslash Omega</annotation></semantics>$ is a special sort of function

$$<semantics>\mathrm{Meas}(\Omega ,I)\to I.<annotation\; encoding="application/x-tex">\; \backslash mathbf\{Meas\}(\backslash Omega,\; I)\; \backslash to\; I.\; </annotation></semantics>$$

Look familiar? As you might guess, the functions with the above domain and codomain that define elements of $<semantics>{T}^{U}\Omega <annotation\; encoding="application/x-tex">T^U\; \backslash Omega</annotation></semantics>$ are precisely the finitely additive integration operators.

The affine and weakly averaging properties of $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_\{I\}</annotation></semantics>$ are enforced by naturality with respect to certain affine maps. For example, the naturality square involving the affine map

$$<semantics>r{\pi}_{1}+(1-r){\pi}_{2}:{I}^{2}\to I<annotation\; encoding="application/x-tex">\; r\backslash pi\_1\; +\; (1-r)\backslash pi\_2\; \backslash colon\; I^2\; \backslash to\; I\; </annotation></semantics>$$

(where $<semantics>{\pi}_{i}<annotation\; encoding="application/x-tex">\backslash pi\_i</annotation></semantics>$ are the projections) forces $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_I</annotation></semantics>$ to preserve convex combinations of the form $<semantics>rf+(1-r)g<annotation\; encoding="application/x-tex">r\; f\; +\; (1-r)g</annotation></semantics>$. The weakly averaging condition comes from naturality with respect to constant maps.

How is the situation different for $<semantics>{T}^{V}<annotation\; encoding="application/x-tex">T^V</annotation></semantics>$? As before $<semantics>\alpha \in {T}^{V}\Omega <annotation\; encoding="application/x-tex">\backslash alpha\; \backslash in\; T^V\; \backslash Omega</annotation></semantics>$ is determined by $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_I</annotation></semantics>$, and $<semantics>{\alpha}_{{d}_{0}}<annotation\; encoding="application/x-tex">\backslash alpha\_\{d\_0\}</annotation></semantics>$ is obtained by applying $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_I</annotation></semantics>$ in each coordinate, thanks to naturality with respect to the projections. A measurable map $<semantics>f:\Omega \to {d}_{0}<annotation\; encoding="application/x-tex">f\; \backslash colon\; \backslash Omega\; \backslash to\; d\_0</annotation></semantics>$ is a sequence of maps $<semantics>{f}_{n}:\Omega \to I<annotation\; encoding="application/x-tex">f\_n\; \backslash colon\; \backslash Omega\; \backslash to\; I</annotation></semantics>$ converging pointwise to $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$, and

$$<semantics>{\alpha}_{{d}_{0}}(f)=({\alpha}_{I}({f}_{i}){)}_{i=1}^{\mathrm{\infty}}.<annotation\; encoding="application/x-tex">\; \backslash alpha\_\{d\_0\}(f)\; =\; (\backslash alpha\_I(f\_i))\_\{i=1\}^\{\backslash infty\}.\; </annotation></semantics>$$

But $<semantics>{\alpha}_{{d}_{0}}(f)\in {d}_{0}<annotation\; encoding="application/x-tex">\backslash alpha\_\{d\_0\}(f)\; \backslash in\; d\_0</annotation></semantics>$, so $<semantics>{\alpha}_{I}({f}_{i})<annotation\; encoding="application/x-tex">\backslash alpha\_I(f\_i)</annotation></semantics>$ must converge to $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$. So $<semantics>{\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\_I</annotation></semantics>$ is an integration operator!

The rest of the proof consists of checking that these assignments $<semantics>\alpha \mapsto {\alpha}_{I}<annotation\; encoding="application/x-tex">\backslash alpha\; \backslash mapsto\; \backslash alpha\_\{I\}</annotation></semantics>$ really do define isomorphisms of monads.

It’s natural to wonder how much you can alter the categories $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ and $<semantics>\mathbb{D}<annotation\; encoding="application/x-tex">\backslash mathbb\{D\}</annotation></semantics>$ without changing the codensity monads. Here’s a result to that effect:

**Proposition**. *The categories $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ and $<semantics>\mathbb{D}<annotation\; encoding="application/x-tex">\backslash mathbb\{D\}</annotation></semantics>$ can be replaced by the monoids of affine endomorphisms of $<semantics>{I}^{2}<annotation\; encoding="application/x-tex">I^2</annotation></semantics>$ and $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$ respectively (regarded as 1-object categories, with the evident functors to $<semantics>\mathrm{Meas}<annotation\; encoding="application/x-tex">\backslash mathbf\{Meas\}</annotation></semantics>$) without changing the codensity monads.*

This gives categories of convex sets that are minimal such that their inclusions into $<semantics>\mathrm{Meas}<annotation\; encoding="application/x-tex">\backslash mathbf\{Meas\}</annotation></semantics>$ give rise to the Giry monads. Here I mean minimal in the sense that they contain the fewest objects with all affine maps between them. They are not uniquely minimal; there are other convex sets whose monoids of affine endomorphisms also give rise to the Giry monads.

This result gives yet another characterisation of (finitely and countably) additive probability measures: a probability measure on $<semantics>\Omega <annotation\; encoding="application/x-tex">\backslash Omega</annotation></semantics>$ is an $<semantics>\mathrm{End}({d}_{0})<annotation\; encoding="application/x-tex">\backslash mathrm\{End\}(d\_0)</annotation></semantics>$-set morphism

$$<semantics>\mathrm{Meas}(\Omega ,{d}_{0})\to {d}_{0},<annotation\; encoding="application/x-tex">\; \backslash mathbf\{Meas\}(\backslash Omega,d\_0)\; \backslash to\; d\_0,\; </annotation></semantics>$$

where $<semantics>\mathrm{End}({d}_{0})<annotation\; encoding="application/x-tex">\backslash mathrm\{End\}(d\_0)</annotation></semantics>$ is the monoid of affine endomorphisms of $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$. Similarly for finitely additive probability measures, with $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$ replaced by $<semantics>{I}^{2}<annotation\; encoding="application/x-tex">I^2</annotation></semantics>$.

What about *maximal* categories of convex sets giving rise to the Giry monads? I don’t have a definitive answer to this question, but you can at least throw in all bounded, convex subsets of Euclidean space:

**Proposition**. *Let $<semantics>\u2102\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{C\}\text{\'}</annotation></semantics>$ be the category of all bounded, convex subsets of $<semantics>{\mathbb{R}}^{n}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^n</annotation></semantics>$ (where $<semantics>n<annotation\; encoding="application/x-tex">n</annotation></semantics>$ varies) and affine maps. Let $<semantics>\mathbb{D}\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{D\}\text{\'}</annotation></semantics>$ be $<semantics>\u2102\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{C\}\text{\'}</annotation></semantics>$ but with $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$ adjoined. Then replacing $<semantics>\u2102<annotation\; encoding="application/x-tex">\backslash mathbb\{C\}</annotation></semantics>$ by $<semantics>\u2102\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{C\}\text{\'}</annotation></semantics>$ and $<semantics>\mathbb{D}<annotation\; encoding="application/x-tex">\backslash mathbb\{D\}</annotation></semantics>$ by $<semantics>\mathbb{D}\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{D\}\text{\'}</annotation></semantics>$ does not change the codensity monads.*

The definition of $<semantics>\mathbb{D}\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{D\}\text{\'}</annotation></semantics>$ is a bit unsatisfying; $<semantics>{d}_{0}<annotation\; encoding="application/x-tex">d\_0</annotation></semantics>$ feels (and literally is) tacked on. It would be nice to have a characterisation of *all* the subsets of $<semantics>{\mathbb{R}}^{\mathbb{N}}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^\{\backslash mathbb\{N\}\}</annotation></semantics>$ (or indeed all the convex sets) that can be included in $<semantics>\mathbb{D}\prime <annotation\; encoding="application/x-tex">\backslash mathbb\{D\}\text{\'}</annotation></semantics>$. But so far I haven’t found one.