It’s an underappreciated fact that the interior of every simplex $<semantics>{\Delta}^{n}<annotation\; encoding="application/x-tex">\backslash Delta^n</annotation></semantics>$
is a real vector space in a natural way. For instance, here’s the
2-simplex with twelve of its 1-dimensional linear subspaces drawn in:

(That’s just a sketch. See below for an accurate diagram by Greg Egan.)

In this post, I’ll explain what this vector space structure is and why
everyone who’s ever taken a course on thermodynamics knows about it, at least partially, even
if they don’t know they do.

Let’s begin with the most ordinary vector space of all, $<semantics>{\mathbb{R}}^{n}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^n</annotation></semantics>$.
(By “vector space” I’ll always mean vector space over $<semantics>\mathbb{R}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}</annotation></semantics>$.)
There’s a bijection

$$<semantics>\mathbb{R}\leftrightarrow (0,\mathrm{\infty})<annotation\; encoding="application/x-tex">\; \backslash mathbb\{R\}\; \backslash leftrightarrow\; (0,\; \backslash infty)\; </annotation></semantics>$$

between the real line and the positive half-line, given by exponential in
one direction and log in the other. Doing this bijection in each
coordinate gives a bijection

$$<semantics>{\mathbb{R}}^{n}\leftrightarrow (0,\mathrm{\infty}{)}^{n}.<annotation\; encoding="application/x-tex">\; \backslash mathbb\{R\}^n\; \backslash leftrightarrow\; (0,\; \backslash infty)^n.\; </annotation></semantics>$$

So, if we transport the vector space structure of $<semantics>{\mathbb{R}}^{n}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^n</annotation></semantics>$ along
this bijection, we’ll produce a vector space structure on $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$. This
new vector space $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$ is isomorphic to
$<semantics>{\mathbb{R}}^{n}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^n</annotation></semantics>$, by definition.

Explicitly, the “addition” of the vector space $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$ is
coordinatewise multiplication, the “zero” vector is $<semantics>(1,\dots ,1)<annotation\; encoding="application/x-tex">(1,\; \backslash ldots,\; 1)</annotation></semantics>$, and
“subtraction” is coordinatewise division. The scalar “multiplication” is
given by powers: multiplying a vector $<semantics>y=({y}_{1},\dots ,{y}_{n})\in (0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">\backslash mathbf\{y\}\; =\; (y\_1,\; \backslash ldots,\; y\_n)\; \backslash in\; (0,\; \backslash infty)^n</annotation></semantics>$ by a scalar $<semantics>\lambda \in \mathbb{R}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash in\; \backslash mathbb\{R\}</annotation></semantics>$ gives $<semantics>({y}_{1}^{\lambda},\dots ,{y}_{n}^{\lambda})<annotation\; encoding="application/x-tex">(y\_1^\backslash lambda,\; \backslash ldots,\; y\_n^\backslash lambda)</annotation></semantics>$.

Now, the ordinary vector space $<semantics>{\mathbb{R}}^{n}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^n</annotation></semantics>$ has a linear subspace $<semantics>U<annotation\; encoding="application/x-tex">U</annotation></semantics>$
spanned by $<semantics>(1,\dots ,1)<annotation\; encoding="application/x-tex">(1,\; \backslash ldots,\; 1)</annotation></semantics>$. That is,

$$<semantics>U=\{(\lambda ,\dots ,\lambda ):\lambda \in \mathbb{R}\}.<annotation\; encoding="application/x-tex">\; U\; =\; \backslash \{(\backslash lambda,\; \backslash ldots,\; \backslash lambda)\; \backslash colon\; \backslash lambda\; \backslash in\; \backslash mathbb\{R\}\; \backslash \}.\; </annotation></semantics>$$

Since the vector spaces $<semantics>{\mathbb{R}}^{n}<annotation\; encoding="application/x-tex">\backslash mathbb\{R\}^n</annotation></semantics>$ and $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$ are isomorphic,
there’s a corresponding subspace $<semantics>W<annotation\; encoding="application/x-tex">W</annotation></semantics>$ of $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$, and it’s given by

$$<semantics>W=\{({e}^{\lambda},\dots ,{e}^{\lambda}):\lambda \in \mathbb{R}\}=\{(\gamma ,\dots ,\gamma ):\gamma \in (0,\mathrm{\infty})\}.<annotation\; encoding="application/x-tex">\; W\; =\; \backslash \{(e^\backslash lambda,\; \backslash ldots,\; e^\backslash lambda)\; \backslash colon\; \backslash lambda\; \backslash in\; \backslash mathbb\{R\}\; \backslash \}\; =\; \backslash \{(\backslash gamma,\; \backslash ldots,\; \backslash gamma)\; \backslash colon\; \backslash gamma\; \backslash in\; (0,\; \backslash infty)\backslash \}.\; </annotation></semantics>$$

But whenever we have a linear subspace of a vector space, we can form the
quotient. Let’s do this with the subspace $<semantics>W<annotation\; encoding="application/x-tex">W</annotation></semantics>$ of $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$. What
does the quotient $<semantics>(0,\mathrm{\infty}{)}^{n}/W<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n/W</annotation></semantics>$ look like?

Well, two vectors $<semantics>y,z\in (0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">\backslash mathbf\{y\},\; \backslash mathbf\{z\}\; \backslash in\; (0,\; \backslash infty)^n</annotation></semantics>$
represent the same element of $<semantics>(0,\mathrm{\infty}{)}^{n}/W<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n/W</annotation></semantics>$ if and only if their
“difference” — in the vector space sense — belongs to $<semantics>W<annotation\; encoding="application/x-tex">W</annotation></semantics>$.
Since “difference” or “subtraction” in the vector space $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$ is
coordinatewise division, this just means that

$$<semantics>\frac{{y}_{1}}{{z}_{1}}=\frac{{y}_{2}}{{z}_{2}}=\cdots =\frac{{y}_{n}}{{z}_{n}}.<annotation\; encoding="application/x-tex">\; \backslash frac\{y\_1\}\{z\_1\}\; =\; \backslash frac\{y\_2\}\{z\_2\}\; =\; \backslash cdots\; =\; \backslash frac\{y\_n\}\{z\_n\}.\; </annotation></semantics>$$

So, the elements of $<semantics>(0,\mathrm{\infty}{)}^{n}/W<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n/W</annotation></semantics>$ are the equivalence classes of
$<semantics>n<annotation\; encoding="application/x-tex">n</annotation></semantics>$-tuples of positive reals, with two tuples
considered equivalent if they’re the same up to rescaling.

Now here’s the crucial part: it’s natural to normalize
everything to sum to $<semantics>1<annotation\; encoding="application/x-tex">1</annotation></semantics>$. In other words, in each equivalence
class, we single out the unique tuple $<semantics>({y}_{1},\dots ,{y}_{n})<annotation\; encoding="application/x-tex">(y\_1,\; \backslash ldots,\; y\_n)</annotation></semantics>$ such that $<semantics>{y}_{1}+\cdots +{y}_{n}=1<annotation\; encoding="application/x-tex">y\_1\; +\; \backslash cdots\; +\; y\_n\; =\; 1</annotation></semantics>$.
This gives a bijection

$$<semantics>(0,\mathrm{\infty}{)}^{n}/W\leftrightarrow {\Delta}_{n}^{\circ}<annotation\; encoding="application/x-tex">\; (0,\; \backslash infty)^n/W\; \backslash leftrightarrow\; \backslash Delta\_n^\backslash circ\; </annotation></semantics>$$

where $<semantics>{\Delta}_{n}^{\circ}<annotation\; encoding="application/x-tex">\backslash Delta\_n^\backslash circ</annotation></semantics>$ is the interior of the $<semantics>(n-1)<annotation\; encoding="application/x-tex">(n\; -\; 1)</annotation></semantics>$-simplex:

$$<semantics>{\Delta}_{n}^{\circ}=\{({p}_{1},\dots ,{p}_{n}):{p}_{i}>0,\sum {p}_{i}=1\}.<annotation\; encoding="application/x-tex">\; \backslash Delta\_n^\backslash circ\; =\; \backslash \{(p\_1,\; \backslash ldots,\; p\_n)\; \backslash colon\; p\_i\; \backslash gt\; 0,\; \backslash sum\; p\_i\; =\; 1\; \backslash \}.\; </annotation></semantics>$$

You can think of $<semantics>{\Delta}_{n}^{\circ}<annotation\; encoding="application/x-tex">\backslash Delta\_n^\backslash circ</annotation></semantics>$ as the set of probability distributions
on an $<semantics>n<annotation\; encoding="application/x-tex">n</annotation></semantics>$-element set that satisfy Cromwell’s
rule: zero probabilities
are forbidden. (Or as Cromwell put it, “I beseech you, in the bowels of Christ, think it possible that you may be mistaken.”)

Transporting the vector space structure of $<semantics>(0,\mathrm{\infty}{)}^{n}/W<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n/W</annotation></semantics>$ along this
bijection gives a vector space structure to $<semantics>{\Delta}_{n}^{\circ}<annotation\; encoding="application/x-tex">\backslash Delta\_n^\backslash circ</annotation></semantics>$. And that’s
the vector space structure on the simplex.

So what are these vector space operations on the simplex, in concrete terms?
They’re given by the same operations in $<semantics>(0,\mathrm{\infty}{)}^{n}<annotation\; encoding="application/x-tex">(0,\; \backslash infty)^n</annotation></semantics>$, followed by
normalization. So, the “sum” of two probability distributions $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$
and $<semantics>q<annotation\; encoding="application/x-tex">\backslash mathbf\{q\}</annotation></semantics>$ is

$$<semantics>\frac{({p}_{1}{q}_{1},{p}_{2}{q}_{2},\dots ,{p}_{n}{q}_{n})}{{p}_{1}{q}_{1}+{p}_{2}{q}_{2}+\cdots +{p}_{n}{q}_{n}},<annotation\; encoding="application/x-tex">\; \backslash frac\{(p\_1\; q\_1,\; p\_2\; q\_2,\; \backslash ldots,\; p\_n\; q\_n)\}\{p\_1\; q\_1\; +\; p\_2\; q\_2\; +\; \backslash cdots\; +\; p\_n\; q\_n\},\; </annotation></semantics>$$

the “zero” vector is the uniform distribution

$$<semantics>\frac{(1,1,\dots ,1)}{1+1+\cdots +1}=(1/n,1/n,\dots ,1/n),<annotation\; encoding="application/x-tex">\; \backslash frac\{(1,\; 1,\; \backslash ldots,\; 1)\}\{1\; +\; 1\; +\; \backslash cdots\; +\; 1\}\; =\; (1/n,\; 1/n,\; \backslash ldots,\; 1/n),\; </annotation></semantics>$$

and “multiplying” a probability distribution $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$ by a scalar
$<semantics>\lambda \in \mathbb{R}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash in\; \backslash mathbb\{R\}</annotation></semantics>$ gives

$$<semantics>\frac{({p}_{1}^{\lambda},{p}_{2}^{\lambda},\dots ,{p}_{n}^{\lambda})}{{p}_{1}^{\lambda}+{p}_{2}^{\lambda}+\cdots +{p}_{n}^{\lambda}}.<annotation\; encoding="application/x-tex">\; \backslash frac\{(p\_1^\backslash lambda,\; p\_2^\backslash lambda,\; \backslash ldots,\; p\_n^\backslash lambda)\}\{p\_1^\backslash lambda\; +\; p\_2^\backslash lambda\; +\; \backslash cdots\; +\; p\_n^\backslash lambda\}.\; </annotation></semantics>$$

For instance, let’s think about the scalar “multiples” of

$$<semantics>p=(0.2,0.3,0.5)\in {\Delta}_{3}.<annotation\; encoding="application/x-tex">\; \backslash mathbf\{p\}\; =\; (0.2,\; 0.3,\; 0.5)\; \backslash in\; \backslash Delta\_3.\; </annotation></semantics>$$

“Multiplying” $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$ by $<semantics>\lambda \in \mathbb{R}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash in\; \backslash mathbb\{R\}</annotation></semantics>$ gives

$$<semantics>\frac{({0.2}^{\lambda},{0.3}^{\lambda},{0.5}^{\lambda})}{{0.2}^{\lambda}+{0.3}^{\lambda}+{0.5}^{\lambda}}<annotation\; encoding="application/x-tex">\; \backslash frac\{(0.2^\backslash lambda,\; 0.3^\backslash lambda,\; 0.5^\backslash lambda)\}\{0.2^\backslash lambda\; +\; 0.3^\backslash lambda\; +\; 0.5^\backslash lambda\}\; </annotation></semantics>$$

which I’ll call $<semantics>{p}^{(\lambda )}<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}^\{(\backslash lambda)\}</annotation></semantics>$, to avoid the confusion that would be created by calling
it $<semantics>\lambda p<annotation\; encoding="application/x-tex">\backslash lambda\backslash mathbf\{p\}</annotation></semantics>$.

When $<semantics>\lambda =0<annotation\; encoding="application/x-tex">\backslash lambda\; =\; 0</annotation></semantics>$, $<semantics>{p}^{(\lambda )}<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}^\{(\backslash lambda)\}</annotation></semantics>$ is just the uniform
distribution $<semantics>(1/3,1/3,1/3)<annotation\; encoding="application/x-tex">(1/3,\; 1/3,\; 1/3)</annotation></semantics>$ — which of course it has to be, since
multiplying any vector by the scalar $<semantics>0<annotation\; encoding="application/x-tex">0</annotation></semantics>$ has to give the zero vector.

For equally obvious reasons, $<semantics>{p}^{(1)}<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}^\{(1)\}</annotation></semantics>$ has to be just $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$.

When $<semantics>\lambda <annotation\; encoding="application/x-tex">\backslash lambda</annotation></semantics>$ is large and positive, the powers of $<semantics>0.5<annotation\; encoding="application/x-tex">0.5</annotation></semantics>$ dominate over the
powers of the smaller numbers $<semantics>0.2<annotation\; encoding="application/x-tex">0.2</annotation></semantics>$ and $<semantics>0.3<annotation\; encoding="application/x-tex">0.3</annotation></semantics>$, so $<semantics>{p}^{(\lambda )}\to (0,0,1)<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}^\{(\backslash lambda)\}\; \backslash to\; (0,\; 0,\; 1)</annotation></semantics>$ as $<semantics>\lambda \to \mathrm{\infty}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash to\; \backslash infty</annotation></semantics>$.

For similar reasons, $<semantics>{p}^{(\lambda )}\to (1,0,0)<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}^\{(\backslash lambda)\}\; \backslash to\; (1,\; 0,\; 0)</annotation></semantics>$ as $<semantics>\lambda \to -\mathrm{\infty}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash to\; -\backslash infty</annotation></semantics>$.
This behaviour as $<semantics>\lambda \to \pm \mathrm{\infty}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash to\; \backslash pm\backslash infty</annotation></semantics>$ is the reason why, in the picture above, you see the curves curling in at the ends
towards the triangle’s corners.

Some physicists refer to the distributions $<semantics>{p}^{(\lambda )}<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}^\{(\backslash lambda)\}</annotation></semantics>$ as the “escort
distributions” of $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$. And in fact, the scalar multiplication of
the vector space structure on the simplex is a key part of the solution of
a very basic problem in thermodynamics — so basic that even *I* know
it.

The problem goes like this. First I’ll state it using the notation
above, then afterwards I’ll translate it back into terms that
physicists usually use.

Fix $<semantics>{\xi}_{1},\dots ,{\xi}_{n},\xi >0<annotation\; encoding="application/x-tex">\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n,\; \backslash xi\; \backslash gt\; 0</annotation></semantics>$. Among all probability distributions
$<semantics>({p}_{1},\dots ,{p}_{n})<annotation\; encoding="application/x-tex">(p\_1,\; \backslash ldots,\; p\_n)</annotation></semantics>$ satisfying the constraint

$$<semantics>{\xi}_{1}^{{p}_{1}}{\xi}_{2}^{{p}_{2}}\cdots {\xi}_{n}^{{p}_{n}}=\xi ,<annotation\; encoding="application/x-tex">\; \backslash xi\_1^\{p\_1\}\; \backslash xi\_2^\{p\_2\}\; \backslash cdots\; \backslash xi\_n^\{p\_n\}\; =\; \backslash xi,\; </annotation></semantics>$$

which one minimizes the quantity

$$<semantics>{p}_{1}^{{p}_{1}}{p}_{2}^{{p}_{2}}\cdots {p}_{n}^{{p}_{n}}?<annotation\; encoding="application/x-tex">\; p\_1^\{p\_1\}\; p\_2^\{p\_2\}\; \backslash cdots\; p\_n^\{p\_n\}?\; </annotation></semantics>$$

It makes no difference to this question if $<semantics>{\xi}_{1},\dots ,{\xi}_{n},\xi <annotation\; encoding="application/x-tex">\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n,\; \backslash xi</annotation></semantics>$ are normalized so that $<semantics>{\xi}_{1}+\cdots +{\xi}_{n}=1<annotation\; encoding="application/x-tex">\backslash xi\_1\; +\; \backslash cdots\; +\; \backslash xi\_n\; =\; 1</annotation></semantics>$ (since
multiplying each of $<semantics>{\xi}_{1},\dots ,{\xi}_{n},\xi <annotation\; encoding="application/x-tex">\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n,\; \backslash xi</annotation></semantics>$ by a constant doesn’t
change the constraint). So, let’s assume this has been done.

Then the answer to the question turns out to be: the minimizing distribution $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$ is a scalar multiple of $<semantics>({\xi}_{1},\dots ,{\xi}_{n})<annotation\; encoding="application/x-tex">(\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n)</annotation></semantics>$ in the vector space structure on the simplex. In other
words, it’s an escort distribution of $<semantics>({\xi}_{1},\dots ,{\xi}_{n})<annotation\; encoding="application/x-tex">(\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n)</annotation></semantics>$. Or in
other words still, it’s an element of the linear subspace of
$<semantics>{\Delta}_{n}^{\circ}<annotation\; encoding="application/x-tex">\backslash Delta\_n^\backslash circ</annotation></semantics>$ spanned by $<semantics>({\xi}_{1},\dots ,{\xi}_{n})<annotation\; encoding="application/x-tex">(\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n)</annotation></semantics>$. Which one? The
unique one such that the constraint is satisfied.

Proving that this is the answer is a simple exercise in calculus, e.g.
using Lagrange multipliers.

For instance, take $<semantics>({\xi}_{1},{\xi}_{2},{\xi}_{3})=(0.2,0.3,0.5)<annotation\; encoding="application/x-tex">(\backslash xi\_1,\; \backslash xi\_2,\; \backslash xi\_3)\; =\; (0.2,\; 0.3,\; 0.5)</annotation></semantics>$ and $<semantics>\xi =0.4<annotation\; encoding="application/x-tex">\backslash xi\; =\; 0.4</annotation></semantics>$. Among all distributions $<semantics>({p}_{1},{p}_{2},{p}_{3})<annotation\; encoding="application/x-tex">(p\_1,\; p\_2,\; p\_3)</annotation></semantics>$ that satisfy the constraint

$$<semantics>{0.2}^{{p}_{1}}\times {0.3}^{{p}_{2}}\times {0.5}^{{p}_{3}}=0.4,<annotation\; encoding="application/x-tex">\; 0.2^\{p\_1\}\; \backslash times\; 0.3^\{p\_2\}\; \backslash times\; 0.5^\{p\_3\}\; =\; 0.4,\; </annotation></semantics>$$

the one that minimizes $<semantics>{p}_{1}^{{p}_{1}}{p}_{2}^{{p}_{2}}{p}_{3}^{{p}_{3}}<annotation\; encoding="application/x-tex">p\_1^\{p\_1\}\; p\_2^\{p\_2\}\; p\_3^\{p\_3\}</annotation></semantics>$
is some escort distribution of $<semantics>(0.2,0.3,0.5)<annotation\; encoding="application/x-tex">(0.2,\; 0.3,\; 0.5)</annotation></semantics>$. Maybe one of the curves shown in the picture above is the 1-dimensional subspace spanned by
$<semantics>(0.2,0.3,0.5)<annotation\; encoding="application/x-tex">(0.2,\; 0.3,\; 0.5)</annotation></semantics>$, and in that case, the $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$ that minimizes is somewhere
on that curve.

The location of $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$ on that curve depends on the value of $<semantics>\xi <annotation\; encoding="application/x-tex">\backslash xi</annotation></semantics>$, which
here I chose to be $<semantics>0.4<annotation\; encoding="application/x-tex">0.4</annotation></semantics>$. If I changed it to $<semantics>0.20001<annotation\; encoding="application/x-tex">0.20001</annotation></semantics>$ or $<semantics>0.49999<annotation\; encoding="application/x-tex">0.49999</annotation></semantics>$ then
$<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$ would be nearly at one end or the other of the curve, since
$<semantics>(0.2,0.3,0.5{)}^{(\lambda )}<annotation\; encoding="application/x-tex">(0.2,\; 0.3,\; 0.5)^\{(\backslash lambda)\}</annotation></semantics>$ converges to $<semantics>0.2<annotation\; encoding="application/x-tex">0.2</annotation></semantics>$ as $<semantics>\lambda \to -\mathrm{\infty}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash to\; -\backslash infty</annotation></semantics>$
and to $<semantics>0.5<annotation\; encoding="application/x-tex">0.5</annotation></semantics>$ as $<semantics>\lambda \to \mathrm{\infty}<annotation\; encoding="application/x-tex">\backslash lambda\; \backslash to\; \backslash infty</annotation></semantics>$.

**Aside** I’m glossing over the question of existence and uniqueness of
solutions to
the optimization question. Since $<semantics>{\xi}_{1}^{{p}_{1}}{\xi}_{2}^{{p}_{2}}\cdots {\xi}_{n}^{{p}_{n}}<annotation\; encoding="application/x-tex">\backslash xi\_1^\{p\_1\}\; \backslash xi\_2^\{p\_2\}\; \backslash cdots\; \backslash xi\_n^\{p\_n\}</annotation></semantics>$ is a kind of average of $<semantics>{\xi}_{1},{\xi}_{2},\dots ,{\xi}_{n}<annotation\; encoding="application/x-tex">\backslash xi\_1,\; \backslash xi\_2,\; \backslash ldots,\; \backslash xi\_n</annotation></semantics>$ —
a weighted, geometric mean — there’s no solution at all unless
$<semantics>{\mathrm{min}}_{i}{\xi}_{i}\le \xi \le {\mathrm{max}}_{i}{\xi}_{i}<annotation\; encoding="application/x-tex">\backslash min\_i\; \backslash xi\_i\; \backslash leq\; \backslash xi\; \backslash leq\; \backslash max\_i\; \backslash xi\_i</annotation></semantics>$. As long as that inequality is
satisfied, there’s a minimizing $<semantics>p<annotation\; encoding="application/x-tex">\backslash mathbf\{p\}</annotation></semantics>$, although it’s not always
unique: e.g. consider what happens when all the $<semantics>{\xi}_{i}<annotation\; encoding="application/x-tex">\backslash xi\_i</annotation></semantics>$s are equal.

Physicists prefer to do all this in logarithmic form. So, rather than
start with $<semantics>{\xi}_{1},\dots ,{\xi}_{n},\xi >0<annotation\; encoding="application/x-tex">\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n,\; \backslash xi\; \backslash gt\; 0</annotation></semantics>$, they start with $<semantics>{x}_{1},\dots ,{x}_{n},x\in \mathbb{R}<annotation\; encoding="application/x-tex">x\_1,\; \backslash ldots,\; x\_n,\; x\; \backslash in\; \backslash mathbb\{R\}</annotation></semantics>$; think of this as substituting $<semantics>{x}_{i}=-\mathrm{log}{\xi}_{i}<annotation\; encoding="application/x-tex">x\_i\; =\; -\backslash log\; \backslash xi\_i</annotation></semantics>$ and $<semantics>x=-\mathrm{log}\xi <annotation\; encoding="application/x-tex">x\; =\; -\backslash log\; \backslash xi</annotation></semantics>$. So, the constraint

$$<semantics>{\xi}_{1}^{{p}_{1}}{\xi}_{2}^{{p}_{2}}\cdots {\xi}_{n}^{{p}_{n}}=\xi <annotation\; encoding="application/x-tex">\; \backslash xi\_1^\{p\_1\}\; \backslash xi\_2^\{p\_2\}\; \backslash cdots\; \backslash xi\_n^\{p\_n\}\; =\; \backslash xi\; </annotation></semantics>$$

becomes

$$<semantics>{e}^{-{p}_{1}{\xi}_{1}}{e}^{-{p}_{2}{\xi}_{2}}\cdots {e}^{-{p}_{n}{\xi}_{n}}={e}^{-x}<annotation\; encoding="application/x-tex">\; e^\{-p\_1\; \backslash xi\_1\}\; e^\{-p\_2\; \backslash xi\_2\}\; \backslash cdots\; e^\{-p\_n\; \backslash xi\_n\}\; =\; e^\{-x\}\; </annotation></semantics>$$

or equivalently

$$<semantics>{p}_{1}{x}_{1}+{p}_{2}{x}_{2}+\cdots +{p}_{n}{x}_{n}=x.<annotation\; encoding="application/x-tex">\; p\_1\; x\_1\; +\; p\_2\; x\_2\; +\; \backslash cdots\; +\; p\_n\; x\_n\; =\; x.\; </annotation></semantics>$$

We’re trying to minimize $<semantics>{p}_{1}^{{p}_{1}}{p}_{2}^{{p}_{2}}\cdots {p}_{n}^{{p}_{n}}<annotation\; encoding="application/x-tex">p\_1^\{p\_1\}\; p\_2^\{p\_2\}\; \backslash cdots\; p\_n^\{p\_n\}</annotation></semantics>$ subject to
that constraint, and again the physicists prefer the logarithmic form (with
a change of sign): maximize

$$<semantics>-({p}_{1}\mathrm{log}{p}_{1}+{p}_{2}\mathrm{log}{p}_{2}+\cdots +{p}_{n}\mathrm{log}{p}_{n}).<annotation\; encoding="application/x-tex">\; -(p\_1\; \backslash log\; p\_1\; +\; p\_2\; \backslash log\; p\_2\; +\; \backslash cdots\; +\; p\_n\; \backslash log\; p\_n).\; </annotation></semantics>$$

That quantity is the **Shannon entropy** of the distribution $<semantics>({p}_{1},\dots ,{p}_{n})<annotation\; encoding="application/x-tex">(p\_1,\; \backslash ldots,\; p\_n)</annotation></semantics>$: so we’re looking for the maximum entropy solution to the constraint.
This is called the **Gibbs state**, and as we saw, it’s a
scalar multiple of $<semantics>({\xi}_{1},\dots ,{\xi}_{n})<annotation\; encoding="application/x-tex">(\backslash xi\_1,\; \backslash ldots,\; \backslash xi\_n)</annotation></semantics>$ in the vector space structure
on the simplex. Equivalently, it’s

$$<semantics>\frac{({e}^{-\lambda {x}_{1}},{e}^{-\lambda {x}_{2}},\dots ,{e}^{-\lambda {x}_{n}})}{{e}^{-\lambda {x}_{1}}+{e}^{-\lambda {x}_{2}}+\cdots +{e}^{-\lambda {x}_{n}}}<annotation\; encoding="application/x-tex">\; \backslash frac\{(e^\{-\backslash lambda\; x\_1\},\; e^\{-\backslash lambda\; x\_2\},\; \backslash ldots,\; e^\{-\backslash lambda\; x\_n\})\}\{e^\{-\backslash lambda\; x\_1\}\; +\; e^\{-\backslash lambda\; x\_2\}\; +\; \backslash cdots\; +\; e^\{-\backslash lambda\; x\_n\}\}\; </annotation></semantics>$$

for whichever value of $<semantics>\lambda <annotation\; encoding="application/x-tex">\backslash lambda</annotation></semantics>$ satisfies the constraint. The denominator
here is the famous **partition function**.

So, that basic thermodynamic problem is (implicitly) solved by scalar
multiplication in the vector space structure on the simplex. A question:
does addition in the vector space structure on the simplex also have a role to play in physics?