Chapter 11: Lie Groups as Manifolds: The Conventional Lie Group Definition 2

Having proven that the concept of a connected Lie group as defined by axioms 1 through 5 leads to the modern concept of a Lie group as an analytic manifold wherein the group operations are analytic and thus that the two conceptions of the Lie group notion are the same, we shall now tell the story of the analytic manifold and the Lie group in the language of elementary differential geometry, in readiness for a presentation of the application of Lie theory to systems and control theory later on.

Vector Fields and One Forms

So we have now shown that a Lie group is an analytic manifold. I shall now talk about some of the basic differential geometric concepts of a general manifold, as opposed to a Lie group manifold, whose fundamental group is Abelian as we shall see in the next chapter. So we have an analytic manifold $\mathcal{M}$ as defined by Definition 10.1 and suppose too that we have a $C^\omega$ path through the manifold $p:\R\to\mathcal{M}$; the co-ordinates $(\lambda_j(\tau))_{j=1}^N$ of $p(\tau)$ in any patch it passes through are analytic functions of the path parameter $\tau$. Our third ingredient for this discussion is an analytic scalar field $\Phi:\mathcal{M}\to\R$. When written as functions of the co-ordinates $\lambda(p)=(\lambda_j(p))_{j=1}^N$ for any chart, $\Phi$ is an analytic function of all the $\lambda_j$. Therefore, on the path $\sigma:\R\to\mathcal{M}$, we define the function $\Phi_\sigma(\tau)=\Phi(\sigma(\tau))$, an analytic function of the path parameter $\tau$.

What now is the rate of change of $\Phi_\sigma$ at any “time” $\tau$? (I shall henceforth call the path parameter “time”). It is of course given by the chain rule:

\begin{equation}\label{ChainRule}\frac{\d\,\Phi_\sigma}{\d\,\tau} = \sum_{k=1}^N \frac{\partial\,\Phi}{\partial \lambda_k}\,\frac{\d\,\lambda_k(\tau)}{\d\,\tau}\end{equation}

which can be thought of as an inner product $\left<Y,\,X\right>$ between two entities $X$ and $Y$ with components:

\begin{equation}\label{InnerProductChainRule}X=\left(\begin{array}{c}\d_\tau \lambda_1(\sigma(\tau))\\\d_\tau \lambda_2(\sigma(\tau))\\\vdots\\\d_\tau \lambda_N(\sigma(\tau))\end{array}\right);\quad Y=\left(\begin{array}{c}\partial_{ \lambda_1}\,\Phi\\\partial_ {\lambda_2}\,\Phi\\\vdots\\\partial_ {\lambda_N}\,\Phi\end{array}\right)\end{equation}

The former, $X\in T_p(\mathcal{M})$, is very much wonted to us. It is a tangent vector, the tangent to the path $\sigma$ and belongs to the tangent space $T_p(\mathcal{M})$, or linear space of all tangent vectors to the manifold $\mathcal{M}$ at the point $p$. The Lie algebra $\g$ of a Lie group $\G$ as we have defined it is nothing more than $\g = T_\id$. The latter entity, $Y$, is of course the gradient $\nabla\,\Phi$. Henceforth I shall now use the notation $\d\,\Phi$ for the gradient $\nabla\Phi$ for reasons that should become clear. $\d\,\Phi$ is a dual vector or covector to the tangent $X$. We recall here the Riesz Representation Theorem ([Riesz], [Royden]; also see the proof at [PlanetMath.org]) asserting that in a complete inner product space $\mathbb{X}$, i.e. a Hilbert space, such as $\R^N$ or $\mathbb{C}^N$, every linear functional $\mathscr{L}:\mathbb{X}\to\R$ (or $\mathscr{L}:\mathbb{X}\to\mathbb{C}$ in the case of $\mathbb{C}^N$) can be written as an inner product with a unique constant member $x_\mathscr{L}\in\mathbb{X}$, that is, every continuous linear functional can be represented as $\mathscr{L}:\mathbb{X}\to\R;\;\mathscr{L}(x)=\left<x_\mathscr{L},\,x\right>$. Conversely, an inner product space for which all continuous linear functionals can be represented as inner products is complete, and since we can define a linear functional $y\mapsto \left<x,\,y\right>$for each member $x\in\mathbb{X}$ of $\mathbb{X}$, the Riesz theorem leads to the alternative and logically equivalent definition of a Hilbert space as one equivalent to its dual space $\mathbb{X}^*$ of continuous linear functionals. For, given the Riesz representation of a continuous linear functionas, the continuous linear functionals on $\mathbb{X}$ themselves form an inner product space isomorphic to the original $\mathbb{X}$ with, naturally, the isomorphism being defined by the invertible map $x\in\mathbb{X} \mapsto \left<x,\,\_\right>\in\mathbb{X}^*$.

SkullWarning: A word of warning about the Riesz theorem, although this is an aside for our Lie theoretic needs. In finite dimensional inner product spaces such as we are interested in for the theory of Lie groups, the concept of linear functional and continuous linear functional are precisely the same. That is, every linear function is continuous for a finite dimensional Hilbert space. This statement does not hold for infinite dimensional Hilbert spaces such as the space $\mathbf{L}^2(\R)$ of functions $f:\R\to\mathbb{C}$ that are square Lebesgue integrable over the reals, i.e. $\int_\R\,|f(u)|^2\,\d u<\infty$. This is because the proof of the general version of the theorem deals with bounded linear functionals. In finite dimensional spaces, all linear operators are bounded. But in infinite dimensional spaces, this is not so and therefore not every linear functional is also continuous; take, for example, the obviously linear Dirac delta $\delta:\mathbf{L}^2(\R)\to\R;\;\delta(f) = f(0)$. There is no member $g\in\mathbf{L}^2(\R)\,\ni\,\delta(f) = \int_\R\,g^*(u)\,f(u)\,\d u$. This does not gainsay the assertion that $\mathbf{L}^2(\R)$ is a Hilbert space, for there are pairs of functions $f_1,\,f_2\in\mathbf{L}^2(\R)$ whose$\mathbf{L}^2$ difference $\int_\R\,|f_1(u)-f_2(u)|^2\,\d u$ can be arbitrarily smallyet with $\delta(f_1) – \delta(f_2) = 1$, so $\delta$ is not continuous. This leads to the concept of a Rigged Hilbert Space as I talk about in this post here. In infinite dimensional Hilbert spaces one tells the difference between the space’s algebraic dual (i.e. the linear space of its linear functionals), which is not isomorphic to the Hilbert space and its smaller, isomorphic (by definition of Hilbert space) topological dual (i.e. the linear space of its continuous linear functionals).

Let’s come back to our finite dimensioal case, as this is the one relevant to our Lie theory. We see that for every tangent space $T_p(\mathcal{M})$, there is an isomorphic co-tangent space $T_p^*(\mathcal{M})$ of linear functionals $T_p(\mathcal{M})\to\R$. We can also think of the tangent bundle $T(\mathcal{M}) = \bigsqcup\limits_{p\in\mathcal{M}}\,T_p(\mathcal{M}) = \bigcup\limits_{p\in\mathcal{M}}\{(p,\,X)|\;X\in T_p(\mathcal{M})\}$ and cotangent bundle $T^*(\mathcal{M})=\bigsqcup\limits_{p\in\mathcal{M}}\,T_p^*(\mathcal{M}) = \bigcup\limits_{p\in\mathcal{M}}\{(p,\,Y)|\;Y\in T_pY(\mathcal{M})\}$ of the manifold, which are the disjoint unions of all the tangent spaces of all the points of $\mathcal{M}$. Now we are ready to define:

Definition 11.1: (Vector Field and One Form)

Given an analytic manifold $\mathcal{M}$ with tangent and co-tangent bundles $T(\mathcal{M})$ and $T^*(\mathcal{M})$, respectively, a vector field $X:\mathcal{M}\to T(\mathcal{M})\,\ni X(p) \in T_p(\mathcal{M})$ is a section of $T(\mathcal{M})$, that is, a function of $\mathcal{M}$ that assigns to each member $p\in\mathcal{M}$ an element $X(p)$ of the tangent space to $p\in\mathcal{M}$. A Co-tangent Field or One Form is a section of the co-tangent bundle, i.e. $Y:\mathcal{M}\to T^*(\mathcal{M})\,\ni Y(p) \in T_p^*(\mathcal{M})$.

We make sense of the notion of $C^0,\,C^1,\,\cdots\,C^\infty, \,C^\omega$ vector fields and one forms in an analytic manifold by looking at their components’ behaviour in the small: at a point $p\in\mathcal{M}$, we consider a chart around $p$, i.e. a patch $\U_p\subset\R^N$, which is an open neighbourhood of $p$ together with a local labeller or co-ordinate function $\lambda:\U_p\to\R^N$. For a vector field, we kit the tangent spaces at any $x\in\U_p$ with their standard basis $\{\hat{X}_j\}_{j=1}^N$ where $\hat{X}_j=\left.\d_\tau\,\lambda(\sigma_j(\tau))\right|_{\tau=0}$ is the tangent vector to a $C^1$ paths $\sigma_j:\R\to\U_p\,\ni\,\sigma_j(0) = x$ through $x$ at $x$ which is aligned with the $j^{th}$ co-ordinate axis in the local chart image $R^N$ and has unit magnitude. $\hat{X}_j$ is a unit vector in the direction of increasing $\lambda_k$. All co-ordinates $\lambda_k(\sigma_j(\tau))$ have stationary points, i.e. $\left.\d_\tau \lambda_k(\sigma_j(\tau))\right|_{\tau=0}=0$, at $x$ aside from $\lambda_j$ which has $\left.\d_\tau \lambda_j(\sigma_j(\tau))\right|_{\tau=0}=1$.

Definition 11.2: (Continuous, Differentiable, Analytic Vector Fields and One Forms)

Given a chart, i.e. a patch $U_p\subset\mathcal{M}$ together with co-ordinate function $\lambda:\U_p\to\R^N$, we define the $j^{th}$ basis vector $\hat{X}_j$ of the standard basis for the tangent space $T_y(\mathcal{M})$ at any $y\in\U_p$ as the tangent to any $C^1$ path $\sigma:\R\to\U_p\,\ni\,\sigma(0) = y$ with $\left.\d_\tau \lambda_k(\sigma_j(\tau))\right|_{\tau=0}=\delta_{j\,k}$ (naturally $\delta$ is the Kronecker delta symbol), where $\lambda_k:\U_p\to\R$ is the $k^{th}$ component of the co-ordinate function $\lambda$. Then a locally $C^0,\,C^1,\,\cdots\,C^\infty, \,C^\omega$ vector field $X:\U_p\to T(\mathcal{M})\,\ni\,X(x)\in T_x(\mathcal{M})$ is one of the form $X(p) = \sum\limits_{k=1}^N\,x_k(\lambda(p))\,\hat{X}_k$, where the $x_k$ are, respectively, $C^0,\,C^1,\,\cdots\,C^\infty, \,C^\omega$ functions of the local co-ordinates $\lambda(p)$ of $p$. A globally $C^0,\,C^1,\,\cdots\,C^\infty, \,C^\omega$ vector field $X$ is one for which this statement holds for all charts in the atlas defining $\mathcal{M}$. Analogous statements apply to co-vector fields (one forms): we define a $C^0,\,C^1,\,\cdots\,C^\infty, \,C^\omega$ one form as one which can be Riesz represented as an inner product with a $C^0,\,C^1,\,\cdots\,C^\infty, \,C^\omega$ vector field.

A slightly different and (for the standard manifold description of Lie theory, important) way to think about the tangent space is to begin with the class of all $C^k$, smooth ($C^\infty$) or analytic ($C^\omega$) functions $\Phi:\mathcal{M}\to\R$ defined on the manifold. We must have at least $k=2$ for this alternative description. Then the tangent bundle $T(\mathcal{M})$ is the class of all $C^k\,k\geq2,\,C^\infty,\,C^\omega$ (as appropriate) differential operators: if $X(p) = \sum\limits_{k=1}^N\,x_k(p) \,\hat{X}_k$ is a $C^k;\,k=2,\,3,\,\cdots,\,\infty,\,\omega$ vector field, then in place of $\hat{X}_j$, we simply write $\partial_j \stackrel{def}{=} \partial/\partial \lambda_j$ where of course $\lambda_j$ is the $j^{th}$ component of the co-ordinate function. So we formally write:

\begin{equation}\label{DifferentialOperator}X(p) = \sum\limits_{k=1}^N\,x_k(p) \,\frac{\partial}{\partial_{\lambda_k}}\end{equation}

Clearly we can indeed literally define a differential operator such as the one in $\eqref{DifferentialOperator}$ to be applied to any member of the class of $C^k$ functions $\Phi:\mathcal{M}\to\R$, but hold off on this literal interpretation for the moment. We simply think for the time being of $\partial_j$ as a different symbol for the standard unit vector $\hat{X}_j$. We also write a general one-form as

\begin{equation}\label{OneForm}\d Y(p) = \sum\limits_{k=1}^N\,y_k(p) \,\d\lambda_k\end{equation}

By definition of the dual space, we have $\left<X(p),\,\d Y(p)\right> = \sum\limits_{k=1}^N\,x_k(p)\, y_k(p)$. What is the entity $\d\lambda_j$? It is by definition the linear functional which picks out the superposition weight $x_j(p)$ from the vector field in $\eqref{DifferentialOperator}$. Hearkening back to $\eqref{ChainRule}$, a functional that does just this is the gradient $\nabla \lambda_j$, where $\lambda_j$ is the $j^{th}$ co-ordinate component function. So, if as suggested there, we write $\d\,\Phi$ instead of $\nabla\,\Phi$, then this gives the right meaning to the $\d\lambda_k$ in $\eqref{OneForm}$.

We note in passing that an alternative, but not mainstream, approach to one forms and other differential forms is to think of the $\d\lambda_k$, $\d\Phi$ as infinitessimals exactly in the spirit of Leibnitz: one can indeed do this rigorously through the machinery of synthetic differential geometry ([nLab Article on Differentiation], [nLab Article on Synthetic Diff. Geom.]) or of nonstandard analysis in the hyperreal number system [Keisler]. However, brilliant insights though these are, one needs to learn quite a bit of machinery to appreciate them (particularly synthetic differential geometry, which I’m still grappling with and making very heavy going of understanding; the Keisler reference is an excellent, well readable text within fairly easy reach to a wide audience with time to read).

Hilbert spaces are isomorphic to their own duals. So in a finite dimensional Hilbert space we can think of the space of linear functionals defined on the co-tangent space as being the tangent space. With this picture in mind, what is the meaning of the unit tangent vector $\hat{X}_j = \partial_j$? In this double-dual picture where the tangent space is the space of linear functionals of the co-tangent space, $\hat{X}_j = \partial_j$ is precisely the linear functional on the co-tangent space that picks out the superposition weight $y_k(p)$ of $\d\lambda_j$ in $\eqref{OneForm}$. From $\eqref{InnerProductChainRule}$, we know that we can make a one-form $\d\,\Phi$ with any $C^k$ function $\Phi:\mathcal{M}\to\R$. If our one-form is defined in this way, i.e. as the gradient of a $C^k$ function, then the basis vector $\hat{X}_j = \partial_j$ is precisely the linear functional that picks the component $\partial_j\,\Phi = \partial\,\Phi/\partial \lambda_j$. Thus we see the sense wherein we can interpret $\hat{X}_j = \partial_j$ as the differential operator $\partial/\partial \lambda_j$. Another way to look at this is to think of a tangent vector as a directional derivative. We compute the gradient along or in the direction of a path with a tangent $X$ as the directional derivative formed by weighting the differential operators $\partial_{\lambda_k}$, i.e. the directional derivatives in the co-ordinate directions, with the components of $X$ to get the directional derivative $\sum\limits_{k=1}^N\,X_k\,\partial_{\lambda_k}$, i.e. the same operator as in $\eqref{DifferentialOperator}$.

The standpoint where a vector field is a differential operator as in $\eqref{DifferentialOperator}$ with $C^k$ superposition weights $x_k(p)$ allows us to define the composition of vectors fields as differential operators $X_1(p)\, X_2(p)\stackrel{def}{=}X_1(p) \circ X_2(p)$ as well as the Lie bracket between vector fields. Of course, the differential operator $X_1(p)\, X_2(p)$ is not a vector field: if we compose entities defined as in $\eqref{DifferentialOperator}$, we get a second order differential operator, a $C^k$ sum of double partial derivative operators. However, the Lie bracket of two vector fields is itself a new vector field.

Definition 11.3: (Lie Bracket of Vector Fields)

Given an analytic manifold $\mathcal{M}$ whereon are defined $C^k;\,k= 2,\,3,\,\cdots,\,\infty,\,\omega$ vector fields $X_1(p), \,X_2(p)$, then the Lie Bracket $\left[X_1,\,X_2\right]$ of the two fields is defined as the differential operator:

\begin{equation}\label{VectorFieldLieBracketDefintion_1}\left[X_1,\,X_2\right] = X_1\,\circ\,X_2 – X_2\,\circ\,X_1 = X_1\,X_2 – X_2\,X_1\end{equation}

For $C^k$ vector fields with $k \geq 2$, the Lie bracket defines a vector field.

For a given $C^k;\,k= 2,\,3,\,\cdots,\,\infty,\,\omega$ vector field $X:\mathcal{M}\to T(\mathcal{M})\,\ni X(p) \in T_p(\mathcal{M})$ on a manifold $\mathcal{M}$, we define the Lie Derivative of a $C^k;\,k= 2,\,3,\,\cdots,\,\infty,\,\omega$ vector field $Y:\mathcal{M}\to T(\mathcal{M})\,\ni Y(p) \in T_p(\mathcal{M})$ by the linear operation $\mathscr{L}_X\,Y = \left[X,\,Y\right]$. For a given $C^1$ scalar field $\phi:\mathcal{M}\to\R$, the Lie derivative of $\phi$ to be $\mathscr{L}_X\,\phi = X\,\phi$.

That the Lie bracket is again a vector field follows from the commutativity of the standard basis vectors, i.e. $\frac{\partial^2 \Phi}{\partial_{\lambda_j}\,\partial_{\lambda_k}} = \frac{\partial^2 \Phi}{\partial_{\lambda_k}\,\partial_{\lambda_j}}$ for any $C^k$ function $\Phi$ where $k\geq 2$. So the higher order derivatives cancel out in the Lie bracket, leaving us with only first order differential operators. If we write $\eqref{VectorFieldLieBracketDefintion_1}$ out in full for vector fields $X(p) = \sum\limits_{k=1}^N\,x_j(p)\,\partial_j,\,Y=\sum\limits_{k=1}^N\,y_j(p)\,\partial_j$ with components $(x_j)_{j=1}^N,\,(y_j)_{j=1}^N$, i.e. when we calculate $Z=\left[X,\,Y\right]$ and cancel out the second mixed derivatives we get for the components $(z_j)_{j=1}^N$ of $Z$:

\begin{equation}\label{VectorFieldLieBracketFull}z_k = \sum\limits_{j=1}^N\,\left(x_j\,\frac{\partial\,y_k}{\partial\,\lambda_j}-y_j\,\frac{\partial\,x_k}{\partial\,\lambda_j}\right)\end{equation}

The Lie bracket of vector fields is also readily shown to be linear, skew-symmetric and to fulfill the Jacobi identity, thus the space of vector fields on a general analytic manifold is an infinite dimensional Lie algebra.

There is another highly intuitive mind picture of the Lie bracket and that is through the Lie Derivative of vector fields, as defined in Definition 11.3. In differential geometry, the Lie derivative is a derivation (entity or operation following Leibnitz’s product rule) that yields a co-ordinate-independent definition of a derivative that is different from covariant derivative of tensor analysis. The catch with the Lie derivative is that it is calculated with respect to a vector field $X$, which works as a “yardstick”: variations are calculated relative to the “landmarks” on the manifold laid down by the “reference vector field” $X$. I sketch the idea in Figure 11.1.

Lie Derivative

Figure 11.1: (The Lie Derivative of Vector Field Y along the Flow of Vector Field X

The Lie derivative may be summarised as evaluating the rate of change of a field along the flow of a reference vector field $X$, that exponentiating to the blue flowlines in Figure 11.1. The definition of the Lie derivative for a scalar field in Definition 11.3 is self explanatory. In Figure 11.1, the principle is that we drag a short section of the flow $\exp(\varsigma\,Y)$ of $Y$, where $\varsigma$ is “small”, a short “distance” $\tau$ along the flow $\exp(\tau\,X)$ of $X$ and see how much it has changed by. We compare it with the image of the “head” of the $\exp(\varsigma\,Y)$ under the same flow $\exp(\tau\,X)$, the difference is of course the little red arrow in Definition 11.3. Otherwise said, we can approximate this amount by beginning at the tail $\mathbf{t}$ of the red arrow in Definition 11.3 and backtracking along the flow $\exp(\varsigma\,Y)$, then backtracking along $\exp(\tau\,X)$, then running forwards along $\exp(\varsigma\,Y)$ then forwards along $\exp(\tau\,X)$ to the head $\mathbf{h}$ of the red vector. The co-ordinates of $\mathbf{h}$ are related to those of $\mathbf{t}$ by $h=\exp(\tau\,X)\,\exp(\varsigma\,Y)\,\exp(-\tau\,X)\,\exp(-\varsigma\,Y)\,t$, so that the comparison is $\exp(\tau\,X)\,\exp(\varsigma\,Y)\,\exp(-\tau\,X)\,\exp(-\varsigma\,Y)-\id)\,\to\,\varsigma\,\tau\, \left[X\,\,Y\right]$ as $\varsigma,\,\tau\to0$. The vector fields commute, the little parallelogram closes.

The Lie derivative generalises to other geometric objects (all tensor fields) beyond scalar and vector fields.

Lastly, we look at what happens when we change co-ordinates for the open intersection $\U\cap\V$ of two patches $\U,\,\V$ with co-ordinate functions $\lambda_U,\,\lambda_V$ and with analytic transition map $\phi_{U,\,V} = \lambda_V\circ \lambda_V^{-1}$. By the chain rule the transformation law for the components of the vector field $X$ in $\eqref{InnerProductChainRule}$ when we switch from $\lambda_U$ to $\lambda_V$ is:

\begin{equation}\label{TangentTransformation}X\mapsto\frac{\partial\,\lambda_V}{\partial\,\lambda_U}\,X;\;X_k\mapsto\sum\limits_{j=1}^N\,\frac{\partial\,\lambda_{V,\,k}}{\partial\,\lambda_{U,\,j}}\,X_j\end{equation}

that is the components of $X$ as an $N\times1$ column vector get fore-multiplied by the matrix $\frac{\partial\,\lambda_V}{\partial\,\lambda_U}$ whose element at position $(k,\,j)$ is $\frac{\partial\,\lambda_{V,\,k}}{\partial\,\lambda_{U,\,j}}$. On the other hand, a co-vector’s components $\d Y$ written as an $N\times1$ column vector get fore-multiplied by the inverse matrix $\left(\frac{\partial\,\lambda_V}{\partial\,\lambda_U}\right)^{-1} =\frac{\partial\,\lambda_U}{\partial\,\lambda_V}$, whose element at position $(k,\,j)$ is $\frac{\partial\,\lambda_{U,\,k}}{\partial\,\lambda_{V,\,j}}$. We can show this in two ways: either (i) the inner product $\left<X,\,\d Y\right> = X^T\, \d Y$ (the RHS of this equation pertains to components written as column vectors) cannot depend on co-ordinates, so that if $X$ gets fore-multiplied by a matrix, $\d Y$ must get multiplied by that matrix’s inverse to keep the inner product the same or (ii) we can simply use the chain rule from first principles, so that:

\begin{equation}\label{CotangentTransformation}\d Y\mapsto\frac{\partial\,\lambda_U}{\partial\,\lambda_V}\,\d Y;\;\d Y_k\mapsto\sum\limits_{j=1}^N\,\frac{\partial\,\lambda_{U,\,k}}{\partial\,\lambda_{V,\,j}}\,\d Y_j\end{equation}

This leaves us with a third way to think about vector fields and one forms: we define any rank 1 field that transforms according to $\eqref{TangentTransformation}$ to be a vector field. A one form in this thought framework is defined to be any field that transforms following $\eqref{CotangentTransformation}$. This co-ordinate transformation conceptualisation of vector fields and one-forms was historically the first.

Exponentiation Of A Vector Field

Let us now hearken back to the definition of the exponential function through a differential equation that we saw in Chapter 5: The Exponential Map. Now we are to consider the analogous differential equation defined on an analytic manifold putatively defining a $C^1$ path $p:\R\to\U$ through $\U$ where $\U$ is some open set whereon co-ordinate functions are defined. That is, we consider the differential equation within a lone chart:

\begin{equation}\label{GeneralManifoldDE}p:\R\to\mathcal{M};\quad\d_\tau p(\tau) = X(p(\tau));\quad p(0) = p_0\in\U\end{equation}

where $X(p)$ is a Lipschitz continuous vector field. Of course here we simply think of the vector field $X(p)$ as a “velocity” as $\eqref{InnerProductChainRule}$ rather than a differential operator as in $\eqref{DifferentialOperator}$.

By the local version of the Picard–Lindelöf theorem as in Chapter 5, $\eqref{GeneralManifoldDE}$ is guaranteed of a unique solution for $\tau_-<\tau<\tau_+$ where $\tau_- < 0$ and $\tau_+ > 0$ and $p\left(\tau\right)\in\U$ for $\tau_-<\tau<\tau_+$. We define $\exp(X(p),\,\tau)\,p_0$ to be this unique solution. Take heed that, for now, $\exp(X(p),\,\tau)\,p_0$ is simply a notation: evidently $\exp(X(p),\,\tau)$ is some operator that “picks $p_0$ up and bears it away to some other point”, reaching $\exp(X(p),\,\tau)\,p_0$ at time $\tau$. But otherwise we as yet know nothing about this solution, other than that it exists and is uniquely defined by the Cauchy initial value problem $\eqref{GeneralManifoldDE}$, so let’s find a few things out about it.

Lemma 11.4: (Flow Equation in a Manifold)

The unique solution $\R\times \U \to\U;\,(\tau,\,p_0)\mapsto \exp(X(p),\,\tau)\,p_0$ of the Cauchy initial value problem $\eqref{GeneralManifoldDE}$ fulfills the flow equation $\exp(X(p),\,(\tau + \varsigma))\,p_0 = \exp(X(p),\,\varsigma)\,\exp(X(p),\,\tau)\,p_0 = \exp(X(p),\,\tau)\,\exp(X(p),\,\varsigma)\,p_0$

Proof: The proof is exactly the same as that of Lemma 5.7; one simply shows that all the entities fulfill the Cauchy initial value problem $\eqref{GeneralManifoldDE}$. By the uniqueness of solution (shown here by Picard–Lindelöf theorem as opposed to the uniqueness as shown by Theorem 5.5 in our Lie group case), all these entities must be equal. $\quad\square$

Witness an intuitive physical interpretation of this lemma, also an intuitive thought picture for the notion of a flow (being a group action of the reals with addition $(\R,\,+)$ on our manifold $\mathcal{M}$). We must think of a flowing fluid in steady state, i.e. one wherein the local particle velocity can be represented as a constant (time-invariant) vector function of position. Then the above lemma simply says that, if a particle’s position at time $\tau=0$ is $p_0$, then its position at time $\tau+\varsigma$ can be reckonned by (i) calculating its position $p(\tau)$ at time $\tau$ then (ii) looking at the particle whose position at time nought was $p(\tau)$ and then calculating this latter particle’s position at time $\varsigma$. Let the latter particle’s position as a function of time be $q(\tau)$, then we must solve the Cauchy initial value problem $\d_\tau\,q = X(q);\;q(0)=p(\tau)$. We then have $p(\tau+\varsigma) = p(\varsigma)$. I sketch this idea in Figure 11.2.

Flow

Figure 11.2: The Flow Property: in a fluid flow, we can find the position of a particle which begins at $p(0)$ at time nought at the time $\tau+\varsigma$ as follows: at time nought, we track a second particle which begins at $p(\tau)$ and then find the position of this latter at time $\varsigma$. Its position will then be the same as that of the first particle at time $\tau+\varsigma$. The flow concept, that of a group action of $(\R,\,+)$ on a manifold is thus the formalisation of the idea of a time-invariant (steady-state) fluid flow in a highly natural and intuitive way.

Since our main gig is of course Lie groups, and we now know these are analytic manifolds, let us now restrict ourselves to an analytic $C^\omega$ vector field $X(p)$ defined in the neighbourhood $\U$ of $p_0\in\mathcal{M}$. When we do so, we have:

Lemma 11.5: ($\exp$ Is Analytic)

The unique solution $\R\times \U \to\U;\,(\tau,\,p_0)\mapsto \exp(X(p),\,\tau)\,p_0$ of the Cauchy initial value problem $\eqref{GeneralManifoldDE}$ is analytic ($C^\omega$) in both $\tau$ and $p_0$ over some nonzero interval $|\tau|<\tau_p$ for $\tau_p>0$ and within some open ball defined by $\left\|p\right\|< p_\epsilon$ with $p_\epsilon>0$.

Proof: Show Proof


Without loss of generalhood, one can assume the analytic co-ordinate functions are such that the $p_0=\Or$, i.e. that the co-ordinates of $p_0$ in the chart $(\U,\,\lambda)$ are the origin. By assumption of analyticity, write the vector field $X(p)$ as its uniformly convergent Taylor series $\sum\limits_{|k|=1}^\infty\,x_k\,p^k$ where now $k$ is a multi-index, i.e. a column vector with components $k_1,\,k_2,\,\cdots,\,k_N$ with $\sum\limits_{j=1}^N\,k_j = |k|$, $x_k$ a matrix and $p^k$ is the column vector with powers $p_1^{k_1},\,p_2^{k_2},\,\cdots,\,p_N^{k_N}$ of the co-ordinate components $p_j$. We assume for the moment a power series solution to $\eqref{GeneralManifoldDE}$ of the form $p(\tau) =\sum_{j=1}^\infty\,q_j\,\tau^j$, where $q_j$ is an $N\times 1$ column vector of co-efficients whose $\ell^{th}$ component defines the $j^{th}$ power of the $\ell^{th}$ co-ordinate component. By putting this assumed solution into $\eqref{GeneralManifoldDE}$ and equating co-efficients, one shows that there must be a unique formal power series solution to $\eqref{GeneralManifoldDE}$.

But then one can now go on to show convergence of this power series by the method of majorants over some nonzero interval $|\tau|<\tau_p$ for $\tau_p>0$ and within some open ball in $\mathcal{M}$ defined by $\left\|p\right\|< p_\epsilon$ for $p_\epsilon>0$. The details one version of this kind of proof are well written up in [Rossmann], §4.3, Theorem 1. So the assumed power series is convergent and moreover it is then trivially shown to fulfill the Cauchy initial value problem $\eqref{GeneralManifoldDE}$. The Picard–Lindelöf theorem then shows that this solution is the unique solution $\exp(X(p),\,\tau)\,p_0$ we have been talking about. $\quad\square$

Geared up with Lemma 11.5, we can now prove the differential geometry result central to our next discussion of Lie groups:

Theorem 11.6: (Taylor’s Theorem on a $C^\omega$ Manifold)

Let $\mathcal{M}$ be an analytic manifold, $p_0\in\mathcal{M}$ a point thereof, $\U$ an open neighbourhood of $p_0$ kitted with analytic co-ordinates $\lambda:\U\to\R^N$. Furthermore let $X:\U\to\R^N$ a vector field which is $C^\omega$ throughout $\U$ and let $p:(-\epsilon_0,\,\epsilon_0)\to\mathcal{M}$ be the unique analytic (by Lemma 11.5) solution to the Cauchy initial value problem $\eqref{GeneralManifoldDE}$. Lastly, let $\Phi:\U\to\R$ be an analytic function of the co-ordinates $\lambda$. Then:

\begin{equation}\label{ManifoldTaylorsTheorem_1}\Phi(\exp(X(p),\,\tau)\,p_0) = \sum\limits_{k=0}^\infty \frac{\tau^k}{k!}\,\left.\left(X(p)^k\,\Phi(p)\right)\right|_{p=p_0}\end{equation}

for $|\tau|<\epsilon_1$ for some $\epsilon_1>0$.

Proof: By Lemma 11.5, $\exp(X(p),\,\tau)\,p_0$ is analytic in $\tau$ over some nonzero interval, therefore so is $\Phi(\exp(X(p),\,\tau)\,p_0)$ ($\Phi$ is so by assumption). By the same theorem, this latter function is also analytic in the manifold co-ordinates for some open ball around $p_0$. Therefore, the time derivatives $\left.\d_\tau \Phi(\exp(X(p),\,\tau)\,p_0)\right|_{\tau=0},\,\left.\d_\tau^2 \Phi(\exp(X(p),\,\tau)\,p_0)\right|_{\tau=0},\,\cdots$ are all non other than $\left.\left(X(p)\,\Phi(p)\right)\right|_{p=p_0},\,\left.\left(X(p)^2\,\Phi(p)\right)\right|_{p=p_0},\,\cdots$, thinking of the vector fields $X(p)^k$ as $k^{th}$ order differential operators found by repeated application of $\eqref{DifferentialOperator}$. So, Taylor’s theorem (by definition of an analytic function) applies and is non other than $\eqref{ManifoldTaylorsTheorem_1}$ in this case. $\quad\square$

The flow lines of the form $\exp(X(p),\,\tau)\,p_0$ found by exponentiating a vector field are also called integral curves. Note that the property in $\eqref{ManifoldTaylorsTheorem_1}$ firstly (i) wholly and uniquely defines vector field exponentiation, for if we put $\Phi=\lambda_j$ for each of the co-ordinate components, we simply reconstruct the co-ordinates for the patch $\U$ and thus the unique solution of the Cauchy initial value problem in $\eqref{GeneralManifoldDE}$. (ii) Secondly, $\eqref{ManifoldTaylorsTheorem_1}$ shows that $\exp(X(p),\,\tau)\,p_0$ is indeed a function of the scaled vector field $\tau\,X$, not of $X$ and of $\tau$ separately. Therefore, henceforth, we shall write the exponentiated vector field as $\exp(X(p)\,\tau)\,p_0$ (rather than $\exp(X(p),\,\tau)\,p_0$ with $X$ and $\tau$ as separate arguments of the $\exp$).

SkullWarning: Exponentiation of a vector field $\exp(X(p)\,\tau)\,p_0$ is not always defined for all $\tau\in\R$ in a general manifold as it is (as we shall see) for a Lie group. Many kinds of “singularity” and failure can befall a general vector field exponentiation.

Homogeneous Spaces and $\G$-Torsors

Now we apply this differential geometry to Lie groups. We shall do so through the concept of a homogeneous space. The definition below defines a great many concepts and packs in a great deal of information. I recommend reading it through many times, perhaps every coffee break over several days whilst doing something else so as to let the mind subconsciously mull over and sort out and neatly pack away the relevant concepts.

Definition 11.7: (Homogeneous Space)

A continuous action of a topological group $\G$ on a topological space $\mathbb{X}$ is called a $\G$-space.

A continuous and transitiveaction of a topological group $\G$ on a topological space $\mathbb{X}$ is called a homogeneous space. If, further, the action is free, then the action is called a $\G$-torsor.

In more detail: let $\G$ be a Lie group, $\mathbb{X}$ a topological space, $\mathscr{S}(\mathbb{X})$ some subgroup of the automorphisms of the space $\mathbb{X}\to\mathbb{X}$ and $\rho: \G\to\mathscr{S}(\mathbb{X})$ a homomorphism from $\G$ onto $\mathscr{S}(\mathbb{X})$. That is, the triple $(\G,\,\mathbb{X},\,\rho)$ is an action of the group $\G$ on $\mathbb{X}$. Let further this action be continuous, i.e. for every $(\gamma,\,x)\in\G\times\mathbb{X}$ the function $\G\times\mathbb{X}\to\mathbb{X};\;(\gamma,\,x)\mapsto\rho(\gamma)(x)$ is continuous at the point $(\gamma,\,x)$ when $\G\times\mathbb{X}$ is kitted with the product topology. Let further the action be transitive, i.e. $\forall\,x,\,y\in\mathbb{X}\,\exists\,\gamma\in\G\,\ni\,\rho(\gamma)(x) = y$. That is, there is always an element of $\G$ whose action on any element $x\in\mathbb{X}$ maps $x$ to any other element $y\in\mathbb{X}$. Otherwise put, the orbit $\G\cdot x=\{\rho(\gamma)(x)|\;\gamma\in\G\}$ of any element (the set of elements which can be reached from $x$ by a $\rho(\gamma)$ for $\gamma\in\G$) is the whole of $\mathbb{X}$.

Then the triple $(\G,\,\mathbb{X},\,\rho)$ is called a homogeneous space.

If, further, the action is free i.e. the stabiliser subgroup $G_x =\{\gamma\in\G|\;\rho(\gamma)(x) = x\}\subset\G$ is trivial ($G_x=\{\id\}$) for every $x\in\mathbb{X}$, i.e. the subgroup of all elements of $\G$ that “fix” $x$ (leave it unchanged) is the identity for every member of $\mathbb{X}$, then the homogeneous space is called a $\G$-torsor or a principal homogeneous space of $\G$. A transitive action which is also free is also called sharply transitive. An action for which the kernel of the homomorphism is trivial is called faithful. That is, different elements of $\G$ map to different automorphisms of $\mathbb{X}$, i.e. if $\gamma,\,\zeta\in\G$ and $\gamma\neq\zeta$ then there is an $x\in\mathbb{X}$ such that $\rho(\gamma)(x)\neq\rho(\zeta)(x)$.

In the case of a Lie Group $\G$, a homogeneous space is a triple of the form $(\G,\,\mathcal{M},\,\rho)$ where $\mathcal{M}$ is an analytic manifold, $\rho:\G\to\mathscr{S}(\mathcal{M})$ is a homomorphism from $\G$ to the group $\mathscr{S}(\mathcal{M})$ of analytic automorphisms of $\mathcal{M}$ and where the action of $\G$ on $\mathcal{M}$ through $\rho$ is transitive. If further the action is free, then the homogeneous space is a Lie group $\G$-torsor. The meanings of free and faithful are of course the same for a $\G$-space where $\G$ is a Lie group as they are when $\G$ is a general topological group.

The word homogeneous refers to the fact that the orbit of any element under $\G$ (the”reachable set” of all elements which can be reached by a transformation induced through $\rho$ by some element of $G$) is the whole of $\mathcal{M}$ (or of $\mathbb{X}$, in the case of a topological group). Therefore, the action of $\G$ can be studied by studying the action of any element of $\mathcal{M}$.

Our first example of a homogeneous space is where the Lie group $\G$ acts on itself through right translation. That is, $\rho$ is an isomorphism and the analytic automorphism $\rho(\gamma)$ of $\G$ (considered as the acted-on analytic manifold $\mathcal{M}$ in Definition 11.7) corresponding to the group element $\gamma\in\G$ is $\G\to\G;\; \zeta\mapsto\zeta\,\gamma$.

Now we think of a $C^1$ path $\sigma_X:\R\to\G;\;\sigma_X(0)=\id;\;\d_\tau\sigma_X(\tau)|_{\tau=0}=X\in\g$, where as always $\g$ is the Lie algebra of $\G$. Now we think of $\sigma_X(\tau)$ acting on $\G$ as the analytic manifold $\mathcal{M}$. As $\tau$ varies, the image $\sigma_X(\tau)\,\zeta$ traces a $C^1$ path through $\zeta$ (which it runs through at $\tau=0$) and with a tangent $\d_\tau \zeta\,\sigma_X(\tau)|_{\tau=0} = \zeta\,X$. We have thus defined a vector field on $\G$, to wit, $\mathscr{X}:\G\to\R^N;\;\mathscr{X}(\gamma) = \gamma\,X$: at each $\zeta\in\G$ we assign the left-translated constant tangent vector $U(\zeta) =\zeta\,X$. This vector field is left-invariant in the sense that left-translating it maps it to the same field, i.e. $\mathscr{X}(\zeta\,\gamma) = \zeta\,\gamma\,X$. Clearly there is exactly one left-invariant vector field for each $X\in\g$. The exponentiation of this vector field in the sense of finding the unique solution to the Cauchy initial value problem of $\eqref{GeneralManifoldDE}$ clearly leads to the same concept of the exponential function $e^{X\,\tau}$ as we defined in Chapter 5. We therefore have a new definition of the Lie algebra of $\G$ as the set of all left-invariant vector fields on $\G$.

Naturally, wholly analogous ideas can be built up on an action of a Lie group on itself through left translation, leading to the notion of right invariant vector fields, and an equally valid definition of the Lie algebra is as the set of all right invariant vector fields. We now link the concept of Lie bracket of vector fields to the one we have given in Chapter 6.

Lemma 11.8: (Relationship between Lie Bracket in $\g=\operatorname{Lie}(\G)$ and Vector Field Lie Bracket)

Let $\G$ be a Lie group, and $\g$ its Lie algebra. Let $X,\,Y\in\g$ and let $Z$ be the Lie bracket between them: $Z=\left[X,\,Y\right]=\ad(X)\,Y$, as defined by Definition 7.2. Now consider the left invariant vector fields $\mathscr{X},\,\mathscr{Y}$ and $\mathscr{Z}$ corresponding to the Lie algebra members $X,\,Y,\,Z$, respectively. Then, with the vector field Lie bracket of Definition 11.3 we have $\mathscr{Z}=\left[\mathscr{X}\,\mathscr{Y}\right]$, that is: (i) the Lie bracket between left invariant vector fields is another left invariant vector field and, in the case of left invariant fields, (ii) this vector field Lie bracket yields the same concept of Lie bracket as defined by Definition 7.2.

Proof: Show Proof


We have noted that the exponentiation of a left-invariant vector field in the sense of finding the unique solution to the Cauchy initial value problem of $\eqref{GeneralManifoldDE}$ clearly leads to the same concept of the exponential function $e^{X\,\tau}$ as we defined in Chapter 5. That is, near enough to the identity, the co-ordinates of the $e^{X\,\tau}$ defined in Chapter 5 are found, following Theorem 11.6, by applying the differential operator $\sum\limits_{k=0}^\infty\,\frac{\tau^k}{k!}\,X^k$ to the co-ordinate functions $\lambda$ and working out $X^k\,\lambda|_{\lambda=\Or}$.

So now consider the two Lie algebra members $X,\,Y$ and thence define a family of left invariant vector fields $\mathscr{W}_s$ corresponding to the exponential function $e^{W_s\,\tau} = e^{s\,X}\, e^{\tau\,Y}\,e^{-s\,X} = \exp\left(e^{s\,\ad(X)}\,Y\,\tau\right)$ (see Lemma 5.12). So the vector field $\mathscr{W}_s$ is a left-invariant vector field corresponding to the Lie algebra member $W_s = e^{s\,\ad(X)}\,Y$. Applying Theorem 11.6 to the entity $e^{s\,X}\, e^{\tau\,Y}\,e^{-s\,X}$ and thinking of it as a differential operator on analytic functions of the co-ordinate functions, we find that the vector field $\mathscr{W}_s$ must be the differential operator:

\begin{equation}\label{LieAlgebraAndVectorFieldLieBracketSameLemma_1}\exp(s\,\mathscr{X})\,\mathscr{Y}\,\exp(-s\,\mathscr{X}) = \mathscr{Y} + s \left[\mathscr{X},\,\mathscr{Y}\right] + s^2 \left[\mathscr{X},\,\left[\mathscr{X},\,\mathscr{Y}\right]\right]+\cdots\end{equation}

the Lie brackets being in the sense of Definition 11.3. Linear combinations and limits of left invariant vector fields being left invariant vector fields (because there is exactly one such field for each member of $\g$, $\g$ is closed under these operations and the correspondence between Lie algebra members and left invariant vector fields is linear and continous), we see that $\lim\limits_{s\to0} (s^{-1} (\mathscr{W}_s – \mathscr{Y}) = \left[\mathscr{X},\,\mathscr{Y}\right]=\mathscr{Z}$ is a left invariant field, and is indeed the same as the one corresponding to $Z=\ad(X)\,Y$.$\quad\square$

This gives us one of the common definitions of a Lie group’s Lie algebra when the former is defined conventionally, i.e. through Definition 10.3:

Definition 11.9: (Lie Algebra as Left (Right) Invariant Vector Field)

Let $\G$ be a finite dimensional ($N$) Lie group as defined by Definition 10.3. Then we define the Lie Algebra $\g$ as the set of all left-invariant vector fields on the group $\G$. The same algebra arises from deeming it to be the set of all right-invariant vector fields on $\G$. The Lie algebra’s Lie bracket $\g\times\g\to\g$ is defined through Definition 11.3.

Now let us think about a group $\mathscr{S}(\mathcal{M})$ of analytic automorphisms of $\mathcal{M}$ from first principles. The group of all analytic automorphisms of any general analytic $\mathcal{M}$ could be a very large thing. There is no reason to foresee that it will be a finite dimensional Lie group and indeed this it is sometimes not a finite dimensional Lie group in some very simple manifolds.

Example 11.10: (Analytic Automorphism Group Bigger than a Lie Group)

We take $\mathcal{M}=\R$ with the wonted topology. Then the group of analytic automorphisms contains every map of the form $\Phi:\R\to\R;\;x\mapsto x + \epsilon\,\sin(\phi(x))$ where (i) $\phi:\R\to\R$ is any analytic function (whether or not a bijection) with a bounded derivative $|\d_x f(x)| < \mathscr{D}$ and (ii) $|\epsilon\,\mathscr{D}| < 1$ so that $d_x \Phi(x) > 0$, $\Phi$ is monotonic, analytic and onto, thus is an analytic bijection. $\phi$ has countably infinitely many degrees of freedom (defined by the power co-efficients of its Taylor series) and so there is no way to put even functions of the form of $\Phi$ with small enough $\epsilon$ into one to one correspondence with $\R^N$ such that $C^1$ paths “multiply” (i.e. through the group product of function composition) to give a $C^1$ path.

However, sometimes groups of analytic automorphisms can be stunningly simple. If instead of $\R$ we think of $\mathbb{C}$ and require our automorphisms to be complex analytic, i.e. holomorphic, we get:

Theorem 11.11: (Holomorphic Automorphisms of $\mathbb{C}$)

The group of all everywhere holomorphic bijections of $\mathbb{C}$ is the group of all functions of the form $\mathbb{C}\to\mathbb{C};\;z\mapsto a\,z+b;\;a,\,b\in\mathbb{C}$. Alternatively: the group of all everywhere conformal bijections of the Euclidean plane $\R^2$ is the same group.

Proof: Show Proof


Let $\phi:\mathbb{C}\to\mathbb{C}$ be such a function. Since conformal, thus differentiable, everywhere, it is entire, i.e. has a convergent Taylor series everywhere, thus $\phi(z) = \sum\limits_{k=0}^\infty \phi_k\,z^k;\;\forall\,z\in\mathbb{C}$.

Suppose first that the Taylor series has infinitely many terms, therefore $\phi$ has an essential singularity at infinity (alternatively, of course, we could consider $\phi(z^{-1})$ and study the singularity at $z=0$). By the Casorati-Weierstrass Theorem, the image $\mathcal{S}_1=\phi(\mathcal{N}_R)$ of every neighbourhood $\mathcal{N}_R=\{z\in\mathbb{C}|\;|z|>R\}$ of infinity ($R>0$ is any positive real) is dense in $\mathbb{C}$. On the other hand, by the open mapping theorem, the image $\mathcal{S}_2=\phi(\tilde{\mathcal{N}}_R)$ of the open set $\tilde{\mathcal{N}}_R = \{z\in\mathbb{C}|\;|z|<R\}$ is also open. Therefore, since $S_1$ is dense in $\mathbb{C}$ and $S_2$ is open, $S_2$ contains points of $S_1$, i.e. $\exists \zeta\in S_1\cap\S_2$. But $\zeta\in S_1 \Rightarrow\,\exists z_1\in\mathcal{N}_R\,\ni\, \phi(z_1)=\zeta$, likewise $\zeta\in S_2 \Rightarrow\,\exists z_2\in\tilde{\mathcal{N}}_R\,\ni\, \phi(z_2)=\zeta$. Since $\tilde{\mathcal{N}}_R\cap \mathcal{N}_R=\emptyset$ this means that $z_1 \neq z_2$, so that $\phi$ is at least two to one.

Therefore, $\phi$ cannot have an essential singularity, and must therefore be a polynomial. But if a polynomial $p(z)$ of degree $n>1$ assumes the value $p_0$ for only one value $z_0$ of $z$, then $p(z)$ is of the form $a_0\,(z-z_0)^n+p_0$ for some $a_0\neq0$. Suppose it can assume some other value $p_1\neq p_0$ at only one value $z_1$ of $z$, then $a_1\,(z-z_1)^n+p_1 = a_0\,(z-z_0)^n + p_0\,\forall\,z\in\mathbb{C}$. Comparing co-efficients of square and higher powers of $z$, this implies $z_1=z_0$. Therefore a polynomial of degree $n>1$ cannot be bijective, and we have ruled out all possibilities aside from a polynomial of the form $a\,z+b$ for $a\neq0$.$\quad\square$

If we consider the compactified complex plane $\hat{\mathbb{C}}$, i.e. the Riemann sphere, we have the following.

Theorem 11.12: (Möbius Group: Holomorphic Automorphisms of the Riemann Sphere)

The group of everywhere holomorphic bijections of the Riemann sphere $\hat{\mathbb{C}}=\mathbb{C}\cup\{\hat{\infty}\}$ (equivalently: conformal bijections of the Riemann sphere) is the Möbius group $\mathbb{M}(2)$ of Möbius (fractional linear) transformations.

Note: Stereographic projection of $\mathbb{C}$ into the unit sphere is a conformal mapping, so that conformal transformations of the complex plane indeed correspond to conformal transformations of the sphere.

Proof: Show Proof


Let $\phi:\hat{\mathbb{C}}\to\hat{\mathbb{C}}$ be such a bijection. Since it is one to one, it can assume the value $\hat{\infty}$ exactly once, meaning that, as a function restricted to the noncompact complex plane alone, it has precisely one singularity, say at $z=z_\infty$. Now witness that the transformations $z\mapsto z^{-1}$ and $z\mapsto z+c$ for $c\in\mathbb{C}$ are clearly both holomorphic bijections of $\hat{\mathbb{C}}$. So now make a change of variable $\zeta = (z-z_\infty)^{-1}$ then $\tilde{\phi}(\zeta) = \phi(\zeta^{-1} + z_\infty)$ is entire, i.e. has its singularity at $\zeta=\hat{\infty}$. So now we have $\tilde{\phi}(\hat{\infty})=\hat{\infty}$ and, for all other values $\zeta\neq\hat{\infty}$, thinking of $\tilde{\phi}$ as a function restricted to the noncompact complex plane alone, we have precisely the same situation as that in Theorem 11.11. That is $\tilde{\phi}$ must be of the form $\tilde{\phi}(\zeta) = a\,\zeta+b;\;a,\,b\in\mathbb{C}$. Transforming back, we have $\phi(z) = k_1\,(z-z_\infty)^{-1}+k_2 = \frac{k_1\,k_2\,z – k_2\,z_\infty}{z-z_\infty}$ and the constants $k_1,\,k_2$ and $z_\infty$ are arbitrary complex constants. It is readily shown that one can choose $k_1,\,k_2,\,z_\infty\in\mathbb{C}$ so that $\phi(z) = \frac{a\,z+b}{c\,z+d}$ with $a\,d – b\,c=1$, i.e. $\phi$ belongs to the Möbius group and the whole Möbius group can be realised with a $\phi$. $\square$

So we witness very simple examples that illustrate the two extremes of behaviour: a group of analytic automorphisms of a manifold can be very complicated, leaving the tools of Lie theory useless for talking about it, or it can be an elegantly described, finite dimensional Lie group. Therefore, if we are to study a subgroup of the analytic automorphisms $\mathscr{S}(\mathcal{M})$ of an analytic manifold $\mathcal{M}|$ with finite dimensional Lie groups theory, we shall have to assume at the outset that there is some Lie group $\G$ which together with the manifold $\mathcal{M}$ and a suitable homomorphism $\rho:\G\to\mathscr{S}(\mathcal{M})$ that makes the system into a homogeneous space. We can cut to the chase, replace $\G$ by $\G/\ker(\rho)$; we still have a finite dimensional Lie group and now $\rho$ is an isomorphism. That is, we can simply assume that there is some interesting subgroup of $\mathscr{S}(\mathcal{M})$ which is the Lie group $\G$ and see what information that yields.

How shall we characterise $\G$? We can imagine acting on $\mathcal{M}$ with members of one of the one parameter groups of $\G$, that is, imparting $e^{\tau\,X}$ to $\mathcal{M}$ and “watching a movie of $\mathcal{M}$ as $\tau$ varies”. As it does so, any given point of $\mathcal{M}$ is dragged through $\mathcal{M}$ to form a flowline. That is, $e^{\tau\,X}\in\G$ must correspond to a flow transformation of the form $\exp(\tau\,\mathscr{X})$ for some analytic vector field $\mathscr{X}$ in $\mathcal{M}$ (note that we must be very careful to keep in mind the separateness of the two analytic manifolds $\G$ and and $\mathcal{M}$ and be ever clear which we speak of when we describe vector fields and the like). Take a deep breath for a great deal actually happens in the next sentence! We construct $\mathscr{X}(a)$ – a vector field in the manifold $\mathcal{M}$ – at any $a\in\mathcal{M}$ as $\mathscr{X}(a) = \left.\d_\tau \lambda_a\left(e^{X\,\tau}\,a\right)\right|_{\tau=0}$, where (i) of course $e^{X\,\tau}\in\G$ is a Lie group member whereas (ii) $a\in\mathcal{M}$ belongs to the acted-on manifold, (iii) the elegant (but rather glib) notation $e^{X\,\tau}\,a$ means “the image of $a$ under the action of (after mapping by) the Lie group member $e^{\tau\,X}$” and, lastly, (iv) $\lambda_a:\U_a\to\R^N$ is a co-ordinate function mapping the open neighbourhood $\U_a\subset\mathcal{M}$ to the local copy of $\R^N$ associated with the manifold $\mathcal{M}$. Take particular heed that the derivative $\left.\d_\tau \lambda_a\left(e^{X\,\tau}\,a\right)\right|_{\tau=0}$ is calculated in $\mathcal{M}$, not in the Lie group, although of course the latter sets the derivative’s value through its action on $\mathcal{M}$.

Naturally in the above, we use the shorthand $e^{X\,\tau}\,a$ or, more generally, $\gamma\,a$ with $e^{X\,\tau},\,\gamma\in\G;\,X\in\g$ and $a\in\mathcal{M}$ to stand for $\gamma\,a = \rho(\gamma)(a)$. When we want to emphasise the role of $\rho$, particularly if it is a homomorphism rather than an isomorphism, we use the full notation, which we can precisely word as “the image of the manifold member $a$ under the analytic automorphism $\rho(\gamma)$ of $\mathcal{M}$ corresponding to the Lie group $\G$ under the homomorphism $\rho$”.

We can further say the following about vector fields of the form $\mathscr{X}(a) = \left.\d_\tau \lambda_a\left(e^{X\,\tau}\,a\right)\right|_{\tau=0}$ for some $X\in\g$ where $X$ is the Lie algebra of $\G$; they rather special:

  1. The exponentiation of these vector fields to get flow lines of the form $e^{X\,\tau}\,a = \exp(\tau,\,\mathscr{X}(a))\,a$ is defined for all $\tau\in\R$, unlike the exponentiation of a general vector field, whose definition in general one cannot broaden beyond the local chart they are defined in as we say above;
  2. If the homomorphism is an isomorphism, there is precisely one such vector field $\mathscr{X}(a)$ on $\mathcal{M}$ for each $X\in\g$; as with the left-invariant vector fields in the Lie group, the correspondence between $\g$ and these special vector fields is a bijection;
  3. If we (i) begin with the Lie group members $e^{\tau\,X},\,e^{\varsigma\,Y}\in\G$ for $X,\,Y\in\g$, then (ii) calculate the corresponding vector fields in $\mathcal{M}$ as $\mathscr{X}(a) = \left.\d_\tau \lambda_a\left(e^{\tau\,X}\,a\right)\right|_{\tau=0}$, $\mathscr{Y}(a) = \left.\d_\tau \lambda_a\left(e^{\tau\,Y}\,a\right)\right|_{\tau=0}$ for every $a\in\mathcal{M}$ and lastly (iii) exponentiate these vector fields in the sense of $\eqref{GeneralManifoldDE}$ to get the differential operators $\exp(\tau\,\mathscr{X}),\,\exp(\varsigma\,\mathscr{Y})$, then, by dint of the homomorphism / isomorphism $\rho:\G\to\mathscr{S}(\mathcal{M})$ between $\G$ and $\mathscr{S}(\mathcal{M})$, the action of the differential operator $\exp(\tau,\,\mathscr{X})\, \exp(\varsigma,\,\mathscr{Y})$ defined by the operator composition product on any analytic function $\varphi:\mathcal{M}\to\R$ (where $|\tau|, \,|\varsigma| < \epsilon$ for some nonzero, positive $\epsilon>0$ so that the operator’s action on $\phi$ converges) must be the same as the action $\varphi(e^{\tau\,X}\,e^{\varsigma\,Y}\,a)$ of the group member $e^{\tau\,X}\,e^{\varsigma\,Y}\in\G$ defined by the Lie group product on on that same function $\varphi:\mathcal{M}\to\R$, i.e. \begin{equation}\label{LieGroupAndManifoldAutomorphismEquivalence}\varphi(e^{\tau\,X}\,e^{\varsigma\,Y}\,a) = \left.\exp(\tau,\,\mathscr{X}(p))\, \exp(\varsigma,\,\mathscr{Y}(p))\,\varphi(p)\right|_{p=a};\;|\tau|, \,|\varsigma| < \epsilon\text{ for some }\epsilon>0\end{equation} with the left hand side being interpreted as a convergent series of terms comprising iterated differential operators $\mathscr{X}^k \mathscr{Y}^j$ acting on the analytic function $\varphi$ at $a$. The set of differential operators $\{\exp(\tau,\,\mathscr{X})|\;\tau\in\R\}$ with the operation of function composition is a genuine one-parameter group of transformations of the manifold $\mathcal{M}$ and, since there must be a definition of $\exp(\tau,\,\mathscr{X}),\, \exp(\varsigma,\,\mathscr{Y})$ for all $\tau,\,\varsigma\in\R$ as in point 1 above, when we broaden the definitions in this way, $\eqref{LieGroupAndManifoldAutomorphismEquivalence}$ must hold for all $\tau\,\sigma$ by dint of the homomorphism / isomorphism $\rho:\G\to\mathscr{S}(\mathcal{M})$ between $\G$ and $\mathscr{S}(\mathcal{M})$.
  4. In particular, as in Lemma 11.8, the vector field $\mathscr{W}_s$ on the manifold $\mathcal{M}$ defined by $\exp(\tau\,\mathscr{W}_s)=\exp(\varsigma\,\mathscr{X}),\, \exp(\tau\,\mathscr{Y})\, \exp(-\varsigma\,\mathscr{X})$ is the one defined by $\mathscr{W}_s=\left.\d_\tau \lambda_a\left(\exp\left(\tau\,e^{\varsigma\,\ad(X)\,Y}\right)\,a\right)\right|_{\tau=0}$. Furthermore, by the linear operation and limit argument in Lemma 11.8, the vector field $\mathscr{Z}=\left[\mathscr{X},\,\mathscr{Y}\right]$ fulfills the following: \begin{equation}\label{LieGroupSpoorInActedOnManifold}\mathscr{Z}(a)=\left[\mathscr{X}(a),\,\mathscr{Y}(a)\right] = \left.\d_\tau \lambda_a\left(e^{\varsigma\,\left[X,\,Y\right]}\right)\,a\right|_{\tau=0}\end{equation} that is, the special vector fields of the form $\mathscr{X}(a) = \left.\d_\tau \lambda_a\left(e^{X\,\tau}\,a\right)\right|_{\tau=0}$ for some $X\in\g$ form a Lie algebra isomorphic (as a Lie algebra as well as as a linear space) to the Lie algebra $\g$ of the group. If the homogeneous space is defined by a homomorphism $\rho$ with nontrivial kernel, the Lie algebra of the vector fields is defined by the corresponding Lie algebra homomorphism $\d\rho$, the Lie Map of $\rho$.

This discussion leads to a result worth writing down formally.

Theorem 11.13: (Relationship Between Lie Algebras of Lie Groups and of Vector Fields in Target Manifolds in a Homogeneous Space)

Let $(\G,\,\mathcal{M}\,\rho)$ be a Lie group homogeneous space, $\mathscr{S}(\mathcal{M}) = \rho(\G)$ the subgroup of the group of analytic automorphisms of $\mathcal{M}$ in homomorphic correspondence with $\G$, $\g$ the Lie algebra of $\G$ and $\{\hat{X}_k\}_{k=1}^N$ a $\g$-basis. Then the set of vector fields $\{\hat{\mathscr{X}}_k\}_{k=1}^M$ (here $M\leq N$) on the target manifold $\mathcal{M}$ induced by $\rho$ through $\rho:\G\to\mathscr{S}(\mathcal{M})$ as:

\begin{equation}\label{LieAlgebraSpoorOnManifoldTheorem_1}\hat{\mathscr{X}}_k(a) = \left.\d_\tau \lambda_a\left(\rho\left(e^{\hat{X}_k\,\tau}\right)(a)\right)\right|_{\tau=0}\end{equation}

is a basis for a Lie algebra $\mathscr{g}$ homomorphic to $\g$, with the Lie map of $\rho$ being the continuous (thus analytic, by the argument given in Theorem 7.12) $M\times N$ linear operator $\d\rho:\g\to\mathscr{g}$ that respects Lie brackets.$\quad\square$

Informally: the Lie group $\G$ leaves a “spoor” of its Lie algebra $\g$ on the manifold $\mathcal{M}$ through the vector fields it induces through $\rho$. If $\ker(\rho)=\{\id\}$, i.e. $\rho$ is an isomorphism, then the Lie algebras are isomorphic through the Lie map $\d\rho$ (nonsingular when thought of as a and $N\timesN$ matrix). The image $\rho(\G)$ of $\G$ under $\rho$, a subgroup of the group of analytic automorphisms of $\mathcal{M}$, is made into a Lie group in the natural way described by Theorem 7.21. We can cut to the chase and replace $\G$ by the Lie group $\G/\ker(\rho)$ in the above and say that this subgroup of analytic automorphisms is $\G$ whence $M=N$ and the Lie map $\d\rho = \id$. In this case, the Lie group $\G$ leaves an unambiguous (wholly, uniquely defining) “spoor” of its Lie algebra $\g$ on the manifold $\mathcal{M}$ through the vector fields it induces through $\rho$.

I emphasise yet again that all of the above inferences are made assuming that there is a finite dimensional Lie group action on a particular manifold. They do not prove that there is one to begin with. Most of the inferences flow straight from the postulate of the homomorphism $\rho$: all the gory details about how this brings about an action on the group are abstracted away by its presence and so what’s going on in the mechanics of a homogeneous space can indeed seem somewhat hidden and mysterious – or at least it did so to me at first reading. If software development is at all wonted to you, the homomorphism $\rho$ rather reminds me of the software architecture notion of a marshal and of marshalling communication of data structures across an interface between two fundamentally different software environments. Typically these “fundamentally different environments” are dynamic libraries compiled from different high level languages. The marshal takes care of all the details of re-arranging the internal representation of a given data structure sent through the interface, which structure may be compiled into quite different allocation topologies in physical memory. The software architect simply takes the marshal’s being for granted, or at the very most defines conversions – marshalling – that needs to be done if they are not standard. Here the Lie group, especially as I have developed the notion from my axioms, can be at first glance seem very “incompatible” with the manifold it acts on and its group action can be rather complicated and far from obvious. We simply defer the details of that function to our “marshal” $\rho$ and leave it at that. Later, when one works through specific examples, one has to think about the workings of our “marshal”.

How shall we ken the above situation, i.e. that a Lie group acts on a manifold to form a homogeneous space? The following gives at least one way to do this.

Theorem 11.14: (Finite Dimensional Lie Group of Manifold Symmetries)

Let $\mathcal{M}$ be an analytic manifold and $\{\hat{\mathscr{X}}_k\}_{k=1}^N$ a finite set of vector fields with the following properties:

  1. $\{\hat{\mathscr{X}}_k\}_{k=1}^N$ is a basis for a finite dimensional Lie algebra $\g$, i.e. the linear space $\g$ generated as the set of all linear combinations of the basis vectors $\{\hat{\mathscr{X}}_k\}_{k=1}^N$ is closed under the Lie bracket of vector fields of Definition 11.3;
  2. The exponentiation $\exp(\tau\,\hat{\mathscr{X}}_k)\,a$ is defined for all $\tau\in\R$ and all $a\in\mathcal{M}$ and thus defines an analytic automorphism of $\mathcal{M}$ for all $\tau\in\R$.

Then the group $\G=\left\{\left.\prod\limits_{k=1}^M\,\exp\left(\tau_k\,\hat{\mathscr{X}}_{j(k)}\right)\right|\;\tau_k\in\R;\;M\in\mathbb{N};\,j(k)\in 1,\,\cdots,\,N\right\}$ together with the operation of automorphism function composition is a finite dimensional, connected Lie group.

Proof: Show Proof


We consider the set $\G=\left\{\left.\prod\limits_{k=1}^M\,\exp\left(\tau_k\,\hat{\mathscr{X}}_{j(k)}\right)\right|\;\tau_k\in\R;\;M\in\mathbb{N};\,j(k)\in 1,\,\cdots,\,N\right\}$, thought of as differential operators on $\mathcal{M}$ which are convergent for some open ball $\mathcal{B}_\phi$ in $\R\time\mathcal{M}$ when imparted to any given analytic function $\phi$. By assumption, they can be meaningfully extended to be defined for all real $\tau_k$. $\G$ is clearly a group, for all products of members are of the form just stated and inverses are of the form $\prod\limits_{k=1}^M\,\exp\left(-\tau_{M+1-k}\,\hat{\mathscr{X}}_{j(M+1-k)}\right)$. It is clearly a connected group; every element is connected to the identity by the path $[0,\,1]\to\G;\;\tau\mapsto \prod\limits_{k=1}^M\,\exp\left(\tau\,\tau_k\,\hat{\mathscr{X}}_{j(k)}\right)$.

We can now apply the Campbell Baker Hausdorff Theorem because, as differential operators, $\exp(\mathscr{X})\,\exp(\mathscr{Y})$ are given by series $\sum\limits_{k=0}^\infty\,\mathscr{X}^k/k!,\,\sum\limits_{k=0}^\infty\,\mathscr{Y}^k/k!$ so that the formal series for $\log(\exp(\mathscr{X})\,\exp(\mathscr{Y}))$ has the same terms as though $\mathscr{X},\,\mathscr{Y}$ were finite dimensional square matrices and we know that $\exp(\mathscr{X})\,\exp(\mathscr{Y})$ converge for some open ball $\mathcal{B}_\phi$ in $\R\times\mathcal{M}$ when imparted to any given analytic function $\phi$. Moreover, we know this formal series for $\log(\exp(\mathscr{X})\,\exp(\mathscr{Y}))$ is convergent as follows: the co-ordinates $\log(\exp(\mathscr{X})\,\exp(\mathscr{Y}))$ for the product $\exp(\mathscr{X})\,\exp(\mathscr{Y})$ of automorphisms are uniquely defined by the convergent Campbell Baker Hausdorff series as long as $\mathscr{X},\,\mathscr{Y} \in \mathcal{N} =\{X\in\g|\;\left\|\ad(X)\right\|<\epsilon\}\subset\g$ and we choose $\epsilon$ small enough for convergence for all $\exp(\mathscr{X}),\,\exp(\mathscr{Y})\in\mathcal{N}$. The unique co-efficients in the series and convergence radius depend wholly and only on the Lie algebra’s structure co-efficients, i.e. on the set $\left\{ \ad_{X_j } \right\}_{j=1}^N $ of square $N\times N$ matrices that abstractly define the Lie algebra $\g$. The fact that $\mathscr{X},\, \mathscr{Y}$ are differential operators defined on a manifold $\mathcal{M}$ seemingly unrelated to any Lie group is altogether irrelevant. Therefore, the path $\sigma:\left[-\epsilon_1\,\epsilon_1\right]\to\G;\; \sigma(\tau) = \prod\limits_{k=1}^M\,\exp\left(\tau_k(\tau)\,\hat{\mathscr{X}}_k\right)$ for some $\epsilon_1>0$ traces a $C^1$ path through $\exp(\mathcal{N})\subset\exp(\mathcal{g})$ and the product of any two such paths is again of the same form (only for a bigger number $M$ of product members) and is $C^1$ for $|\tau|<\epsilon_2$, for some $\epsilon_2>0$.

So if we now put $\Nid = \{\gamma\in\G|\;\gamma=\exp(\mathscr{X});\;\mathscr{X}\in\g;\;\left\|\mathscr{X}\right\|<\epsilon\}$ with $\epsilon$ small enough that the Campbell Baker Hausdorff Series $\log(\exp(\mathscr{X})\,\exp(\mathscr{Y}))$ converges for all $\mathscr{X},\,\mathscr{Y}\in \log(\Nid)$, then all of our Lie group axioms of Chapter 1 are fulfilled, by the argument of Theorem 3.9 every member of $\exp(\Nid)$ is a finite product of terms of the form $\exp\left(\tau_k\,\hat{\mathscr{X}}_k\right)$ i.e. $\exp(\g)\subset\G$ and so $\G$ is a connected Lie group with Lie algebra $\g$. $\quad\square$

Let’s end with a concrete example.

Example 11.15: (Vector Fields on a 2-Sphere)

Let $\mathbb{S}^2$ be the unit sphere, labelled by spherical co-ordinates $\theta,\,\phi$, i.e. the Cartesian co-ordinates of the point $(\theta,\,\phi)$ are $x = \sin\theta\,\cos\phi;\,y=\sin\theta\,\sin\phi;\,z=\cos\theta$. Let us define two vector fields:

\begin{equation}\label{SphereFieldExample_1}\Theta = \frac{\partial}{\partial\,\theta};\quad\Phi = \frac{\partial}{\partial\,\phi}\end{equation}

These fields, naturally, commute: $\left[\Theta,\,\Phi\right] = 0$ since $\frac{\partial^2\,\psi}{\partial\,\theta\,\partial\,\phi} = \frac{\partial^2\,\psi}{\partial\,\phi\,\partial\,\theta}$ for any $C^2$ function $psi:\mathbb{S}^2\to\R$.

We can exponentiate $\Theta$ to the mapping $\exp(\tau\,\Theta)$, which is defined by $\exp(\tau\,\Theta)\,(\theta,\,\phi) =(\theta+\tau,\,\phi)$, i.e. we simply add $\tau$ to the latitude co-ordinate $\theta$. Likewise, $\exp(\tau\,\Phi)\,(\theta,\,\phi) =(\theta,\,\phi+\tau)$. In this case, $\exp(\tau\,\Theta)$ is not a bijection: every point on the equator defined by $\theta=\pi/2$, for example, is mapped to the same, critical point defined by $\theta=\pi$ by $\exp(\pi\,\Theta/2)$. On the other hand, $\exp(\tau\,\Phi)$ is a bijection of the sphere: it is a rigid rotation about the $z$-axis through an angle $\tau$ radians. Therefore, although point 1. in Theorem 11.14 is fulfilled, point 2 is not. So, even though there is a two dimensional commutative Lie algebra of vector fields, it does not correspond to a commutative two dimensional Lie group of symmetries of the sphere. The one parameter group defined by all transformations of the form $\exp(\tau\,\Phi)$ for $\tau\in\R$ on its own, however, does fulfil points 1. and 2. of Theorem 11.14 , and the symmetries concerned are rotations of the sphere about a fixed axis.

However the two vector fields $\Phi = \partial_\phi$ and $\tilde{\Theta} = \sin\theta\,\partial_\theta$ do commute and, moreover, $\tilde{\Theta}$ exponentiates to an analytic bijection of the sphere. The exponentiation of course acts on the $\theta$ co-ordinate only, as it is the solution to $\d_\tau \theta(\tau) = \sin\theta(\tau)$, to whit, $\theta(\tau) = 2\,\operatorname{arccot}\left(e^\tau\,\tan\left(\frac{\theta(0)}{2}\right)\right)$. This is a locally analytic bijection (i.e. there is a convergent Taylor series for some neighbourhood around the current $\theta$ co-ordinate and time $\tau$) between beginning values $\theta(0)$ and image values $\theta(\tau)$ at any time $\tau$. Indeed the analytic automorphism defined by $(\phi(0),\, \theta(0)) \mapsto \left(\phi,\,2\,\operatorname{arccot}\left(e^\tau\,\tan\left(\frac{\theta(0)}{2}\right)\right)\right)$ is precisely the transformation wrought on the Riemann sphere by the scaling of the complex planes $z\,\mapsto\, e^\tau\,z$, $\zeta\,\mapsto\,e^{-\tau}\,\zeta$ where $z$ and $\zeta$ are respectively the complex co-ordinates in the Euclidean charts gotten through stereographic projection of the unit sphere from the North and South poles to $\mathbb{C}$, respectively. This analytic automorphsim is not a rigid isometry of the sphere, but it is an invertible analytic transformation of it nonetheless and so is a “symmetry” in the more general sense of being an analytic automorphism; the everyday, wonted English usage of the word “symmetry” tending to bear the meaning “invariant with respect to an isometric, analytic automorphism”.

Now we consider the vector fields on the sphere induced by the rotations about three orthogonal axes. The flowlines of these fields are shown in Figure 11.3.

SphereFields

Figure 11.3: Vector Fields on the 2-Sphere $\mathbb{S}^2$ Induced by Rotations About Three Orthogonal Axes

The blue flowlines are induced about the Cartesian $z$ axis. In our spherical co-ordinates, this is the vector field $\Phi$; let us henceforth change its name to $\mathscr{S}_z = \partial_\phi$. The vector fields $\mathscr{S}_x,\,\$\mathscr{S}_y$ induced, respectively, by rotations about the $x$ and $y$ axes in our spherical polar co-ordinates follow the red and green flowlines in Figure 11.3. The obvious way to find the descriptions of $\mathscr{S}_x,\,\$\mathscr{S}_y$ isto find the transformations between the spherical co-ordinates with different polar axes, but I shall describe what seems to be the simplest way as follows. Rotations about the $x$ axis induce flowlines that are circles of constant $x = \sin\theta\,\cos\phi$ so that a vector field tangent to this flowline is described by:

\begin{equation}\label{SphereFieldExample_2}\begin{array}{cl}&\left(A(\theta,\,\phi)\,\frac{\partial}{\partial\,\theta} + B(\theta,\,\phi)\,\frac{\partial}{\partial\,\phi}\right)\,x(\theta,\,\phi) \\\\= &\left(A(\theta,\,\phi)\,\frac{\partial}{\partial\,\theta} + B(\theta,\,\phi)\,\frac{\partial}{\partial\,\phi}\right)\, \sin\theta\,\cos\phi \\\\=&A(\theta,\,\phi)\,\cos\theta\,\cos\phi – B(\theta,\,\phi)\,\sin\theta\,\sin\phi = 0\end{array}\end{equation}

and when we do likewise for the rotation about the $y$ axis we conclude that the two vector fields are of the form:

\begin{equation}\label{SphereFieldExample_3}\begin{array}{lclclcl}\mathscr{S}_x &=& f_x(\theta,\,\phi)\, \tilde{\mathscr{S}}_x;&\tilde{\mathscr{S}}_x &=& \sin\theta\,\sin\phi\,\frac{\partial}{\partial\,\theta}+\cos\theta\,\cos\phi\,\frac{\partial}{\partial\,\phi}\\\mathscr{S}_y &=& f_y(\theta,\,\phi)\, \tilde{\mathscr{S}}_y;&\tilde{\mathscr{S}}_y &=& \sin\theta\,\cos\phi\,\frac{\partial}{\partial\,\theta}-\cos\theta\,\sin\phi\,\frac{\partial}{\partial\,\phi}\end{array}\end{equation}

Here the functions $f_x(\theta,\,\phi)$ and $f_y(\theta,\,\phi)$ are arbitrary local scaling factors for the vector fields yet to be found. If we set both these functions to unity and compute the Lie bracket between $\tilde{\mathscr{S}}_x$ and $\tilde{\mathscr{S}}_x$ we get:

\begin{equation}\label{SphereFieldExample_4}\left[\tilde{\mathscr{S}}_x,\,\tilde{\mathscr{S}}_y\right] = -\cos(2\,\theta)\,\tilde{\mathscr{S}}_z\end{equation}

so the vector fields don’t form a Lie algebra yet; we need to find the factors $f_x(\theta,\,\phi),\,f_y(\theta,\,\phi)$ to achieve this. The following general identity is readily shown by the Leibnitz rule:

\begin{equation}\label{SphereFieldExample_5}\left[\mathscr{S}_x,\,\mathscr{S}_y\right] = f_x\,f_y\,\left[\tilde{\mathscr{S}}_x,\,\tilde{\mathscr{S}}_y\right] + (f_x\,\tilde{\mathscr{S}}_x(f_y))\, \tilde{\mathscr{S}}_x – (f_y\,\tilde{\mathscr{S}}_y(f_x))\, \tilde{\mathscr{S}}_x\end{equation}

which can be used to show that we can make $\left[\mathscr{S}_x,\,\mathscr{S}_y\right]=\mathscr{S}_z$ if $f_x(\theta,\,\phi) = f_y(\theta,\,\phi) = f(\theta)$ where $-f(\theta)^2 – f(\theta)\,\d_\theta\,f(\theta) = 1$, a differential equation with general solution $f(\theta) = \sqrt{\csc(2 \theta)\,(2 \cot(2 \theta) + C\, \csc(2 \theta)) }$ (here $C\in\R$ is an arbitrary integration constant) and which yields the simplest vector fields if $f(\theta) = \csc\theta$. When we put $f_x=f_y=f$ we at last find our vector fields:

\begin{equation}\label{SphereFieldExample_6}\begin{array}{lcl}\mathscr{S}_x &=& \sin\phi\,\partial_\theta + \cot\theta\,\cos\phi\,\partial_\phi\\\mathscr{S}_y &=& \cos\phi\,\partial_\theta – \cot\theta\,\sin\phi\,\partial_\phi\\\mathscr{S}_z &=& \partial_\phi\end{array}\end{equation}

and these, unsurprisingly, have the commutation relationships of $\mathfrak{so}(3)$:

\begin{equation}\label{SphereFieldExample_7}\left[\mathscr{S}_x,\,\mathscr{S}_y\right]=\mathscr{S}_z;\quad\left[\mathscr{S}_z,\,\mathscr{S}_x\right]=\mathscr{S}_y;\quad\left[\mathscr{S}_y,\,\mathscr{S}_z\right]=\mathscr{S}_x\end{equation}

as we must get since we know that $SO(3)$ acts on the sphere by rigid rotations.

Now for completeness, let’s do the same thing again from first principles. Be warned: this is rather messy; it is good to see exactly how fiddly these calculations wontedly are. For these kinds of calculations you would do well to have a good symbolic manipulation system such as Mathematica on hand and Differential and other Geometry addons for it can be found. Maxima, a child of the famous MIT computer algebra project Macsyma, can now be downloaded as freeware; home versions of Maple and Mathematica are reasonably priced as well. In the left hand column below are the transformation equation between standard spherical polar co-ordinates and the Cartesian co-ordinates of points on the unit circle. By cyclically permuting the roles of the $x,\,\,y,\,z$ axes we get the column on the right, which defines spherical polar co-ordinates (with the $x$ axis as the polar axis).

\begin{equation}\label{SphereFieldExample_8}\begin{array}{lclclcl}x&=&\sin\theta\,\cos\phi&&x&=&\cos\theta^\prime\\y&=&\sin\theta\,\sin\phi&&y&=&\sin\theta^\prime\,\cos\phi^\prime\\z&=&\cos\theta&&z&=&\sin\theta^\prime\,\sin\phi^\prime\end{array}\end{equation}

From these we infer $\cot\phi^\prime = \tan\theta\,\sin\phi$ and $\cos\phi^\prime = \sin\theta\,\cos\phi$, whence:

\begin{equation}\label{SphereFieldExample_9}\begin{array}{lclcclcl}\frac{\partial\,\theta^\prime}{\partial\,\theta}&=&-\frac{\cos\theta\;\cos\phi}{\sqrt{\sin^2\theta\,\sin^2\phi + \cos^2\theta}}&&\frac{\partial\,\theta^\prime}{\partial\,\phi}&=&\frac{\sin\,\theta\;\sin\,\theta\cos\phi}{\sqrt{\sin^2\theta\,\sin^2\phi + \cos^2\theta}}\\\frac{\partial\,\phi^\prime}{\partial\,\theta}&=&-\frac{\sin\phi}{\sin^2\theta\,\sin^2\phi + \cos^2\theta}&&\frac{\partial\,\phi^\prime}{\partial\,\phi}&=&-\frac{\sin\,\theta\;\cos\,\theta\cos\phi}{\sin^2\theta\,\sin^2\phi + \cos^2\theta}\end{array}\end{equation}

so we can form the Jacobi matrix for transforming $(\theta,\,\phi)\mapsto(\theta^\prime,\,\phi^\prime)$ and invert it to find:

\begin{equation}\label{SphereFieldExample_10}\left(\begin{array}{cc}\frac{\partial\,\theta}{\partial\,\theta^\prime}&\frac{\partial\,\theta}{\partial\,\phi^\prime}\\\frac{\partial\,\phi}{\partial\,\theta^\prime}&\frac{\partial\,\phi}{\partial\,\phi^\prime}\end{array}\right)=\left(\begin{array}{cc}\frac{\partial\,\theta^\prime}{\partial\,\theta}&\frac{\partial\,\theta^\prime}{\partial\,\phi}\\\frac{\partial\,\phi^\prime}{\partial\,\theta}&\frac{\partial\,\phi^\prime}{\partial\,\phi}\end{array}\right)^{-1}\end{equation}

and (on shoving this into Mathematica) we find the vector field we want: the directional derivative operator in the direction of maximally increasing $\phi^\prime$ is, by the chain rule:

\begin{equation}\label{SphereFieldExample_11}\frac{\partial}{\partial\,\phi^\prime} = \frac{\partial\,\theta}{\partial\,\phi^\prime}\,\frac{\partial}{\partial\,\theta} + \frac{\partial\,\phi}{\partial\,\phi^\prime}\,\frac{\partial}{\partial\,\phi} = -\sin\phi\,\partial_\theta-\cos\phi\;\cot\theta\,\partial_\phi\end{equation}

Now we do a second cyclic permutation of our $x,\,y,\,z$ as in $\eqref{SphereFieldExample_8}$ to define a spherical co-ordinates system with the $y$ axis as the polar axis:

\begin{equation}\label{SphereFieldExample_12}\begin{array}{lclclcl}x&=&\sin\theta\,\cos\phi&&x&=&\sin\theta^\prime\,\sin\phi^\prime\\y&=&\sin\theta\,\sin\phi&&y&=&\cos\theta^\prime\\z&=&\cos\theta&&z&=&\sin\theta^\prime\,\cos\phi^\prime\end{array}\end{equation}

whence $\tan\phi^\prime = \tan\theta\,\cos\phi$ and $\cos\phi^\prime = \sin\theta\,\cos\sin$, and:

\begin{equation}\label{SphereFieldExample_13}\begin{array}{lclcclcl}\frac{\partial\,\theta^\prime}{\partial\,\theta}&=&-\frac{\cos\theta\;\sin\phi}{\sqrt{\sin^2\theta\,\cos^2\phi + \cos^2\theta}}&&\frac{\partial\,\theta^\prime}{\partial\,\phi}&=&-\frac{\sin\,\theta\;\sin\,\theta\cos\phi}{\sqrt{\sin^2\theta\,\cos^2\phi + \cos^2\theta}}\\\frac{\partial\,\phi^\prime}{\partial\,\theta}&=&\frac{\cos\phi}{\sin^2\theta\,\cos^2\phi + \cos^2\theta}&&\frac{\partial\,\phi^\prime}{\partial\,\phi}&=&-\frac{\sin\,\theta\;\cos\,\theta\sin\phi}{\sin^2\theta\,\cos^2\phi + \cos^2\theta}\end{array}\end{equation}

and so, on inverting the Jacobi matrix as above we find our second vector field:

\begin{equation}\label{SphereFieldExample_14}\frac{\partial}{\partial\,\phi^\prime} = \frac{\partial\,\theta}{\partial\,\phi^\prime}\,\frac{\partial}{\partial\,\theta} + \frac{\partial\,\phi}{\partial\,\phi^\prime}\,\frac{\partial}{\partial\,\phi} = \cos\phi\,\partial_\theta-\sin\phi\;\cot\theta\,\partial_\phi\end{equation}

and, on making the needful sign change of the vector field in $\eqref{SphereFieldExample_11}$ and we find that the vector fields in $\eqref{SphereFieldExample_11}$, $\eqref{SphereFieldExample_14}$ and $\partial_\phi$ we recover the vector fields of $\eqref{SphereFieldExample_6}$ with the $\mathfrak{so}(3)$ commutation relationships of $\eqref{SphereFieldExample_7}$

References:

  1. Wulf Rossmann: “Lie Groups: An Introduction through Linear Groups (Oxford Graduate Texts in Mathematics)”
  2. H. L. Royden, “Real Analysis”, Macmillan Publishing, New York, 1968; §14.3, p310
  3. F. Riesz, “Sur une espèce de géométrie analytique des systèmes de fonctions sommables”, C. R. Acad. Sci. Paris 144 (1907) pp1409–1411
  4. Planet Math, Proof of the Riesz Representation Theorem, http://planetmath.org/proofofrieszrepresentationtheoremforseparablehilbertspaces
  5. nLab.org, Article on “Differentiation”, section “Exposition of Differentiation Via Infinitesimals”, http://ncatlab.org/nlab/show/differentiation#ExpositionDifferentiationViaInfinitesimal
  6. nLab.org, Article on “Synthetic Differential Geometry”, http://ncatlab.org/nlab/show/synthetic%20differential%20geometry
  7. H. Jerome Keisler, “Foundations Of Infinitessimal Calculus”, 2007. Licensed under Creative Commons Licence, Download from www.math.wisc.edu/~keisler/foundations.pdf
  8. “Differential Geometry Addons For Mathematica”, http://mathematica.stackexchange.com/questions/2620/differential-geometry-add-ons-for-mathematica