Circular Polarisation Eigenstates, The Riemann-Silberstein Vector and Angular Momentum of Light

The phasor notation, i.e. the use of $e^{-i\,\omega\,t}$ instead of sinusoidal time variations, is simply a tool of convenience for analysing the time-harmonic behaviour of systems described by linear equations, such as Maxwell’s equations, electric circuits and some kinds of acoustic waves. But for Maxwell’s equations, there is a much deeper meaning of the complex notation quite aside from simple phasors. Let’s look at the simple interpretation first.

Why do Phasors Work for Maxwell’s Equations?

There are two reasons for this:

  1. Maxwell’s equations are linear. Also the operations ${\rm Re}$, ${\rm Im}$ and $z\mapsto z^*$ are linear in the sense that a sum mapped by the respective operator is the sum of the mapping of the addends by that operator.
  2. With a time-varying sinusoidal quantity the mapping between the quantities $|a| \cos(\omega\,t+\arg(a));\,a\in\mathbb{C}, \omega\,t\in\mathbb{R}$ (or $i\,|a| \,\sin(\omega\,t+\arg(a))$) on the one hand and $a\,e^{-i\,\omega\,t};\,a\in\mathbb{C}$ on the other is one-to-one and onto. So for every entity of the form $a\,e^{-i\,\omega\,t};\,a\in\mathbb{C}$ there is a unique $|a| \cos(\omega\,t+\arg(a))$ and contrawise. Explicitly:$$|a| \cos(\omega\,t+\arg(a)) = {\rm Re}(a\,e^{-i\,\omega\,t})$$and, because $\omega\,t$ takes on every value in $[0,2\pi]$, one can uniquely infer $\arg(a)$, $|a|$ and $\omega$ from the values of $f(t) = |a| \cos(\omega\,t+\arg(a))$ as $t$ varies. Likewise for the inversion of ${\rm Im}$.

It’s important to keep in mind that this seeming “trickery” works because $\omega\,t$ varies, so we see the variation of the real and imaginary parts with time. In contrast, taking the real or imaginary part of a lone complex is of course irreversible: the imaginary (or real) part is lost and one cannot get the original complex number back from only its real (or imaginary) part!

You can think of the above pithily as: the phasor signal is the “single sideband” (if you recall this archaic modulation scheme) version of the real valued signal. That is, the negative frequency component of a real valued signals is simply the complex conjugate of the positive frequency component. So, for a linear calculation, there is no need to “process” both positive and negative components: one calculates with the positive frequency component and then recovers the real signal by taking the complex conjugate of the calculation outcome, thus finding the negative frequency component, and adding it back to the “single-sideband” positive frequency component.

Phasors are simply a tool of convenience: calculations with $e^{-i\,\omega\,t}$ are easier than those with $\cos$ and $\sin$. However, in the special case of Maxwell’s equations, one can interpret the complex quantities as more than simply phasors (although the technique turns out to be the same).

The Complex Electromagnetic Field

However, in the particular case of Maxwell’s equations, there is a radically different and much more physically and geometrically intuitive way to bring in complex equivalents of the electromagnetic field that has a neat interpretation in terms of polarization. In practice, it ends up being used in a way very like the phasor method, even though its grounding is altogether different.

This is the idea of diagonalising the Maxwell curl equations (Faraday and Ampère laws) with the Riemann-Silberstein fields which are:

$$\vec{F}_\pm = \sqrt{\epsilon_0} \,\vec{E} \pm i\,\sqrt{\mu_0} \,\vec{H}\tag{1}$$

and which decouple the Maxwell curl equations into the following form:

$$i\, \partial_t \vec{F}_\pm = \pm c\,\nabla \wedge \vec{F}_\pm\tag{2}$$

Note that by taking the divergence of both sides of (2) we get $i\, \partial_t \vec{F}_\pm =0$, so that if the fields are time varying and have no DC (zero frequency) component(i.e. $\partial_t$ is invertible), (2) also implies the Gauss laws $\nabla\cdot\vec{F}_\pm=0$ too.

Now one could simply sit with real electric and magnetic fields and one would need only one complex Riemann-Silberstein vector (either of $\vec{F}_\pm$ will do just as well as the other) to stand in the stead of two real fields and then the real valued curl equations are replaced by one complex-valued one. So one would interpret the real part as the field $\sqrt{\epsilon_0} \,\vec{E}$ and the imaginary part as $\pm\sqrt{\mu_0} \,\vec{H}$ (depending on whether $\vec{F}_\pm$ were used) at the end of the calculation.

However, it turns out to be more physically meaningful to keep both vectors, but to throw away their negative frequency parts and keep the positive frequency parts alone of both vectors. What’s really neat about this second approach is that if the light is right circularly polarized, only $\vec{F}_+$ is nonzero; if left, only $\vec{F}_-$ is nonzero. So the positive frequency parts of the electromagnetic fields are decoupled precisely by splitting them into left and right circularly polarized components.

In momentum (Fourier) space, the decoupled Maxwell equations become (we do spatial, not time – Fourier transform of both sides):

$$\mathrm{d}_t \tilde{\mathbf{F}}_{\pm,\mathbf{k}} = \pm c\,\mathbf{k} \wedge \tilde{\mathbf{F}}_{\pm,\mathbf{k}}\tag{3}$$

or, in matrix notation:

$$\mathrm{d}_t \tilde{\mathbf{F}}_{\pm,\mathbf{k}} = \pm c\,\mathbf{K}(\mathbf{k}) \tilde{\mathbf{F}}_{\pm,\mathbf{k}}\tag{4}$$

where $\mathbf{K}(\mathbf{k})$ is the $3\times3$ skew-Hermitian matrix corresponding to $\mathbf{k} \wedge$, i.e. the “infinitesimal” rotation in the Lie algebra $\mathfrak{so}(3)$ and the basic solutions are $\tilde{\mathbf{F}}_{\pm,\mathbf{k}} = \exp(c \mathbf{K}(\mathbf{k}) t)\tilde{\mathbf{F}}_{\pm,\mathbf{k}}(0)$, ie vectors spinning at uniform angular speed $\omega = c k$ about the nullspace (the one eigenvector corresponding to the eigenvalue nought) of $\mathbf{K}(\mathbf{k})$.

Now to restore a field’s negative frequency part from the positive frequency part alone, one adds the complex conjugate, i.e. we’re still effectively taking the real part of the $\vec{F}_\pm$ fields at the end of the calculation, so the practicalities are rather like the phasor method. But now we take:

$$\begin{array}{lcl}
\vec{E} &=&\operatorname{Re}\left(\frac{\vec{F}_+ + \vec{F}_-}{2\,\epsilon_0}\right)\\
\vec{H} &=&\operatorname{Re}\left(\frac{\vec{F}_+ – \vec{F}_-}{2\,i\,\,\mu_0}\right)=\operatorname{Im}\left(\frac{\vec{F}_+ – \vec{F}_-}{2\,\mu_0}\right)
\end{array}
\tag{5}$$

to get our “physical” fields at the end of the calculation. But, given the physical, manifestly Lorentz covariant interpretation of the Riemann-Silberstein vectors I talk about below (see “more advanced material” below), one might just as well say that $\vec{F}_\pm$ are the physical fields (even though they’re not what you would measure with a vector voltmeter or magnetometer). In this framework of thought, a quantity’s being real or imaginary has a geometric meaning as whether it is bivector or a Hodge dual thereof in the Clifford algebra $C\ell_3(\mathbb{R})$ wherein the now “spinor” $\mathbf{F}_\pm$ live and the entity $i$ is now the unit pseudoscalar in this algebra (more on all this below). Bivectors and their Hodge duals mix and transform differently under the Lorentz transformation (10), so, if you like, you can very soundly take this difference as the meaning of real and imaginary parts.

Lastly, since now (2) is confined to two equations in positive frequency (therefore positive energy) we can now interpret (2) as the time evolution, i.e. Schrödinger equation for the quantum state of a first quantized photon. See:

I. Bialynicki-Birula, “Photon wave function” in Progress in Optics 36 V (1996), pp. 245-294

for more details. This paper is also downloadable from arXiv:quant-ph/0508202 . Please also see Arnold Neumaier’s pithy summary (here) of a key result in section 7 of the paper above. Ivo Bialynicki-Birula has a great deal to say about the Riemann-Silberstein field thought of as the first-quantised photon wavefunction and several papers on the subject can be downloaded from his personal website. The particular scaling of the Riemann-Silberstein vectors above is Bialynicki-Birula’s, and it means that $|\mathbf{F}_+|^2 + |\mathbf{F}_-|^2$ is the electromagnetic energy density. He defines the pair $(\mathbf{F}_+, \mathbf{F}_-)$, normalised so that $|\mathbf{F}_+|^2 + |\mathbf{F}_-|^2$ becomes a probability density to absorb the photon at a particular point, to be a first quantized photon wave function (without a position observable). There is special, nonlocal inner product to define the Hilbert space and in such a formalism the general Hamiltonian observable is $\hbar\, c\, \mathrm{diag}\left(\nabla\wedge, -\nabla\wedge\right)$. The Hilbert space of Riemann Silberstein vector pairs that Bialynicki-Birula defines is acted on by an irreducible unitary representation, defined by Bialynicki-Birula’s observables $\hat{H}$, $\hat{\mathbf{P}}$, $\hat{\mathbf{K}}$ and $\hat{\mathbf{J}}$ in section 7 of the paper cited above, of the full Poincaré group presented in the paper. So the two subspaces containing wholly right ($\mathbf{F}_-=\mathbf{0}$) and wholly left-polarized ($\mathbf{F}_+=\mathbf{0}$) states are the “particles” of the theory: you don’t get the same thing with other nontrivial linear combinations of light base states (which are not eigenfunctions of the angular momentum observable).


More Advanced Material

The Riemann-Silbertein vectors are actually the electromagnetic (Maxwell) tensor $F^{\mu\nu}$ in disguise. We can write Maxwell’s equations in a quaternion form:

$$\begin{array}{lcl}
\left(c^{-1}\partial_t + \sigma_1 \partial_x + \sigma_2 \partial_y + \sigma_3 \partial_z\right) \,\mathbf{F}_+ &=& {\bf 0}\\
\left(c^{-1}\partial_t – \sigma_1 \partial_x – \sigma_2 \partial_y – \sigma_3 \partial_z\right) \,\mathbf{F}_- &=& {\bf 0}\end{array}\tag{6}$$

where $\sigma_j$ are the Pauli spin matrices and the electromagnetic field components are:

$$\begin{array}{lcl}\frac{1}{\sqrt{\epsilon_0}}\mathbf{F}_\pm &=& \left(\begin{array}{cc}E_z & E_x – i E_y\\E_x + i E_y & -E_z\end{array}\right) \pm i \,c\,\left(\begin{array}{cc}B_z & B_x – i B_y\\B_x + i B_y & -B_z\end{array}\right)\\
& =& E_x \sigma_1 + E_y \sigma_2+E_z\sigma_3 + i\,c\,\left(B_x \sigma_1 + B_y \sigma_2+B_z\sigma_3\right)\end{array}\tag{7}$$

The Pauli spin matrices are simply Hamilton’s imaginary quaternion units reordered and where $i=\sigma_1\,\sigma_2\,\sigma_3$ so that $i^2 = -1$. When inertial reference frames are shifted by a proper Lorentz transformation:

$$L = \exp\left(\frac{1}{2}W\right)\tag{8}$$

where:

$$W = \left(\eta^1 + i\theta \chi^1\right) \sigma_1 + \left(\eta^2 + i\theta \chi^2\right) \sigma_2 + \left(\eta^3 + i\theta \chi^3\right) \sigma_3\tag{9}$$

encodes the transformation’s rotation angle $\theta$, the direction cosines of $\chi^j$ of its rotation axes and its rapidities $\eta^j$, the entities $\mathbf{F}_\pm$ undergo the spinor map:

$${\bf F} \mapsto L {\bf F} L^\dagger\tag{10}$$

Here, we’re actually dealing with the double cover $SL(2,\mathbb{C})$ of the identity-connected component of the Lorentz group $SO(3,1)$, so we have spinor maps representing Lorentz transformations, just as we must use spinor maps to make a quaternion impart its represented rotation on a vector.


The Classical Angular Momentum of Light

Now we look at the classical angular momentum. The Wikipedia page on angular momentum of light gives the classical angular momentum as:

$$\frac{\epsilon_0}{2i\omega}\int \left(\mathbf{E}^\ast\wedge\mathbf{E}\right)d^{3}\mathbf{r} +\frac{\epsilon_0}{2i\omega}\sum_{i=x,y,z}\int \left({E^i}^{\ast}\left(\mathbf{r}\wedge\mathbf{\nabla}\right)E^{i}\right)d^{3}\mathbf{r}\tag{11}$$

when the positive frequency part alone of the fields is kept (hence the complex conjugates). The first term is the spin angular momentum, and, rewritten in positive frequency Riemann-Silberstein vectors when everything is roughly paraxial (i.e. near to a plane wave) it reads:

$$\hat{\mathbf{z}}\frac{1}{\omega}\int \left(|\mathbf{F}_+|^2-|\mathbf{F}_-|^2\right)d^{3}\mathbf{r}\tag{12}$$

i.e. $\frac{1}{\omega}$ times the right polarized energy density less the left polarized energy density in the direction of propagation of the light. Orbital angular momentum vanishes in the paraxial limit and so the last equation is the total angular momentum in this case. It is important to recall how this equation is derived: one imagines an electromagnetic field crossing the boundary into a conductive medium and being absorbed there, then one calculates the angular impulse exerted on the medium, exactly analogously with method 3 of the momentum calculation in my answer here. The point is that the angular momentum density $\left(|\mathbf{F}_+|^2-|\mathbf{F}_-|^2\right)/\omega$ calculated from this most basic (in the sense of fundamental) Newtonian-Maxwell physics is the difference between the intensities of the circularly polarized base states, not the linear ones. So again Nature shows her preference. This calculation says that the right and left circularly polarized components transfer angular momentum $\pm E/\omega$ in the direction of light propagation, respectively, whenever energy $E$ is absorbed. So now we see that, if the photon has energy $h\nu$, then if a high number of them are to transfer the same angular momentum as classical physics reckons, the photon’s angular momentum has to be $\pm h\nu/\omega$ or $\pm \hbar$ in the direction of its propagation for right and left circularly polarized photons, respectively.


Experimental Confirmation in Birefringent Crystals

Now we get to birefringent crystals. It is obvious, but I believe important, to take heed that birefringent crystals are a way wherein Nature very explicitly tells the difference between linear and circular polarization. It is the linear, not  the circularly polarized states that are the “eigenmodes” of a birefringent crystal, i.e. linearly polarized fields aligned with the fast and slow axes of the crystal are simply phase delayed and undergo no mixing. Circular polarized states are not eigenmodes: they mix in such crystals. The mixing induced by a quarter wave plate – in particular that which happens when the input field is linearly polarized and aligned at 45 degrees to the fast and slow axes therefore imparts a torque on the crystal: it most certainly should be possible to measure that torque and check it against classical calculations: from our calculations above, in this situation there will be a torque $P/\omega$, when the light’s power is $P$. A thought experiment: a 100W, 1mm diameter linearly polarized collimated beam passes through a quarter wave plate suspended in a fluid. With infrared light at $193\mathrm{THz}$ (you can get fibre lasers at $193\mathrm{THz}$ outputting hundreds of watts) the torque will be of the order of $10^{-13}\mathrm{Nm}$, if the crystal is a millimeter in diameter and 3 millimeters or so long with a density of $3000\mathrm{kg\,m^{-3}}$, its mass moment of inertia is of the order of $4\times10^{-13}\mathrm{kg \, m^2}$, so this will be an accurately measurable effect (indeed, as the crystal spins and misaligns itself from the 45 degree position, its angular position will fulfill $\mathrm{d}_t^2 \theta = – \frac{1}{2}\Omega^2 \sin(2\theta)$ and we will have a torsional pendulum oscillating at $\Omega /(2\pi) = 0.1\mathrm{Hz}$).

The above was confirmed experimentally by directly measuring the torque exerted on a birefringent crystal in 1936, as described in the following paper:

Richard A. Beth, “Mechanical Detection and Measurement of the Angular Momentum of Light” Phys. Rev. 50 July 15 1936