Mixed Quantum States, Wigner’s Friend and the Mueller and Density Matrices

A recent question on Physics Stack Exchange was a good chance to describe the density matrix concept in quantum mechanics in detail and, at the same time, relate it to the Mueller matrix concept from optics.

I have read that polarized light is treated by Jones vectors and that to treat partially polarized light you have to use Stokes vectors and mueller matrices.

Nonetheless, the optics notes that my professor have given us have no mention of mueller calculus, and we have assigned exercises involving partially polarized light passing through polarizers, retarders… so I figured that perhaps the following is legitimate:

The Stokes parameters characterizing partially polarized light are the following:


$s_2=Vs_0\sin{2\alpha}\cos{\delta} $

$s_3=Vs_0\sin{2\alpha}\sin{\delta} $

from the Stokes vector $(s_0,s_1,s_2,s_3)$ we get $\alpha$ and $\delta$ and build a Jones vector using:

\right) $

and from here we go on using jones matrices.

Is this doable? And if it is, why do people use mueller matrices if this can be done?

and so my answer, somewhat edited and with other background material added, is as follows.

The proposed method would work as long as we only pass light through linear optical components that do not change the light’s degree of polarisaion or overall power, in which case we should be using the Jones calculus in disguise: we can then keep the polarised and depolarised components separate.

But the method will not work in general. However, we can still use the Jones matrices to represent optical components, but apply them in a new way.

Partial polarisation is a very hard thing to describe classically – it’s almost the same (and as hard) as the classical discussion of partial coherence and one needs to have a thorough grasp of random processes to discuss it fully. Born and Wolf give a whole chapter to these concepts. But it is highly elegantly described in the quantum picture:

Partially polarised light is a classical statistical mixture of pure quantum states.

I discuss both approaches below.

So now, if you’ve not done so, you should first read up on the density matrix (see Wikipedia article of this name). The name “matrix” is a little misleading, because it is really a “state” (albeit a mixed one) written down as a $N\times N$ matrix (where $N$ is the dimensionality of the quantum states we are dealing with). Even though it is called a matrix, it does not stand for a “transformation” or “operator” on states, as the name “matrix” would imply (which is why I’m not really keen on the name “matrix”). It’s written as a matrix because this is the most convenient way to get statistics out of it: the $n^{th}$ moment of a measurement by an observable $\hat{A}$ is computed as ${\rm Tr}(\rho \hat{A}^n)$ where $\rho$ is the density matrix representing the mixed state. So if the light state is a classical statistical mixture of polarisation states with $2\times 1$ Jones vectors $\vec{x}_1, \vec{x}_2, \cdots$ with the classical probabilities of each state being $p_1,p_2,\cdots$, then the density matrix is:

$$\rho = \sum\limits_j p_j \vec{x}_j \vec{x}_j^\dagger$$

(note the order: $\vec{x}_j \vec{x}_j^\dagger$ is a $2\times2$ projection matrix). Such matrices are readily seen to be Hermitian (i.e. $\rho = \rho^\dagger$)

Wigner’s Friend

One illustration of the concept of a classical statistical mixture of pure quantum states is Eugene Wigner’s classic “Wigner’s Friend” thought experiment.

Wigner and his friend are doing the Schrödinger Cat experiment and, before the box is opened and the cat’s state of health known, Wigner leaves the room to have a cup of tea. When Wigner comes back, he sees that his friend has opened the box and has checked the cat, and thus his friend now knows whether the cat is dead or alive. Now, from Wigner’s standpoint, before Wigner is told the experiment’s outcome:

How does Wigner model the cat’s quantum state?

He knows that the cat’s state is known by his friend. Roger Penrose, who, like me, seems to have a soft spot for cats and other animals, would have us instead think of a “Schrödinger’s Lump” instead of a “Schrödinger’s Cat” where a radioactive decay triggers the release of a ball-bearing laden into a spring-loaded tube. So the question becomes whether the ball-bearing has been sprung or not, rather than whether the cat is alive or dead. For the Wigner’s friend experiment, the “lump” form is likely better not only from the standpoint that we don’t have to think of killing cats, but also from the standpoint that Wigner doesn’t have to find a psychopath friend to do the experiment with. Otherwise, Wigner could glean from his friend’s facial expression what the outcome was!

Generally such thought experiments involving “conscious observers” are not tackled much anymore since a conscious observer is a hugely complicated, uncharacterised and uncharacterisable system. Instead we replace Wigner and his friend by quantum observables: simple operators that take the quantum state as an input, return a real valued measurement and somehow (the answer to this somehow is the quantum measurement problem) straight after the “observable’s” application, the quantum system is in the eigenstate of the observable’s operator that corresponds to the value measured. The quantum measurement problem need not worry us here. We simply replace the two people by observables and assume that an eigenstate of an observable prevails straight afterward its application, whether or not the quantum state “collapsed” there or got there by some other way that a future solution to the quantum measurement problem will describe. As an aside, I sense a hope in some writings that the quantum measurement problem is indeed resolvable and may be so even in my lifetime – look up Einselection for example).

So now, observable $\hat{W}_1$ acts on a pure quantum state $\psi$ (a pure superposition) and forces it into one of its eigenstates. Now, from the standpoint of the second person (who, as you say, knows the first observable $\hat{W}_1$ has been applied, but does know the eigenstate prevailing after application of $\hat{W}_1$), the quantum state is now in a mixed quantum state. The question of consciousness does not enter: the quantum state has been acted on by $\hat{W}_1$, it’s simply that the second person does not know the outcome.

The accepted way to think of the state from the second person’s standpoint is as a classical statistical mixture of pure quantum states. Suppose the second observer wants to impart observable $\hat{W}_2$ some time after $\hat{W}_1$ is applied. In principle, to foretell the probability distributions of outcomes, the second observer must calculate the pure quantum state’s evolution (by, say, the Schrödinger equation) for each possible outcome of the observable $\hat{W}_1$ (i.e. for each possible eigenstate $\left|\left.\psi_{1,j}\right>\right.$ output by the observable $\hat{W}_1$), work out the the probability distibutions arising from the application of $\hat{W}_2$ on each of these evolved eigenstates and then combine these distributions following the rules of classical probability. It is easy to shown that this is exactly what we do when we calculate with the density matrix, which is:

$$\rho = \sum_j p_{1,j} \,\left|\left.\psi_{1,j}\right>\right. \left<\left.\psi_{1,j}\right|\right.$$

where $\left|\left.\psi_{1,j}\right>\right.$ are the eigenstates of the first observable $\hat{W}_1$ and $p_{1,j}$ their probabilities given the pure quantum state that was input to the first observable (so if the pure quantum state before the experiment was $\psi$, then $p_{1,j} = |\left<\psi|\psi_{1,j}\right>|^2$). Instead of evolving the pure state by the Schrödinger equation, the density matrix evolves following the Liouville-von Neumann equation:

$$i \hbar \frac{\partial \rho}{\partial t} = [\hat{H},\rho]$$

where $\hat{H}$ is the quantum system’s Hamiltonian, and when we get to the time where the second person does their experiment, i.e. imparts observable $\hat{W}_2$, we calculate the $n^{th}$ moment of the statistical distribution of the measurement outcomes as:

$$m_n={\rm tr}\left(\rho\,\hat{W}_2^n\right)$$

whence we can derive the full probability density function for the measurement outcomes from $\hat{W}_2$.

Back to the Stokes Parameters and Depolarised Light

So now our $2\times 2$ mixed quantum light state is now represented as a general $2\times2$ Hermitian (i.e. $H = H^\dagger$) matrix:

$$\rho = \sum\limits_{j=0}^3 s_j \sigma_j$$

where $\sigma_0 = {\rm id}$ is the $2\times 2$ identity matrix and $\sigma_j$ are the Pauli spin matrices. The co-efficients $s_j$ are nothing but the Stokes vector. Any $2\times2$ Hermitian matrix can be written like this.

Now, if the light passes through a lossless component, so that its Jones matrix $U$ is unitary $U U^\dagger = U^\dagger U = {\rm id}$, then the density matrix becomes:

$$\rho^\prime = U \rho U^\dagger = s_0 {\rm id} + \sum\limits_{j=1}^3 s_j U \sigma_j U^\dagger$$

and the length of the “polarised” part of the light $(s_1, s_2, s_3)$ does not change. $s_0$ on the one hand and $(s_1, s_2, s_3)$ on the other stay separate and do not mix. The unitary matrix scrambles the $(s_1, s_2, s_3)$ but leaves their sum of squares constant and indeed, if we look only at the $(s_1, s_2, s_3)$ we are witnessing the group $SU(2)$ of unitary Jones matrices acting on the three dimensional Lie algebra $i\sigma_1, \sigma_2, \sigma_3)$ of $SU(2)$ through the Adjoint representation $SO(3)$ of $SU(2)$ – in everyday language we are seeing rotations of the Poincaré sphere.

However, if our optical component is not lossless, then the transformation $U$ is simply a general $2\times2$ Hermitian matrix and the $s_0$ and $(s_1, s_2, s_3)$ are mixed in a more general linear transformation. You can, if you like, still use your Jones matrices, but you must use them not acting on a state, but acting on the density matrix: i.e. instead of your pure state $x$ transforming like $x\mapsto U x$, your density matrix transforms by a so called spinor map $\rho\mapsto U \rho U^\dagger$.

Another way of doing this is simply to note that in the map $\rho\mapsto U \rho U^\dagger$, the four parameters $(s_0,s_1, s_2, s_3)$ defining the density matrix undergo linear transformations. So instead of spinor maps, we can use a $4\times4$ matrix to represent a general optical component. This of course is the Mueller matrix. For an optical component with general, nonunitary Jones matrix $U$, the corresponding elements of the Mueller matrix $M$ are:

$$M_{j\,k} = {\rm Tr}\left(\sigma_j^\dagger U \sigma_k U^\dagger\right)$$

The Mueller matrix acts on vectors in the linear space of $2\times2$ Hermitian matrices thought of as a vector space over $\mathbb{R}$. This space comes kitted with an inner product for finding components of “vectors”, to wit, the Killing form $\left<A,B\right> = {\rm Tr}(A^\dagger B) = {\rm Tr}(A B)$, which is how I wrote the expression above dowm. The Stokes vector is simply the density matrix living in this space but written as a $4\times 1$ real valued column and the Mueller matrix implements the linear spinor map on the rewritten density matrix.

More generally, the Mueller calculus is simply another way of calculating the transformations wrought on a density matrix for any finite dimensional quantum system by various operations, which can include unitary operators or Wigner-Friend kind conversion of pure states to mixed ones. Every $N$ dimensional quantum system implies an $N^2 \times N^2$ dimensional Mueller calculus when the density matrices are written as columns. Here the “basis vectors” are the matrices $\left|\left.x_j\right.\right>\left<\left.x_k\right.\right|$ where $x_j$ are the base quantum pure states. The $N^2 \times N^2$ Mueller matrix operates on the vector of co-efficients $\rho_{j,k}$ in the density matrix $\sum\limits_j\sum\limits_k \rho_{j,k}\left|\left.x_j\right.\right>\left<\left.x_k\right.\right|$.

An Aside: As Physics Stack Exchange user Trimok has pointed out (thanks Trimok) the standard numbering of the Pauli matrices gives a reordering of the OP’s Stokes parameters:

… with the OP conventions, you have the correspondence $s_1 \to s_z, s_2 \to s_x, s_3 \to s_y$ with $\rho = s_0\sigma_0 + s_x \sigma_x +s_y \sigma_y +s_z\sigma_z$

Classical against Quantum Description of Partially Polarised Light

I should like to end by comparing the classical and quantum description of partially polarised light. The latter is undoubtedly far simpler and more elegant.

I shall illustrate the two approaches by calculating the scattered light intensity when partially polarised but plane wave light is incident on an interface between two dielectric materials. The interaction between plane waves and such an interface is of course described by the Fresnel equations; the question is how to apply these polarisation-dependent equations when the polarisation state is not certain.

Suppose the Fresnel equations give us complex reflexion co-effcients $R_p$ and $R_s$ for $p$- and $s$-polarized light, respectively. Then the intensity reflexion co-efficient (power reflexion coefficient) for depolarized light is (in most cases):

$\frac{1}{2} (|R_s|^2 + |R_p|^2)$

You do likewise for the transmission co-efficients, so that the transmitted power ratio is:

$\frac{1}{2} (|T_s|^2 + |T_p|^2) = 1- \frac{1}{2} (|R_s|^2 + |R_p|^2)$

where $T_p$ and $T_s$ are the Fresnel equation-derived complex transmission co-efficients for $p$- and $s$-polarized light. Forming average square magnitudes like this is often called “incoherent summing”.

To understand fully how to do your calculation, you need to understand exactly what depolarized light is, and it has quite a complicated description: it is bound up with decoherence and partially coherent light, a topic which Born and Wolf in “Principles of Optics” give a whole chapter to. A classical description, roughly analogous to Born and Wolf’s is as follows: if the transverse (normal to propagation) plane is the $x,y$ plane, then we represent the electric field at a point as:

$\mathbf{E} = \left(\begin{array}{cc}E_x(t) \cos(\omega t + \phi_x(t))\\E_y(t) \cos(\omega t + \phi_y(t))\end{array}\right)$

where $\omega$ is the centre frequency and the phases $\phi_x(t)$, $\phi_y(t)$ and envelopes $E_x(t)$, $E_y(t)$ are stochastic processes, which can be as complicated as you like. The formulas I cite above just assume that:

  1. $E_x$, $E_y$ and $\phi$ behave like independent random variables, and
  2. They vary with time swiftly compared to your observation interval (the time interval whereover you gather light in a sensor to come up with an “intensity” measurement) but not so swiftly that the light’s spectrum broadened so much that we cannot still think of the light as roughly monochromatic.

A simple quantum description is actually conceptually much clearer than Born and Wolf’s classical one, as long as light states do not become entangled. Each photon can be thought of as a perfectly coherent wave propagating following Maxwell’s equations. The Fresnel equations thus apply to each photon as they would to a perfectly coherent wave. For each photon, therefore, you calculate the intensity of reflexion and transmission, and then average this intensity over all photon polarization states – we assume the source is creating “random” pure states. Thus, suppose the Fresnel equations give us complex reflexion co-effcients $R_p$ and $R_s$ for $p$- and $s$-polarized light: the complex amplitude reflexion co-efficient for a general polarization state is then:

$R(\alpha, \phi) = \alpha R_p e^{i \frac{\phi}{2}} + \sqrt{1-\alpha^2} R_s e^{-i \frac{\phi}{2}}$

where $\alpha \in [0, 1]$ and $\phi \in [0, 2\,\pi)$. Summing intensities over all values of $\phi$ (assuming all phases equally likely) yields:

$\frac{1}{2\pi}\int\limits_0^{2\pi} \left(\alpha^2 |R_p|^2 + (1-\alpha^2) |R_s|^2 + 2 \alpha\sqrt{1-\alpha^2} |R_p| |R_s| \cos\phi\right)\mathrm{d}\phi = \alpha^2 |R_p|^2 + (1-\alpha^2) |R_s|^2$

and then summing intensities over all values of $\alpha^2$ (assuming the $\alpha^2$ is uniformly distributed in $[0,1]$) leaves the formulas above.

This will not give a full picture for general entangled polarization states, when you have to resort to more general coherent and cross correlation functions to describe what is going on. Likewise for Born and Wolf’s classical description. But it is an excellent first approximation and it is probably true to say that it is hard to arrange for it not to hold in the laboratory. Deviations from it are likely to be seen if you sample the light intensities over very short sampling intervals, when you will see complicated, extremely swift fluctuations in scattered and transmitted intensities, often following white noise processes.