What is Diffraction?

There were several questions about this recently on Physics Stack Exchange.

I would say not to get too worried about fine meanings of the word: it is ultimately a little imprecise, and when one is thinking about real, physical problems, one is going to be working with equations.

The more fundamental concept is interference, which is simply a manifestation of the linear superposition principle. Amplitudes add, so magnitude and phase is important when summing up contributions to a field from different sources.

Diffraction works like this. Suppose you know a monochromatic field’s values on one transverse plane. Now Fourier transform the values, to express the field on a transverse plane as a sum of plane waves. Plane waves running nearly orthogonal to the transverse plane have almost the same phase over wide transverse regions. So they show themselves as low spatial frequencies in the transverse plane field pattern. Plane waves running at steep angles to the transverse plane beget high spatial frequency components in that plane.

So we’ve resolved our field into a linear superposition of plane waves. Because these waves are propagating in different directions, they undergo different delays in reaching another transverse plane. The Fourier co-efficients take on different phases, so the same constituent plane waves interfere together to make a different field configuration on other transverse planes.

Diffraction is thus the interference of a field’s (e.g. an electromagnetic field following the linear Maxwell equations) plane wave constituents. These constituent plane waves beat differently on different transverse planes because they undergo different phase delays by dint of their different directions. Another equivalent (in the larger propagation distance limit) description is Huygens’s principle. Think of a single slit field. Diffraction is the interference on a farfield plane between the different fields arising from the different Huygens point sources at different positions in the slit.

Diffraction is the basic reason why laser beams always diverge. Now whether you speak of photons or classical fields, the explanation is precisely the same. Maxwell’s equations are the exact, single quantized description of photon propagation; I bang on about this topic ad nauseam in my post on this site Is Light a Wave or a Particle? and also here (Electromagnetic radiation and quanta). Sometimes people describe diffraction of photon in terms of the Heisenberg Uncertainty Principle. A laser beam’s minimum divergence is governed by exactly the same mathematics (more on this further on) as the Heisenberg Uncertainty Principle, but I believe it is misleading to think of these two phenomenons as the same thing, even though their mathematics is the same.

So let’s give a quick mathematical summary of what I mean by diffraction. Consider a field on a plane, say $z = 0$ and split it up using Fourier decomposition of the field variation over the plane $z=0$ into constituent plane waves, which are “modes” of Maxwell’s equations insofar that their propagation description is simply that the fields become phase delayed by a simple scale factor $\exp(i\,\mathbf{k}\cdot\mathbf{\Delta r})$ under the action of a translation $\mathbf{\Delta r}$. Each constituent plane wave has a different direction defined by the wavevector $\left(k_x, k_y, k_z\right)$ with $k^2 = k_x^2 + k_y^2 + k_z^2$ (i.e. the Fourier space equivalent of Helmholtz’s equation), that is, all the wavevectors have the same magnitude but different directions. So, when we ask what the field looks like at a different value of $z$, we build the field up from our plane wave constituents at this point (use an inverse Fourier transform). However, now, because the wavevectors are all in different directions, the plane waves have all undergone different phase delays in reaching the new value of $z$ (even though their phase advances by $k$ radians per unit length in the direction of the respective wave vector). Therefore, the field’s configuration gets scrambled by all these different phase delays. I sketch this idea in a drawing below:


Figure 1: Plane waves with the same phase speed but in different directions undergo different phase delays in running from $z=0$ to $z=L$

Now to study diffraction in some detail. Think of a one-dimensional problem, so we have a uniformly lit slit of some finite width $w$ modeling the laser output; in this simplified system that there are only 2D wave vectors. The screen with the slit is in the $z = 0$ plane and the one orthogonal direction is the $x$ axis. All the Cartesian components of the fields fulfil the same (Helmholtz) equation, so we can discuss the principles by just looking at one scalar field $\psi$ (say, the electric field’s $x$-component). Each plane wave has the form $\psi(k_x) = \exp\left(i \,(k_x\, x + k_z\, z)\right)$. The Fourier transform of the field output from the slit is then (I’ll leave out factors of $2\pi$ in the unitary FT because scale factors don’t affect the following):

$$\frac{\sin\left(\frac{w\, k_x}{2}\right)}{k_x} \tag{1}$$

where $w$ is the slit width, and unless the slit is very wide, the Fourier transform has a wide spread of frequencies. This means that for $z = 0^+$ (“immediately downstream” of the the slit’s output) the field is the superposition

$$\int\limits_{-\infty}^\infty \frac{\sin\left(\frac{w\, k_x}{2}\right)}{k_x} \exp\left(i\, (k_x\, x + k_z\, z)\right) \mathrm{d} k_x\tag{2}$$

When we plug $z = 0$ in, the integral is simply the inverse FT of (1) and we get our original slit field. But now put some nonzero value of $z$ in: because $k_x^2 + k_z^2 = k^2$, we have $k_z = \sqrt{k^2 – k_x^2}$ (assuming the field is running in the $+z$ direction), we get

$$\int\limits_{-\infty}^\infty \frac{\sin\left(\frac{w\, k_x}{2}\right)}{k_x} \exp\left(i\, (k_x\, x + \sqrt{k^2 – k_x^2}\, z)\right)\, \mathrm{d} k_x\tag{3}$$

You can see the “scrambling”, $k_x$-dependent phase factor $\exp(i\, \sqrt{k^2 – k_x^2}\, z) = \exp\left(i\, k\, \cos\theta_x\,\right)$ (where $\theta_x$ is the angle that the plane wave with wavevector $(k_x, k_z)$ makes with the $z$-axis) will yield the complicated scrambling you see as “diffraction”. Various approximations, notably Fraunhofer and Fresnel, are applied to this integral. The angle a Fourier component with $x$ component of wavenumber $k_x$ makes with the $z$-axis is $\theta = \arcsin (k_z/k)\approx k_s/k$. So we see that the Fourier transform of the transverse field dependence defines the divergence. In the above, we see a reciprocal relationship between a rough measure $2\pi/w$ of the maximum skew angle of the constituent plane waves and the “confinement” $w$ of the light field to the slit. The beam divergence and the beamwidth are indeed related by a Heisenberg-like inequality, and if we measure beam divergence and confinement by RMS values, we can indeed show the following from the basic properties of Fourier transforms. If $f(x)\in \mathbf{L}^2(\mathbb{R})$ and $F(k_x)$ is its Fourier transform, then the product of the root mean square spreads of both functions is bounded as follows. Without loss of generality, assume that $f(x)$ is real and $\int_{-\infty}^\infty x\,f(x)\,\mathrm{d}\,x = \int_{-\infty}^\infty k_x\,F(k_x)\,\mathrm{d}\,k_x= 0$, then:

$$\sqrt{\frac{\int_{-\infty}^\infty x^2\,|f(x)|^2\,\mathrm{d}\,x}{\int_{-\infty}^\infty |f(x)|^2\,\mathrm{d}\,x}}\;\sqrt{\frac{\int_{-\infty}^\infty k_x^2\,|F(k_x)|^2\,\mathrm{d}\,k_x}{\int_{-\infty}^\infty |F(k_x)|^2\,\mathrm{d}\,k_x}} \leq \frac{1}{2}\tag{4}$$

and moreover the inequality is saturated by Gaussian $f(x)$ $f(x) \propto \exp\left(-\frac{x^2}{2\,\sigma^2}\right)\,e^{-i\,k_0\,x}$ for some real constants $\sigma$ and $k_0>0$, *i.e.* such functions (their Fourier transforms are also Gaussian) achieve equality in the above bound.

So we have, since $\theta \approx k_x / k$:

$$\Delta k_x \Delta x = \geq \frac{1}{2} \;\;\Rightarrow \;\;\frac{2\pi}{\lambda}\, \Delta\theta \,w \approx \frac{1}{2}\tag{5}$$

Plugging in a $w = 1\mathrm{mm}$ beamwidth for $\lambda =500\mathrm{nm}$ wavelength light, we get a beam divergence of $\Delta \theta \approx 10^{-5} \mathrm{radian}$. This is the typical beam divergence for a high quality 1mm laser chip. There is some arbitrariness in what measures we use for beam divergence (since Gaussian beams have theoretically infinite breadth): often it is the vertex angle of the cone containing $1 – e^{-2}$ of the beam’s power. But I have equally seen the Gaussian RMS $\sigma$ or twice this value (one can speak of cone vertex angles or halfangles) used as the beamwidth; these are the $1 – e^{-2}$ beamwidth divided by $2\sqrt{2}$ and $\sqrt{2}$, respectively. You have to be a little bit careful how the beam divergence is defined.

Applying the Heisenberg Uncertainty Principle to Light

Let’s finish with the Heisenberg uncertainty principle. My post “The Heisenberg Uncertainty Principle and Canonical Commutation Relationships in Quantum Mechanics on this site shows how we can derive the following from the canonical commutation relationship $\hat{X}\,\hat{P} – \hat{P}\,\hat{X} = i\,\hbar\,\mathrm{id}$ between conjugate quantum observables alone:

We can always find co-ordinates for our quantum state Hilbert space such that $X$ is a simple multiplication operator and $P$ is the simple derivative operator $-i\hbar \mathrm{d}_x$

and as such position and momentum co-ordinates are mapped into one-another by the Fourier transform (because the eigenfunctions of $\mathrm{d}_x$ are of the form $e^{i\,k_x\,x}$). Therefore, exactly the same techniques and ideas apply as above, which is why the Heisenberg uncertainty principle seems so like the ideas in my answer. But it is most assuredly not the same thing. The HUP can’t be applied to light for position-momentum because there are problems defining a position observable for the photon. This has to do with the fact that if $(\vec{E}, \vec{B})$ is a solution of Maxwell’s equations, then things like $(x_j \vec{E}, x_j \vec{B})$ (where $x_j$ are the Cartesian co-ordinates) generally aren’t (the Gauss laws showing divergencelessness in freespace are violated). Of course the HUP always applies to noncommuting (conjugate) observables and there are many pairs of those in quantum electrodynamics, such as corresponding pairs Cartesian components of the electric and magnetic field observables. Contrast this with the scalar quantum electron state in the scalar massive particle nonrelativistic Schrödinger equation where the scalar eigenstates as $\mathbf{L}^2$ complete, so that if $\psi(x)$ is a quantum state in position co-ordinates, then $x \psi(x)$ is also in the Hilbert space of states. One can of course define an intensity field which yields a probability distribution to (destructively) photodetect a photon, but this is different from asking where (position observable) an electron electron is in an orbital. Electrons can be detected nondestructively – it is very hard to do this for photons. Also, position observables are readily defined only for scalar quantum states in nonrelativistic first quantized descriptions: there is of course no nonrelativitistic first quantised description of the photon. The bispinor valued electron state is also weird and the question of where the electron is cannot be addressed by a simple position observable either. Now you can still define the momentum with the usual observable, because the eigenfunctions of $-i\,\hbar\,\partial_j$ are plane waves, i.e. well defined momentum states. But when you talk of localization of photons – probability distributions of where to detect them – you are talking about diffraction. This has exactly the same mathematics as the HUP, as I have shown in my answer above. Having said this, Margaret Hawton is one of a few researchers who have taken a step back and looked at ways wherein we can meaningfully talk about photon positions, i.e. what we can salvage from the wreckage of the above problems: she derives a “position” observable with commuting components essentially by concocting something which has canonical commutation relationships with the momentum observable by definition and goes on to build a second quantized theory with these ideas. One finds that one gets what would wontedly be defined as a position observable together with some interesting and weird terms related to the photon’s topological (Berry) phase. In other words, she explicitly shows how the wonted “no-go” theorems that forbid a photon position observable manifest themselves as extremely interesting terms that have to be added to the “wonted” and defective position observable. See her personal website for her papers.