Two Unwonted Proofs of the Thin Lens Formula

A mathematician teaching physics asked for a proof of the thin lens formula in this question on Physics Stack Exchange.

Now the “traditional proof” is of course the one with similar triangles. A good summary of this one is to be found at here.

However, I have come up with and refined two proofs over the years that delve much more into the nature of light.

Before I begin, take heed that the thin lens formula is a paraxial formula: it only applies to light fields made up of plane waves skewed at small angles both to the nominal propagation direction, taken to be the $z$ direction in the following, and to any interface that the light field may meet. Under these conditions, $\sin\theta \to \theta$ for any angle between a ray and a surface unit normal Snell’s law becomes $n_1\,\theta_1 = n_2\, \theta_2$

Method 1: Spherical Waves and their Interaction with Refracting Surfaces

Label $\mathbb{R}^3$ by Cartesian co-ordinates $x,y,z$ and consider a perfect point source at $(0,0,-R)$ a distance $R$ from the $x-y$ plane of constant axial co-ordinate $z=0$. The scalar field (i.e. any Cartesian component of the $\vec{E}$ or $\vec{H}$ fields or, in Lorentz gauge, potentials $\vec{A}$ or $\phi$) is a spherical wave by the time it reaches $z=0$ (strictly, one needs to assume $R\gg k\,s^2$ where $s$ is the diffraction limited spotsize of the point source and $k$ the wavenumber), but intuitively you can see that, far enough from the source, the source will be radiating spherical waves.

Therefore, the field’s phase $\phi$ on the plane $z = 0$ varies like:

$$\tag{1}\phi(x,y) = k \,\sqrt{R^2 + x^2 + y^2}-k\,R\approx k\,\frac{x^2+y^2}{2\,R}$$

the approximation being good if the field is paraxial, i.e. the radius of the support of $\phi$ in the plane is much less than $R$.

Now define a thin lens as an element that adds a curvature to its input field without significant diffractive effects. Its action is thus like a phase mask. A converging thin lens’s transformation on the input field $\psi(x,y)$ is:

$$\tag{2}\psi(x,y)\to \exp\left(-i\,k\,\frac{x^2+y^2}{2\,f}\right)\psi(x,y)$$

To understand the parameter $f$, we think of a plane wave (curvature of nought) with wavefront parallel to the $x-y$ plane (propagation exactly along the $z$ axis) as the input. The output field’s phase on the $x-y$ plane is thus:

$$\tag{3}\tilde{\phi}(x,y) = -k\,\frac{x^2+y^2}{2\,f}$$

By comparing (1) and (3) and noting the parameters $R$ and $f$ and their signs in the phase denominators, we see that the output field is that arising from a diffraction limited point field which is at $z = –f = +f$. So therefore $f$ is the thin lens’s focal length, its sign setting whether we have a converging or diverging lens. So, if the field with phase (1) is input to the thin lens, when we impart transformation (2) on a field with phase (1) we get a field with phase:

$$\tag{4}\phi_{out}(x,y) = k\,\left(x^2+y^2\right)\left(\frac{1}{2\,R}-\frac{1}{2\,f}\right)$$

again, comparing (4) with (1), we see that this is a field that arises from a point source at position $z=R^\prime$ where:

$$\tag{5}-\frac{1}{R^\prime} = \frac{1}{R}-\frac{1}{f}$$

whence straight away follows the thin lens formula. $\qquad\square$

Method 2: The Transformer Matrix Method

For an axisymmetric (invariant with respect to rotation about the optic axis, here assumed to be the $z$ axis) system we can represent ray behavior through any plane through the optic axis. So any ray through a homogeneous medium has the linear equation $y = \theta z + \delta$ where $\theta$ (angle wrt $z$ axis) and $\delta$ are real constants characterizing the ray. Suppose a ray meets a refracting interface: it leaves the interface as another ray with a different direction, characterised by new $\theta^\prime$, $\delta^\prime$. Moreover, if the ray passes through any sequence of refracting surfaces or reflecting surfaces spaced by homogeneous mediums, the transformation wrought on the ray parameters $\theta$, $\delta$ is a homogeneous linear transformation because Snells law is paraxially linear in $\theta$ as noted in the introduction and the law of reflection is also linear in $\theta$. Note that this is true for a given ray, but in general the linear transformation depends on where (the value of $y$) the ray enters the transforming system. However, we either argue that, (1) to first order in $y$, the transformation does not depend on the entry point or (2) we design our refracting surfaces so that the transformation does not depend on the entry point.

So this means for these idealized, paraxial systems, a refracting/ reflecting system is characterized by a $2\times2$ real matrix that transforms the column vector $\left(\begin{array}{c}\theta\\\delta\end{array}\right)$ defining the input ray to $\left(\begin{array}{c}\theta^\prime\\\delta^\prime\end{array}\right)$ defining the output. It is customary to shift the $z$ axis origin to the input / output of each successive ray transformer so that the column vectors for the input assume that the input plane is $z=0$ and those for the output assume that $z=0$ is the output – we can define the transformation matrix accordingly.

So now we work out the matrices of the basic subsytems. A homogeneous medium does not transform the angle $\theta$ but it shifts a ray’s height by $\theta\,\Delta z$; therefore the matrix for a medium of width $\Delta z$ is:

$$\tag{6}H(\Delta z) = \left(\begin{array}{cc}1&0\\\Delta z&1\end{array}\right)$$

Now for the thin lens: $y$ does not shift, but an axially propagating ray ($\theta = 0$) is bent so that it passes through the optical axis a distance $f$ beyond the transformer’s output plane; likewise a ray with $\theta = \delta/f$ (passing through the focal point a distance $f$ before the input) is mapped to an axially propagating ray, therefore:


and so the transformer matrix for the converging thin lens (“positive optical power” $1/f$) with focal length $f$ is:

$$\tag{8}T(f) = \left(\begin{array}{cc}1&-f^{-1}\\0&1\end{array}\right)$$

So now we work out the transformer matrix for the system described by a thin lens: a homogenous medium of width $s_0$ followed by thin lens of focal length $f$ followed by homogenous medium of width $s_1$; we compose the ray transformations by multiplying the relevant matrices:

$$\tag{9}H(s_1)\, T\left(\left(s_0^{-1}+s_1^{-1}\right)^{-1}\right)\, H(s_0) = \left(
\begin{array}{cc}-\frac{s_0}{s_1} & -\frac{s_0+s_1}{s_0\,s_1} \\0 & -\frac{s_1}{s_0} \\\end{array}\right)$$

whereunder the family of rays through the optical axis characterized $\theta = \theta_0\,\delta = 0$ is mapped to another family of ray through the optical axis $\theta = -s_0\,\theta_0/s_1\,\delta = 0$ (i.e. a focal point is mapped to a focal pojnt) and moreover the system magnification is the reciprocal $-s_1/s_0$ of the angular scaling at the matrix element $(1,1)$. $\qquad\square$

What’s interesting about this method is that (6) and (8) are unimodular, so therefore so are all their products, so any system of thin lenses spaced by homogenous mediums also has a unimodular transformation matrix: if some of the spacing mediums are not vacuums, we represent the transition into them by Snell’s law, which in this notation means a flat interface orthogonal to the optical axis transforms by:

$$\tag{10}R(n_1,n_2) = \left(\begin{array}{cc}n_2/n_1&0\\0&1\end{array}\right)$$

which is not unimodular, but as long as we end up with the inputs and outputs in the same medium, the matrix product includes $R(n_1,n_2), R(n_2,n_3), R(n_3,n_4)\cdots R(n_N,n_1)$, whose determinants all multiply to yield 1. So our system still has a unimodular matrix. With a little more work, you can show that there is a paraxial system to realize any member of the unimodular group $SL(2,\mathbb{R})$. This is most readily done by understanding that the “infinitesimal” versions of (Lie algebra members in $\mathfrak{sl}(2,\mathbb{R})$ corresponding to) (6) and (8) are:

$$\tag{11}h = \left(\begin{array}{cc}0&0\\1&0\end{array}\right);\quad t=\left(\begin{array}{cc}0&1\\0&0\end{array}\right);\quad i.e.\;H(\Delta z) = \exp(\Delta z \, h);\quad T(f) = \exp\left(-\frac{1}{f}\,t\right);$$

and that:

$$\tag{12}[t,h] = s = \left(\begin{array}{cc}1&0\\0&-1\end{array}\right);\quad [s,t] = 2\,t;\quad [s,h] = -2\, h$$

so that $s, t$ and $h$ span the whole Lie algebra $\mathfrak{sl}(2,\mathbb{R})$, therefore there is a finite product of matrices of the form (6) and (8) to realize any member in the identity-connected component of $SL(2,\mathbb{R})$, which is simply the whole unimodular group. If we add a reflecting system $\left(\begin{array}{cc}1&0\\0&-1\end{array}\right)$ (flips an image and has determinant -1) to the mix as well scale factors to the determinant through an “unbalanced” $R(n_{in}, n_{out})$ arising from a different input and output medium, we can indeed realize any member of the whole group $GL(2,\mathbb{R})$ of invertible $2\times2$ matrices as a paraxial optical ray transformer.

Afterwords to Method 1

Note that you can also get the system magnification and the lateral positions of images relative to sources if you do method 1 again assuming an offset phase mask, i.e. by replacing (2) by:

$$\tag{13}\psi(x,y)\to \exp\left(-i\,k\,\frac{(x-x_0)^2+(y-y_0)^2}{2\,f}\right)\psi(x,y)$$

and interpreting the linear terms of the output phase appropriately as lateral co-ordinates of the image point.

This method assumes perfectly unaberrated waves. However, witness that any lens whose surface sag (height) near its vertex can be described by an analytic function of distance $r$ from the optical axis (axis of symmetry) will impart a phase mask well described by the above in the paraxial limit, i.e. by limiting the numerical aperture (maximum skew angle) in any field such that the support of any field in the transverse plane $z=0$ is small enough. As long as the width of the support needed to validate the analysis above stays big compared with a wavelength, the analysis above will be valid in the paraxial limit. This means it works in the paraxial limit for all lenses of practical curvatures in the neighbourhood of the optical axis.

You can also dig deeper into the scalar wave theory behind the thin lens equation with the theory of Gaussian beams. You begin with the Helmholtz equation in a homogeneous medium $(\nabla^2 + k^2)\psi = 0$. If the field comprises only plane waves in the positive $z$ direction then we can represent the diffraction of any scalar field on any transverse (of the form $z=c$) plane by:

$$\tag{14}\begin{array}{lcl}\psi(x,y,z) &=& \frac{1}{2\pi}\int_{\mathbb{R}^2} \left[\exp\left(i \left(k_x x + k_y y\right)\right) \exp\left(i \left(k-\sqrt{k^2 – k_x^2-k_y^2}\right) z\right)\,\Psi(k_x,k_y)\right]{\rm d} k_x {\rm d} k_y\\
\Psi(k_x,k_y)&=&\frac{1}{2\pi}\int_{\mathbb{R}^2} \exp\left(-i \left(k_x u + k_y v\right)\right)\,\psi(x,y,0)\,{\rm d} u\, {\rm d} v\end{array}$$

In words:

  1. Take the Fourier transform of the scalar field over a transverse plane to express it as a superposition of scalar plane waves $\psi_{k_x,k_y}(x,y,0) = \exp\left(i \left(k_x x + k_y y\right)\right)$ with superposition weights $\Psi(k_x,k_y)$;
  2. Note that plane waves propagating in the $+z$ direction fulfilling the Helmholtz equation vary as $\psi_{k_x,k_y}(x,y,z) = \exp\left(i \left(k_x x + k_y y\right)\right) \exp\left(i \left(k-\sqrt{k^2 – k_x^2-k_y^2}\right) z\right)$;
  3. Propagate each such plane wave from the $z=0$ plane to the general $z$ plane using the plane wave solution noted in step 2;
  4. Inverse Fourier transform the propagated waves to reassemble the field at the general $z$ plane.

Now we make the paraxial approximation to the propagation relationship in step 2 above, *i.e.* we assume that the plane waves aren’t skewed at too steep angles relative to the $z$ axis so that $k_x^2+k_y^2 \ll k^2$. Then our two propagation equations above become the Fresnel propagation integral:

$$\tag{15}\begin{array}{lcl}\psi(x,y,z) &=& \frac{1}{2\pi}\int_{\mathbb{R}^2} \left[\exp\left(i \left(k_x x + k_y y\right)\right) \exp\left(i \frac{k_z^2+k_y^2}{2\,k} z\right)\,\Psi(k_x,k_y)\right]{\rm d} k_x {\rm d} k_y\\
\Psi(k_x,k_y)&=&\frac{1}{2\pi}\int_{\mathbb{R}^2} \exp\left(-i \left(k_x u + k_y v\right)\right)\,\psi(x,y,0)\,{\rm d} u\, {\rm d} v\end{array}$$

Now witness that a beam $\psi(x,y,0)$ with Gaussian dependence on $x$ and $y$ becomes, under the Fresnel diffraction integral, $\psi(x,y,z)$ with Gaussian dependence on $x$ and $y$. The Fourier transform of a Gaussian is a Gaussian, a fact which does not change when we multiply by the $\exp\left(i \frac{k_z^2+k_y^2}{2\,k} z\right)$ kernel in the Fresnel diffraction integral, and of course a Gaussian is recovered by the inverse Fourier transform. Moreover, the transformation of the phase mask in (2) and (13) maps a Gaussian beam into a Gaussian beam. Best of all, for Gaussian beams the integrals split up into a product of separate $x$ and $y$ dependences and these separate dependences are also operated on independently by the phase masks (2) and (13), so propagation analysis can be done as a product of two decoupled one-transverse dimensional diffraction problems. Therefore the propagation through a system comprising homogeneous, diffractive mediums and thin lenses characterized by (2) and (13) can be done wholly in closed form expressions (use Mathematica though!) and you end up with something slightly more general than the thin lens formula that becomes the thin lens formula when the axial distances involved become longer than the Rayleigh diffraction length (of the order of wavelengths).