The Heisenberg Uncertainty Principle and Canonical Commutation Relationships in Quantum Mechanics

Heisenberg came up with the idea that certain measurements could not be made at once: that if, for example, you measured an object’s momentum and then its position you would get a different result from the situation where you measured the position first and then the momentum.

With the advantage of hindsight, we now understand that everything about the Heisenberg Uncertainty Principle begins and ends with the canonical commutation relationship (CCR):

$$\tag{1}[\hat{X},\,\hat{P}]=\hat{X}\,\hat{P} – \hat{P}\,\hat{X} = i\,\hbar\,\mathrm{id}$$

where $\hat{X}$ and $\hat{P}$ are respectively the position and momentum observables.  Let’s go back to what we can actually derive from the CCR. Let’s simplify this to a one-dimensional particle – it should be easy to see that this generalizes with 1D multiplication operators replaced by $3\times 3$ diagonal matrix multiplication operators. It can be shown that, given the CCR, $\hat{X}$ and $\hat{P}$ must have continuous spectrums – see my answer here and especially the link therein for more information. The eigenvalues of $\hat{X}$ and $\hat{P}$ can be any real values, so let’s assume firstly that our co-ordinates in quantum state space have been chosen so that $\hat{X}$ is a simple multiplication operator $\hat{X} f(x) = x\,f(x)$ and we write the quantum state as a wavefunction $\psi(x)\in\mathbf{L}^2(\mathbb{R})$ of the position observable’s eigenvalue $x$, so that now $|\psi(x)|^2$ is the probability density of the position eigenvalue given the quantum state $\psi(x)$. So now let’s think of $\hat{X}$ and $\hat{P}$ as operators belonging to some linear space $\mathcal{L}$ of suitably well behaved operators on $\mathbf{L}^2(\mathbb{R})$. We need to assume that $\mathcal{L}$ is a space of diagonalizable (i.e. spectrally factorisable) operators with continuous spectrums. Then we can think of the Lie bracket $\hat{P}\mapsto[\hat{X},\,\hat{P}]$ as linear operator $\operatorname{ad}_{\hat{X}}:\mathcal{L}\to\mathcal{L}$ on the space $\mathcal{L}$ of our operators. This mapping’s kernel is the linear space of operators mapped to nought by $\operatorname{ad}_{\hat{X}}$; these are precisely the linear space of generalised multiplication operators $f(x)\mapsto g(x) f(x)$ for some fixed $g(x) \in \mathbf{L}^2(\mathbb{R})$ defining each such operator. The reason we know this is the whole kernel is that operators commute if and only if those operators have the same eigenvectors and those commuting with $\hat{X}$ are the kernel members we seek. So a kernel member must be a multiplication operator. Now a particular solution to the CCR can readily be verified to be $f(x)\mapsto -i\,\hbar\,\mathrm{d}_x\,f(x)$, and the coset of all operators $\mathbf{P}$ fulfilling the CCR is the kernel displaced by any particular solution, so the most general $\hat{P}$ we can have in this particular co-ordinate system is:

$$\tag{2}f(x) \mapsto (-i\,\hbar\,\mathrm{d}_x + g(x))\,f(x) = -i\,\hbar\,\mathbf{Q}^{-1} \mathbf{D} \mathbf{Q} f(x)$$

where $\mathbf{D} f(x) = \mathrm{d}_x f(x)$ and $\mathbf{Q} f(x) = \exp\left(\frac{i}{\hbar} h(x) \right) f(x)$ where $h(x) = \int g(x)\mathrm{d}x$. So now we can rotate our state space co-ordinates so that $f(x) \mapsto \mathbf{Q} f(x)$ and the $\hat{X}$ and $\hat{P}$ observables transform in this space as $\hat{X}\mapsto \mathbf{Q}\hat{X}\mathbf{Q}^{-1} = \hat{X}$ and $\hat{P}\mapsto \mathbf{Q}\hat{P}\mathbf{Q}^{-1} = -i\,\hbar\,\mathbf{D}$. Therefore:

The CCR alone implies there is an orthogonal co-ordinate system for the quantum state space wherein:

$$\tag{3}\begin{array}{lcl}\hat{X} f(x) &=& x\,f(x)\\ \hat{P} f(x) &=& -i\,\hbar\,\mathrm{d}_x\,f(x)\end{array}$$

or, with a swap of roles of $\mathbf{X}$ and $\mathbf{P}$ together with a sign change of $\hbar$:

$$\tag{4}\begin{array}{lcl}\hat{P} f(p) &=& p\,f(p)\\ \hat{X} f(p) &=& i\,\hbar\,\mathrm{d}_p\,f(p)\end{array}$$

Once you have the expressions $\hat{\mathbf{X}} f(x) = x\,f(x);\;\hat{\mathbf{P}} f(x) = -i\,\hbar\,\mathrm{d}_x f(x)$ it follows that the co-ordinates wherein $\mathbf{P}$ is the simple multiplication operator $\hat{\mathbf{P}} f(p) = p\,f(p)$ and the co-ordinates wherein $\hat{\mathbf{X}} f(p) = p\,f(p)$ must be related by a Fourier transform. This is because the eigenfunctions of $-i\,\hbar\,\mathrm{d}_x$ with real eigenvalues (i.e. so that $-i\,\hbar\,\mathrm{d}_x$ is Hermitian) are the functions $\exp(i\,k\,x)$ for real $k$. So this observation brings us to a second method of deriving the result.

Second Method for Deriving Result

We might begin with de Broglie’s hypohthesis that momentum eigenstates are plane waves (i.e. with functional form $\exp(i\,k\,x)$ in position co-ordinates with momentum $\hbar k$ as our “fundamental” axiom. To factorise a state $\psi(x)$ into a superposition of such waves, we of course use the Fourier transform. De Broglie’s hypothesis is then equivalent to the statement that momentum co-ordinates are the position co-ordinates Fourier transformed and rhe momentum operator, by our de Broglie hypothesis momentum formula, multiplies these by $\hbar\,k$, then we transform back to position co-ordinates. So our momentum operator in position co-ordinates must be:

$$\tag{5}\psi(x) \to  \frac{\hbar}{2\,\pi}\int_{-\infty}^\infty\exp(i\,k\,x)\,k\,\int_{-\infty}^\infty\exp(-i\,k\,u)\psi(u)\,\mathrm{d}u\,\mathrm{d}k=\\-i\,\hbar\,\mathrm{d}_x\,\left(\frac{1}{2\,\pi}\int_{-\infty}^\infty\exp(i\,k\,x)\,\int_{-\infty}^\infty\exp(-i\,k\,u)\psi(u)\,\mathrm{d}u\,\mathrm{d}k\right) = -i\,\hbar\,\mathrm{d}_x\,\psi(x)$$

So, if we now Fourier transform our co-ordinates so that:

$$\tag{6}\Psi(p) = \frac{1}{\sqrt{2\,\pi}}\int_{-\infty}^\infty\exp\left(i\,\frac{p}{\hbar}\,x\right)\psi(x)\mathrm{d} x$$

then we see that (e.g. by integration by parts)

$$\tag{7}p\,\Psi(p) = \frac{1}{\sqrt{2\,\pi}}\int_{-\infty}^\infty\exp\left(i\,\frac{p}{\hbar}\,x\right)\left(-i\,\hbar\,\mathrm{d}_x \psi(x)\right)\,\mathrm{d} x$$

and also (by simply differentiating under the integral)

$$\tag{8}i\,\hbar\,\mathrm{d}_p \Psi(p) = \frac{1}{\sqrt{2\,\pi}}\int_{-\infty}^\infty\exp\left(i\,\frac{p}{\hbar}\,x\right)\, x \,\psi(x)\,\mathrm{d} x$$

and so (7) and (8) are the momentum and position operators, respectively, in momentum co-ordinates, hence we have a second way to understand the result.

The Heisenberg Inequality and its Saturation

So now we can derive the Heisenberg inequality. We first look at the following fundamental property of the Fourier transform:

If $f:\mathbb{R}\to\mathbb{R}$ is a function and $F:\mathbb{R}\to\mathbb{R}$ its Fourier transform, then the product of the root mean square spreads of both functions is bounded as follows. Without loss of generality, assume that $f(x)$ is real and $\int_{-\infty}^\infty x\,f(x)\,\mathrm{d}\,x = \int_{-\infty}^\infty k\,F(k)\,\mathrm{d}\,k = 0$ (i.e. the function and its Fourier transform have means of nought), then:

$$\tag{9}\sqrt{\frac{\int_{-\infty}^\infty x^2\,|f(x)|^2\,\mathrm{d}\,x}{\int_{-\infty}^\infty |f(x)|^2\,\mathrm{d}\,x}}\;\sqrt{\frac{\int_{-\infty}^\infty k^2\,|F(k)|^2\,\mathrm{d}\,k}{\int_{-\infty}^\infty |F(k)|^2\,\mathrm{d}\,k}} \geq \frac{1}{2}$$

I’m using the unitary definition of the Fourier transform here:

$$\tag{10}F(k)=\frac{1}{\sqrt{2\pi}}\int\limits_{-\infty}^\infty e^{-i\,k\,x}\,f(x)\,{\rm d}x$$

This is, of course, important for the Heisenberg Uncertainty Principle because, as we have seen above,  momentum co-ordinates and position co-ordinates (more generally, eigen-co-ordinates corresponding to any observables $\hat{X}$, $\hat{P}$ which fulfill the canonical commutation relationship $[\hat{X}, \hat{P}]=i\,\hbar\,{\rm id}$) are needfully related by a Fourier transform (with a scaling of the momentum co-ordinates by $\hbar$ thrown in after the Fourier transform).

As an aside, take heed that, if you define a family of pulses $f_\alpha(x) = f(\alpha x)$ from a “prototype” $f(x)$, then the Fourier transforms of the family are $F(k/\alpha)/\alpha$, so the product of uncertainties in (9) is the same for all family members: a broad pulse has a narrow FT.

To prove (9) for the class of Tempered Distributions) we note that proving (9) is equivalent to the problem:

Minimise $\int_\mathbb{R} k^2 |F(k)|^2 \,{\rm d} k$ subject to:

  1. $\int_\mathbb{R} x^2 |f(x)|^2\, {\rm d} x = const$ (find smallest wavenumber domain spread for a constant position domain spread);
  2. $\int_\mathbb{R} x |f(x)|^2\, {\rm d} x = 0$ (mean of nought in “position co-ordinates”);
  3. $\int_\mathbb{R} k |F(k)|^2\, {\rm d} k = 0$ (mean of nought in “wavenumber co-ordinates”);
  4. $\int_\mathbb{R}  |f(x)|^2\, {\rm d} x = 1$ (constant norm functions. Note that we do not need to assume $\int_\mathbb{R}  |F(k)|^2\, {\rm d} k = 1$ as this follows, by the Plancherel / Parseval theorems from $\int_\mathbb{R}  |f(x)|^2\, {\rm d} x = 1$;

Since the product of Fourier transforms is the Fourier transform of the convolution of the relevant tempered distributions, we can rewrite $k^2 |F(k)|^2$ as:

$$\frac{1}{2\pi} \int_\mathbb{R} e^{-i\,k\,x} (f * f^*)(x) {\rm d} x$$

where $f * f^*$ is the convolution of $f$ with its complex conjugate. Then we integrate $k^2 |F(k)|^2$  over the whole real line, switch order of integration, taking heed that multiplication of the FT by $i\,k$ is equivalent to taking the FT of the derivative $f^\prime(x)$and so, in the distributional sense:

$$\int_\mathbb{R} k^2 |F(k)|^2 \,{\rm d} k = \sqrt{\frac{2}{\pi}} \int_\mathbb{R} \delta(x)\,{\rm d}_x^2\left( (f * f^*)(x)\right)\,{\rm d} x = -\sqrt{\frac{2}{\pi}}\int_\mathbb{R} f^*(x) f^{\prime\prime}(x) {\rm d} x $$

Likewise:

$$\int_\mathbb{R} k |F(k)|^2 \,{\rm d} k = -i\sqrt{\frac{2}{\pi}}\int_\mathbb{R} f^*(x) f^{\prime}(x) {\rm d} x $$

Thus if we bring to bear standard calculus of variation techniques, calculating the variation of our integral to be minimised with the constraints accounted for by Lagrange multipliers we find:

$$2{\rm Re}\left(\int_\mathbb{R}\delta f^*(x)\left( -\sqrt{\frac{2}{\pi}}f^{\prime\prime}(x) +\lambda_1 f^\prime(x)+(\lambda_2 x^2 + \lambda_3 x + \lambda_4)\,f(x)\right){\rm d} x\right) = 0$$

where $\delta f^*(x)$ is an arbitrary variation function. Hence, functions which our integral is extremal for fulfill:

$$-\sqrt{\frac{2}{\pi}}f^{\prime\prime}(x) +\lambda_1 f^\prime(x)+(\lambda_2 x^2 + \lambda_3 x + \lambda_4)\,f(x)=0$$

a differential equation which defines a general Gaussian pulse $\exp(-a\,x^2 + b\,x + c)$ where $a,\,b,\,c\in\mathbb{C}$ and ${\rm Re}(a) > 0$. When we put such a general Gaussian pulse into the left hand side of (1), we find that its extremum is $1/2$.

So now on to the Heisenberg uncertainty principle. Given the relationships (3) and (4) and that, needfully from these relationships,  a change of state from $\hat{X}$ to $\hat{P}$ co-ordinates is made by a scaled Fourier transform, it is a simple matter to apply (9) to prove:

$$\tag{11}\sigma_\hat{\mathbf{X}} \sigma_\hat{\mathbf{P}} \geq \frac{\hbar}{2}$$

where $\sigma_\hat{\mathbf{X}}$ is the rms spread of a wavefunction pulse $\psi(x)$ in the position eigenspace and $\sigma_\hat{\mathbf{P}}$ is the rms spread of the same state $\Psi(p)$ written in the momentum eigenspace and that this inequality will hold for all pairs of observables fulfilling the canonical commutation relationships.