# Non-Normalisable States and Rigged Hilbert Space in Quantum Mechanics

Often in Quantum Mechanics it makes life easier to think of non-normalisable states which have the form of the Dirac delta function $\delta(z)$ or the momentum eigenstate in position co-ordinates $\exp\left(i\,\frac{p}{\hbar}\,z\right)$. Normalisable states can be built up as superpositions of the Dirac delta. Now clearly these functions (or distributions) lie outside $\mathbf{L}^2(\mathbb{R})$. How do we reconcile them and make our ideas rigorous?

Practically speaking, the reason why there are always such states it is because observables fulfilling the canonical commutation relationship $[X,P]=i\,\hbar\,I$ have eigenvectors which are non-normalisable.

Actually there is no reason why we must have eigenvectors, so at a deeper level, the basic reasons why there are always non-normalised states are (1) convenience – the need for wieldiness of mathematical description and (2) the mathematical ingenuity of the people who gave us this wieldy and handy mathematical description – most notably the genius of (in rough historical order), Paul Dirac, Laurent Schwartz, Alexander Grothendieck and Israel Gel’Fand. This discussion keeps the intuitive ideas of eigenvectors and other convenient tools on a unified and rigorous footing.

## Grounding Ideas

The basic setting for quantum mechanics is Hilbert space, namely a complete (in the sense that every Cauchy sequence converges to a member of the space) vector space kitted with an inner product (Banach spaces are a weaker and more general concept – being complete vector spaces kitted simply with a norm. The norm in a Hilbert space comes from the inner product of a vector with itself).

So, intuitively, it is a complex vector space like $\mathbb{C}^N$ with “no holes in it” so we can talk about limits and do calculussy kind of stuff without worrying whether limits exist and, wherein we can talk about linear superposition and wherein we can “resolve” vectors uniquely into components through the inner product. So it’s pretty much the state space of any physical system, aside from being complex, which is slightly unusual.

Now we look at the idea of a linear functional on a Hilbert space $\mathcal{H}$. This is simply a linear function $L:\mathcal{H}\to\mathbb{C}$ mapping the Hilbert space $\mathcal{H}$ to the underlying field (in this case $\mathbb{C}$. The inner product for some fixed “bra” $\ell\in\mathcal{H}$, namely the function $x\mapsto\left<\ell,x\right>$ is clearly a special case of this linear functional notion. However, in Hilbert space, every continuous linear functional can indeed be represented by a “fixed bra” inner product and, since every fixed bra inner product clearly induces a continuous linear functional, the ideas of continuous linear functional and inner product with a fixed bra are precisely the same notion: this equivalence does NOT hold in any old vector space. This key equivalence property is special to Hilbert spaces is the subject matter of the Riesz representation theorem (see Wiki page of this name). So the continuous (topological) dual $\mathcal{H}^*$ of $\mathcal{H}$, being a poncy name for the vector space of continuous linear fucntionals on $\mathcal{H}$, is *isomorphic to the original Hilbert space*.

It can be shown to be an alternative and altogether equivalent definition of “Hilbert space” as the one above (i.e. a complete inner product space) is:

An inner product space which is isomorphic to its dual space of continuous linear functionals.

All this is very slick and attractive for describing things like quantum mechanics. It is also very easy in finite dimensional quantum systems, such as e.g. an electron constrained to a being a superposition of spin up and down states. In finite dimensions, there is no difference at all between the notions of continuous linear functional and the more general one of simply a linear functional (i.e. without to heed to continuity).

## Rigging the Hilbert Space: Nonnormalisable States

In infinite dimensions – as with the quantum state space of the harmonic oscillator or the electron bound to a potential, we meet a glitch:

Not all linear functionals are continuous.

OOOPS: so just as we covet our neighbour’s iPhone 5 when we have “only” model 4, so too we covet a stronger concept than Hilbert space wherein a software upgrade would make all “useful” linear functionals continuous!

Less flippantly, here is where we get practical. In quantum mechanics, we need to implement the Heisenberg uncertainty principle, so we need Hermitian observables $\hat{X}$ and $\hat{P}$ fulfilling the canonical commutation relationship (CCR) $[\hat{X},\,\hat{P}]=i\,\hbar\,I$ (see my answer here and here). It’s not too hard to show that a quantum space truly implementing the HUP cannot be finite dimensional – if it were, then $\hat{X}$ and $\hat{P}$ would have square matrix representations and the Lie bracket $[\hat{X}, \hat{P}]$ between any pair of finite square matrices has a trace of nought, whereas the right hand side of the CCR certainly does not have zero trace. So we consider them to be operators on the Hilbert space $\mathbf{L}^2(\mathbb{R}^N)$, which is a Hilbert space with dimensionality $\aleph_0$, i.e. it has countably infinite basis vectors, for example, the eigenfunctions of the $N$-dimensional harmonic oscillator. Vectors in this Hilbert space are “everyday wavefunctions” $\psi:\mathbb{R}^N\to\mathbb{R}^N$ as conceived by Schrödinger with the crucial normalisability property:

$$\int\limits_{\mathbb{R}^N} |\psi(\vec{x})|^2\,{\rm d}^N x < \infty$$

Now, for convenience, we want to work in co-ordinates wherein one of $\hat{X}$ and $\hat{P}$ is the simple multiplication operator $X \psi(x) = x\,\psi(x)$. In my answer here I show that this means that there are co-ordinates where $X \psi(x) = x\,\psi(x)$ and, needfully $\hat{P} \psi(x) = -i\,\hbar \,{\rm d}_x \psi(x)$.

However, neither of these operators is defined on our whole Hilbert space $\mathcal{H} = \mathbf{L}^2(\mathbb{R}^N)$: there are vectors (functions) $f$ in $\mathbf{L}^2(\mathbb{R}^N)$ (e.g. functions with jump discontinuities) which have no defined $P\,f\in\mathcal{H}$, owing to the derivative’s being undefined at the discontinuity. Likewise, some normalisable functions $g$ have no defined $X\,g\in\mathcal{H}$; multiplication by $\vec{x}$ makes them unnormalisable (witness for example the function $f(x) = (1+x^2)^{-1}$).

Furthermore, neither of these functions has eigenvectors in $\mathcal{H}$: if $X\,f(x) = \lambda f(x) = x f(x)\,\forall x\in\mathbb{R}$ then $f(x) = 0$ for $x\neq\lambda$ and the eigenfunction $e^{-i\,k\,x}$ of $P$ is not normalisable.

But we want to salvage the idea of eigenstates and still be able to write our states in position or momentum co-ordinates.

Here is where the notion of Rigged Hilbert Space comes in – the ingenious process where we kit a dense subset $S\subset H$ of the original Hilbert space $H$ (“rig it”) with a stronger topology, so that things like the Dirac delta are included in the topological dual space $S^*$ where $S\subset H\subset S^*$.

For QM we take the dense subset $S$ to be the “smooth” functions that still belong to $\mathcal{H}$ when mapped by any member of the algebra of operators generated by $X$ and $P$. That is, $S$ is invariant under this algebra and comprises precisely the Schwartz space of functions than can be multiplied by any polynomial and differentiated any number of times and still belong to $\mathcal{H}$. Any function in $\mathcal{H}$ can be arbitrarily well (with respect to the Hilbert space norm) by some function in $S$.

At the same time, we kit the dense subset $S$ out with a stronger topology than the original Hilbert space one. Why do we do this? One of the basic problems with $\mathcal{H}$ is that the Dirac delta $\delta:\mathbf{L}^2(\mathbb{R})\to \mathbb{C};\;\delta\;f(x) = f(0)$, which can be construed as an eigenvector of $X$, is not a continuous linear functional on $\mathcal{H}$ even though of course it is a linear functional. To see this, consider the image of $f(x) + \exp(-x^2/(2 \sigma^2)$ under the delta funcional: we can choose a $\sigma$ to make this function arbitrarily near to $f(x)$ as measured by the $\mathbf{L}^2$ norm, but with images $f(0)$ and $f(0)+1$, respectively, under the Dirac $delta$. So we kit the dense subset $S$ out a topology that is strong enough to “ferret out” all useful linear functionals and *make* them continuous. We now have a topological dual (space of all linear functionals continuous with respect to the stronger topology) $S^*$ of $S$ such that $S\subset\mathcal{H} = \mathcal{H}^*\subset S^*$.

$S^∗$ is the space of tempered distributions as discussed in my answer here. $S^∗$ includes the Dirac delta, $e^{i\,k\,x}$ and is bijectively, isometrically mapped onto itself by the Fourier transform. Intuitively, functions and their Fourier transforms are precisely the same information for the tempered distributions. This ties in with the fact that position and momentum co-ordinate are mapped into each other by the Fourier transform and its inverse.

So there we have it. We now have a space of bras $S^*$ that is strictly bigger than the space of kets $\mathcal{H}$ and it needfully includes, by the construction of the rigged Hilbert space, nonnormalisable bras in $S^*\sim\mathcal{H}$ simply so that we can discuss eigenstates of all the observables we need in a rigorous way.

Good references for these notion are:

In the latter, Todd Trimble’s suspicions are correct that the usual Gel’Fand triple is $S\subset H = \mathbf{L}^2(\mathbb{R}^N)\subset S^*$ with $S$ , $S^∗$ being the Schwartz space and tempered distributions as discussed in my answer here. The Wikipedia article on rigged Hilbert space is a little light on here: there’s a great deal of detail about nuclear spaces that’s glossed over so at the first reading I’d suggest you should take a specific example $S$ = Schwartz space and $S^∗$ = Tempered Distributions and keep this relatively simple (and, for QM most relevant) example exclusively in mind – for QM you won’t need anything else. The Schwarz space and space of tempered distributions are automatically nuclear, so you don’t need to worry too much about this idea at first reading.

## More on Tempered Distributions and Fourier Transforms

We define tempered distributions as follows. Firstly, we consider the Schwartz space $\mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right)$ of complex valued functions defined on $\mathbb{R}^N$ (in this case you have a time variation) which are (i) smooth (i.e. have derivatives in all directions of all orders) and (ii) which themselves as well as all their derivatives dwindle “more swiftly than polynomial speed” to nought at infinity; these two conditions can be summarised by $\left\|\, \left| \mathbf{x} \right|^\alpha D^\beta \psi\left(\mathbf{x}\right) \right\| < \infty,\,\forall \alpha, \beta \in \mathbb{Z}^+$ and $D$ is any first order differential operator – there is only one such operator – to wit – $\mathrm{d}_t$ when we’re dealing with time variations (i.e. $N=1$) but the ideas hold equally well in any number of dimensions..  The Fourier transform $\mathfrak{F}\,\psi$ of any $\psi \in \mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right)$ is then defined and also belongs to the Schwartz space i.e. $\psi \in \mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right) \Leftrightarrow \mathfrak{F}\, \psi \in \mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right)$.

Furthermore, the kernel of  $\mathfrak{F}:\mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right)\mapsto \mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right)$ is trivial, to wit $\mathfrak{F}\psi = 0, \psi \in \mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right) \Rightarrow \psi = 0$. Lastly, every member of the Schwartz space is the Fourier transform of some other member of the Schwartz space. Thus, in the Scwartz space:

Schwartz functions on $\mathbb{R}^N$ and their Fourier transforms constitute exactly the same information

However, the Schwartz space does not include all fields of interest to us: we may wish to define piecewise continuous boundary conditions, nondiminishing time variations (e.g. pure sinusoid) which, although smooth, do not fulfill the swift decay criterion but only the much weaker $\left|\mathbf{r}\right| \psi\left(\mathbf{r}\right) \rightarrow 0$ as $\left|\mathbf{r}\right| \rightarrow \infty$ and indeed $\left|\mathbf{r}\right|^2 \psi\left(\mathbf{r}\right)$ generally diverges. Therefore, we consider the topological dual space $\mathscr{S}^\prime\left(\mathbb{R}^N, \mathbb{C}\right)$ of all complex-valued linear functionals on the Schwartz space; the topological dual is defined by a stronger topology than simply the $\mathbf{L}^2$ norm of the original Hilbert space $\mathbf{L}^2(\mathbb{R}^N)$. This stronger topology is the one induced by the family of norms:

$$\rho_{\alpha,\,\beta}(f) \stackrel{def}{=} \sup\limits_{\mathbf{u}\in\mathbb{R}^N} \left.||\mathbf{x}|^\alpha\,D^\beta f(\mathbf{x})|\right|_{\mathbf{x}=\mathbf{u}}$$

Thus, for example, the Dirac delta is a continuous linear functional on $\mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right)$ kitted with this topology, but it is not continuous on the Hilbert space $\mathbf{L}^2(\mathbb{R}^N)$ kitted with the original $\mathbf{L}^2$ norm. So we’ve brought the stronger topology to bear to ferret out all the linear functionals which are useful to us (Dirac delta, multiplication operator $f(x)\mapsto x\,f(x)$, $f(x)\mapsto e^{i\,k\,x}\,f(x)$ and so forth) which the original topology couldn’t “sniff out”.

The members of $\mathscr{S}^\prime\left(\mathbb{R}^N, \mathbb{C}\right)$ are known as tempered distributions or sometimes generalised functions. We can think of an ordinary scalar field $\psi\left(\mathbf{r}\right)$ as the linear functional $\Psi : \mathscr{S}^\prime\left(\mathbb{R}^N, \mathbb{C}\right) \rightarrow \mathscr{S}^\prime\left(\mathbb{R}^N, \mathbb{C}\right): \varphi \in \mathscr{S}\left(\mathbb{R}^N, \mathbb{C}\right) \mapsto \int_{\mathbb{R}^N} \psi\left(\mathbf{u}\right) \varphi\left(\mathbf{u}\right) \mathrm{d}^N u$; given a linear functional $\Psi \in \mathscr{S}^\prime\left(\mathbb{R}^N, \mathbb{C}\right)$, we can recover the ordinary function by evaluating, for example,  $\Psi\left(\kappa^N \exp\left(-\kappa^2 \left|\mathbf{r} – \mathbf{u}\right|^2\right) / \pi^{3/2}\right)$ and taking the limit as $\kappa\rightarrow\infty$. If the linear functional in question is defined as just described from an ordinary function, the ordinary function’s value at $\mathbf{r}\in\mathbb{R}^N$ is recovered by the limit. If not (e.g. if the functional is the delta function), the limit will not exist at all points. The tempered distributions also have the useful properties that:

1. A tempered distribution’s Fourier transform is also a tempered distribution;
2. Any tempered distribution is the Fourier transform of a tempered distribution; and
3. The Fourier transform’s kernel is trivial. More pithily, $\mathfrak{F}$ is then a unitary bijection (one-to-one, onto map) from $\mathscr{S}^\prime\left(\mathbb{R}^n, \mathbb{C}\right))$ onto itself.

So now we have again:

Tempered distributions on $\mathbb{R}^N$ and their Fourier transforms constitute exactly the same information

and almost anything dreamt up in practical problems can be represented by a tempered distributions.

Some good references for these concepts:

1. E. M. Stein and G. L. Weiss, Introduction to fourier analysis on euclidean spaces, Princeton University Press, 1990, Chapter 1, especially Sections 2 and 3 in that chapter.

2. M. J. Lighthill, “Introduction to fourier analysis and generalised functions”, Cambridge University Press, Cambridge, U.K., 1996, Chapter 4, especially  section 4.2 and Theorem 17. Lighthill uses the slightly offbeat name “good” function for any Schwartz space member; aside from the unwonted nomenclature, this is an excellent readable reference.

3. J. K. Hunter and B. Nachtergaele, “Applied analysis”, World Scientific Publishing Company Incorporated, Singapore, 2005, Chapter 11, section 11.2.