How to Derive the Schrödinger Equation

I wrote several posts on Physics Stack Exchange about the “derivation” of the Schrödinger equation: here is a partial list:

  1. Is it possible to derive Schrodinger equation in this way?, where I talk about the equation’s “derivation” i.e. the meaning of the ideas the Schrödinger equation encodes and why and how it does encode them.
  2. Why complex functions for explaining wave particle duality?, where I talk about why complex numbers enter quantum mechanics, or indeed any branch of physics and I try to imagine the early lines of thought that lead to our current description of the deterministic, unitary evolution part of quantum mechanics (i.e. everything aside from quantum measurement) as an evolving state vector in “the” separable (i.e. with a countable orthonormal basis) complex Hilbert space.
  3. A Simple Explanation for the Schrödinger Equation and Model of Atom
  4. Why can the Schroedinger equation be used with a time-dependent Hamiltonian?
  5. Reference frame involved in the Schrödinger’s equation

I’ve gathered all this material into the following summary. Probably the best beginning point for all of this material and the reference that really got me started is the discussion of the Hamiltonian in the “Feynman Lectures on Physics” in chapter 8 “The Hamiltonian Matrix” of the third volume.

The Dawn: How Should Quantum Mechanics Look and Why Complex Rather than Real State Spaces Are Foremost

I believe the key words here are time shift invariance, flow, state, linearity (or homogeneity), continuity and unitarity and the answer to the question above is quite independent of whether or not we are doing quantum mechanics.

Suppose we are groping around in the dark trying to describe some new phenomenon as the early (actually 1920s) physicists were. In the tradition of Laplace, we hope that a deterministic system description will work (and the part of QM that uses complex numbers, namely unitary state evolution, is altogether and utterly deterministic). So we are going to begin with a state: some information – an array of numbers $\psi$ – real ones for now if you like – that will wholly determine the system’s future (and past) if the system is sundered from the rest of the Universe.

So we have some $\psi\in\mathbb{R}^N = X$ that is our basic description.

Now an isolated system evolves on its own: with the lapsing of time the system changes in some way. Let $\varphi: X\times\mathbb{R}\to X$ be our time evolution operation: for $x\in X$ $\varphi(x,t)\in X$ is what the state evolves to after a time $t$. Moreover, the description cannot be dependent on when we let it happen: it must have a basic time shift invariance. The experiment’s results cannot depend on whether I do it now or whether I wait till I’ve had my cup of tea. So the description of the change in some time interval $\Delta t$ must be the same as that for the change in any other $\Delta t$. If we’re not interacting with the system, then there are no privileged time intervals. Therefore, we must have:

$$\varphi(x,t+s) = \varphi(x,s+t) = \varphi(\varphi(x,t),s) = \varphi(\varphi(x,s),t),\;\forall s,t\in\mathbb{R}$$


$$\varphi(x,N \Delta t) = \underbrace{\varphi(\varphi(\cdots \varphi(\varphi(}_{N\ \text{iterations}}x,\overbrace{\Delta t),\Delta t)\cdots,\Delta t)}^{N\ \text{iterations}}$$

so, from our Copernican notion of time shift invariance we have our first next big idea: that of a flow defined by the state transition operator $\varphi$. So our state transition operators form a one parameter, continuous group. Here we have brought our fundamental idea of continuity to bear.

This may seem to be saying (since we now have a one-parameter Lie group) that the only systems that fulfill these ideas are linear ones, but this is not quite so: the Lie group only acts on $X$ so there is always the possibility of some local, $X$-dependent nonlinear “stretching” or “shrinking” of the path. So now we make an assumption of system linearity or of some other notion of a homogeneous action (so that, intuitively, $\varphi(-,t):X\to X$ conserves some local “structure” of our state space $X$). Then the only continuous, linear, homogeneous state transition flow operator on $X$ is:

$$\varphi(x, t) = \exp(H\,t) x \stackrel{def}{=} \left({\rm id} + H\,t + H^2\,\frac{t^2}{2!} + H^3\,\frac{t^3}{2!} + \cdots \right) x$$

for some linear operator $H:X\to X$. There are some other, wildly discontinuous operators fulfilling $\varphi(x,t+s) = \varphi(x,s+t) = \varphi(\varphi(x,t),s) = \varphi(\varphi(x,s),t),\;\forall s,t\in\mathbb{R}$; see the footnote at the end.

So now, to investigate the behaviour of this basic model, we need to think about the eigenvalues of $H$, so that, for example, we can “spectrally factorise” the operator (“diagonalise” it – or at least come as near as we can to diagonalising it – but this always can be done in quantum mecganics). The natural home for the eigenvalues of an operator are $\mathbb{C}$, not $\mathbb{R}$, because the former is an algebraically closed field and the latter is not. There exist finite dimensional linear operators whose characteristic equations have solutions in $\mathbb{C}-\mathbb{R}$. Moreover, we must consider the behaviours of $e^{\lambda\,t}$ where $\lambda$ are the eigenvalues of $H$ (this is easy in a finite dimensional space – for an infinite dimensional one, see my answer here to the Physics SE question “Are all scattering states un-normalizable?”– how do these square with our physics?

If, for any eigenvalue, $\lambda$ is real and positive, or if complex and its real part is real and positive, our state vector diverges to infinite length as $t\to\infty$. If they are all complex with negative real part, then our state vector swiftly dwindles to the zero vector. Even before we have crystallised the idea of probability amplitude properly, we may have the idea that we “want as much state to stick around for as long as possible”. The system must end up in some nonzero state: our particles don’t all collapse to the same state $0\in X$. So all eigenvalues being wholly imaginary so that $e^{\lambda\,t}$ has bounded magnitude that doesn’t rush of to infinity or to nought, might be a fair bet. Bringing this into sharper focus: having $\exp(H\,t)$ conserve length might be another reasonable assumption. The easiest “length” in state space is of course the $\mathbf{L}^2$ one. So now we might postulate, even before our ideas about probability amplitude in QM are fully crystallised, that:

The operator $\exp(H\,t)$ is unitary

This equivalently means that:

$H$ is skew-Hermitian

or that our operator is $\exp(i\,\hat{H}\,t)$ where:

$\hat{H}$ is Hermitian

Hermitian operators, given very mild assumptions, are always diagonalisable, and they always have real eigenvalues. So entities like $\exp(i\,\omega\,t)$ where $\omega \in \mathbb{R}$ are inevitable in our state transition operator expansions.

Now we could keep everything real, use $\cos$ and $\sin$ instead and keep our state space as $\mathbb{R}^{2 N}$ were we would have $\mathbb{C}^N$ in the conventional description. Whether or not we choose to single out an object like:

$$\left(\begin{array}{cc}0&-1\\1&0\end{array}\right)\in U(1), SU(2), SO(3), U(N) \cdots$$

and give it a special symbol $i$ where $i^2=-1$ is a “matter of taste”, so in this sense the use of complex numbers is not essential. Nonetheless, we would needfully still meet this object in decomposing our state transition operator and $H$, $\hat{H}$ operators and ones like it and would have to handle statements involving such objects when describing physics – there’s no way around this. So, in particular, if we have bounded, but everlasting wave behaviour, we must be encountering pairs of states in our $\mathbb{R}^{2 N}$ representation of the conventional $\mathbb{C}^N$ that evolve with time through linear differential operators with submatrices like the $i$ object above.

So you actually see that complex numbers arise very naturally out of the very classical and indeed Renaissance ideas of people like Laplace, Copernicus, Leibnitz, Galileo and Newton. We simply have a better and more refined mathematical language to talk smoothly about these ideas where these guys were groping in the dark.

Bringing It All Into Sharper Focus

Now we fast forward to now when we understand much more deeply the probability amplitude interpretation of a system’s quantum state. Often in quantum mechanics, we replace the state space $\mathbb{R}^N$ talked about above by $\mathcal{H}$, “the” separable Hilbert space with a countably infinite orthonormal basis. Let’s for example consider a quantum harmonic oscillator, so we shall encode the state as a discrete sequence of complex numbers $\Psi = \{\psi_0, \psi_1, \cdots\}$, such that $\sum_j |\psi_j|^2 = 1$. $\psi_0$ is the probability amplitude that the system will be detected quantum ground state, i.e. as close as one can get to “unenergised” without violating the Heisenberg inequality, $\psi_1$ the probability amplitude that the oscillator is in a one photon state, i.e. its energy is $\hbar \omega$, $\psi_2$ the amplitude that it is two photon state, and in general $\psi_N$ the attitude that is in an $N$-photon state; or, if you like, the amplitude that it has had $N$-photons added to its ground state from somewhere outside the oscillator system. More generally, the $\psi_j$ are the probability amplitudes that the system will be detected as being in the $j^{th}$ basis state: one of the basis vectors for the state Hilbert space and they don’t have to be the equispaced states of the harmonic oscillator – it could be another system altogether. Obviously, the system must always be in some state, so the relationship $\sum_j |\psi_j|^2 = 1$ always holds.

The Schrödinger equation is very general: it simply says that a quantum system’s makeup and working is in some sense “constant” when the system is sundered from the rest of the World. This vague statement makes more sense in symbols: the mathematical description has to be invariant with respect to time shifts: if I begin with a quantum state at 12 o’clock and evolve it until 1 o’clock, then my state evolution is going to be the same as if I began with the same state at 4 o’clock and waited until five. Now, we assume linearity, so that our state vector (now written as a column vector) is going to evolve following some matrix equation: $\psi(t) = U(t) \psi(0)$, where state transition matrix $U(t)$ must:

  1. Fulfil $U(t+s) = U(t) U(s) = U(s) U(t)$ for any time intervals $t$ and $s$. This is simply our discussion about time shift invariance above. Straight away we know $U(t) = \exp(A t)$, for some constant matrix $A$ as the exponential is the only continuous function with this time shift invariance property;
  2. It must be unitary: this means it must conserve norms, so that $\sum_j |\psi_j|^2 = 1$ holds at all times: this simply says that the system has to be in some state, owing to the probability interpretation of the squared magnitudes.

So the most general state evolution possible is $\psi(t) = \exp(-\hbar^{-1} i\, \hat{H}\, t)\,\psi(0)$, where $\hat{H}$ is a constant, Hermitian matrix (this is equivalent to the unitarity statement). This in turn is equivalent to:

$$i\,\hbar\,\mathrm{d}_t \psi = \hat{H}\,\psi$$

which is the Schrödinger equation (see footnote about the mysterious constants $\hbar$ and $i$). Hopefully the Schrödinger’s equation’s essential nature should now be clear:

The Schrödinger equation for a quantum system asserts the system’s time shift invariance and the linearity of its state evolution when that system is sundered from the rest of the World.

Sometimes one comes across time dependent Schrödinger equations. This arises in two cases:

  1. For some reason, we may wish to invoke a unitary co-ordinate transformation that “rotates” the Hilbert space basis so as to keep basis vectors “aligned” with some part of our problem. The interaction picture is a good example where we want to split our isolated system into a “main” subsystem with known behavior and contribution to the Hamiltonian and a “perturbation” system with a perturbing contribution to the Hamiltonian.
  2. There may be a parameter “controlling” the quantum system that varies with time. For example, we might have an electron in a classical magnetic field which we can vary to “steer” the electron’s state. The full analysis of this situation would model the quantum electromagnetic field as well and then the whole kit – electron and quantised field – would go into the quantum state, so that the whole quantum system would be time-shift-invariant exactly as above. This is only in principle and a realistically doable problem comes from a semi-classical model: the magnetic field is simply some precisely known time varying parameter of $\hat{H}$, so that, when the electron is studied as a quantum one-body problem with $\hat{H} = \hat{H}(t)$ varying.

So, we now “roll our co-ordinate reference axes around” by some unitary co-ordinate-mapping matrix $\mathbf{V}(t)$ where:

$$\mathrm{d}_t \mathbf{V}(t) = -\frac{i}{\hbar}\,\tilde{K}(t)\,\mathbf{V}(t) = -\frac{i}{\hbar}\,\mathbf{V}(t)\,K(t)$$

where $i\, K(t),\,i\, \tilde{K}(t)\in \mathfrak{u}(N)$ the (possibly infinite dimensional) Lie algebra of skew-symmetric matrices and there are two alternatives because left translation or right translation of neighbourhoods of the identity work equally well to define the Lie group topology. Therefore, we now have our new state $\psi^\prime(t) = \mathbf{V}(t) \psi(t)$ so that:

$$\begin{array}{lcl}\mathrm{d}_t \psi^\prime(t) &=& \mathrm{d}_t(\mathbf{V}(t))\, \psi^\prime(t) + \mathbf{V}(t) \,\mathrm{d}_t \psi^\prime(t) \\
&=& -\frac{i}{\hbar}\left(\mathbf{V}(t)\,K(t)\,\mathbf{V}^{-1}(t) + \mathbf{V}(t)\,\hat{H}(t)\,\mathbf{V}^{-1}(t)\right)\,\psi^\prime(t) \\
&=& -\frac{i}{\hbar} \hat{H}^\prime(t)\,\psi^\prime(t)\end{array}$$

where we can always find an $i\,\hat{H}^\prime(t) \in \mathfrak{u}(N)$ such that $\hat{H}^\prime(t) = \mathbf{V}(t)\,K(t)\,\mathbf{V}^{-1}(t) + \mathbf{V}(t)\,\hat{H}(t)\,\mathbf{V}^{-1}(t)$, as we can understand either because:

  1. We can left translate a $C^1$ path through the identity with tangent $\frac{i}{\hbar}(K + \hat{H})$ there to a $C^1$ path through $\mathbf{V}$ with tangent $\frac{i}{\hbar}\,\mathbf{V}\,(K + \hat{H})$ there and then right translate this path back to a (in general different) $C^1$ path through the identity with the in general different tangent $\frac{i}{\hbar} \hat{H}^\prime = \frac{i}{\hbar}\,\mathbf{V}\,(K + \hat{H}) \mathbf{V}^{-1}$ there; or
  2. More jargonistically, a Lie group acts on its own Lie algebra through conjugation in the group’s adjoint representation.

So now we have another perfectly valid Schrödinger equation, only this time it is time-varying:

$$i\,\hbar\,\mathrm{d}_t \psi^\prime = \hat{H}^\prime(t)\,\psi^\prime$$

and the only requirement of $\hat{H}$ is that it must be Hermitian. Of course when we roll around in the Lie group like this, physical observables become themselves time varying, as they do in both the interaction and Heisenberg picture.

Now if we relax the time-shift-invariance constraint we still must have unitary evolution of the state transition matrices to make sure that norms and thus probabilities are conserved. So now our state transition operator still stays in the (possibly infinite dimensional) Lie group $\mathfrak{U}(N)$ but it no longer traces a one-parameter group (i.e. left / right translationally-invariant flow line) but rather traces some general $C^1$ path through the Lie group. Its time derivative must be in the left or right translated Lie algebra so now we have:

$$\mathrm{d}_t \mathbf{U}(t) = -\frac{i}{\hbar}\,\hat{H}(t)\,\mathbf{U}(t) = -\frac{i}{\hbar}\,\mathbf{U}(t)\,\tilde{\hat{H}}(t)$$

for some $\frac{i}{\hbar}\,\hat{H}(t), \frac{i}{\hbar}\,\tilde{\hat{H}}(t)\in \mathfrak{u}(N)$ and we still have a Schrödinger equation:

$$i\,\hbar\,\mathrm{d}_t \psi = \hat{H}(t)\,\psi$$

Other Pictures

Schrödinger’s equation is not the only way to make the above assertion of time shift invariance, which leads me to the discussion of “pictures”, sometimes, highly unhelpfully, called “frames” or “frameworks”. Sometimes it is easier to analyse a system’s evolution in what is called the Heisenberg picture. In quantum mechanics the only “real” things are measurements, represented by observables, which are Hermitian matrices (operators). So the only “real” quantities are the moments of the probability distribution for the measured quantity: if the quantity is measured by an observable $\hat{M}$ then the $n^{th}$ moment of the probability distribution for the value of that measurement when the system state is $\psi$ is $\psi^\dagger \hat{M}^n\psi$ in matrix notation, or in bra-ket notation $\left<\psi|\hat{M}^n|\psi\right>$. One can think of a system’s state being constant when the system is isolated and of the observables themselves evolving in time. Since only the measurements matter this is altogether acceptable as long as the values of measurements don’t change: the measurement evolves with first-time derivative $\mathrm{d}_t\left<\psi|\hat{M}^n|\psi\right>$ and, if we use Leibniz’s rule and plug in the time evolution of $\psi$ described by the Schrödinger equation we get:

$$\mathrm{d}_t \psi^\dagger \hat{M} \psi = \frac{i}{\hbar} \psi^\dagger [\hat{H}, \hat{M}]\psi $$

We can now see that the measurements will evolve exactly the same way as they would if the system evolved as described by Schrödinger’s equation if we think of the state $\psi$ as constant and if the observables instead evolve following:

$$\mathrm{d}_t \hat{M} = \frac{i}{\hbar} [\hat{H}, \hat{M}]$$

This is the equation of motion for an observable in the Heisenberg picture (“frame”). I’m pretty sure somewhere Feynman say the Heisenberg picture is like doing quantum mechanics in a rotating frame in his lecture series. He is of course being metaphorical. Also note that, because we want the Heisenberg equation to hold for any observable, its form is very constrained. In particular, the operation on the right has to be a derivation (something which fulfils the Leibnitz product rule, which the Lie bracket is) so that if observables $\hat{A}$ and $\hat{B}$ fulfill the Heisenberg equation, so too do $\hat{A}^n$, $\hat{B}^n$ and $i [\hat{A}, \hat{B}]$, which can also be (Hermitian) observables.

From especially the Heisenberg equation, one can readily see that any observable that commutes with the constant matrix $\hat{H}$ defines an observable whose measurements are constant with time. So $\hat{H}$ is hypothesised to be the *energy* observable, since in classical physics energy is the conserved “current” corresponding, by Noether’s theorem, to invariance with respect to time shifts.

The Stone-von Neumann theorem has more to say about the unitary equivalence between the Heisenberg and Schrödinger pictures.


There are some weird and quite wonderful everywhere discontinuous solutions to the functional equation $\mathbf{U}(t+s) = \mathbf{U}(t) \mathbf{U}(s) = \mathbf{U}(s) \mathbf{U}(t)$ if you’re interested in Hewitt and Stromberg, “Real and Abstract Analysis”, Springer-Verlag, Berlin, 1965 Chapter 1, §5 but we throw these out as “unphysical”. However, in the case of a semisimple, compact Lie group the weird and wonderful Hewitt and Stromberg behaviour is eliminated because there is only one possible topology for such a Lie group that makes the group a Lie group. Or, otherwise put, there are no automorphisms of the group (thought of only as an abstract, non-Lie group) that are not also Lie group automorphisms, so all automorphisms are also group topology homeomorphisms for compact semisimple Lie groups. This amazing fact was proven by van der Waerden in 1932 (van der Waerden, B. L., Mathematische Zeitschrift **36** pp780 – 786). So we don’t even have to worry about assuming continuity: semisimplicity and compactness do the job for us. We nearly have this in the above discussion: $\mathfrak{SU}(N)$ is compact and simple, but $\mathfrak{U}(N)$ has a continuous, Lie group centre $\mathfrak{U}(1)$ and $\mathfrak{U}(N) = \mathfrak{SU}(N) \otimes \mathfrak{U}(1) = \mathfrak{U}(1) \otimes \mathfrak{SU}(N)$). I think this is quite profound from a physics standpoint, since most often we have to make assumptions about the mathematical objects that are very strong (i.e highly specializing) mathematical assumptions: smoothness, for example. Here is an example where the coarsest of mathematical assumptions (one parameter groups that are not even assumed continuous) yields smoothness, indeed $C^\omega$ (analytic) behavior. Sadly, my gut feeling is that this point is likely the only thing in this post which will not survive the passage to countably infinite dimensions: van der Waerden’s stunning little trick relies wholly on finiteness of dimension.