Why are the laws of thermodynamics “supreme among the laws of Nature”?

This question was put to Physics Stack Exchange here.

Eddington wrote:

The law that entropy always increases holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell’s equations — then so much the worse for Maxwell’s equations. If it is found to be contradicted by observation — well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.

and Einstein wrote

….[classical thermodynamics] is the only physical theory of universal content which I am convinced will never be overthrown, within the framework of applicability of its basic concepts.

Why did they say that? Is it a very deep insight they had, or is it something one can be convinced of quite easily? Or even, is it trivial?

which I answered as follows …….

Your statement is somewhat subjective, so really can only be answered by trying to put together what thoughts about physics such great physicists were thinking when they made their statements.

Firstly, the laws of thermodynamics have very different origins and putative theoretical justifications and indeed the Eddington quote only talks about the second.

I have heard the Eddington quote, but I don’t know a lot about the man because I’m afraid “I’ve never really forgiven him” for the following exchange:

Asked in 1919 whether it was true that only three people in the world understood the theory of general relativity, [Eddington] allegedly replied: ‘Who’s the third?

and so am wont to take him with a grain of salt (and, probably unjustifiably, neglected to find much out about him). However, James Clerk Maxwell thought something very like this about the Second Law and what he was getting at was that it was an emergent phenomenon from the laws of large numbers in probability theory, and a weak form of it can be derived from very basic assumptions quite independent of the details of the physical laws steering a system’s micro-constituents. First of all, consider the simple binomial probability distribution for, say, sampling of red balls from a population which is, say, 43% made up of red balls. If you take a sample of ten, then you’ll most likely to get four or five red ones, the likelihood of getting two or three or eight or nine is also very great. The simple number 0.43 does not tell you very much about the character of the kinds of samples you’ll get. However, if we take one million balls, the number of red ones will be $430\,000$ to within a very small proportion error, roughly of the order of $1/\sqrt{N}$, which is about 0.001 here. So even though the absolute number of red balls will vary quite widely from sample to sample, the simple statement “43% are red” characterises the sample extremely well. The binomial distribution gets “pointier and pointier” such that, even though the probability of getting exactly 43% red balls is fantastically tiny, almost all the possible arrangements, i.e. samples, look almost exactly like a sample with 43% red balls. The probability of getting, say, $420\,000$ or fewer, or $440\,000$ or more red balls out of a sample of one million is so small (roughly $10^{-90}$ !) that it can be neglected for all practical purposes:

A large sample looks almost exactly like the statistically expected sample, and this statement gets more and more accurate as the sample gets bigger and bigger

So too it is for, say, the derivation of the Boltzmann distribution from the microcanonical ensemble on the Wikipedia page “Maxwell-Boltzmann Statistics”. You have two Lagrange multipliers in this latter derivation, but the essential idea is almost exactly the same as the binomial distribution I’ve just talked about. You find out the most likely arrangement, given the basic assumption that all possible arrangements are equally likely. Stirling’s formula works exactly the same as it does when you approximate the binomial distribution for large samples. What the Wikipedia derivation (as do I think all of the ones I’ve seen in physics texts) glosses over is this following powerful idea:

The distribution gets “pointier and pointier” such that almost all of the arrangements look very like the maximum likelihood one. The probability of finding an arrangement significantly different in macroscopic character from the maximum likelihood one becomes vanishingly small in the thermodynamic limit of a large number of particles.

So then, in any system of a large number of particles there are states that look almost exactly like the maximum likelihood macrostate and there is almost nothing else.

Therefore, if for some reason, a system finds itself in a state that is significantly different from the maximum likelihood one, then it will almost certainly, through any random walk in its phase space, reach a state that is almost the same in macroscopic character as the maximum likelihood one. (The reason for the unlikely beginning state might be, for example, that one of we monkeys in white coats has created a system comprising a pellet of native sodium in a beaker of water. Kaboom!) This, of course, is a “laboratory form” of the second law of thermodynamics. At the level of universes, though, the grounding of the second law becomes much more experimental in nature (see the Physics SE question How do you prove the second law of the dynamics from statistical mechanics? and also my answer here, but the stunningly fundamental and simple reasons for its holding in its weak form as I talked about above give physicists deep reasons for believing that the second law is generally true.

But note in passing that my arguments do not work in the small. Entropy can and does fluctuate wildly in both directions for systems comprising small numbers of particles, see the review paper:

Sevick, E. M.; Prabhakar, R.; Williams, Stephen R.; Bernhardt, Debra Joy, “Fluctuation Theorems”, Annual Rev. of Phys. Chem., 59, pp. 603-633

this one is paywalled, so also see the Wikipedia page on the fluctuation theorem.

The first law, to wit, conservation of energy, is very different in character and grounding. Again, it is experimentally proven: it has been found in countless experiments over roughly two hundred years that systems behave as though they have a certain “budget” of work that they can do; it doesn’t matter how you spend that budget, but if you tally up the work that can be done by the system in the right way (i.e. as $\int_\Gamma \vec{F}\cdot{\rm d}\vec{s}$, or $\int_0^T V(t)I(t){\rm d}t$ in an electrical circuit and so on), then the amount of work that can be done will always be the same. There is also theoretical motivation for energy conservation: the idea of time shift invariance of physical laws. That is, physical laws must give foretell the same outcomes after we arbitrarily shift the time coordinate. Physics cannot be dependent on what we humans choose to be the $t=0$ time. Through Noether’s theorem, we find that this implies for physical systems with a Lagrangian description with no explicit time dependence that the total energy must be conserved.

It is ironic, therefore, that Einstein made the comment, given that his general relativity is one theory, wherein this time shift invariance breaks down. Global time cannot be defined on cosmological scales for a spacetime manifold fulfilling general relativity so our time shift invariance argument cannot be applied. Physicists therefore do not believe that conservation of energy holds for the whole universe (although there is still local conservation of energy in general relativity). I’m sure Einstein was aware of this flaw in his general statement, so, although first law has very solid grounds in almost any practical case we wish to consider, it seems probably Einstein was talking about the second law in particular.