[this is an unfinished post! I may never finish it, or may substantially rewrite, add-to, or split up parts of it; some sentences trail off, and it has no ending]
It is well recognized that code can age poorly, but like many things, code can also age well. Like a fine cheese or wine, code can get better when left unattended.
In particular, I want to talk about how code "ages efficient" - as optimizations improve, and your language or system gets those optimizations implemented, your code will sometimes get more efficient for free.
But! Just like aging grapes need the right preparation to turn into wine instead of rotting slush, your code needs to be written in the right ways to be amenable to optimizations, instead of resisting and defeating optimizations.
Remember the XOR swap? For a brief moment in history, it was a more efficient way to swap variable values than using a temporary variable, because a compiler with no or minimal optimizations would just save that temporary variable on the stack. Then one day, compilers learned to keep a map in memory between registers used by the machine code and variables used by the source code - this was very simple, and had huge sweeping efficiency benefits across almost all code, because the compiler no longer had to keep moving stuff between the stack and registers for every single operation. And just like that, with no deliberate optimization of that specific special case, variable swaps through a temporary variables became free no-ops. Meanwhile the XOR swap remained three XOR operations, because it is much harder, and far less universally valuable, to write analysis that can symbolically simulate several operations on all possible inputs and notice when they cancel each other out, what effects they may have, and what other operations have the same effects but are more efficient.
We see this pattern repeat across software history.
Manual loop unrolling was more efficient once, but a good compiler or JIT optimizing VM can unroll your loops for you, and unlike you manually tweaking the source, the automatic optimization is in a better position where to choose the unrolling that best for the current target hardware.
Duff's Device was more efficient once, but it's easier and more broadly useful to have algorithms that can turn a clear simple loop into one of several options including a Duff's Device, than to have algorithms that can reverse-engineer the clear simpler loop from a Duff's Device.
The thing is, it comes down to just one simple principle: write easy-to-analyze code:
Write code that sieves out and collapses possibilities as soon as possible. The sooner you error out on bad data or programmer error, the more of the code paths after can be transformed in ways that are more efficient but only produce the same result for the correct inputs.
Write code that maximizes locality of behavior and correctness.
Write code that most clearly and directly expresses your big-picture intended result. The more the intended result has to be inferred from the code, the harder it is
Write code that says what you mean in the semantics of that programming language,
Conveniently, you have a human brain, which is very limited in terms of working memory and speed, so when rigorously stepping through code, you can literally feel when code is harder or easier to analyze based on how much effort it takes to keep track of the full possibility space of what could happen at each operation.
This, by the way, is why coding for the human is incidentally often coding for the optimizer. An optimizer needs to prove that certain things can't happen to apply optimizations. The further it has to look to