Discover Top Posts Tagged with #aging efficient

A little example of clear abstract code "aging efficient":

if you use Arm's solution in pure software to implement pow(x,y) on an x86 machine, then it'll go 5x faster than Intel's native x87 instructions for doing the same thing. [alleged here]

If true, and I suspect it is (I would just want to verify how/why, and if any edge cases were left on the cutting room floor, just in case) then code which went out of its way to use inline assembly instead of the "pow" C function is now slower - assuming your C library or compiler takes advantage of these techniques (notably, even if there are edge cases where it behaves differently than the instruction, if the C compiler can prove that you will never hit them, or if it would be undefined behavior per the C standard to hit them, your code could benefit from this at any time).

Code for the optimizer.

#software #code for the optimizer #aging efficient

Aging Efficient

[this is an unfinished post! I may never finish it, or may substantially rewrite, add-to, or split up parts of it; some sentences trail off, and it has no ending]

It is well recognized that code can age poorly, but like many things, code can also age well. Like a fine cheese or wine, code can get better when left unattended.

In particular, I want to talk about how code "ages efficient" - as optimizations improve, and your language or system gets those optimizations implemented, your code will sometimes get more efficient for free.

But! Just like aging grapes need the right preparation to turn into wine instead of rotting slush, your code needs to be written in the right ways to be amenable to optimizations, instead of resisting and defeating optimizations.

Remember the XOR swap? For a brief moment in history, it was a more efficient way to swap variable values than using a temporary variable, because a compiler with no or minimal optimizations would just save that temporary variable on the stack. Then one day, compilers learned to keep a map in memory between registers used by the machine code and variables used by the source code - this was very simple, and had huge sweeping efficiency benefits across almost all code, because the compiler no longer had to keep moving stuff between the stack and registers for every single operation. And just like that, with no deliberate optimization of that specific special case, variable swaps through a temporary variables became free no-ops. Meanwhile the XOR swap remained three XOR operations, because it is much harder, and far less universally valuable, to write analysis that can symbolically simulate several operations on all possible inputs and notice when they cancel each other out, what effects they may have, and what other operations have the same effects but are more efficient.

We see this pattern repeat across software history.

Manual loop unrolling was more efficient once, but a good compiler or JIT optimizing VM can unroll your loops for you, and unlike you manually tweaking the source, the automatic optimization is in a better position where to choose the unrolling that best for the current target hardware.

Duff's Device was more efficient once, but it's easier and more broadly useful to have algorithms that can turn a clear simple loop into one of several options including a Duff's Device, than to have algorithms that can reverse-engineer the clear simpler loop from a Duff's Device.

And so on.

The thing is, it comes down to just one simple principle: write easy-to-analyze code:

Write code that sieves out and collapses possibilities as soon as possible. The sooner you error out on bad data or programmer error, the more of the code paths after can be transformed in ways that are more efficient but only produce the same result for the correct inputs.

Write code that maximizes locality of behavior and correctness.

Write code that most clearly and directly expresses your big-picture intended result. The more the intended result has to be inferred from the code, the harder it is

Write code that says what you mean in the semantics of that programming language,

Conveniently, you have a human brain, which is very limited in terms of working memory and speed, so when rigorously stepping through code, you can literally feel when code is harder or easier to analyze based on how much effort it takes to keep track of the full possibility space of what could happen at each operation.

This, by the way, is why coding for the human is incidentally often coding for the optimizer. An optimizer needs to prove that certain things can't happen to apply optimizations. The further it has to look to

#aging efficient #software #software design #software optimization #code for the optimizer #code for the human

Code for the Optimizer

After coding for the human, our next priority should be coding for the optimizer. The desire to write the most optimized code is good, but the best way to do that is by writing the most optimizable code.

The easier and simpler it is to analyze code - to prove what can or can't happen, what inputs can have what values, which values depend on each other, when resources are last used, and so on - the lower the bar for turning it into the most efficient form for any given targets and use patterns.

Hand-optimizing code usually destroys or severely obfuscates exactly that information, and often is only more optimal given additional assumptions that are invisible in the code and become false as the underlying hardware and software improves.

This is also a very constructive perspective for preventing premature optimization. Sure, don't optimize until your code has proven itself too slow or too big and you have measured where the worst execution time or space costs are. But nowadays, even after measuring, you might be able to get better performance by making code simpler and clearer, so that automated optimizer passes have an easier time proving what's happening.

#software #code for the optimizer #code for the human #software optimization #aging efficient