desmond wrote:CS70 wrote:So long the OS of Apple stays similar, it's mostly a matter of recompiling. Since for decades now compilers have been far better than human programmers in optimizing, it's also unlikely that there's a lot of assembly in the codebases. Sure there can be a few traps if the codes not first quality, but for big, popular apps it's unlikely.
Yes, but not necessarily for very tight performance loops, when developers use all kinds of clever tricks and clever hand optimising to eek out as much performance as possible.
Now it is difficult to discuss this without referencing facts and literature and knowing how much you know specifically about this stuff over generalities, so it becomes easily a "who's got it longer" discussion, which is not the intention at all.
But briefly: while in theory what you say is true, and it certainly was once upon a time, in practice there's really no such thing anymore. Heck, I remember well the time when accessing a matrix in RAM by columns instead of rows made a difference (I had access to some 70s equipment back in the times), but these times are long gone. Optimizations are made by compilers, not human people, in practically any relevant case, and there's almost no practical exception.
To put it as a metaphor, it's like a car being faster than even the faster human runner. The car can do things that a human can't. There's always space for improvement, but you will improve over the last car, not the last human.
The rest is mythology, founded on something that was true long ago, when
real programmers didn't use Pascal.
Of course, all within reason: algorithms have to be efficient in the first place (and you dont need no CPU to assess that), so if your way of doing something is fundamentally inefficient, no optimizer can do much about it.
if your audio loop utilises particular techniques that Intel chips have to eek out more performance,
There's no such thing. There's no "particular techniques that Intel chips have" (even assuming that the Intel chip microcode stays the same all the time, which it doesn't since it gets occasionally updated). At least, not techniques that a programmer can understand and an optimizer can't.. rather the opposite, in average.
Of course that may not stop a naive programmer from trying, but usually people who bother to learn that kind of stuff know better than that. Hopefully. :)
For example, the Sculpture synth in Logic is benchmarking with great performance on Apple Silicon versus intel - we're seeing the expected performance boost that the chip suggests. Alchemy, on the other hand, isn't seeing as good a performance boost to the lvel we'd expect based on the chip performance alone. It may be that Alchemy had very optimised loops under Intel, and it hasn't quite been optimised as well yet under ARM. Or it might have used tricks that aren't available on Arm. Or it might be they haven't quite got around to tuning it yet, and it'll get better.
If it's
not been recompiled, it's not so much Alchemy that has the "optimized loops", but that it's using algorithms that - when originally compiled - resulted in an optimization not suitable to the changed underlying hardware.
For example, say that you have an algorithm which is the most efficient for a task, and requires moving 4 values, and that you have 4 registries: the optimizer will examine the combination space and come out with a sequence of registry allocations that suits your 4 registries hardware pretty well. Now you move it to an hardware platform that has only 2 registries and a virtualization layer. The VL will have to do many more swaps - taking more time for what originally was an atomic action. So it's slower and the overall result does not improve much even if the underlying hardware is a bit faster, because the machine is doing
more work.
Another algorithm, that requires moving only 2 values, will be much faster on the faster hardware because even thru the virtualization it's still doing the same work.
Before recompilation, it's only natural that certain algorithms will adapt better and others worse.
But if you recompile the 4 values algorithm the optimizer will explore a very different combination space and produce a different list of actions which will be optimized for the new hardware.
Again, what really makes a huge difference is the OS and how it packages and presents its services. The rest is just a temporary inconvenience (so long software houses can be assed to maintain a product.. now that can be an issue).