You are here

Single core versus Multicore CPU

For everything after the recording stage: hardware/software and how you use it.

Re: Single core versus Multicore CPU

Postby Eddy Deegan » Sun Dec 01, 2019 2:30 pm

Folderol wrote:
wireman wrote:
Eddy Deegan wrote:Higher clock speed gets more done in a single core than woud otherwise be possible. Under the hood, the CPU will transfer single-core workloads between physical cores on the chip because when working the CPU heats up, which leads to performance degradation.

Even if you are running a task that requires only a single core, the CPU will pick the coolest/least busy core to do that work, and then when it inevitably heats up it will transfer that work, while the job is running, to a cooler core to maintain best performance.

Could you provide a reference for this please? Older CPUs had thermal control that affected all cores at once and I would have thought core binding was under control of the OS and thermal considerations would not be part of the scheduling decision.

I can't give you a reference, but I have seen this core swapping in action. I have a bit of software that monitors individual cores, and you can watch the work load being swapped around between them. It's definitely under firmware control. There is nothing in the OS that does this.

I don't know when multi-core thermal management was introduced, but there are sensors per core in contemporary chips (before this, it was common for the CPU operating temperature to be reported incorrectly by the BIOS). I read about the dynamic load management some years ago in some technical reference for the Xeon family, but as I'm no longer involved in the kind of work that I was at the time I've not kept up to date to be honest.

I had a quick search through some of the programmers reference material available online today but I couldn't find anything either. That said, there is a huge amount of stuff out there to look through. It's possible this is not commonly done these days due to better thermal management technology in general, or perhaps it's not something that's detailed in public docs much.

I don't know, so I'm happy to retract my suggestion that all modern chips do it as they may achieve the same result (optimum performance) in easier ways.

You can associate threads with specific cores using core affinity in software, and the OS scheduler has a lot of freedom to influence the load distribution so the behaviour of the system as the user sees it is influenced by quite a lot of variable factors.
User avatar
Eddy Deegan
Frequent Poster (Level2)
Posts: 3993
Joined: Wed Sep 01, 2004 12:00 am
Location: Brighton & Hove, UK
Some of my works.
Please consider supporting the SOS Forum Album project.
 

Re: Single core versus Multicore CPU

Postby wireman » Sun Dec 01, 2019 10:30 pm

Eddy Deegan wrote:You can associate threads with specific cores using core affinity in software, and the OS scheduler has a lot of freedom to influence the load distribution so the behaviour of the system as the user sees it is influenced by quite a lot of variable factors.

That's right and a hardware feature getting in the way would be a real problem for some people which is why I'm so doubtful, but interested if someone does find information on such a feature.

I'm not surprised that if you observe the location of processes/threads you see them 'wander' around the cores as those software threads are getting interrupted and rescheduled all the time for various reasons and on short timescales.
wireman
Regular
Posts: 395
Joined: Fri Dec 17, 2004 1:00 am

Re: Single core versus Multicore CPU

Postby Eddy Deegan » Mon Dec 02, 2019 3:41 am

wireman wrote:
Eddy Deegan wrote:You can associate threads with specific cores using core affinity in software, and the OS scheduler has a lot of freedom to influence the load distribution so the behaviour of the system as the user sees it is influenced by quite a lot of variable factors.

That's right and a hardware feature getting in the way would be a real problem for some people which is why I'm so doubtful, but interested if someone does find information on such a feature.

I'm not surprised that if you observe the location of processes/threads you see them 'wander' around the cores as those software threads are getting interrupted and rescheduled all the time for various reasons and on short timescales.

I've looked around a bit more and I can now shed some light. Not related to the original subject of the thread and possibly not quite what you were hoping for but here's the rundown.

When I was involved in work involving this kind of thing (2005-2011) we were doing research on the potential for general purpose CPUs to take on some (and ideally all) of the work done by the Intel IXP2800 network processor. The IXP2800 was a piece of silicon specifically engineered to process network traffic but it was not a general purpose chip and thus came with unusual challenges.

It had 16 cores and the hardware I was involved with was a line-speed, negligable-latency (usually microseconds) Deep Packet Inspection/Modification platform called the CS2000, a little-known (outside of governments, telcos and ISPs) platform which had six IXP2800s on the data processing plane, providing the programmer with a 96-core system. Optionally, you could install two DPPM (Deep Packet Processing Module) blades in a single CS2000 to give 192 cores in 3U of rackspace.

As the hardware was running flat out most of the time (it was usually installed in data centers, peering nodes and the like, often in racks containing sets of multiple CS2000s connected together via dedicated FPGA-based switching fabric) it ran hot by design. Although it was amazing technology it was also controversial and in the end I left that sector of the industry for ethical reasons.

I wrote a lot of code for the CS2000 over a period of about 5 years and thermal management was a big challenge. The fans in those chassis were insane. As the company I was working for was heavily backed financially by US federal sources, we got to experiment with all sorts of things under NDA (I'm choosing my words carefully so as not to breach it) and this was part of that. IBM even offered an optional part with that technology in it for their blade-centres to a restricted market.

Experiments were being done with 'smart' thermal management in schedulers, but also in hardware. I have found some docs relating to the latter but I can't share them I'm afraid.

I still can't find the Xeon-related connection but there was a version of the Xeon that got creative with thermals. However, given the lack of public information about it, I believe with hindsight that it didn't make it into the mainstream.

Had the CS2000 been created today, the chances are that the work could be done by something along the lines of the Xeon w-3175 processors but at the time the research concluded that there was no suitable general purpose CPU that could match the speed of multiple IXP2800s with supporting silicon (custom chips for regular expression pattern matching, and a high-speed Ternary CAM-based database-on-a-chip for handling things like session management).

Anyway, the bottom line is that I did make a mistake in saying that the advanced thermal management so described was present in consumer chips (even though I still believe some form of it was/is present in some Xeons) so thanks for asking for clarification on that else I'd have continued to conflate some of the things I saw at that time with 'normal' computing today!
User avatar
Eddy Deegan
Frequent Poster (Level2)
Posts: 3993
Joined: Wed Sep 01, 2004 12:00 am
Location: Brighton & Hove, UK
Some of my works.
Please consider supporting the SOS Forum Album project.
 

Re: Single core versus Multicore CPU

Postby Eddy Deegan » Mon Dec 02, 2019 3:59 am

PS: Nerdy side-note. The CS2000 (and I suspect the IXP processors) did not support threads. Furthermore, the application you deployed to it would be spawned on a per-packet basis, run to process a single packet and then die.

There was shared memory between the cores in the form of a chunk of 320Mb or so of RAM and the silicon database which was Ternary-CAM with custom silicon to do extremely fast (about 133 million queries/sec) simple database-like operations on 128-bit or 256-bit records. We used it for rulesets and flow management (in ipv4 that meant source IP, dest IP, source port, dest port, protocol and a few spare bits for custom state).

Therefore an N-core system, (where N is the number of cores in total across all IXP2800s involved) required your software to be something akin to a finite state machine. It would spawn, query the shared resources to figure out what state the system was in and then branch to the appropriate handler for that state, knowing that N-1 siblings were doing the same thing.

It was proper fun. I joined the company when it was a start-up and we were doing some really cool things with it, but sadly it got darker and was deployed for dubious purposes at which point I left.

I have a couple of C-based hobby projects on the go where I'm distilling the nice parts of that to emulate the system in a virtual CPU of my own design, but in my case the ultimate objective is to create a fun system that uses colourful vector-based graphics to play things like pong and space invadors, with an emulation of a heavily restricted subset of the cloak-and-dagger technology driving it underneath.

I have been accused in the past of being an 'Intel fanboy'. This is not the case. I think AMD are doing some incredible things and I would love to see them take a chunk out of Intel. It's just that my professional life took me down some Intel-based paths and I've been able to see some of the things that Intel do that the public don't so I'm able to speak on that more than AMD stuff. The same may well be true of AMD but as I've not seen that personally, I can only speculate.

These days I'm doing something completely unrelated, non-controversial and mostly loving it :-)
User avatar
Eddy Deegan
Frequent Poster (Level2)
Posts: 3993
Joined: Wed Sep 01, 2004 12:00 am
Location: Brighton & Hove, UK
Some of my works.
Please consider supporting the SOS Forum Album project.
 

Re: Single core versus Multicore CPU

Postby Eddy Deegan » Tue Dec 03, 2019 1:51 am

Eddy Deegan wrote:The CS2000 (and I suspect the IXP processors) did not support threads.

The IXPs do support threads but this was not exposed to the programmer in the CS2000, although under the hood the operating system may have used them.
User avatar
Eddy Deegan
Frequent Poster (Level2)
Posts: 3993
Joined: Wed Sep 01, 2004 12:00 am
Location: Brighton & Hove, UK
Some of my works.
Please consider supporting the SOS Forum Album project.
 

Re: Single core versus Multicore CPU

Postby resistorman » Tue Dec 03, 2019 7:16 am

Single core FUD is being promoted by Intel since AMD is cleaning their clocks with multi core. Quite the reversal from the Athlon vs Core days.
User avatar
resistorman
Frequent Poster
Posts: 813
Joined: Sun Nov 22, 2015 1:00 am
Location: Asheville NC

Re: Single core versus Multicore CPU

Postby Pete Kaine » Tue Dec 03, 2019 12:10 pm

JacoVanDuijn wrote:I have always thought that Multi-core CPU's where better (for Video editing, etc everything non-gaming related) for audio production. But I read that single core performance is better.

Which one is true?

Yes.

JacoVanDuijn wrote:And are their DAWS out there that make use of multi-core? (I read that in the new studio one 4.5 they support multi-core, but not how many. I wanted to ask this on the presonus forums, but I don't have Studio one to register).

Ok, so all DAW's have multi-core support and they are all core limited to some degree. I'd say thread limited, but as Eddy's already pointed out the true meaning of a thread is rather different from Intel or AMD's public facing definition which in reality is really what the rest of the world calls "Logical Cores" (as opposed to "Physical Cores").

I don't think there are any DAW's left out there that handle less than 48 LC's at this point and I suspect most will do a fairly standard 64 with a few more being able to be pushed higher, although I've seen perfomance start to deteriorate once I've gone beyond the 64 LC figure, so if anything I'd imagine the artifical constraints are to do with load balance managment although I suspect there are better people in this thread to comment upon that.

JacoVanDuijn wrote:I am about to buy a new pc and really don't know what to get. I was first going for the Ryzen 3950x, untill I read about the importance of single core performance.

Well, it's IPC (instructions per clock) that is key, or more to the point how accessible they are. You can have two chips with 4GHz core clocks, but the IPC on one might be twice as high as the other, so those two chips wouldn't be all that comparable this is the difference between an i7 first and tenth gen or a AMD first and third generation setup (figures pulled from thin air for point of the example).

Turbos can be miss-leading and the CPU to some degree is only as good as the weakest core.

They tend to stagger turbos, with one or two cores at the full figure and then each additional core being staggered by a 100MHz or 200MHz all the way down to that base clock. What this means is that if all cores are equally loaded, then it will start to lose data (discard and glitch the audio) when the slowest core runs out of overhead.

Of course, the system will try and balance it so that the weaker loads are on the slower cores in an attempt to get the most from the setup. Where this becomes tricky is that you tend to see the entire channel and even dependent tracks get processed on the same core in large chains. Maybe not such a great concern if your mixing in a more traditional sense of taking audio tracks and processing away, but the generation of audio in the box and the processing chains that get applied mean that you sometimes need to keep a careful eye on what a single core might be doing least it get bit overly demanding.

I've long been an advocate of trying to level the cores off where possible. My preference this is done around the single core turbo speed, but we'd raise the rest to match it and for instance with a 9900K we push all cores to 4.9GHz rather than just one of them.

Of course you can't always do this with every chip range as tolerances change between designs and the interesting one (well, headache inducing one in my case) has been trying to figure out where AMD stands on this.

They've been advising for a while now that RAM optimization is better for the end user than overclocking in any way and I agree, I got similar performance results out of overclocking and optimizing the RAM in recent testing, as I did out of overclocking the CPU to an inch of it's life, which means I didn't have to overclock the CPU and the voltages I cranked for that overclock could be removed and the chip now runs a lot cooler which of course is a great. The upshot of this is that you can either run slow RAM and overclock or your can run fast RAM and not overclock.

So, this means whilst we now have a cooler chip, we also open up the chances of it running out of performance earlier on a single core. In real terms through there is enough RAW performance through the extra cores that it more than makes up for this as a balance.

To note, AMD has come under fire for being able to rarely hit their advised turbo figures, with constant updates appearing to tweak it. Instead the 3950X for example (because it's fresh in my mind) has a 3.5GHz base clock and 4.7GHz advised turbo and in my own testing we're looking at it only managing to hit around 4.1GHz - 4.25GHz across the various cores. The interesting thing for me was that none of the cores were spiking under DAW usage (whereas they might with a program that favours single cores) and this is how I would prefer a CPU to behave when dealing with a hugely multi-core capable client and audio.

As a general example I would take a 12 core 4.2GHz over a 8 core 4.6GHz.
But I would most likely take a 8 core 4.6GHz over a 12 core 3.8Ghz

All depending on the generational IPC scores of course.

Eddy Deegan wrote:
JacoVanDuijn wrote:I have always thought that Multi-core CPU's where better (for Video editing, etc everything non-gaming related) for audio production. But I read that single core performance is better.

Are you sure you read that right? There may be advice out there to disable hyper-threading on certain Intel CPUs due to the issues associated with the various CPU hardware bugs that surfaced a while back but generally speaking more cores is better when it comes to video editing.

Video editing is largely off-line rendering and the CPU is able to just deal with it as and when it can free up the processing capability. More cores the better and we're not overly bothered by how fast they are (more is more through)

Audio (and to be be fair, video streaming) is real-time and so you either complete the work within a given cycle (i.e. your ASIO buffer) or you can expect drop outs and glitching to ruin your session.

miN2 wrote:Re Studio One: it supports multi-core, but they took serious time implementing it to a half-way decent standard in my view, and it still doesn't really hold a candle to the likes of Cakewalk and Cubase.

Studio One 3 by all accounts was pretty well put together, but from feedback the Studio One 4 engine managed to go backwards to some degree when they made changes under the hood. Apparently they've improved it again in recent build releases, so I'm a little bit lost to how exactly it holds up these days, although good to know they've been working on it.

merlyn wrote:The R9 3900x looks good with a slightly faster base clock of 3.8 GHz versus the 3.5 GHz of the 3950x.

The 3900X is marginally faster per core on average, but again we're talking about the system running a roughly 4.2 - 4.3 average across those cores under heavy loads. The small 100MHz difference for me puts the 3950X ahead on it's higher core count.

To anyone wondering why I've not published the 3950X yet, stock has been really lacking and I only got to play with one last week. I don't normally cover limited editions (hence no 9900KS coverage) and AMD isn't letting on if they are shipping any more stock anytime soon or at all. I'll more than likely cover it when I do the Intel 10 series, although they also don't have stock, but at least they can outline a figure of at last double digits when asked how many we might see over.
User avatar
Pete Kaine
Frequent Poster (Level2)
Posts: 3171
Joined: Thu Jul 10, 2003 12:00 am
Location: Manchester
Kit to fuel your G.A.S - https://www.scan.co.uk/shop/pro-audio

Re: Single core versus Multicore CPU

Postby Folderol » Tue Dec 03, 2019 12:47 pm

As a point of interest, a certain soft-synth of my acquaintance has 1 very high priority audio thread, 1 slightly lower priority MIDI thread, and 3 very low priority UI and housekeeping threads.

In Folderol towers, on a single core machine it runs like a pig. On a dual core one it runs well. On a 4 core one it runs extremely well. The 4 core machine runs at the same clock speed as the 2 core one (neither overclocked).
User avatar
Folderol
Jedi Poster
Posts: 10521
Joined: Sat Nov 15, 2008 1:00 am
Location: The Mudway Towns, UK
Yes. I am that Linux nut.
Onwards and... err... sideways!

Re: Single core versus Multicore CPU

Postby JacoVanDuijn » Sat Dec 07, 2019 5:12 pm

Pete Kaine wrote:
To anyone wondering why I've not published the 3950X yet, stock has been really lacking and I only got to play with one last week. I don't normally cover limited editions (hence no 9900KS coverage) and AMD isn't letting on if they are shipping any more stock anytime soon or at all. I'll more than likely cover it when I do the Intel 10 series, although they also don't have stock, but at least they can outline a figure of at last double digits when asked how many we might see over.

Dear Pete,

thank you for taking the time and writing all of this. I have learned so much from you and others here.

1. Let's say I would get a good SSD, good and fast ram etc, so no bottlenecks and I have the money should I go for the 3950x? My aim is to decrease stutter and crackling. Will the 3950x be the best option of all the cpu's out there or would you advice a different CPU?

2. You also talked about real-time. is it true that real-time processing is done better with more cores (again, assuming no bottlenecks from other PC parts)?
JacoVanDuijn
New here
Posts: 4
Joined: Tue Nov 26, 2019 8:16 pm

Re: Single core versus Multicore CPU

Postby Pete Kaine » Mon Dec 09, 2019 11:46 am

JacoVanDuijn wrote:1. Let's say I would get a good SSD, good and fast ram etc, so no bottlenecks and I have the money should I go for the 3950x? My aim is to decrease stutter and crackling. Will the 3950x be the best option of all the cpu's out there or would you advice a different CPU?

Ideally, you need to pick your price point before asking that question. It's not the best out there, but it's probably the best bang per buck currently.


JacoVanDuijn wrote:2. You also talked about real-time. is it true that real-time processing is done better with more cores (again, assuming no bottlenecks from other PC parts)?

No, single core score tends to be more important and then number of cores for real-time handling and you more than likely be better served taking 8 cores at 4.2GHz over 16 cores at 2.4Ghz for instance.
User avatar
Pete Kaine
Frequent Poster (Level2)
Posts: 3171
Joined: Thu Jul 10, 2003 12:00 am
Location: Manchester
Kit to fuel your G.A.S - https://www.scan.co.uk/shop/pro-audio

Re: Single core versus Multicore CPU

Postby CS70 » Mon Dec 09, 2019 12:11 pm

JacoVanDuijn wrote:2. You also talked about real-time. is it true that real-time processing is done better with more cores (again, assuming no bottlenecks from other PC parts)?

A little point of pedantry - you don't have real "real time processing" in a system unless the operating system (and to a degree the hardware) is "real time" (regular Windows and OSX aren't): true "real time" processing means guaranteed processing time - and since the time-quality-price triangle is valid for CPUs as well :) , the implication is that in a real time system the same computation is not guaranteed to provide the same result every time. In practice you will get whatever state of the computation exists when the deadline (the time allocated to the operation) expires - and this specific kind of interruption must generally be implemented at very low level, in hardware.

General purpose operating systems prioritize correctness of result over time guarantees, and at most provide an estimate of execution time for a given computational load. Of course we use "real time" loosely to indicate that the system is fast enough in the most cases. In practice, under a load which is easy to handle by the resources available (cores, bandwidth, local memory etc), a non RT OS will in average work fast enough to make little difference, but all the personal computers we use tend to be of that sort.
User avatar
CS70
Jedi Poster
Posts: 5274
Joined: Mon Nov 26, 2012 1:00 am
Location: Oslo, Norway
Silver Spoon - Check out our latest video and the FB page

Re: Single core versus Multicore CPU

Postby wireman » Mon Dec 09, 2019 9:08 pm

CS70 wrote:General purpose operating systems prioritize correctness of result over time guarantees, and at most provide an estimate of execution time for a given computational load. .

Really? I think they prioritise fair access to resources (for some measure of fair) for a running workload.
wireman
Regular
Posts: 395
Joined: Fri Dec 17, 2004 1:00 am

Re: Single core versus Multicore CPU

Postby CS70 » Mon Dec 09, 2019 9:50 pm

wireman wrote:
CS70 wrote:General purpose operating systems prioritize correctness of result over time guarantees, and at most provide an estimate of execution time for a given computational load. .

Really? I think they prioritise fair access to resources (for some measure of fair) for a running workload.

It's a different level of abstraction. Fair access to resources, for whatever definition of "fair", can be provided by both real time and non real time systems - it's only a matter of load distribution. Both models allow for preemptive scheduling of computational flows.

But the defining characteristic of "correctness"-oriented system is that the flow of control does not return until a computation is completed, regardless how much time it takes. 2+2 is always 4, but it may take a potentially unbounded while for the result to be computed and the flow of control returned to tthe requester.

A "true" realtime system, on the other side, works with strict deadlines: if the allocated time for the computation is expired the flow of control and the result is returned (possibly with a flag set). So 2+2 occasionally maybe 3.5 or whatever.

This may seem odd but typically in physical control systems is much better to have an occasionally slightly wrong result than a possibly long waiting time before the computation is finished. This is also occasionally useful for algorithms which return quickly most of the time but may behave different for a few sets of input parameters, etc.

Obviously a real time system must be sized so that the majority of the expected computations do not generate errors.. and that's why it's not so suited for general purpose computers. Both models have equivalent computational power and you can emulate one with the other but when implemented in a physical machine, you've gotta choose.
User avatar
CS70
Jedi Poster
Posts: 5274
Joined: Mon Nov 26, 2012 1:00 am
Location: Oslo, Norway
Silver Spoon - Check out our latest video and the FB page

Re: Single core versus Multicore CPU

Postby wireman » Mon Dec 09, 2019 10:21 pm

CS70 wrote:
wireman wrote:
CS70 wrote:General purpose operating systems prioritize correctness of result over time guarantees, and at most provide an estimate of execution time for a given computational load. .

Really? I think they prioritise fair access to resources (for some measure of fair) for a running workload.

It's a different level of abstraction. Fair access to resources, for whatever definition of "fair", can be provided by both real time and non real time systems - it's only a matter of load distribution. Both models allow for preemptive scheduling of computational flows.

But the defining characteristic of "correctness"-oriented system is that the flow of control does not return until a computation is completed, regardless how much time it takes. 2+2 is always 4, but it may take a potentially unbounded while for the result to be computed and the flow of control returned to tthe requester.

A "true" realtime system, on the other side, works with strict deadlines: if the allocated time for the computation is expired the flow of control and the result is returned (possibly with a flag set). So 2+2 occasionally maybe 3.5 or whatever.

OK, yes. For some reason when I read the post I thought you meant that a general purpose OS could choose how much priority to place on correctness.
wireman
Regular
Posts: 395
Joined: Fri Dec 17, 2004 1:00 am

Re: Single core versus Multicore CPU

Postby CS70 » Tue Dec 10, 2019 12:54 am

wireman wrote:
OK, yes. For some reason when I read the post I thought you meant that a general purpose OS could choose how much priority to place on correctness.

All good, no stress.
User avatar
CS70
Jedi Poster
Posts: 5274
Joined: Mon Nov 26, 2012 1:00 am
Location: Oslo, Norway
Silver Spoon - Check out our latest video and the FB page

PreviousNext