Serial killers: The massively parallel processors driving the AI and crypto revolutions (fava beans and a nice chianti not required)

By Gareth Stokes on June 9, 2021

Posted in Artificial Intelligence, General Technology, Hardware, Industrial revolution

Thread count

Before we get to the death scene, let’s step back in time…

History tends to focus on the addition of new sources of power, such as water wheels and steam engines, as the transformative aspect of the Industrial Revolution. Arguably, separating the production of goods into distinct tasks, and then having specialised systems for performing those tasks at scale, was the real revolution. In the textile industry, the earlier generalists operating as a cottage industry – those skilled individuals who could spin, weave and sew – were comfortably outperformed when tasks were separated out and undertaken by collections of specialists in the new factories.

The generalists would undertake tasks as a series, one after the other: carding the wool or cotton, then spinning it into a single thread, then weaving cloth and then making garments. The factories had many workers performing tasks in parallel, with floors of spinning machines and looms respectively working on many threads at once.

It is perhaps not surprising that this analogy was adopted by computing pioneers – from the late ‘60s onward, collections of discreet instructions that could be scheduled to be performed by a computer started to be referred to as ‘threads’. A computer that could work through one set of tasks at a time was ‘single threaded’, and those that could handle several in parallel were ‘multi-threaded’.

Home computers – a new cottage industry

The advent of home computers in the late ‘70s was reliant upon getting the cost of a useful computing device down to the point where it could fit within the discretionary spending of a large enough section of society. Starting with 8-bit computers like the Apple II or Commodore PET, and progressing through the 16-bit era and into the age of IBM PC compatible dominance in the ‘90s and early 2000s (286, 386, 486 and Pentium processors), personal computing hardware was almost universally single-threaded. Clever programming meant that multi-tasking – or the ability for two or more applications to appear to be running at the same time – existed at the operating system layer. Amiga OS was a particularly early example, and the feature came to the PC with much fanfare in Windows 95. Even when OS-level multi-tasking was in use, under the hood the CPUs were dutifully executing instructions in series on a single thread at any one time. Serial, not parallel.

Whilst there had been some rare personal computers with two or more CPUs available earlier, true multi-threading became widely available with the advent of the Pentium IV processor in 2002. Before long CPUs with multiple cores, each able to handle up to two threads, were commonplace. Today, 4- 6- or 8-core CPUs with 4, 8 or 16 threads are commodity offerings, and ‘workstation’ class CPUs might boast 28 cores or more. The single-threaded cottage industry of the early computer age is giving way to multi-threaded factories inside the CPU.

Entering the third dimension

The single-threaded CPUs of the early ‘90s were still powerful enough to ignite a 3D revolution. Raycasting technologies, pseudo-3D engines running entirely on the CPU, allowed players to shoot everything from Nazis to demons invading Mars… I did promise up front that there would be deaths.

True 3D engines, with texture-mapping, lighting effects, transparency, greater colour-depths and higher resolutions required more simultaneous calculations that the CPUs of the day could support. A new breed of special-purpose co-processors were born – the 3D graphics cards.

Instead of a second general-purpose CPU that could carry out a range of different types of calculations with high-levels of precision, these new processors were turned to perform the specific types of linear algebra and matrix manipulations for 3D gaming to a ‘good enough’ level of precision. Importantly, these Graphics Processing Units, or GPUs, were made up of multiple individually simple computing cores on a single chip, allowing many lower-precision calculations to be performed in parallel.

More than just a pretty picture

In a few short years, GPUs revolutionised PC gaming. In 1996, it was rare for a PC to be sold with a GPU. By 1999, a dedicated gamer wouldn’t consider a PC without one. Today, even the most business-focussed PC will be running a CPU with built-in 3D graphics acceleration, and gamers will spend thousands on the latest graphics cards from AMD. Even if they’re often embedded within the CPU, GPUs are ubiquitous.

Even with today’s multi-core, multi-threaded CPUs, the number of simultaneous threads that a GPU can run will dwarf those that the CPU can handle. With GPU hardware part of the standard PC set-up, inevitably projects exist to unlock that parallel computing power for other purposes. Collected under the banner of ‘General Purpose computing on Graphics Processing Units’ (GPGPU), projects such as OpenCL allow programmers to access the massively parallel architecture of today’s GPUs.

One particular use case that created massive demand and has led to GPU shortages are blockchain technologies – and proof-of-work crypto mining in particular. Since the cryptographic hash functions used in many cryptocurrencies rely on linear algebra (elliptic curve) calculations that are broadly similar to those that underpin 3D graphics, mining software offloads the majority of the work to the GPU.

Artificial Intelligence – super-massive parallelisation

Any machine learning system based on neural networks requires significant computing resources to run, and still greater resources to train. Even a relatively simple neural network will probably have hundreds or thousands of neurons per layer, and several layers. If every neuron in a layer has to be connected to every neuron in the previous layer, and have weights and biases for all those connections, the number of calculations required rapidly skyrockets to an absurdly large number, as does the memory required to hold that information. Just trying to run the trained AI can bring a powerful machine to its knees – and the numbers of threads that GPUs can run simultaneously pale into insignificance. If we then factor in the additional calculations required to train an AI and optimise those weights and biases using techniques such as backpropagation, the computational task is often an order of magnitude or more greater.

This reality is why specialist AI hardware is increasingly important. New classes of AI-focussed processors provide this super-massive parallelisation with memory built into the processor, allowing models to be trained and run far more efficiently with larger datasets. In our last article we drew attention to examples including GraphCore’s ‘Intelligence Processing Units’ (IPUs). Taking that example again (although other specialist AI hardware is available), when compared to the few tens of threads that a workstation CPU might run, GraphCore’s latest-generation Colossus MK2 IPU can process nine thousand threads in parallel – and with multiple IPUs in each machine, there is simply no comparison to what can be achieved with general purpose hardware.

Whilst high-end GPUs might have very large numbers of cores, specialist AI hardware wins out again – this time because of memory bandwidth. A graphics card might boast separate memory for the GPU, but the architecture pairs standard memory modules connected via the logic board to the GPU. This limits the speed at which information can be fed into and received from the large number of compute cores on the GPU. For 3D graphics or crypto mining this tends not to be a significant constraint, but for running or training AI models it often is. Having stores of on-silicon memory linked to each core as part of the processor architecture avoids this bottleneck, increasing performance and allowing more effective scaling if multiple specialist processors are linked in a single machine.

Even with all these advantages in specialist AI hardware, avoiding wasted compute cycles by reducing the load via sparsity techniques (i.e. getting rid of redundant calculations where values are zero) makes a huge difference. As is so often the case, a combination of highly capable hardware twinned with well-tuned software is the best approach.

Integration Integrity

With Artificial Intelligence well over the peak of the technology hype curve, and in active deployment in an ever-greater range of circumstances, running and training the best possible machine learning models becomes a significant differentiator for many businesses. Competitive pressure to have the best and ‘smartest’ machines will only increase.

The enormous potential of these technology platforms can be utterly eroded by poor deployments, poor integration and the age old challenge of poor quality data (garbage in, garbage out still applies…). Just as when new Enterprise Resource Planning (ERP) deployments were all the rage in the early 2000s there were significant opportunities for the Systems Integrators, the same will be true with AI. Most organisations are unlikely to have significant in-house expertise in designing, deploying and integrating these new AI platforms – buying in expertise is the way to go.

Many of the contractual challenges with Systems Integration deals will be familiar – requirements design, project timelines and consequences of delay, payment triggers by milestone, acceptance testing and deemed acceptance. The key to success will be clarity about the objectives and outcomes to be delivered, and the plan to deliver them. Complicating matters is the extent to which AI systems might “work” in terms of being capable of producing a result, but be sub-optimal in terms of accuracy or performance if not structured properly, trained properly, and tuned to avoid redundant effort. These matters take on a new significance against the background capital expenditure on hardware and related software from third parties, and the enhanced legal responsibilities likely to attach to operators of AI systems as regulatory requirements increase. We have already seen the EU’s proposed AI Regulation, and know that the compliance burden will be material, fines for non-compliance potentially greater even than GDPR fine thresholds.

Next steps

We’ll be discussing the implications of this exciting time in hardware at the European Technology Summit in our ‘Hardware Renaissance’ panel. To find out more and register to attend the summit, visit the event website.

You can find more views from the DLA Piper team on the topics of AI, hardware and the related legal issues on our blog Technology’s Legal Edge.

If you’d like to discuss any of the issues discussed in this article, get in touch with Gareth Stokes or your usual DLA Piper contact.

The author’s views do not necessarily represent the view of DLA Piper or its clients.

Menu

Technology's Legal Edge

Serial killers: The massively parallel processors driving the AI and crypto revolutions (fava beans and a nice chianti not required)