A properly trained special-purpose AI can be a wonderful thing. AI systems have already displayed super-human performance in many fields, from diagnosing specific medical conditions, to spotting fraudulent transactions, or identifying infringing content amongst the fire-hose of social media.

These systems tend to rely upon some form of neural-network based machine leaning system, which has been ‘trained’ using vast quantities of example data, and has incrementally optimised its responses to be ever more accurate. The source of the example data, and the effort that goes into creating it, can be a significant hurdle to implementing and deploying AI systems. If the data has to be gathered, classified and ‘tagged’ with relevant metadata before being used in the training of the system, then this will often require a massive amount of human labour at the start of the process, with an associated cost for the project. If this process has to be done by people with a particular level of expertise it becomes even more of a challenge – consider the example of doctors reviewing and tagging diagnostic imaging data to classify it before the dataset is used to train a diagnostic machine learning system.

If this process is characterised in its broadest terms as a knowledge transfer exercise from human to machine, some of the humans involved may perceive the machine as a threat, absorbing their expertise only to be used to replace them. This leads to low participation levels, poorer quality output and a less capable machine.

Additionally, there may be general and industry specific regulations to consider. Any training project where example datasets involve the (re-)use of personal data present challenges from a GDPR perspective. Cases like DeepMind and the Royal Free indicate that regulators, even if sympathetic to the laudable aims of a project, will not excuse the use of personal data for purposes manifestly beyond those for which a data subject provided that data. Heavily regulated sectors such as life sciences, financial services and insurance have additional layers of regulatory responsibility. It is also likely that specific AI-related regulations will soon follow. There are many reports and recommendations from national, supra-national and international bodies each strongly emphasising a desire for transparency, explainability and unbiased processing. These reports will influence today’s regulators in their implementation of current regulatory frameworks, and drive the development of additions to those frameworks over the next few years.

So creating training data is difficult, expensive, requires knowledge transfer from often-reluctant personnel and is fraught with regulatory challenges. Wouldn’t it be wonderful if we could train an AI without any of that?

All work and no play…

Games have long been at the forefront of AI challenges. The idea of a computer playing chess to a super-human level was a big part of early AI research. IBM’s Deep Blue finally demonstrated a super-human ability by beating Gary Kasparov in 1997. Chess is a relatively limited and rule based game though – brute force computation of every possible move and countermove for millions of move combinations literally overwhelmed human ability.

Go is a more subtle game than chess, and far less susceptible to brute force computing methods to triumph. Those who predicted that it would be a long time before computers could mount a serious challenge to human superiority on the 19 by 19 grid of a Go board had reckoned without reference to machine learning – in 2016, DeepMind’s ‘AlphaGo’ system beat Go Champion Lee Sedol. Unlike Deep Blue’s brute force gameplay methods, AlphaGo has been trained using examples taken from huge numbers of games played by the best human Go players and its decision making systems optimised accordingly. This approach still required the human curation and tagging of that training dataset – with the attendant difficulties outlined above.

However, DeepMind didn’t stop there. It’s next system AlphaGo Zero, was trained by coding the rules of Go, setting up goals (win the game), and then allowing the machine to play itself millions upon millions of times. At first, moves would be essentially random, some good, but the vast majority bad. But as each game is played, the system evaluates victories and defeats, and learns to play. After training by simply playing against itself, AlphaGo Zero beat the original (already superhuman) AlphaGo system that had been trained using human games.

So much for turn-based board games. Surely achieving super-human performance in anything more real-time and fluid would be orders of magnitude more difficult? Well, perhaps not… in recent months, OpenAI has developed a system that can play multi-player online battle arena and esports mainstay Dota 2 to a world-beating standard, and DeepMind has a system that can best human players at capture-the-flag games in Quake 3.

Example based vs. rule based

The consistent theme here is that where something can be expressed in rules and goals, then a system can be trained using reinforcement learning. Instead of needing to collect a vast categorised data set, by being allowed to “play” within the rules, and train itself by evaluating success or failure against the predetermined goals, the system can effectively generate its own training data. That early examples of successes for this approach stem from games and esports is to be expected – these are perfect rule constrained contained systems with clear goals against which outcomes can be measured.

Some areas of business will be more susceptible to these kinds of reinforcement based learning approaches. Anything where a business process can be reduced to specific inputs, a set of rules regarding actions, and a clear goal or gals against which success can be measured are ripe for these types of training methods. The advantages are clear – super-human performance achieved without the expense, risk or regulatory challenges of creating training datasets first. Just one example we have seen deployed in practice is a financial reconciliation system that had learned to take given input sources and accurately reconcile them incoming and outgoing payments. In a closed system of income, balances and outgoings, the system could train itself to outperform the finance clerks previously undertaking the task.

Other areas remain more amorphous and less obviously rule-constrained, and will long require large sets of example training data to be compiled and classified. The trick is recognising which areas necessarily fall into this category. Anyone who starts down the path of compiling training data for something that turns out to be susceptible to a rule-based approach will quickly find themselves overtaken by competitors who can more clearly see a way to properly frame the problem.

In some cases, an activity might first appear to be a clear rule-based activity, but is actually more open ended than it first appears. One example from my own professional sphere is that many commentators on the use of AI in law (mistakenly) assume that litigation – an activity governed by clear rules of procedure – must eventually be susceptible to such an approach. The open-ended possibilities and strategies to be applied outside the courts to gain an advantage (both legitimate and nefarious) mean that there is no constrained arena of play. However, the fact that cases are publically reported in many jurisdictions does provide a broad public dataset for example-led training of ‘litigation prediction’ systems.

For more on AI, reinforcement learning and the legal and regulatory challenges associated with deploying these technologies, come along to DLA Piper’s European Tech Summit in London on 15th October.
More details on www.dlapipertechsummit.com.