When was the last time you saw a panda? In fact, have you ever seen a real life panda…? They’re not that common after all. I know that I’ve only ever seen live pandas once in my just over four decades, and it required a very special effort to go and see them.

It’s odd therefore that we all know what pandas look like. Their status as an endangered species is so well known they are the symbol of the World Wide Fund for Nature. Their status as the conservationists’ favourite “charismatic megafauna” is secure.

I was nevertheless surprised to see that AIs love pandas too. And giraffes, and any number of other creatures that are far from common in the urban environments we encounter every day. Watching half-trained image recognition AIs identify road signs as pandas, street-lights as giraffes and traffic lights as tropical birds is amusing, but does speak to a deeper truth about these systems.

We’ve known that computers operate on a ‘garbage in, garbage out’ principle since engineers were changing blown vacuum tubes on the first programmable machines in the 1940s. It remains true for AI, but what constitutes garbage can be harder to spot.

The reason that our juvenile AI described above thinks it is in a jungle rather than on the Swindon bypass is due to the training data it has been fed. There are several of large general image libraries that researchers tend to use when training image recognition AIs. And you can now even download fully trained AIs to incorporate wholesale into your projects. The trouble is that these libraries, and the AIs trained using them, have been constructed by humans, and our own biases toward coverage of a wide array of subjects (we want as many ‘nouns’ to be returned as possible), and aesthetically pleasing images are the problem. Consequently the image sets disproportionately contain objects that we rarely encounter (bright copper kettles, warm woollen mittens, and schnitzel with noodles), and tend to be woefully under-representative of the more mundane items that make up that majority of our experience (offices, coffee cups and dry-cleaners) or relevant experience for the AI’s particular situation (roundabouts, pedestrian crossings and unspeakable so-and-sos who change lanes without indicating, in the case of a self-driving vehicle).

Watching half-trained image recognition AIs identify road signs as pandas, street-lights as giraffes and traffic lights as tropical birds is amusing, but does speak to a deeper truth about these systems.

Whilst we can immediately see that a sign pointing towards the industrial estate is clearly not a large bamboo-obsessed bear from China, spotting training-data induced errors is more insidious in other contexts.

An AI deployed in a business processing context, replicating work flows previously undertaken manually, will often be trained using data and decisions / responses from the tasks undertaken by manual operators. That data will, by design, reflect the way that the process has been carried out. But if not sifted carefully, it will also include all the lazy shortcuts, incorrect assumptions and ‘Friday afternoon’ behaviours that ideally would not form part of the future best-of-breed automated processing. On the other hand, sifting too carefully will also negatively impact the training of the AI. The tendency to want to include every edge case and weird issue ever encountered in the workflow pushes the training data away from being representative of the input that the system will receive in live use. As a result the system learns to identify everything as an edge-case of some form, with a resulting impact on efficiency and accuracy. In effect you’ve trained the system with too many pandas and not enough road signs.

Exactly the same is true with other kinds of bias. We’ve seen examples where too many edge cases in the training data happened to have as the subject of the work flow an individual from a particular ethnic group. The two factors (ethnicity of the subject and the nature of the edge-case problem) were unlinked, but the machine wrongly learned to see them as correlated. Any training data where issues disproportionately coincide with protected characteristics and your AI will see a false correlation and train itself to be discriminatory in a morally and legally problematic way.

The EU’s recent report ‘Ethics guidelines for trustworthy AI’ has strongly emphasised the need for AI systems to be built to avoid discrimination, promote diversity and further societal well-being. The Correct choice of training data, which are truly representative of the AI’s operating environment without reflecting incorrect biases or providing false correlations, will be a big part of achieving these aims.

Therefore incredible care needs to be taken in identifying training data. For self-driving vehicles, efforts are well underway (why do you think every ‘prove you’re not a robot’ test you see on the web these days is asking you to identify cars or road signs). If deploying AI in business processing contexts relatively few third party data sources are available. Training systems with historic data remains a common approach, but there are still big bear traps to avoid… and they’re chock full of pandas.

For more on AI, machine learning, deep neural networks and the legal and regulatory issues that touch upon them, come along to DLA Piper’s European Tech Summit in London on 15th October. More details here.