Sitting at 36,000 feet over the Atlantic writing this article, I’m unusually aware of the traditional use of the phrase “black box”: an aircraft’s flight data recorder, there to let investigators understand what went awry should the worst happen. It seems odd that a device whose purpose is to provide an explanation should so often be called upon as a metaphor for the unexplainable.

We are told that AI’s decision making must be fundamentally unknowable; that the link between inputs and output can only ever be given as an empirically measured probability based on its relative successes in making previous decisions that are judged to be correct. This characterisation of AI as a black box we can never peer into has gained currency, and dire consequences are then enumerated.

How can we use AI in highly regulated sectors (life sciences, financial services, insurance) if we can’t understand the decisions it makes? How can we be sure that the system isn’t reflecting biases we’d prefer to shed, and making discriminatory decisions? How can we catch ‘unfair’ decisions if we don’t understand the basis of the decision itself?

How can we be sure that the system isn’t reflecting biases we’d prefer to shed, and making discriminatory decisions?

The most sophisticated AIs – and therefore the most susceptible to this explainability challenge – rely on form of deep neural network. Often these systems rely on multiple separate neural networks chained together, the outputs of one set be being fed into the next as inputs.

A relatively easy to describe model might be an image recognition system. If all you want to do is recognise pictures of handwritten numbers in black ink on a white page and characterise them as ‘0’, ‘1’, ‘2’ and so on to ‘9’, then a single neural network will be more than up to the task. A classic CompSci problem, the canonical solution will see pixel brightness values forming the input layer, a couple of hidden layers with tuned weighting for the links between layers, and an output layer with the 10 digit values.

Assuming a reasonable number of neurons in the hidden layers, let’s say 128 in each, after the system has been trained it will achieve high accuracy in correctly categorising the images against the digits, even for my reasonably indecipherable handwriting. Can we explain how it does this though, or is it an unknowable black box?

One way of ‘seeing’ what is going on would be to graph how the training of the network has been reflected in the weighting of the links between neurons. In a typical neural network all the values of the links between neurons will be between a minimum of -1.0 and a maximum of +1.0. If the values of all the weights are turned into an image with any positive weightings represented in the red channel of a pixel’s colour value, and any negative weightings reversed and represented in the blue channel, you would get an image with “hot and cold” spots representing the weights of connections that are relatively more significant to the decision. Combined with the values in the original image, these ‘hot spots’ and ‘cold spots’ can be mapped back to areas of the original input image to highlight those areas that the system has identified as containing important features.

This isn’t simply theoretical. This precise method of interrogating a network to discover its ‘hot’ and ‘cold’ features is used to diagnose ‘faulty’ decisions. One interesting example from an image recognition system to sort breeds of dog concerned an image that was consistently but incorrectly being tagged as a wolf, even though the dog in the picture was manifestly another breed. Upon checking the heat maps, researchers could see that the system was triggered not by features of the dog itself, but by the background, which ‘lit up’ in the heat map of significant features. The problematic image was taken against a snowy backdrop. It happened that all the training images of wolves were also in the snow – and so the system had correlated snow with ‘wolf’ rather than any features of the animal in the images.

Can the same approach work where a deep neural network is involved though? Here, instead of feeding the image into a single neural network which must simultaneously extract features and weight those features in order to make the decision, chains of networks are involved. Outputs from one neural network feed into the next.

Sticking with our image recognition example, if we want to identify more than a few hand-written digits, a single neural network will soon become inaccurate if trained with a broad array of wildly differing images. If we want to identify a very broad array of features, a different approach is needed. The types of deep neural network that can beat humans at image recognition tasks tend to separate the image into smaller sections, which are fed into neural networks trained to detect edges. The results of the edge detection across overlapping areas are fed into a second set of networks which identify shapes and features (everything from “square” to “an eye”). Finally a third network takes the output from the features and uses them for final identification.

Counterintuitively, whilst the deep neural network is orders of magnitude more complex and more capable than the single neural network from our handwriting example, it is arguably easier to understand the decision process. The same technique of converting weightings between neurons applies to each individual network within the deep neural net, but because later networks are working with concepts we can more readily understand (edges -> lines -> ellipse and circle -> eye – two eyes near each other -> face -> human) the decision tends to be more explicable at later steps in the process.

Representing other types of data that might be used for AI decision making in the same way requires a bit more imagination. Applying heat maps to images is easy to visualise, but what about credit card application data, or patterns of results in blood tests, or actuarial data? The same approach works here too. Using the heat map approach to check that the system is performing as expected, and to unlock the ‘black box’, significantly helps to overcome these legal and regulatory challenges.

This is never going to match the full explainability of a true ‘if this then that’ traditional computer program, but it is far from accurate to characterise the functioning of AI as entirely unknowable.

Explainability is at the top of the agenda for regulators and legislators alike when it comes to AI. In the UK, the Government’s very positive response to the House of Lords report ‘AI in the UK: Ready, willing and able?’ agreed with the vast majority of the recommendations. This included a commitment to transparency of algorithms, balanced with an acknowledgement of the difficulties of providing the same degree of explainability as traditional computer programs. The recent EU report ‘Ethics guidelines for trustworthy AI’ makes similar points. Amongst its seven key requirements for ethical AI, two (transparency and accountability) are directly concerned with the explainability of AI systems.

There will always be a place for a certain level of empirical testing of results to check that the basis of decisions aligns with expectations, but it is far from the only approach. In fact with the ability to use techniques like those outlined above to map the decision making criteria in this way, deep neural networks are ultimately far more explicable than human decision making – where biases, intuitions, and random ‘gut feel’ is often the subject of retrospective justification on other grounds if the decision is ever challenged.

For now though, I’m comforted that whether in AI or in aviation, black boxes can aid in explainability. I just hope that the black box on the plane that’s carrying me right now isn’t needed…

For more on AI, machine learning, deep neural networks and the legal and regulatory issues that touch upon them, come along to DLA Piper’s European Tech Summit in London on 15th October. More details here.