BAILII grants Oxford University unprecedented access to case data for AI analysis in historic agreement

By Mark O'Conor & Imogen Palmer on February 10, 2021

On 14 December of last year, as somewhat of an early Christmas present for those in the technology, artificial intelligence or dispute resolution space (and everything in between), BAILII (the British and Irish Legal Information Institute) agreed to provide access to Oxford University to conduct AI analysis on its central online case law database – the largest dataset of its kind in England and Wales.

In real terms, this agreement will provide Oxford academics running AI for English Law – a project funded by the Industrial Strategy Challenge Fund and UK Research and Innovation government department that seeks to explore the potential and limitations of using AI to support legal services – with access to 400,000 judgments for the purpose of ‘unlocking new research insights into English case law’ as well as developing ‘novel research techniques that will improve access to legal information’.[1]

In the past, BAILII has strongly resisted allowing bulk downloads or database scraping and this was prohibited in its standard user agreement amidst concerns that it could lead to the development of AI software that could predict the outcomes of cases – perhaps even based on which judge was hearing a case on a particular day. While the agreement with Oxford University does not grant the use of BAILII data for this particular purpose, permitting the download of the database to conduct natural language process analysis (the sub-type of AI being used) is the first time that access of this nature has been granted by BAILII.

This partnership represents a key milestone for England and Wales in its wider move to open up and digitise the legal world with the use of AI, ‘justice data’ and LegalTech more generally. This is already happening with varying degrees of success in other jurisdictions. ROSS – an AI legal research company in the USA/Canada – was an encouraging example of this, but recently announced its closure owing to a copyright infringement lawsuit brought against it by Thompson Reuters and Westlaw in respect of access to and use of their content in breach of their licence agreements.[2] In contrast, this formal arrangement between BAILII and Oxford University was heavily negotiated over the course of a year with the involvement of both the judiciary and the Ministry of Justice, which gives it credibility that may lead to its framework being replicated in future arrangements of a similar nature.

Looking ahead, with the increased use of AI in the particularly sensitive area of justice, it is crucial that AI is used in a considered and carefully monitored way. To the extent that AI is one day used to assist a judge in the courtroom, clear controls must be hard-wired into any AI product to ensure transparency in any decision-making and the ability to easily audit the AI, so as to avoid a ‘black box’ situation where it is unclear how the AI drew a particular conclusion about a judgment. Suitable redress that is easily understood by adversely affected parties (such as laypersons) will also be paramount.

AI for English Law

The Oxford University/BAILII partnership is just one facet of the AI for English Law project, specifically ‘Work Package 2: Mapping Digital Justice’ which sits alongside five other Work Packages covering: (i) new business models in legal services, (ii) frontiers of AI in legal reasoning, (iii) coordinating skill investment and tech transfer, (iv) law and technology education and (v) mapping the LawTech and innovation ecosystem. The most recent report from the Work Package 2 research stream – ’Building a Justice Data Infrastructure’[3] – was published in October last year.

The report sets out an ‘overarching governance blueprint’ of the legal opportunities and constraints (including the protection of fundamental rights of data subjects) involved in creating a robust data infrastructure in the context of the HM Courts and Tribunals Service (HMCTS) digital reform programme. The analysis in the report is structured around the stages of research related to data processing (collection, preparation and linkage, access and retention/re-use) and a series of recommendations in relation to each of these. Recommendations of particular interest include:

Principles: research use of justice data must be responsible and, akin to existing principles under data protection and human rights law, adhere to principles of proportionality, security, and transparency
User classification: access to justice data for research and evaluation purposes should be granted based on whether a project serves in the ’public interest’, acknowledging that even commercial/non-independent research may serve the public interest (as well as private interests of a proprietary nature), so should be permitted access in this case (something that HMCTS does not currently allow)
Data collection:
- an outward looking data catalogue that identifies available datasets for research and evaluation should be produced and used to encourage third parties to explore different avenues of research
- there should be engagement with stakeholders in respect of data that will be collected and implementation of any recommendations
Data preparation and linkage:
- datasets should be prepared, curated and maintained for research and evaluation
- with a long-term vision, striving for high data quality to maximise datasets’ analytical potential in accordance with the FAIR principles[4]
- collaboration with a trusted intermediary (or trusted-third-party) with experience in securely de-identifying justice system users and linking datasets should be considered
Data access:
- a governance body should be established – whether a new one or through expanding the remit of the HMCTS Data Access Panel – to ensure adherence to policy/legal compliance as well as the operational aspects of data sharing
- public engagement should be a core facet of data access policy with transparency around the purposes of the data sharing and processes employed by the data infrastructure to indicate the appropriateness in sharing such data
Data retention and re-use:
- a ’retain and re-use’ policy should be used so as to ensure the proportionate use of justice data to minimise risk of interference with data protection and privacy rights of the data subjects
- datasets should be pseudonymised with the separation of content data (e.g. the outcome of a case) from demographic data (i.e. identifiable data such as names, addresses and dates of birth).[5]

It is notable that these recommendations draw (whether directly or indirectly) on key guidelines and macroeconomic principles that we have seen published to date on the use of AI, such as the European Commission’s White Paper on AI (which also advocates for the use of the FAIR Principles[6]) and the Information Commissioner’s AI Auditing Framework[7], as well as at a more international level, the OECD AI principles[8]. All of these sources have common ground in that they advocate for careful consideration of the following areas when deploying AI:

training data and data
record keeping
user transparency (including engagement and information provided to these users)
robustness, accuracy and security
human oversight
human values and fairness
responsible governance
accountability

The AI for English Law recommendations therefore represent a tangible and fascinating example of these high-level principles now being converted into concrete recommendations and ultimately government policy and legislation. DLA Piper is monitoring these developments, as well as the introduction of new AI principles and publications at an international level, and can assist with any organisations looking to ensure they deploy and contract for AI in a mature way that is cognizant of this existing body of guidance.

Please contact the authors or your regular DLA Piper contact for any questions you may have about deploying and contracting for artificial intelligence or harnessing data (including justice data).

[1] https://www.law.ox.ac.uk/news/2020-12-08-ai-english-law-project-granted-unprecedented-access-british-and-irish-legal

[2] https://www.lawgazette.co.uk/news/legal-research-pioneer-to-close-ai-site-after-copyright-challenge/5106792.article

[3] https://www.law.ox.ac.uk/sites/files/oxlaw/ukri_justice_data_report_fv_0.pdf

[4] These are: Findability, Accessibility, Interoperability and Re-usability. https://www.nature.com/articles/sdata201618

[5] https://www.law.ox.ac.uk/sites/files/oxlaw/ukri_justice_data_report_fv_0.pdf

[6] https://ec.europa.eu/info/files/white-paper-artificial-intelligence-european-approach-excellence-and-trust_en

[7] https://ico.org.uk/about-the-ico/news-and-events/ai-auditing-framework/

[8] https://www.oecd.org/going-digital/ai/principles/