Skip to content

National Data Opt-Outs

Background🔗

The national data opt-out applies to the disclosure of confidential patient information for purposes beyond individual care across the health and adult social care system in England.

The national data opt-out does not apply to the OpenSAFELY COVID-19 Service or the OpenSAFELY Data Analytics Service.

In certain limited circumstances an OpenSAFELY Data Analytics Service project may wish to apply the national data opt-out, notwithstanding that the service operates under an exemption to the national data opt-out policy. This page describes the technical implementation for projects that require it.

Technical details🔗

The list of patients with an active national data opt-out🔗

The system suppliers provide a list of pseudonymous IDs for patients who do not have an active national data opt-out. It is populated by the system supplier according to the policy agreed with NHS England. This list is provided and stored in the secure database along with the rest of the patient data. It consists of a single bespoke table, with a single list of pseudonymous IDs and no other information.

How is permission to access national data opt-out data determined?🔗

The list of projects with access to national data opt-out data has been embedded into the platform's public codebase, rather than being stored in a database. This is an unusual step from an engineering standpoint, but it means that any changes to the list are automatically included in the public audit log of code changes. It is also automatically covered by our code protection rules which require independent sign-off by another developer for all code changes.

The project permissions file and history of changes to it are all publicly available on Github.

How is permission to access national data opt-out data enforced?🔗

In OpenSAFELY researchers do not have direct access to the data. Instead they describe the data they require using ehrQL, our Electronic Health Record Query Language, and ehrQL is responsible for fetching it.

At the point where ehrQL needs to fetch the data, it is told (by the system described above) whether it should include data from opted-out patients or not.

Every ehrQL query contains a "population definition" which specifies exactly which criteria a patient must meet to be included in the result e.g. "patients between the ages of 18 and 65 who have not recently changed GP practice". Unless a project is named in the project permissions file, ehrQL will automatically add an extra condition to this population definition: the patient's pseudonymous ID number must appear in the list of ID numbers provided by the system supplier.

Again, the code which enforces this is publicly available on Github.