SHIP Safe Havens

The Data Sharing Review report conducted for the UK Ministry of Justice in 2008 called for the greater use of ‘safe havens’ to permit “important statistical and research analysis to proceed, while minimising the risk of identifying individuals from within datasets”.  This report defined a safe haven as “[an] environment for population based research and statistical analysis in which the risk of identifying individuals is minimised” i.e. a place where research can be done on sensitive data such that the risk of disclosure is reduced by controlling who can have access, what data they can analyse and what outputs can be taken away. 

A safe haven may be accessed physical or virtually but a key characteristic is that researchers can only keep their results and even the results are screened to ensure no disclosive data is released.  At present the SHIP Safe Haven must be accessed physically.

The SHIP Safe Haven

SHIP has established a Safe Haven for use by researchers accessing data using SHIP.  There are three key features of the Safe Haven:

  1. The Safe Haven will provide a secure environment for the linkage, storage and analysis of personal data.
  2. Only ‘approved researchers’ will be permitted to access the data from defined physical locations, initially via dumb terminals (i.e. within safe havens).
  3. There will be penalties for anyone who abuses personal data. Researchers will be bound by a strict code, which prohibits disclosure of any personal identifying information. Safe havens will carry out statistical disclosure control on outputs to prevent accidental disclosure

The Safe Haven will hold the linked datasets and ensures that only approved researchers can gain access.  Researchers will access the data held within the Safe Haven via a dumb terminal in a secure access facility.  Analytical software will be available within the safe haven for use by researchers.  The dumb terminals will be configured so that the researcher cannot download or remove any of the data or outputs held at the Safe Haven.  A dedicated file space will be provided for the researcher to store their outputs pending release by the safe haven.

Using a Safe Haven to reduce privacy risks

  • A researcher may choose to specify that they wish to use a SHIP Safe Haven to access patient data in order to reduced the privacy risk category that is assigned to their application and thus make it more likely that it gains approval. 

  • Alternatively, data custodians themselves will have an opportunity to set this as a condition of permitting data access for which they are data controllers if they assess that the risk to confidentiality of transferring the data directly to them is too great.

  • Finally, if the data access request is allocated a higher risk category by the SHIP Research Coordinator and sent to the Privacy Advisory Committee (PAC) for consideration, PAC may set use of the SHIP Safe Haven as condition of authorising the data access application.

Safe Haven security principles

The following security principles must be adhered to when accessing data through a SHIP Safe Haven:

  • The data controller(s) must be defined at each stage of the process and that controller must be aware of their responsibility.

  • An indexing service should never receive any information about the patient/client/research subject other than the required identifiers.

  • All information must be encrypted before transmission between data controllers, safe havens, indexing and linkage services.

  • All data in the safe haven are held on secure servers located on the NHS network.

  • Data should be de-identified where possible.

  • Where the safe haven must hold data with identifiers it must be held on separate servers from de-identified data.

  • All data processes are carried out within secure offices, on the NHS network, prior to secure releases to external data users

  • Safe haven to check that all approvals are in place before receiving data for processing.

  • A record of all projects, approvals and data releases is to be kept on a Project Management System

  • All processes are audited annually by external auditors and actions taken in response to issues raised.

Conditions upon a user’s visit

The user will perform the analysis without:

  • Attempting to remove data;
  • Attempting to identify individuals, households,or firms;
  • Using data for which they are not licensed;
  • Using data for anything other than the proposed project;
  • Linking/matching data without permission;
  • Handing out usernames and passwords to others;
  • Writing anything down from the screen;
  • Attempting to photograph screen etc;
  • The session will be recorded.

Data indexing service

SHIP has established a separate National Indexing Service to facilitate deterministic linkage of datasets. An indexing service receives only a project code, local identifiers and subject identifiers from the data sources for each of the datasets that are to be linked and no other data. The indexing service creates a study specific anonymised identifier for each subject (called a study number) and returns this with the associated project code and local identifier.

The SHIP indexing service will maintain a population index based on a unique patient identifier (UPI; eg the Community health Index (CHI) in Scotland). The indexing service will add anonymised identifiers (referenced to UPI) to individual records for the purposes of linking these records across two or more datasets. The indexing service will be separate from the linkage agent.

If you are having your research dataset directly transferred to you, you can still make use of the indexing service. Once the data has been indexed you will be sent the anonymised identifier.

Data linkage service

The SHIP linkage agent will use anonymised identifiers to perform the matching of records belonging to individuals from two or more datasets to form a single linked dataset. The identifiers for the linkage will be provided by the indexing service.

The linkage agent will carry out linkages involving either data from multiple Health Boards or data held at national level.  The Indexing Service supplies the linkage key so that study numbers can be matched across datasets, as described above. The Linkage Agent does not receive identifiable information (e.g. names, addresses or CHI numbers). The Linkage Agent uses the linkage key received from the Indexing Service to join datasets for the study and deposits the linked dataset in a separate area of the Safe Haven. The Safe Haven is the Data Controller for both the received datasets and the newly created linked dataset.

Separation of responsibilities

To ensure high levels of information security and the protection of subject  confidentiality the storage of contributory datasets, indexing, linkage of data, and storage of the final dataset must be carried out separately. 

  • In practice this means that no individual should be directly involved in any more than one of these processes, but a single organisation could host more than one activity with appropriate segregation of roles and IT facilities.

  • The Indexing Service will be ‘stand alone’, because this is the only function for which patient identifiers are required.

The Safe Haven will be responsible for the remainder of the processes, which use anonymised data: linkage of data, provision of analytical software, the separate storage of the source and linked datasets and the analytical outputs. The Safe Haven will also be responsible for the implementation of other safeguards, including statistical disclosure control, accreditation of researchers (as part of a central register of approved researchers) and adherence to the good governance framework

Statistical Disclosure Control and Archiving

The aim of statistical disclosure control (SDC) is to prevent someone who is reading research outputs from finding out confidential information.  The SHIP Safe Haven is responsible for undertaking SDC prior to release of analytical outputs to researchers.  This will be done by appropriately trained employees of the safe haven.  Once the output is deemed safe it will be sent to the researcher electronically.

The level of disclosure control required will vary between studies. It is the responsibility of the data controllers for the contributory datasets and the Caldicott Guardians to decide upon the appropriate level of disclosure control at the beginning of the project before the datasets are linked and access is provided to the researcher.

The Safe Haven will also provide an archiving service for all linked datasets so that researchers can return to the dataset for an agreed specified period of time following the initial analysis. While an extension to the time may be easily arranged, the  analysis must still be covered by the original research application. If not then another application must be submitted.

Return to top of page

Return to Glossary