Anonymisation

Why do I need to know about anonymisation?

  • In the absence of consent to use patient identifiable data in health research, it will often be possible to anonymise the data and thus enable its use without consent and without having to comply with the requirements of the Data Protection Act 1998.  Anonymisation of data is also one means by which ethical concerns about failing to respect the confidentiality of patients and their autonomous wishes to decide what is done with their personal data may be addressed. 

  • However, you should bear in mind that it is often technically challenging to achieve absolute anonymisation of data.  But, as explained throughout this toolkit, consent or anonymisation are not the only possible routes to lawful, ethical research uses of patient data, approval by the relevant authorisation body may be a viable alternative. 

What is anonymisation?

The process of anonymisation involves the removal of personal identifiers from a dataset to minimise the risk of disclosure. Data which is ‘truly anonymised’ contains no information that could reasonably be used, by anyone, to identify the individual whose data it is. Data may be anonymised by, for example:

  • Removing direct identifiers, e.g. name or address;
  • Aggregating or reducing the precision of information or a variable, e.g. replacing date of birth by age groups;
  • Generalising the meaning of detailed text, e.g. replacing a doctor’s detailed area of medical expertise with an area of medical speciality;
  • Using pseudonyms;
  • Restricting the upper or lower ranges of a variable to hide outliers, e.g. top-coding salaries.

What is pseudonymisation?

Data which is pseudonymised is anonymous to the people who receive and hold it (i.e. to the research team), but it contains information or codes which would allow others (e.g. the data controller) to identify an individual from it. Essentially the normal personal identifiers (such as names, address, NHS number etc) will be replaced by an artificially created identifier so as to conceal the identity of the individual. The links between the artificial and the normal identifiers should be stored separately and securely to the anonymised data.

Therefore, although the research team to whom the data is anonymous may not have to act in accordance with the Data Protection Act 1998, those who know the code to identify the data will still need to comply with the legislation.

The use of pseudonymised data is common in research.

The benefits of anonymisation?

There are many examples of the benefits derived from analysis of anonymous health care data for research purposes in Scotland. This research has increased the knowledge base, helped to improve health outcomes, informed the effectiveness, efficiency and safety of the health services we offer and influenced international medical best practice.

Generally information can be used more freely if the subject of the information is not identifiable in any way. Although good practice in governance dictates that there should still be safeguards in place to prevent inappropriate use of even anonymous information, legally consent does not need to be obtained for the use of anonymous data. In addition, if you are able to use anonymised data in your research project your use of the data will not be governed by the Data Protection Act 1998 as s1 of the Act provides that the Act only applies to personal data, and data can only be personal data if it is ‘identifiable.’

It should however be remembered that the anonymisation process is ‘processing’ for the purposes of the Data Protection Act 1998 and therefore the Act will need to be complied with while this process is being carried out.

However, it is recognised that it will not always be appropriate to use anonymised data, as some research projects will require richer datasets, the richness of which would be lost by the anonymisation process. If this is the case you will either need to obtain consent from individual to use their personal data, or you will need to obtain authorisation from the relevant authorising body for the use of the data.

Identifying individuals from anonymous information?

With both anonymised and pseudonymised data, it is sometimes possible that an individual will be able to be identified through combinations of information.

The most common potential identifiers are:

  • Rare disease or treatment
  • Partial address
  • Place of treatment
  • Rare occupation or place of work
  • Combinations of birth data, ethnicity, place of birth, data or death etc

When seeking to access data and when considering the legal obligations you must comply with, you must consider whether the data itself contains combinations of information which could lead to individuals being identified, or whether you possess other data sets which, when combined with the new dataset, could produce identifiable information. If the data does become identifiable then you will be bound by the obligations contained in the Data Protection Act 1998.

Anonymised data and ethical review?

Authorisation is needed for the use of even anonymised data, this will be facilitated by the SHIP Research Co-ordinator. However the use of anonymised data will be much easier to justify,  both legally and ethically, than the use of identifiable data.

In rare circusmstances, and from some ethical perspectives, patients may be seen to retain an interest in determining the kind of uses to which particularly sensitive and significant personal information is put, even when it has been effectively anonymised and cannot be traced to them.

SHIP and anonymisation

Often the data custodian will transfer data which has already been anonymised to the Safe Haven or directly to the researcher. In addition, for when a research project requires linked datasets, SHIP has established a SHIP indexing service to facilitate deterministic linkage of datasets. This service will maintain a population index based on a unique patient identifier (UPI) e.g. the Community Health Index (CHI) in Scotland. The indexing service will add anonymised identifiers (referenced to UPI) to individual records for the purposes of linking these records across two or more datasets. The indexing service will be separate from the linkage agent.

A SHIP linkage agent will then use the anonymised identifiers to perform the matching of records belonging to individuals from two or more datasets to form a single linked dataset. The identifiers for the linkage will be provided by the indexing service.

SHIP Guiding Principles and Best Practice

SHIP Guiding Principles and Best Practices reflect the values which underpin the SHIP project. They are designed to act as a guide for all those involved in SHIP and data sharing. You as a researcher or data custodian should be aware of these guiding principles and best practices as they provide useful guidance as to the standards of information governance promoted and expected by SHIP.

Principles
  • Researchers should normally only have access to anonymised data and be subject to an obligation not to attempt to re-identify individual data subjects.
  • Where possible and practicable, data should be anonymised before linkage and use so as to minimise risk of re-identification of individuals.
  • Where researchers cannot or do not intend to anonymise data and where consent for use of personal data has not been obtained, approval from an oversight body, e.g. Privacy Advisory Committee, must be obtained.
  • Where data have been anonymised, authorisation should be obtained where there is a risk of re-identification; anonymisation does not remove the need for authorisation.
  • Risk of re-identification must be assessed by a body/individual with the relevant expertise to make such judgments.
  • Data controllers should determine and agree upon the appropriate level of anonymisation to be applied to any given dataset or linkage exercise.
Best Practice
  • The appropriate level of anonymisation for each linkage should be agreed upon by all data sources and maintained by the linker i.e. the individual/programme responsible for combining data.
  • Where possible and practicable, data subjects should be provided with accurate information about the levels of protection afforded to their data by anonymisation as well as an account of the real risks involved.
  • There should be a separation of functions between data controllers, safe havens, linkers, indexers and recipients of linked datasets.
  • All users of data should have signed a Memorandum of Understanding with respect to data storage, use and protections of data subjects.

Return to top of page