Identifiability of data – guidance

Guidance on the use of data including potentially identifiable, de-identified data and options for linking data.

Date types

On this page:

Identified data

Identified data are data that allow a specific individual to be identified. Identifiers may include:

  • the individual’s name
  • the individual’s date of birth
  • the individual’s address
  • unencrypted NHI numbers
  • or, in particularly small sets of data, information such as a postcode.

Potentially identifiable (key-coded, re-identifiable) data

  • Data are potentially identifiable if it is possible to infer an individual’s identity from them.
  • Key coding is the technique of separating identified data from substantive data. It maintains a potential link by assigning an arbitrary code number to each piece of identifiable data before removing the identifying factor to leave only the substantive data and code number.
  • Key codes are held securely and separately. They allow substantive data to be re-associated with the identifying data under specified conditions.
  • Data may be single-coded or double-coded for extra security.

Partially de-identified (AIDS-type code) data

  • Data coded with abbreviated identifiers (eg, initials, date of birth, sex) are used for reporting AIDS, HIV and some other health conditions.
  • This allows the reporting clinician to re-identify the data, but they are anonymous to the recipients, although duplicates can be linked. See data linking options for more information.

De-identified (not re-identifiable, unlinked) data

  • When all identifiers have been removed permanently, data are considered to be ‘de-identified’.  Note: Some documents use the term ‘de-identified’ to refer to data that have had names only removed. However, such data remain ‘potentially identifiable’.
  • De-identified data include data containing encrypted NHI numbers.

Anonymous data

  • Anonymous data have been collected without personal identifiers, and no personal identifier can be inferred from them.

Options for linking data

De-linked

  • De-linked data are irreversibly stripped of their identifiers.
  • The data have maximum confidentiality.
  • However, this means there is no way of identifying participants if an adverse condition is found or participants want to withdraw their consent.

Linked but de-identified

  • Linked but de-identified data protect confidentiality while allowing participants to be re-contacted or to withdraw their consent as necessary.
  • The local site retains the ability to link the data.
  • There must be good encryption, security and access restrictions.
  • There is some risk of inadvertent disclosure.

Identifiable data

  • This is a risky way to store data as they may be disclosed inadvertently.
  • Data should be kept in an identifiable form for the shortest period possible.