Guides, templates & forms
- Online Forms: application and file upload – guidance
- Online Forms: requests and approvals – guidance
- Cultural questions – guidance
- Participant information sheet templates
- Potentially vulnerable study participants – guidance
- Recruitment and advertising materials – guidance
- Scientific peer review submissions – guidance
- Health information and data use – guidance
- Human tissue use – guidance
- Ethical standards for health and disability research
Identifiability of data – guidance
Guidance on the use of data including potentially identifiable, de-identified data and options for linking data.
On this page:
- Identified data
- Potentially identifiable (key-coded, re-identifiable) data
- Partially de-identified (AIDS-type code) data
- De-identified (not re-identifiable, unlinked) data
- Anonymous data
- Options for linking data
Identified data are data that allow a specific individual to be identified. Identifiers may include:
- the individual’s name
- the individual’s date of birth
- the individual’s address
- unencrypted NHI numbers
- or, in particularly small sets of data, information such as a postcode.
Potentially identifiable (key-coded, re-identifiable) data
- Data are potentially identifiable if it is possible to infer an individual’s identity from them.
- Key coding is the technique of separating identified data from substantive data. It maintains a potential link by assigning an arbitrary code number to each piece of identifiable data before removing the identifying factor to leave only the substantive data and code number.
- Key codes are held securely and separately. They allow substantive data to be re-associated with the identifying data under specified conditions.
- Data may be single-coded or double-coded for extra security.
Partially de-identified (AIDS-type code) data
- Data coded with abbreviated identifiers (eg, initials, date of birth, sex) are used for reporting AIDS, HIV and some other health conditions.
- This allows the reporting clinician to re-identify the data, but they are anonymous to the recipients, although duplicates can be linked. See data linking options for more information.
De-identified (not re-identifiable, unlinked) data
- When all identifiers have been removed permanently, data are considered to be ‘de-identified’. Note: Some documents use the term ‘de-identified’ to refer to data that have had names only removed. However, such data remain ‘potentially identifiable’.
- De-identified data include data containing encrypted NHI numbers.
- Anonymous data have been collected without personal identifiers, and no personal identifier can be inferred from them.
Options for linking data
- De-linked data are irreversibly stripped of their identifiers.
- The data have maximum confidentiality.
- However, this means there is no way of identifying participants if an adverse condition is found or participants want to withdraw their consent.
Linked but de-identified
- Linked but de-identified data protect confidentiality while allowing participants to be re-contacted or to withdraw their consent as necessary.
- The local site retains the ability to link the data.
- There must be good encryption, security and access restrictions.
- There is some risk of inadvertent disclosure.
- This is a risky way to store data as they may be disclosed inadvertently.
- Data should be kept in an identifiable form for the shortest period possible.