Project discussion: Would a systematic approach to data risk classification be helpful?#

Chair: Will Crocombe (RISG Consulting)

Prompts#

Could there be a common language around risk?
Classify based on the ease of identifiability, plus ‘payload’ - what would we know about them?
Proportionate controls - Tiered. Gatekeepers and access points (where). E.g.
- 0 - public
- 1 - anonymised
- 2 - strong pseudo
- 3 - weak pseudo
- 4 - public
Dropping down tiers, things become easier. Turing paper on this - Sheffield used this as the basis of their system for assessing risk.
Alan Turing Institute paper
Importance of agreed risk classification with federation, and agreement on risk appetite
NIST RMF
NCSC
Harvard DataTags
UK Data Service data types
Doing this work at King’s similar classification to Turin paper
Dundee operate on a blanket tier
My question was going to be around risk classification, based on my understanding of Goldacre, pseudonymisation should not be relied on. I agree researchers should only be presented data required for their project, but the risk of de-anonymisation particularly when combining datasets means this should be treated cautiously at best.
Automation - reduces risk of error
Scottish Open Data
HIC RDMP