Project discussion: Would a systematic approach to data risk classification be helpful?#

Chair: Will Crocombe (RISG Consulting)


  • Risk - how much and what sort? Personal/sensitive, commercial, political, IP…

  • Why classify risk and how might it help?

  • What type and level of controls might be practicable and proportionate?


  • Could there be a common language around risk?

  • Classify based on the ease of identifiability, plus ‘payload’ - what would we know about them?

  • Proportionate controls - Tiered. Gatekeepers and access points (where). E.g.

    • 0 - public

    • 1 - anonymised

    • 2 - strong pseudo

    • 3 - weak pseudo

    • 4 - public

  • Dropping down tiers, things become easier. Turing paper on this - Sheffield used this as the basis of their system for assessing risk.

  • Alan Turing Institute paper

  • Importance of agreed risk classification with federation, and agreement on risk appetite


  • NCSC

  • Harvard DataTags

  • UK Data Service data types

  • Doing this work at King’s similar classification to Turin paper

  • Dundee operate on a blanket tier

  • My question was going to be around risk classification, based on my understanding of Goldacre, pseudonymisation should not be relied on. I agree researchers should only be presented data required for their project, but the risk of de-anonymisation particularly when combining datasets means this should be treated cautiously at best.

  • Automation - reduces risk of error

  • Scottish Open Data