Current state of the art re data linkage/federation/AI&ML&LLM across infrastructures: federation, governance, safe output methods#



Issues about federation of datasets were discussed, including identifying different datasets across multiple systems, how to collect identifiable information robustly, and how we can link up different approaches across the 4 nations effectively.

There was further discussion on how to effectively check ML models within TREs.

In the case of governance, it was suggested that a project working across multiple TREs should have one singular governance process.

Next steps#

  • Create a ‘panel’ focused on specific type of data/research (e.g. health, crime, financial) who can oversee specific research projects within these fields

Raw notes#

Data Linkage#

How do you go about the NHS Number?#

  • Uses NHS Standard NF5, after 3 they went to manual to track through the system.

  • Issues with health and non-health data

Names such as Dave / David can cause problems.#

  • Linksmart is a solution for this.

  • Collecting Crime Data

Scotland’s Approach#

  • a national ID number

Federation between datasets#

  • Identifying with confidence across TREs is important

  • Problem: Linking health with something else is problematic to match up and link it with addresses and names

  • Separation functions

  • Person has all the identifying information, but they do not have the data

  • TREs communications between each other need specific criteria, Scotland has 5 TREs

  • Having more than two, and introducing a central one is a possibility

  • Issues with identifying A-B data sets across multiple systems

  • Seeding Death Data – David and Debra Smith: D. Smith & D. Smith causes gender incompatibility issues

  • National Drug Treatment Data – At source they only collected initials ‘D.S.’, Gender and MM/YYYY of DOB. Deidentifying can cause linking problems. Education to non-education where they don’t have their common ‘number’ – how confident can we be that Participant A is the same participant in another TRE? If you’re not sharing names & addresses

  • Bringing in NHS data and also pseudo anonymise it – how can you work with it without a key?

  • Once you got a data linkage – bringing the different data types into a data set (TRE). E.g. Linking mental health data and shopping data, if you anonymise that and have their own key – they can do it anonymously for external sources

  • Education data between England, Scotland and Wales might use different notations

  • Residential Data can be used as a key

  • ‘E-child’ trying to link the NHS with the Department of Education

AI & ML#

  • People misunderstand the terms AI & ML with ‘Statistical Modeling’

  • Based on risk factors you can determine 70% precision pre-diabetic chance

  • Accessing ‘clinical like data’ with similar terminology to mimic clinic systems

  • AI – Offline AI: you can have an offline machine learning model – yes

  • Would multiple AIs learn the same thing on same data sets? – no

  • You can make it work with a shared API though (Stroke Predicition)

  • APRs – 8-9 expensive centre

  • Different type of interpretation of ML, ML data on health ‘takes your job’, ML data on other scenarios might be socially acceptable

  • Pattern finding models are popular and precise, this is lacking in statistical modeling

  • At the end of the day, medical data ML is not understood why it gives that result

  • Checking models are problematic and difficult, unsure results and unsure contents of the model begs the question of the model’s authenticity


  • Process is repeated a lot, no committee talks to each other and are a separate entity

  • Cannot start work unless approved

  • Doing a project between TREs, each TRE will have an approval process, ideally a multi TRE Project requires a single approval process, this decision should be approved across the other one

What would a solution to this problem look like?#

  • Current state of the art is the overarching question – needs a TRE panel to decide what is state of the art

  • Single ‘panel’ on a specialty (e.g. health, crime) who deal with specific projects, additionally members of the national TRE supervision