Maintenance burden and its impact on TRE development#
Session 1, Room 1
Chair: Jim Madge (Alan Turing Institute)
Outline#
A major challenge in TRE projects in maintaining a TRE codebase. The code can have a large number of lines, particularly as extra configuration is added to
fix bugs,
and improve security.
The code may also be, necessarily, complex involve many languages such as
Programming languages (e.g. Python)
Infrastructure as code language/frameworks (e.g. Terraform, Pulumi)
Bootstrapping tools (e.g. CloudInit, ARM templates, …)
Container orchestration (e.g. Kubernetes, Compose)
Configuration management (e.g. Ansible, Chef, Puppet, Salt)
Furthermore, writing the codebase requires good knowledge of the technologies a TRE is built on such as,
Networking (e.g. subnets, masking)
Internet protocol suite (e.g. DNS, SSL, TCP)
Linux (e.g. Permissions and ACL, PAM, systemd, distributions, packages)
Containers
Virtual machines
Cloud providers (.e.g. AWS, Azure, DigitalOcean)
These factors contribute in making codebases that are difficult to maintain. Finding people with the skill set to work on these projects can be challenging. Perhaps especially in academic/public sector spaces where wages are not competitive with roles using similar skills elsewhere. Making progress can be difficult as the effort required keep a TRE working correctly uses a lot of developer time.
Prompts#
Is your codebase difficult to maintain?
How many people work on your codebase, how many do you think it needs?
Do you struggle to find or recruit people with the skills you need?
Does the complexity of your codebase hinder or prevent you making improvements?
Are you looking to reduce complexity (for example by changing the tools used)?
If you were to start again, what would you do differently?
Notes#
Setting the scene:
TRE codebases are large and complex and grow with time
Could be/likely to be multi-lingual (software-wise)
Lots of config alongside code
First challenge is finding enough people with the right skills across a broad landscape
Second challenge is maintenance overhead is so great that it drowns out time for development
Funding and recruitment
Big enough to break out into a different discussion.
Fought hard to pay RSEs more than default institution salaries
Make the work environment attractive e.g. nice place to work
Bennett Inst have found a way to pay engineers “sensible” salaries to attract the right skillsets; it’s also a nice place to work (and that doesn’t come for free - creating that environment takes a lot of effort)
Things like
Autonomy
Interesting work
Coming together regularly
Career development pathways focusing on tech rather than management
Turing RSEs also have permanent contracts - unusual in academia
Code complexity / maintenance:
Onboarding new engineers into a big codebase project is difficult. Conversely, engineers leaving causes a leak of knowledge which is difficult to recover
Supporting multiple projects when each project is so different
Use code reviews
At Dundee it’s based on a bigger codebase from AWS, therefore it’s harder to wrap your head around a project as you have to understand the AWS codebase first
How can authors of code be sure that other developers understand their code / codebase?
Very good point in TRE land: if you don’t understand the codebase, how can you trust it, and how can you convince the infomation governance authorities that it’s trustworthy?
Documentation is on par with effort to writing the codebase (although is it the most useful way to impart knowledge? Does anyone read it…?)
Investing in comprehensive build systems (Bennett Inst use “just”) is one way to manage technical debt
Difficult to experiment with codebases
Is there an argument for having owners of subsections of a TRE codebase rather than expecting a small number of people to understand all sections - a product management approach?
We are trying to do this in an ongoing rewrite
Hindering development:
Feels like skills and funding are the large problems
Prioritisation is also important, particularly if we have a small team. Balancing maintenance and new features
Consider your stakeholders.
Your users rely on you
Academic funding does not support maintenance
In terms of team structuring, we have been inspired by https://teamtopologies.com/
Actions/next steps#
On funding and sustainability, the Funding & Sustainabilty WG is planning to write a position paper on short-term funding/long-term operation headache. Do join in!