Designing a secure backend infrastructure for patient data

September 10, 2023 by Caroline Morton, Harry Tsiligiannis and Maxwell Flitton

Designing a secure backend infrastructure for patient data

Introduction

Medical data is uniquely sensitive and valuable. It represents our most personal information in our most vulnerable of times. In the wrong hands, it could be used to identify us in a way that other data cannot. Patients need to be able to trust that their most personal information is being handled with the utmost care. As a breach of this trust could have serious consequences for the patient, it is highly regulated and the penalties for mishandling data are severe.

In this blog post we will discuss the design of a secure backend infrastructure for patient data. We will discuss the requirements for such a system, and how we can meet them. We will also discuss the tradeoffs that we have to make in order to meet these requirements. We will discuss the design of the system, and how we can use the tools available to us to build a secure system. Finally, we will discuss the future of the system, and how we can improve it.

Requirements

A backend infrastructure for patient data must meet a number of requirements. These requirements are driven by the need to protect patient data, and the need to ensure that the system is usable by the medical professionals who will be using it. We will discuss these requirements in turn.

Security

At the top of the list is the security of system. As mentioned above, it is vital that patient data is kept secure. A term that is used a lot in health tech companies is anonymisation - which refers to the process of removing identifying information from the data such as name or data or birth. This is a good start but it is not enough. So called anonymised data can be re-identified by combining it with other data sources, perhaps even those you have shared yourself on social media. Imagine a scenario where your medical data is leaked, and combined with your tweets about your new baby, location data from your phone and good wishes from friends on Facebook about an upcoming knee operation. This consellation of data alone could be enough to identify you in the leaked data and therefore reveal the rest of your medical history. The best way to guard against this is to ensure that the data is encrypted at rest and in transit. This means that even if the data is leaked, it will be useless to anyone who does not have the decryption key. This is the first requirement of our system.

Usability

Medical professionals are typically not software engineers. They are experts in their field but they are not experts in software. This means that the system must be easy to use. It must be intuitive and it must be reliable. Whilst SSH-ing into a secure environment and running a series of commands might be second nature to a software engineer, it is not something that a doctor would be comfortable with.

Auditability

Audit

Much of the compliance requirements for medical data are around auditability. This means that we must be able to track who has accessed the data, when they accessed it, and what they did with it. This is a requirement of the system.

Scalability

The system must be able to scale to meet the needs of the business. This means that it must be able to handle a large number of users, and a large amount of data. Data typically needs to be kept for at least 6 years and often longer. This means the data stored in the system will grow over time. The system must be able to handle this growth.

Cost

The system must be cost effective. This means that we must be able to build the system using the tools available to us, and we must be able to run it on a budget. This means that we must be able to build the system using open source tools, and we must be able to run it on a budget.

The Developer Experience

Developer

The ideal system must be easy to update and maintain. This means that we must be able to make changes to the system without having to take it offline. The code should be well documented with both READMEs and code comments, and have a consistent and sensible structure. Tests both in the form of unit tests and integration tests should ensure that the system works as expected before an update is rolled out. Luckily for us, modern CI tooling such as Github Actions make this easy to achieve right from the start.

Design

We will now discuss the design of the system. We were contracted by a client to build a secure medical system that could be used by patients and health care professionals to store and access medical data.

The Stack

We decided to use the following stack to build the system:

PostgreSQL for the databases
Rust for the backend microservices
Distroless Builds for most of the microservices containers
Docker for containerisation
Kubernetes for orchestration

Devops Perspective - Structure of the System

From the DevOps perspective, creating a robust and secure infrastructure demands an intricate arrangement of various components that work in harmony to safeguard sensitive patient data while ensuring the system remains fluid and agile.

To facilitate this, we’ve structured our system to embody the following characteristics:

Microservices Architecture: Leveraging a microservices architecture, orchestrated through Kubernetes, our backend services built with Rust operate autonomously yet interdependently, making the system resilient and facilitating continuous deployment and integration processes.
End-to-End Encryption: Ensuring data is encrypted not just at rest but also during transit is non-negotiable. Utilizing high-grade encryption technologies, we maintain a fortress-like security mechanism, resilient against potential data breaches.
Containerization with Docker: Implementing Docker allows us to package our microservices and their dependencies into containers. This ensures consistency across multiple environments, be it development, staging, or production, effectively eliminating the “it works on my machine” issue.
Database Segregation: Our PostgreSQL databases are structured to facilitate data segregation, essentially compartmentalizing different data types and access levels, providing an extra layer of security while promoting efficiency in data retrieval processes.
API Security: Implementing strict authentication and authorization policies through API securities like OAuth 2.0 ensures secure communication between the frontend and backend systems.
Automated Backups: The system is configured to perform regular automated backups, ensuring data durability and aiding in quick recovery in the unfortunate event of data loss.
Monitoring and Alerts: Implementing comprehensive monitoring solutions that give real-time insights and trigger alerts in case of any suspicious activities or system health degradation, thereby facilitating a proactive response to issues.

Why Kubernetes?

When it comes to orchestrating containerized applications at scale, Kubernetes stands as a behemoth offering a range of benefits that made it the obvious choice for our infrastructure:

Automated Scheduling: Kubernetes intelligently schedules the containers based on resource utilization, helping in maintaining high availability and ensuring optimal performance.
Self-healing Capabilities: Kubernetes constantly monitors the health of nodes and containers, automatically replacing containers that fail, and rescheduling them to other nodes if necessary.
Horizontal Scaling: As demand increases, Kubernetes allows us to easily scale our applications horizontally, meaning we can add more instances to our deployment with simple commands or even automatically based on CPU usage.
Rollout and Rollback features: Kubernetes provides us with the ability to roll out updates progressively while monitoring the application’s health, ensuring zero downtime during deployments. Similarly, if something goes wrong, we can roll back to a previous stable version with ease.
Config and Secret Management: It facilitates secure management and injection of configuration details and secrets into the application, promoting a secure ecosystem for our sensitive medical data infrastructure.
Community and Ecosystem: Being a CNCF project with a vast community, Kubernetes offers an expansive ecosystem with a wealth of readily available tools and resources, and a rich community-driven environment ensuring it remains at the forefront of container orchestration.

By leveraging Kubernetes, we’ve built an infrastructure that is not only robust and scalable but also leverages modern best practices to maintain a high level of security, which is paramount in handling sensitive patient data.

Why Rust?

Rust Rust is a modern systems programming language that compiles into a binary that can be easily deployed. It is memory safe and this makes it far easier to write secure code. Memory leaks and buffer overflows are not a feature of working with Rust.

In addition, to the speed and security of Rust, it is a great developer experience. The compiler is very helpful and will catch many errors at compile time. This means that you can be confident that your code will work as expected. When a rust program is compiled into a single binary and packaged up in a docker image, it is very small in size - typically less than 20MB.

Why Distroless?

Distroless builds are docker images that contain only the application and its runtime dependencies. They do not contain any of the usual tools you would find in a Linux environment including bash. This has two main advantages. Firstly it means that the image is very small in size. Secondly, it means that the image is more secure. If an attacker were to gain access to the container, they would not be able to use any of the tools that they would normally use to escalate their privileges or copy data out of the container.

Why PostgreSQL?

We opted to use a standard SQL database. This is because the data we were dealing with in this situation was highly structured and repeatable, and the queries that we needed to run were well defined ahead of time. This meant that a relational database was a good fit for our needs.

Linking back to the developer experience, we opted to use PostgreSQL because it is a well known and well documented database, and most new developers to our team would have some experience with it. This meant that they would be able to get up to speed quickly and start contributing to the project.

The Future

We don’t know what the future will hold for our client and their product. As many start-ups have to do, they may pivot and change their product offering. We have designed the system to be flexible and scalable so that it can grow and change with the business. It is well documented and split into small components that can be mixed and matched as needed. We hope that this will allow the system to grow and change as needed.