Architectural Changes for Preservica’s Future
As we try to expand and scale our business, it’s time for us to bring in some standard industry practices regarding modern architecture, scalability and resilience.
But there are extra challenges when you’re trying to do that into a large existing system which is continually in use and has an existing build and deployment pipeline that’s been constructed for what we were already doing.
None of what we’re starting to do here is novel or invented by us. This technology’s been around for a while, and become mainstream over the last 5-10 years. How to apply it to our situation is more interesting.
In case you’re coming from an environment that hasn’t had to work with scalability solutions, here’s a quick summary of what I’m talking about:
- Containerisation: the idea of running a service or application inside a container (like a lightweight virtual machine), so it’s independent and isolated from other services. Docker is the most common container technology.
- Microservices: breaking your application up into small, independent services which communicate with each other by a well specified boundary contract. Microservices are typically deployed in containers.
- Elastic scaling. Containers are intrinsically scalable, in that you can choose to start more than one instance of the same container image. Elastic scaling means that you have some software which watches metrics about your services and scales them up and down (by adding or removing instances) so you automatically have the right number of each service for system conditions. This is often done with Kubernetes, which groups related containers into a ‘pod’ which can be scaled together.
- Message based architecture and message brokers: a way for services to communicate with each other asynchronously, so one service doesn’t have to wait when it makes a call to another. Service instances can be decoupled from each other by using a central message broker. The standard is a server that supports AMQP, for example RabbitMQ.
Why are we doing this?
Preservica’s always had scalability as a consideration. Up until recently, the way we’ve addressed that is that expensive operations take place on the job queue (either as steps within a workflow, or jobs started on a schedule like storage integrity checks), and if the system is overloaded, we can add extra job queues. Some of you with Enterprise (EPC or on premise) systems will know about this because we’ve worked with you to add extra job queue servers to your system, and our operations team can tune the number of job queues our shared cloud instances have.
This is a manual process, and it’s also a blunt instrument - we end up scaling entire job queues (which are large and expensive to set up) when perhaps only one part of it is really under load, for example the FFmpeg tool if you have a migration workflow that’s running on a lot of videos. It’s also difficult to have any kind of elastic scaling for entire job queues, as there isn’t any tooling to help with that and the metrics available are very high level.
As we’re trying to expand our SaaS offering - Preservica Starter greatly increases the number of people using our systems, and initiatives like Preserve365 will result in much larger loads on some aspects of that system - this method of scaling becomes insufficient. The operational expense of managing more systems under load becomes unsustainable, and the infrastructure cost of scaling the entire job queue rather than just the parts actually under load gets expensive. And being able to elastically scale those smaller parts will further improve infrastructure cost - firstly for our cloud systems, but also if you have an on-premise scalability need.
It also has internal benefits for the engineering teams, because by separating and isolating services from each other we reduce the chances of code changes causing unexpected consequences a long way from where the change is made.
What do I get out of this?
Users of our cloud hosted systems won’t notice anything changing. But that lack of a change is the benefit for you: this will make our system more reliable, responsive to usage patterns, resilient, able to handle larger numbers of users, so as we grow it means you won’t see the problems that would otherwise be caused by an overloaded system.
Eventually, it should also enable us to make high load activities (ingest and migration) scale more responsively and complete more quickly. On-premise customers with high load will also be able to take advantage of the updates, and get the same benefits as our cloud operations, if they need to.
Microservice size and the Preservica ‘miniservice’
There is a trade-off when splitting a system up: you are reducing complexity within each part, but you are increasing the complexity of interactions between parts. Communication between different microservices is more expensive than communication between classes within a single code base, because you have to consider the asynchronous nature of those calls and the possibility that the other service is not available or busy, so smaller isn’t always better.
If you look up microservices online, most people will tell you they should be small and self-contained, with their own data sources and local responsibility. Some of our services can achieve that (“run ImageMagick”), but most of them will have to interact with the main entity data store, read your settings, interact with storage or the search index, and hence can’t really be called microservices. All of our services require a JVM so their infrastructure cost isn’t that small either. We chose to coin the term ‘miniservice’ so no-one expecting all the things that come with the word ‘microservice’, and because our services are a bit bigger than ‘micro’ so ‘mini’ is the next step up.
When you’re starting with an existing system it can be hard to decide what should be a separate service. We chose to go by functional area - for example we have a miniservice to manage thumbnails, one for characterisation and so on.
Example: Automatic recharacterisation
To give you an idea of how it works in a simple case, here is a simplified summary of how different miniservices interact to accomplish automatic recharacterisation (part of our auto-preservation initiative):
The dashed lines are indirect calls, made through the message broker (RabbitMQ). The caller posts messages onto a queue which instances of a miniservice are watching; the recipient posts messages back onto a control queue to report on the status of the call, so if the caller needs to know the result it can see when it’s done.
You can see how, although each service has a clear responsibility, they access shared resources and require access to a shared content area to pass files between services.
Monitoring, metrics and elastic scaling
Once your system is no longer running on a few well known servers, it becomes harder to keep track of its performance, and we need some extra tooling to help. We’ve selected some common tools to help us with this.
At the core of this is Prometheus (a monitoring system and time series database), which accepts metrics and will then record them as a timeseries. We have metrics looking at the performance and resource utilisation of individual containers, metrics about the queues on the message broker, and miniservices can also submit their own metrics.
Those metrics then get used in three main ways:
- A Grafana (an open observability platform) dashboard is set up to show an operational overview of service status. This doesn’t do anything, but it lets us see if there is a problem more quickly.
- Certain metrics, centred around container resource limits, are monitored and will ping an alert if they see a problem, such as a service being down or overloaded.
- Each service or set of services will be associated with certain metrics to determine when they should be scaled up or down. We’ll be looking at container resource usage metrics and the size of the service queues on the message broker, as that indicates a service is being given more requests than it can process.
Changing the architecture of a running system
If we were building a new digital preservation system from scratch, it would be hard enough to design it with this architecture. But of course we aren’t: we have existing systems, on a variety of platforms and with different data and usage profiles, and they need to keep working. It isn’t possible to rebuild the whole thing and wait until it’s all done. So how do you make a big change like this in an existing system?
There are two important questions to consider when deciding what to turn into a miniservice. First, what could be a miniservice, and then which of those should become one given that we don’t have infinite capacity to work only on this.
For the first, look at which parts of the system are easy to separate from the main deployment. In a well built system, even one deployed as a few large applications like Preservica, there will be components and services within the code base, and they should have clear responsibilities and dependencies. It took us a significant time to look through our internal services and work out which ones were independent enough, with a clear responsibility, to become a miniservice.
Then look at operational load to see which areas of the system are being stressed by real world usage. There’s no point splitting something out as a separate deployment, at least early on, if that isn’t what’s causing scalability or reliability concerns. For us there were two clear winners - running third party tools and generating thumbnails - so that’s where we started, and as we continue the work we continue to select components that have a clear operational benefit.
Separation and boundaries
Once you select an area to work on, define the boundaries of that component. Hopefully you have interfaces within the codebase that already do most of that, but it’s likely that they won’t be completely externalised. For us the two major things we needed to address were how to access content files between services, and we were passing non-serialisable objects (for example entity objects) to them.
Once you have a well defined boundary, and a mechanism for calling between services (we have an in house library to help us make these calls via the broker), you can then deploy a service in a container and call out to that container from the places you needed to use it. For example, our migration code in the job queue can call out to LibreOffice inside a container rather than having to run it directly on the server.
We can do this piece by piece, so we can deploy more services inside containers as we create them.
What about on-premise deployments?
Our cloud hosted systems are likely to be deployed with Kubernetes using the platform native implementation (i.e. EKS on Amazon), and that’s what our operations staff will become familiar with. So how does that work when you’re not hosted by us in our cloud systems, and operated by us?
Our plan is to offer you two options:
- If you have your own Kubernetes deployment, you could deploy Preservica pods into that. You’d still be responsible for managing it, defining your own scaling metrics and operating it, though we could offer advice on setting it up.
- If you don’t have a need for elastic scaling, the Preservica containers can be deployed without Kubernetes, just using Docker. If you have existing Docker infrastructure, you’ll be able to run the containers there (as long as you can share content directories with it). Otherwise, Docker can be run anywhere, including on one of the existing servers of your Preservica deployment (we’ve put it on the job queue in some of our environments) or on a new dedicated server in the same group.
For now, if you don’t want to worry about any of this, you can continue to run without containerised services, as you have been up to now. Some newer features won’t work (auto-preservation, the first part of which is in 6.5, won’t run without miniservices) and that’s likely to be the case going forward as well (for example, remote contributors, work that’s ongoing at the moment, is also built with the new service architecture), and you won’t get any of the benefits of scalability or isolation, but perhaps you are not near the limits of your existing Preservica deployment.