The Need for High Availability

| Jun 29, 2022

So, in typical fashion when it comes to technology, something broke.

Now normally, this wouldn’t be such a huge problem - except this time, it was that something broke while I’m away on holiday, with no access to my physical machines.

So, what exactly happened?

Well, great.

So, as mentioned, I was away on holiday when my docker server (the entire machine) just powered off. At the current time, I still have no idea what has caused the machine to power off, as I’ve still not returned home. The server happens to handle almost every externally-facing service I run - this includes the media server, my proxy manager, my password manager, so on and so on. So suffice to say, this was not an ideal scenario to be in.

Thankfully, the process to get some of my services back online was as simple as spinning up a backup NGINX container I had on my raspberry pi to handle the links to some of the media server services; this is only a temporary fix, though.

Note: The above links are Amazon Affiliate links and, as such, I receive monetary compensation for any associated purchases.

Hindsight is 20-20

Looking at the situation now, it becomes crystal clear to me that I need to implement some level of ‘high availability’ into my lab. My first plan will be to implement Kubernetes in some way to provide a backup to the most crucial of services, such as the proxy manager, password manager etc etc. As long as I have access to those, that should provide me with a reasonable amount of control and access, even while away.

The second thing I’m now looking at is making sure I have a proper ‘bastion server’ setup - At the moment, I’ve been working with Teleport, which has been pretty useful; but I haven’t got it set up on everything, yet.

I have a VPN set up on my home router, so I could at least go in and check on those services, but being able to jump directly into the servers externally would make a lot of difference.

Summary

In short, my plans are now to implement the following solutions to help avoid this in the future:

  • A Kubernetes cluster for crucial services such as proxies, password managers etc.
  • A bastion server for SSH access into all machines from one single source (using Teleport).
  • A proper monitoring system for my physical machines.
  • Moving some low-level services off the physical machines into the cloud (where appropriate).