While testing server.rebuild_now(), I ran into some high level problems, which kind of disrupted the technical foundation of the entire project. I think I can make some solutions for them, but this is definitely the biggest roadblock that I’ve hit so far with GhostiFi. Up until now, development has been pretty easy actually, mostly because I’ve been able to reuse lots of code that I had already written for HostiFi.
1. Cloudflare won’t proxy OpenVPN
While designing this solution, one of the most important parts to me was figuring out how to solve the problem of DNS propagation.
When a VPN server is rebuilt, the IP changes, and DNS changes can take up to 2 days to propagate worldwide. That is an unacceptable amount of time to wait for your VPN to come back.
My solution to the problem was to use Cloudflare, which I thought would be able to proxy the OpenVPN traffic to the servers. The reason that was a great idea is because the Cloudflare/proxy IP address would never change, only the server behind it. So if that were the case, DNS changes to the server behind the proxy would be instantaneous worldwide, since basically nothing changed from the public side.
The problem I discovered is that Cloudflare can only proxy HTTP/HTTPS traffic on the free plan. I tried changing the OpenVPN port to 443, but that wasn’t going to work either. I have to find a new solution to DNS propagation wait time.
Potential DNS Solutions
Use a Floating IP
One of the better ways to deal with this problem is to have a “floating IP” from Vultr for each server. As the servers get rebuilt, the IP is reassigned, and there is no DNS propagation downtime.
However, I haven’t tested yet if this would work across different datacenters. My guess is that it won’t.
Another problem with floating IP is the cost, that would add an additional $3/month of expense per server to an already more-expensive-than-average VPN solution.
Lower TTL to lowest possible setting
This is unfortunately the route I will have to take I think. I will set TTL to 120 seconds (lowest possible setting) on Cloudflare DNS so that hopefully the world will update the new IP within 2 minutes.
The reason this sucks is because the best case scenario now is a 2 minute interruption to service during a rebuild instead of no interruption, and also there is the possibility that the DNS change could still take much longer depending on the servers the customer is connected to.
I might keep the old server running for 5 minutes before destroying it to see if I can reduce downtime to zero for the 2 minute TTL.
2. Vultr snapshots are too slow
After heavily testing rebuilds using Vultr snapshots, they’re too slow and unreliable for this solution. They say they can take up to 60 minutes to complete, in practice it was much quicker, but I still wanted it to be as fast as possible and it just isn’t it that fast. Also, restoring from snapshot would occasionally hang for some reason and have to be restarted.
They’re still great for doing an occasional backup or whatever, but not great for the speed and frequency needed for GhostiFi.
Potential Snapshot Solution
I’m going to remove all the snapshot functions and replace them with a new solution. For each rebuild, I’m going to run the OpenVPN install script on a fresh server, and then copy the server-specific settings from a private DigitalOcean Spaces bucket and restore them over SSH.
I’m building GhostiFi completely transparently, sharing the process from concept, to sketch, pseudocode, and finally actual code.
If you have any feedback on GhostiFi or alternative solutions to these problems please let me know in the comments section!
I am also looking for feedback on the concept itself, as well as beta testers. Please sign up for the newsletter at https://ghostifi.net if you are interested.