If we don’t solve the problems of our past, the futures that we want become ever more difficult to obtain. That’s true in life, and it’s true in software. In life, the past comes in the form of baggage. In software, we call that “technical debt”. For both life and software, how do you deal with the past? Do you ignore it and hope for the best? Do you continuously make your future better, via therapy and code refactoring? Do you cut ties with the past completely, via moving to another city and starting a new git repository?
There’s no one way to move forward. Context matters, though it’s fair to say that we can always benefit from both therapy and refactoring our code.
We recently decided on our own way forward for our crumbling video game portfolio. Our games were stuck with an outdated, well, everything. Outdated login system, outdated cloud saves, outdated engine version. You name it, it was out of date and in trouble and probably didn’t work on a lot of modern devices.
After weighing the options, we decided to migrate the whole portfolio to a new web service while also clearing outstanding problems with the game engine and distribution platforms. Over half a million users used our legacy system, often with dozens of hours of their time invested into each cloud save. We migrated all that save data into our new system, which also required making major updates to local data storage and syncing for five games (each on 2-5 platforms).
We did all of this with zero downtime for our players.
Every step of that migration could have been a disaster, and pulling it off required a huge investment of development resources. So why did we do it at all? And, having decided to do it, how did we prevent disaster?
Takeaways
You can’t solve tech debt, but you can mitigate it. Mitigation requires constant, low, overhead. In the long run that overhead is worth it.
Sometimes the cost to clearing tech debt is so high, and the value of doing it so low, that starting over is the best path forward.
Always try to reduce your feature set. You can cut more than you think.
Don’t try to solve every tech debt problem at once. Sometimes a partial solution now makes a future solution more difficult, but the alternative is always a difficult solution now.
Make the migration small-batch deliverable to avoid the need for heroics (a core tenet of DevOps).
Zero-downtime migration for a centralized system (like a web service) with decentralized clients (like games) can take this approach:
Prepare the new system for (probably transformed) versions of the data.
Have the old system start tracking whether or not each user’s data has been copied to the new system.
Change all parts of the old system that interact with the data from data processors into data tubes: first migrate the data to the new system, then act as a proxy to transform legacy requests into new ones (and new responses into legacy ones). This ensures that active users will have their data migrated on demand.
Batch-migrate all data to the new system. This ensures that all inactive users will eventually have their data migrated, and can be done with lower-priority background processes.
Turn off the old system.
What’s Tech Debt?
"Tech debt" is the set of constraints you face because of past technical decisions. Maybe you have old, messy code that is hard to update, or an application written in a no-longer-supported framework. You may even have perfectly fine code that no one on your current team wants to touch, because because that code forms the foundation of a fragile house of cards.
Tech Debt is unavoidable. Even if your programmers followed all the best practices and used the most reliable and effective technologies, tech debt is being created externally all the time! New tools are constantly coming out, operating systems get updated, new phones (with more holes punched into their screens) come out every year and people are always refining best practices and inventing entire new ways of doing the work.
Every technical decision you make and every line of code you write is creating future technical debt. That means you can’t “fix” tech debt. Resolving tech debt requires making changes, and every change is just future tech debt.
There is no escape.
But that doesn’t mean we should throw up our hands and give up. Sure, every tech debt “fix” is a future tech debt problem, but the scale of the problem matters. Incremental improvements make future problems smaller, and consequently make the size of future tech debt problems as small as possible.
This is why the “Leave it better than you found it” rule is such a strong, practical approach to tech debt. Instead of letting problems fester (or trying to fix all the problems), you aggressively improve things the moment they impact your ability to move into the future, and otherwise leave things alone.
Having said that, sometimes you find yourself in tech debt so deep that everything is far harder than it should be. Every feature or bugfix is extremely costly, because your past decisions make all new changes difficult or risky. You can’t even use the Boy Scout Rule, because the code is so convoluted that you can’t just fix one thing. Sometimes it’s all or nothing.
Sometimes the only solutions to tech debt are to either Just Walk Away (and hope for the best), or to Burn It All Down.
BscotchID: Our Tech Debt
When I joined the Butterscotch Shenanigans (“Bscotch”) team in 2014 the studio had already launched a few games. One of them, Quadropus Rampage, had millions of downloads on mobile. New to the industry, I asked my teammates, “How do we tell all those people already playing our games about our next game?” The answer was… we couldn’t. In general, digital stores don’t let us (as developers/publishers) reach out to the players who bought our games.
Surviving in the games industry was already hard enough, but not being able to roll one success into another by marketing to our existing players made it far harder. Even today most platforms don’t allow developers/publishers direct marketing access to their game’s players.
To solve this problem we needed to give users a reason to let us contact them. But users are extremely (and rightly) suspicious when companies ask for their email address. So while a simple newsletter signup would have been the easy technical solution to this problem, the user incentives weren’t high enough. We needed something fancier.
We settled on cross-platform save syncing (now common enough in the industry to have the shorthand term “cross-save”) as that core feature. By providing a useful feature to our players, one that happened to require a user email address to work (and that wasn’t being served by their existing accounts), we got the ability to email our users as a super useful side effect.
Thus the idea for BscotchID, our first cloud service, was born. We just had to, you know, actually build it. More specifically, I had to build it; we were a three-person team, and the other two were busy making our games.
I’m a self-taught programmer. I had never done any significant web development, built software with a team, nor built any production-level software prior to joining Butterscotch Shenanigans. I was learning on the job, but without mentorship because I was the only “web developer” on the team.
Over about three months I built our BscotchID account system and all the in-game code needed to talk to it. I was unfamiliar with pretty much every part of the tech stack, so I learned just enough of GameMaker Studio (our game engine), PHP, MySQL, HTML, CSS, and JavaScript to get the job done.
If you’re thinking that 3 months isn’t much time for one person to learn an entire 6-technology stack and use that stack to build a production-ready and secure user account system, with all the features a user would expect (cloud saves, achievements, friends, messaging, leaderboards, etc.), with clean enough code to last long into the future, then, well, you’re absolutely correct about that.
Shortcuts were required. Though, to be fair, I knew so little about development at the time that I didn’t even know I was taking shortcuts. I’d never heard of test driven development, development/test environments, or even clean code principles.
After initially launching BscotchID and updating all our titles to use it, a significant part of my next ~1.5 years went into features and maintenance. After all of that no-idea-what-I-was-doing development time, BscotchID was a walking pile of tech debt. Here’s a brief summary of the most glaring issues.
BscotchID had no development version. (A “development” version is a separate copy of the software that doesn’t talk to the same data as the “production” version, allowing for safe development and testing without risking negative impacts on real users) There was only production.
I had no automated tests, nor even a checklist of manual tests. I ran custom tests in production while working on specific parts of BscotchID. Then I just hoped nothing bad happened later.