This has been the roughest week of our collective studio live – and we had a number of rough patches as some of you know. We are a small team and creating an online game – regardless of whether it is a massive open world MMO or a more modest game like ours- is a big undertaking. When we launched, we were reasonably certain, that everything worked out. So what has gone so horribly wrong and have we all been blind or stupid? If you want to understand what happened, below is the rundown on things. If you are not interested in explanations or excuses, just scroll to the announcement at the end for how we try to at least make amends for this clusterdrek.
We know the risk associated with online games, especially if it is structurally and technically an MMO (in the sense of most of the AI, Logic, Persistence and all the other stuff that makes the game operate being done via server). This is why we have been extra careful before we released Shadowrun Chronicles – the game had been running for months before launch without any issues (as our Early Access and Kickstarter players can attest to), we did a series of Mass Tests (simulating thousands of concurrent user sessions) and a day prior to launch we invited our Early Access community to come in droves by allowing them a day head-start for their characters (whom the previously had to endure to see wiped time and time again as we balanced and reworked the game). We upgraded our hardware to be extra safe, even though our servers rarely went above 5% load.
Come launch day and things work without a glitch…until they suddenly don’t.
Shortly after our studio fell victim to a district wide blackout at the most inopportune of times, our external server inexplicably slows down. The source for this was quickly identified (once we were back on line) – a specific task called “garbage collector”, which cleans up leftover allocated things in the memory and with the amount of players suddenly has too much to do, which results in things queuing up and eating away the hardware resources for every other process (like walking from a to b). So we reduced the amount of garbage produced, patched, and off we go…
Only to see lag creeping into the game again. Looking at the server actions and our logs, it turns out our servers clog up with threads they cannot compute fast enough, leaving ever more threads hanging, which results in the game becoming ever more unresponsive. So, what do players do, if the game does not react when they click on something? Right, they click on it again and again and again…creating more commandos for the server, who is struggling as it is. Vicious circle anyone?
Now these issues are something that accumulates over time…which is almost impossible to test without actual people doing a thousand actual things which differ from our mass tests. It also exposes problems by adding a magnitude. Things that worked badly but did not register when we had a little over hundred players concurrently suddenly explode with 500 players.
Take our hub: To be able to see other players running around, the server needs to collect the “commands” from every player and update every other client about the movements of each character in the hub. Previously that was never an issue. But with dozens of hubs open the small things stack up. And if they do, the server gets clogged up…and if that happens long enough it simply cannot work through commands fast enough…resulting in lag. One of the patches has reduced the size of each command (so the amount of data the server has to compute) to 20% of what it was and optimized the associated logic at the server, so it is 40 times as fast as it has been before.
So why didn’t we do this before? Simple: It wasn’t a problem, so we weren’t fixing it – every change of a running system has the risk of creating new bugs and if it ain’t broken…
The example above is but one part of the complex and interwoven web of logic and relations and we wish we could enter that code Shadowrun style and hack our way through things with ease, but unfortunately that is not the case. Instead our 6 coders and IT as well as QA have worked day and night for three days to correct problem after problem, but as with any intricate webs, when you start pulling one strings, the whole thing changes and transforms. So every patch improved parts of the problem…only to see another issue creep up. And we may not have seen the end of this, as we continue to fix things. And it does not help if people don’t sleep for over 40 hours…
TL;DR: This is fragged and we are desperately trying to fix it, but we are stretched to the limits.
As a way of saying sorry, we want to give everyone who endured this week of constant patching and those who just gave up a little gift and ask them to give the game one more shot.
Mid next week, every user will receive three things:
- A free additional character slot (so if all that patching messed up your builds, you can start a clean one)
- Two pieces of cultured bioware – these cost zero Essence so you can use them with any character even mages or make sure you cybered up street sams get that extra boost they have been looking for
- A flamethrower, so whenever you get really angry at us for that drek, you can just incinerate a few enemies and think of us.
We are still working on creating a PvP and Endgame mode – most of this is design and thus not overly affected by the massive amount of work our coders had to do, but then our coders will need some sleep after this marathon.
Again, we are sorry for this and after putting years of work into this, it is unbelievably disappointing for us to see this happen and leave a bad impression.