About two years ago, I promised that, once distribution was finished, I would make a lengthy post explaining how it all worked. Up until now, it has been one of those well-kept secrets that somehow never leaked, most likely since there aren’t many people that actually understand the system, despite the data is plainly visible to anyone with access to the source code.
Distribution isn’t finished yet, but it should be done soon. And thus, the time has come to lift the veil on a system that has remained, to my knowledge, uncracked so far. There’s a lot of information to share, so I’m making this a three-part series.
Part 1: The history
I don’t remember where the idea came from. Logs are completely lost, and thus I’m simply going from memory here. But a plan came up to distribute Pokémon after the game was released, as a DLC lookalike. Of course, the initial challenge was determining the list of Pokémon that we would distribute.
And here came the first problem. A quick calculation showed that actually distributing Pokémon was impossible. If we wanted to distribute a Pokémon, we would have to pack the data into a code — and even the shortest encoding we could come up with was 40 characters long, including checksums and validations.
So, the only way we could distribute Pokémon would be by encoding them all into the ROM at the time of release. The idea we came up with was that we’d store a code with each Pokémon, and thus we’d distribute the codes when we were ready to do so. That would put a limit on the number of Pokémon we could distribute, but we didn’t plan on doing it forever anyway.
We went back and forth on the number of Pokémon that we wanted to distribute as we filled up the list, but we eventually settled for 20. Most of the debate at that point was over which Pokémon we’d distribute — we gathered representative Pokémon from generation 1 and 2 runs, a few Pokémon we created honoring some staff members, and a few others (M4 comes to mind). Gathering the data was quite a process in and of itself — in some cases we had to go through old saves and the like to retrieve the exact data for each Pokémon. For instance, the released Pokémon from our Red run come from a collection of dumped saves, which we had to analyze one by one until we found one that contained the relevant information.
At some point while we were filling up the list, someone suggested bringing back Phancero. Phancero had been added to the codebase sometime during development, but it had been scrapped at the last minute before releasing the game to the stream; the only thing that remained at that point was its cry. It seemed like a shame to let it die, considering it had been designed full with sprite, base data, moveset and so on, so we came up with the idea of bringing it back in through the distribution system. Of course, this created a problem — we had to hide it. And we did, by not listing it, not talking about it, and only adding it to the latest version of the ROM, the one that actually contained the distribution system itself — I even removed it from my data dumps. Here’s a version with the data unmodified:
Phancero became pretty much AC’s Mew in this regard, and just like Mew, it managed to remain hidden, in this case until we released the code through Prism.
Of course, we had one last hurdle to cross, one that most people already know about. If we had just added the data to the ROM unchanged, anyone would have found the full list within days; we have plenty of tinkerers in the community, and we had even more back then. Many people tried to do just that as soon as the distribution system was announced, proving our point. And so, Pigu came up with an extremely clever system that allowed us to hide the list in a way it would be extremely hard to find and crack without disassembling the ROM. He wrote a Python script that took the raw data bytes and encoded them into this form, which was the way we first generated the distribution data file that would be included in the ROM. This script had one major flaw, though: it required encoding all of the data manually into raw hexadecimal bytes before passing it through the program, thus creating accidental encoding errors. We found more than a handful of those errors when testing the system, caused by our own mistakes when transforming the data into hexadecimal; to put a final stop to that, I wrote a C program that would parse a plain text file containing the data in human-readable form and generate the distribution data file. As part of this process, I had two options: either use pigu’s Python script to encrypt the data, or rewrite it in C. As I don’t know Python, I went for the latter choice; this took a bit of trial and error, but eventually (and with his help) I managed to produce output that matched pigu’s encryption. This program, as well as the data itself, still lives in the actual source code repository (i.e., not the public copy); this is the reason why the actual repository is still private (as the distribution data is extremely easy to find and read).
And that is the history of the AC distribution system. In the next post, I’ll talk about the data involved: what data is stored, what isn’t, what we gathered, and so on. Coming across a particular data file was what inspired me to make this series in the first place: that file is M4’s raw data from the Emerald save, which I’ll leave you as a sneak peek of what’s to come:
(shoutout to M4 for obvious reasons, and to LightningXCE for general help related to this post series)
Original Reddit thread (some old file download links may no longer function)