by asdf14396
As the time to release the repository comes close, it’s time to finally put this series to an end. It’s been over a week since I posted part 2, and it seems like a good way to let people have a try at cracking the data before we just release it all for the world to see.
In the previous post, I talked about the data stored in the distribution system, but I intentionally avoided discussing some part of it, which I only said was “related to passwords”. And so, it’s finally time to reveal…
Part 3: The algorithm
This post will be highly technical, so I’m going to try to explain some concepts initially to help people understand it better. In order to explain how the encryption worked, it may be worthwhile to quickly go over some cryptography basics.
An encryption algorithm consists on applying a reversible transformation to some data using a certain encryption key. By “reversible transformation” we mean any kind of process that can be undone to recover the original data, although hopefully it will be one that will be hard to undo without the key. Undoing this process is of course called decryption, which applies the inverse transformation using a decryption key. Note that, while the encryption key and the decryption key will necessarily be linked to each other, they don’t need to be the same. Algorithms where the link between both keys is intentionally hard to reconstruct, making it next to impossible to calculate one key from the other, are called asymmetrical, and they are the basis of what is nowadays known as public key cryptography. On the other hand, algorithms where the keys are either equal or trivial to calculate from one another are called symmetrical. The algorithm used here is a symmetrical one.
While encryption uses the key to transform the data, it doesn’t actually store the key with the data. Keeping the encryption key away from the data is a fundamental step in protecting the data (unless you’re dealing with public key encryption). Also, note that it’s (usually) possible to decrypt data with the wrong key — of course, you’ll get the wrong data if you do this, but there is no indication of this happening. Therefore, validating the data must be a separate step.
Considering all of these matters, the straightforward implementation would be to store the passwords in the ROM along with the data for each distributed Pokémon (encrypted with some key), and also store some checksum to verify that the data has been successfully decrypted. The structure I mentioned in the previous post would allow this, since there are 8 bytes reserved for the password system before each Pokémon, and passwords are at most 8 characters long.
Of course, this is not how it actually works. If we had done this, we would have needed to store the key in the ROM in order to be able to decrypt the distribution data. Someone could have harvested this key by debugging the ROM and decrypted all of the data, and all 20 distribution Pokémon would have been revealed early. Instead, the passwords are used as encryption keys, and not stored at all in the ROM. (I wasn’t lying when I said this!) The 8 bytes we reserved for “something related to passwords” are actually used for validation.
The algorithm basically works like this: when the distribution data is generated, those 8 bytes are filled with an identical byte. It doesn’t matter which byte (it is chosen at random), but all 8 bytes are filled with the same value. (Unused areas in the data, which are caused by names being shorter than the maximum length allowed, are also filled with random data (to add more noise to the encrypted output), but that data is actually random and not used for any purpose.) When the user enters a distribution code, the game attempts to decrypt each of the Pokémon using the corresponding password. If decryption results in the first 8 bytes being identical, then the code is considered valid and the remaining 34 bytes are used as the distributed Pokémon’s data; if those 8 bytes aren’t identical, decryption fails and the game tries the next entry in the dataset, until one entry succeeds (in which case a Pokémon is awarded) or all of them fail (in which case the code is considered invalid and the player is informed of this).
The code is therefore used to generate the key: since only the correct key will produce the original data (with 8 identical bytes in the beginning), and we can reasonably assume that incorrect codes will generate random-looking data that will not match this pattern, it becomes possible to store only the encrypted data and a validation header without storing the decryption key at all. Note that I said “used to generate the key”, not “used as key”: the characters that the user can enter come with the rather undesirable property of always having the upper bit set, other than the space ($7f) and terminator ($50) characters, so the upper bit is stripped from all characters, converting them to 7-bit values (the space and terminator characters are respectively converted to $bf and $cf before stripping the upper bit, since they would otherwise be indistinguishable from $ff and $c0 — and the former also represents the character 9
); the entire code is therefore turned into a 7-byte value. (Terminator characters only appear if the actual code is shorter than 8 characters long, filling up the unused space.) Finally, a fixed 5-byte string is appended to this value to generate the 12-byte key; for rather obvious reasons, OLDEN
was chosen as this fixed string.
The actual algorithm used to encrypt the data isn’t terribly interesting; it is mostly a series of XORs, additions, subtractions and permutations, good enough to mix and shuffle the data, ensuring that invalid codes wouldn’t result in valid outputs. (I’ll leave it as an exercise for the reader to find out if there are any additional codes that happen to arise from coincidence, i.e., from some key accidentally generating a correct validation header for one of the Pokémon in the dataset.) The current version of the actual function that does the encryption (which takes as arguments the 42-byte data structure, the 7-byte key and a 42-byte buffer for the result) is this one:
void encrypt (const unsigned char * data, const unsigned char * key, unsigned char * encrypted) {
unsigned char i, j, k, tp;
unsigned char width, shift;
unsigned char temp_data[42];
const unsigned char fixed_string[] = {0x8e, 0x8b, 0x83, 0x84, 0x8d}; // "OLDEN"
memcpy(encrypted, data, 42);
for (i = 41; i < 42; i --) encrypted[i] ^= encrypted[(i < 21) ? (41 - i) : (i - 21)] ^ key[i % 7] ^ fixed_string[i % 5];
for (i = 6; i < 7; i --) {
width = (key[i] & 15) + 2;
shift = (key[i] >> 4) + 1;
k = 0;
for (j = 0; j < width; j ++) for (tp = 0; (j + tp) < 42; tp += width) temp_data[j + tp] = encrypted[k ++];
memcpy(encrypted, temp_data + (42 - shift), shift);
memcpy(encrypted + shift, temp_data, 42 - shift);
}
memcpy(temp_data, key, 7);
memcpy(temp_data + 7, fixed_string, 5);
for (i = 0; i < 42; i ++) encrypted[i] += temp_data[(i + 6) % 12] - temp_data[i % 12];
}
There isn’t much left to say at this point; the secret is now revealed. I can only wonder what our local hackers and data miners could have done with this information back in the day; I know that some people tried to find the codes, but as this post should show, those efforts were misguided, as the codes themselves aren’t stored in the ROM at all. Feel free to ask any questions you might have.
I mentioned that the code above is the “current version” of the function; I’ve been recently making changes to the repository prior to its public release so it would be easier to use and understand. I’ll leave you for now with the original version of the distribution builder, which parses the text file shown in the previous post and generates the distribution.bin
file with all of the data already encrypted and ready for inclusion in the ROM:
Original autogen.c program
And a big shoutout to Pigu for coming up with this clever system.
Original Reddit thread