Month: September 2018

AC distribution system revealed – Part 3

by asdf14396

As the time to release the repository comes close, it’s time to finally put this series to an end. It’s been over a week since I posted part 2, and it seems like a good way to let people have a try at cracking the data before we just release it all for the world to see.

In the previous post, I talked about the data stored in the distribution system, but I intentionally avoided discussing some part of it, which I only said was “related to passwords”. And so, it’s finally time to reveal…

Part 3: The algorithm

This post will be highly technical, so I’m going to try to explain some concepts initially to help people understand it better. In order to explain how the encryption worked, it may be worthwhile to quickly go over some cryptography basics.

An encryption algorithm consists on applying a reversible transformation to some data using a certain encryption key. By “reversible transformation” we mean any kind of process that can be undone to recover the original data, although hopefully it will be one that will be hard to undo without the key. Undoing this process is of course called decryption, which applies the inverse transformation using a decryption key. Note that, while the encryption key and the decryption key will necessarily be linked to each other, they don’t need to be the same. Algorithms where the link between both keys is intentionally hard to reconstruct, making it next to impossible to calculate one key from the other, are called asymmetrical, and they are the basis of what is nowadays known as public key cryptography. On the other hand, algorithms where the keys are either equal or trivial to calculate from one another are called symmetrical. The algorithm used here is a symmetrical one.

While encryption uses the key to transform the data, it doesn’t actually store the key with the data. Keeping the encryption key away from the data is a fundamental step in protecting the data (unless you’re dealing with public key encryption). Also, note that it’s (usually) possible to decrypt data with the wrong key — of course, you’ll get the wrong data if you do this, but there is no indication of this happening. Therefore, validating the data must be a separate step.

Considering all of these matters, the straightforward implementation would be to store the passwords in the ROM along with the data for each distributed Pokémon (encrypted with some key), and also store some checksum to verify that the data has been successfully decrypted. The structure I mentioned in the previous post would allow this, since there are 8 bytes reserved for the password system before each Pokémon, and passwords are at most 8 characters long.

Of course, this is not how it actually works. If we had done this, we would have needed to store the key in the ROM in order to be able to decrypt the distribution data. Someone could have harvested this key by debugging the ROM and decrypted all of the data, and all 20 distribution Pokémon would have been revealed early. Instead, the passwords are used as encryption keys, and not stored at all in the ROM. (I wasn’t lying when I said this!) The 8 bytes we reserved for “something related to passwords” are actually used for validation.

The algorithm basically works like this: when the distribution data is generated, those 8 bytes are filled with an identical byte. It doesn’t matter which byte (it is chosen at random), but all 8 bytes are filled with the same value. (Unused areas in the data, which are caused by names being shorter than the maximum length allowed, are also filled with random data (to add more noise to the encrypted output), but that data is actually random and not used for any purpose.) When the user enters a distribution code, the game attempts to decrypt each of the Pokémon using the corresponding password. If decryption results in the first 8 bytes being identical, then the code is considered valid and the remaining 34 bytes are used as the distributed Pokémon’s data; if those 8 bytes aren’t identical, decryption fails and the game tries the next entry in the dataset, until one entry succeeds (in which case a Pokémon is awarded) or all of them fail (in which case the code is considered invalid and the player is informed of this).

The code is therefore used to generate the key: since only the correct key will produce the original data (with 8 identical bytes in the beginning), and we can reasonably assume that incorrect codes will generate random-looking data that will not match this pattern, it becomes possible to store only the encrypted data and a validation header without storing the decryption key at all. Note that I said “used to generate the key”, not “used as key”: the characters that the user can enter come with the rather undesirable property of always having the upper bit set, other than the space ($7f) and terminator ($50) characters, so the upper bit is stripped from all characters, converting them to 7-bit values (the space and terminator characters are respectively converted to $bf and $cf before stripping the upper bit, since they would otherwise be indistinguishable from $ff and $c0 — and the former also represents the character 9); the entire code is therefore turned into a 7-byte value. (Terminator characters only appear if the actual code is shorter than 8 characters long, filling up the unused space.) Finally, a fixed 5-byte string is appended to this value to generate the 12-byte key; for rather obvious reasons, OLDEN was chosen as this fixed string.

The actual algorithm used to encrypt the data isn’t terribly interesting; it is mostly a series of XORs, additions, subtractions and permutations, good enough to mix and shuffle the data, ensuring that invalid codes wouldn’t result in valid outputs. (I’ll leave it as an exercise for the reader to find out if there are any additional codes that happen to arise from coincidence, i.e., from some key accidentally generating a correct validation header for one of the Pokémon in the dataset.) The current version of the actual function that does the encryption (which takes as arguments the 42-byte data structure, the 7-byte key and a 42-byte buffer for the result) is this one:

void encrypt (const unsigned char * data, const unsigned char * key, unsigned char * encrypted) {
  unsigned char i, j, k, tp;
  unsigned char width, shift;
  unsigned char temp_data[42];
  const unsigned char fixed_string[] = {0x8e, 0x8b, 0x83, 0x84, 0x8d}; // "OLDEN"
  memcpy(encrypted, data, 42);
  for (i = 41; i < 42; i --) encrypted[i] ^= encrypted[(i < 21) ? (41 - i) : (i - 21)] ^ key[i % 7] ^ fixed_string[i % 5];
  for (i = 6; i < 7; i --) {
    width = (key[i] & 15) + 2;
    shift = (key[i] >> 4) + 1;
    k = 0;
    for (j = 0; j < width; j ++) for (tp = 0; (j + tp) < 42; tp += width) temp_data[j + tp] = encrypted[k ++];
    memcpy(encrypted, temp_data + (42 - shift), shift);
    memcpy(encrypted + shift, temp_data, 42 - shift);
  }
  memcpy(temp_data, key, 7);
  memcpy(temp_data + 7, fixed_string, 5);
  for (i = 0; i < 42; i ++) encrypted[i] += temp_data[(i + 6) % 12] - temp_data[i % 12];
}

There isn’t much left to say at this point; the secret is now revealed. I can only wonder what our local hackers and data miners could have done with this information back in the day; I know that some people tried to find the codes, but as this post should show, those efforts were misguided, as the codes themselves aren’t stored in the ROM at all. Feel free to ask any questions you might have.

I mentioned that the code above is the “current version” of the function; I’ve been recently making changes to the repository prior to its public release so it would be easier to use and understand. I’ll leave you for now with the original version of the distribution builder, which parses the text file shown in the previous post and generates the distribution.bin file with all of the data already encrypted and ready for inclusion in the ROM:

Original autogen.c program

And a big shoutout to Pigu for coming up with this clever system.

Original Reddit thread

AC distribution system revealed – Part 2

by asdf14396

In the previous post, I talked about how the distribution system in Anniversary Crystal came to be, and how it was developed. In the meantime, yet another Pokémon has been distributed. And so, the time has come to talk about…

Part 2: The data

For every Pokémon we intended to distribute, we had to determine the exact data it would contain. The distribution sets consist of sixteen Pokémon from our previous runs and four made-up ones; these were handled differently.

When it came to Pokémon from previous runs, our goal was clear: reconstruct the originals as closely as possible. This required some data diving, going through old saves, and reading the data off hexadecimal dumps. The distributed Pokémon aren’t exact copies of the originals, but they get as close as it was reasonable to achieve. Two instances where this imperfection is visible are experience points (any distributed Pokémon only has enough experience to reach the level they are at, while the originals could have made some progress towards the next level) and damage (Pokémon are distributed fully healed, with full HP, no status conditions, and full PP for all moves).

It is instructive at this point to describe the data that is stored in the distribution system. Every Pokémon is stored as a custom encrypted 42-byte data structure, from which the game rebuilds the full data for the Pokémon when the player enters a distribution code. The first 8 bytes are related to the passwords; the remaining 34 (before encryption) are as follows:

  • Species: 1 byte
  • Held item: 1 byte
  • Moves: 1 byte each (4 total)
  • OT ID: 2 bytes (big-endian)
  • DVs: 2 bytes (in the usual format)
  • Level: 1 byte
  • OT name: 11 bytes (terminated with $50)
  • Nickname: 11 bytes (terminated with $50)
  • Flags: 1 byte (usually zero, more on this later)

This structure explains the differences mentioned a few paragraphs above. In order to keep the system simpler, unnecessary data such as stats or experience points isn’t stored; this data is recalculated when the Pokémon is obtained. Stat experience is also missing, as it would amount to a lot of data that wasn’t practical to obtain; every Pokémon is distributed with zero stat experience, and thus somewhat lower stats than their original counterparts. The missing values are recalculated as follows: experience points are set to the minimum needed for the indicated level, stat experience is set to zero, stats are recalculated on the spot (as it happens when you withdraw a Pokémon from a box), move PPs are calculated as well with PP Ups set to zero, encounter data is set to indicate that the Pokémon was received in a trade and caught at an unknown time and location (as a matter of fact, it is wholly zeroed out), Pokérus data is cleared (i.e., set to the default value of “has never been infected”), and happiness is set to zero. Most of the values set to zero aren’t explicitly set — the distribution code simply zeroes out the whole party slot before regenerating the distributed Pokémon.

We intentionally kept the number of made-up sets low. While making sets may be fun, and the code can support up to 255 distributed Pokémon, the main focus of the system was distributing Pokémon from old TPP runs. That being said, we made three sets in recognition of staff members (namely Koolboyman, PikalaxALT, and our original Streamer). We only chose the OT ID and OT name for those sets (and the DVs for the shiny Slowpoke); the sets themselves were made by the respective staff members.

There is an additional made-up set, and that’s the one for Phancero, which was distributed both as a reference within Prism and through a regular code post. Since this would be the only way to obtain a Phancero in the game (other than trading from Prism, but we already knew that would be a pipe dream; it will happen some day, but not soon), and all distribution Pokémon have fixed DVs, we chose perfect DVs for it; shiny DVs were another option, but obtaining a shiny Phancero is already easy in Prism (all you need is a Shiny Ball). We also wanted the player to actually be able to own the Pokémon, as if they had caught it — and thus the “flags” field was born. The flags field in the distribution data is set to 0 for all other Pokémon, but to 1 for Phancero; setting it to 1 causes the game to set the OT name and ID to the current player’s, and to ask the player to enter a nickname. The three corresponding fields in the data are thus empty, since their values aren’t meaningful.

When it came to sets from our previous runs, the hardest part was gathering the data. In most cases, the data was available directly from savefiles; however, this was not always the case. The parties for previous runs’ trainers contained within AC itself already had the correct DVs, so that made the search effort a lot simpler, since for those Pokémon we could take the DVs from the source code we already had and the rest of the data from twitchplayspokemon.org (which fortunately shows the ID for the player character, the only “obscure” part of the data). Not everything was so easy, though — for instance, reconstructing DUX‘s OT data proved problematic. We had successfully dumped the rest of the data, but the OT name seemed to be invalid! It turns out that, in generation 1, all in-game trades have their OT stored as a single byte, $5D, which is shown as TRAINER in game; Bulbapedia thankfully explained this.

A particularly complicated case was M4. In my last post, I posted the original data for M4. I manually recovered that data from the final Emerald save by looking at a hexadecimal dump and a description of the data structures, and writing down the values; that’s why the text file looks like messy scribbled-down notes (which is what they are). But there’s a fundamental problem with that data: the IVs are in newer-gen format, in which all IVs are independent and range from 0 to 31. Those IVs had to be converted to generation 2 format, in which there are only four DVs ranging from 0 to 15: the special attack and special defense DVs are shared, and the HP DV is calculated from the other four. Converting DV values to IV values is as simple as doubling them; the reverse conversion isn’t as lossless. M4’s original IVs were HP:11, atk:23, def:19, spe:24, spA:19, spD:26. First step, halve the values: HP:5½, atk:11½, def:9½, spe:12, spA:9½, spD:13. Now, since there’s a single special DV, average them: 11¼. Finally, rounding. Ideally, HP would have to be 5 or 6; for this, we’d need the attack DV to be even, the defense DV to be odd, and the remaining two to have different parities (odd speed gives 6, odd special gives 5). Therefore, the attack and defense DVs were rounded to 12 and 9 respectively; since the speed DV was already even and the special DV was closer to an odd value, the special DV was rounded to 11. This way we get the DVs that were finally recorded for M4 in the distribution data: 12 attack, 9 defense, 12 speed, 11 special, giving 5 HP. This would represent generation 3 IVs of 10/24/18/24/22/22, which is as close as we could get to the original 11/23/19/24/19/26. (Note that speed is given after defense because this is the order in which stats appear internally in every single game.)

And that’s all for now. Of course, feel free to ask any questions. In the next post, I’ll finally reveal how the data is encrypted and encoded, and how everything works. I might even post some code, for those who can read it. For now, I’ll leave you with a screenshot of the data file that is parsed and built into the distribution data that goes into the ROM:

Screenshot (with the undistributed Pokémon censored)

(and yes, when the source code is released, you should be able to edit this file to generate your own codes)

Original Reddit thread (some links may no longer work)

Anniversary Crystal Distro Card 17: DUX

fu×&’du×

DUX! Our lovely bird is the final Pokemon to make the CUT to be distributed via digital codes for TPP AC. Once lost years ago to the PC, DUX is back and ready to join you along your own journey.

Note, that the x symbol in the code above is the multiplication symbol, not a Roman Alphabet ‘x’, and the 'd is one character.

Also, be on the lookout for asdf14396’s series of posts explaining the ins and outs of how we got this whole setup working – it’s quite interesting!

As with the other distribution codes, all of our distributed Pokemon from previous runs have been brought into Anniversary Crystal using their original data as they were last seen in their respective games. To acquire the Pokemon in your own game, please see the PCC located in Goldenrod City after you have acquired 16 GYM Badges!

Scroll to top