asdf14396

AC distribution system revealed – Part 3

by asdf14396

As the time to release the repository comes close, it’s time to finally put this series to an end. It’s been over a week since I posted part 2, and it seems like a good way to let people have a try at cracking the data before we just release it all for the world to see.

In the previous post, I talked about the data stored in the distribution system, but I intentionally avoided discussing some part of it, which I only said was “related to passwords”. And so, it’s finally time to reveal…

Part 3: The algorithm

This post will be highly technical, so I’m going to try to explain some concepts initially to help people understand it better. In order to explain how the encryption worked, it may be worthwhile to quickly go over some cryptography basics.

An encryption algorithm consists on applying a reversible transformation to some data using a certain encryption key. By “reversible transformation” we mean any kind of process that can be undone to recover the original data, although hopefully it will be one that will be hard to undo without the key. Undoing this process is of course called decryption, which applies the inverse transformation using a decryption key. Note that, while the encryption key and the decryption key will necessarily be linked to each other, they don’t need to be the same. Algorithms where the link between both keys is intentionally hard to reconstruct, making it next to impossible to calculate one key from the other, are called asymmetrical, and they are the basis of what is nowadays known as public key cryptography. On the other hand, algorithms where the keys are either equal or trivial to calculate from one another are called symmetrical. The algorithm used here is a symmetrical one.

While encryption uses the key to transform the data, it doesn’t actually store the key with the data. Keeping the encryption key away from the data is a fundamental step in protecting the data (unless you’re dealing with public key encryption). Also, note that it’s (usually) possible to decrypt data with the wrong key — of course, you’ll get the wrong data if you do this, but there is no indication of this happening. Therefore, validating the data must be a separate step.

Considering all of these matters, the straightforward implementation would be to store the passwords in the ROM along with the data for each distributed Pokémon (encrypted with some key), and also store some checksum to verify that the data has been successfully decrypted. The structure I mentioned in the previous post would allow this, since there are 8 bytes reserved for the password system before each Pokémon, and passwords are at most 8 characters long.

Of course, this is not how it actually works. If we had done this, we would have needed to store the key in the ROM in order to be able to decrypt the distribution data. Someone could have harvested this key by debugging the ROM and decrypted all of the data, and all 20 distribution Pokémon would have been revealed early. Instead, the passwords are used as encryption keys, and not stored at all in the ROM. (I wasn’t lying when I said this!) The 8 bytes we reserved for “something related to passwords” are actually used for validation.

The algorithm basically works like this: when the distribution data is generated, those 8 bytes are filled with an identical byte. It doesn’t matter which byte (it is chosen at random), but all 8 bytes are filled with the same value. (Unused areas in the data, which are caused by names being shorter than the maximum length allowed, are also filled with random data (to add more noise to the encrypted output), but that data is actually random and not used for any purpose.) When the user enters a distribution code, the game attempts to decrypt each of the Pokémon using the corresponding password. If decryption results in the first 8 bytes being identical, then the code is considered valid and the remaining 34 bytes are used as the distributed Pokémon’s data; if those 8 bytes aren’t identical, decryption fails and the game tries the next entry in the dataset, until one entry succeeds (in which case a Pokémon is awarded) or all of them fail (in which case the code is considered invalid and the player is informed of this).

The code is therefore used to generate the key: since only the correct key will produce the original data (with 8 identical bytes in the beginning), and we can reasonably assume that incorrect codes will generate random-looking data that will not match this pattern, it becomes possible to store only the encrypted data and a validation header without storing the decryption key at all. Note that I said “used to generate the key”, not “used as key”: the characters that the user can enter come with the rather undesirable property of always having the upper bit set, other than the space ($7f) and terminator ($50) characters, so the upper bit is stripped from all characters, converting them to 7-bit values (the space and terminator characters are respectively converted to $bf and $cf before stripping the upper bit, since they would otherwise be indistinguishable from $ff and $c0 — and the former also represents the character 9); the entire code is therefore turned into a 7-byte value. (Terminator characters only appear if the actual code is shorter than 8 characters long, filling up the unused space.) Finally, a fixed 5-byte string is appended to this value to generate the 12-byte key; for rather obvious reasons, OLDEN was chosen as this fixed string.

The actual algorithm used to encrypt the data isn’t terribly interesting; it is mostly a series of XORs, additions, subtractions and permutations, good enough to mix and shuffle the data, ensuring that invalid codes wouldn’t result in valid outputs. (I’ll leave it as an exercise for the reader to find out if there are any additional codes that happen to arise from coincidence, i.e., from some key accidentally generating a correct validation header for one of the Pokémon in the dataset.) The current version of the actual function that does the encryption (which takes as arguments the 42-byte data structure, the 7-byte key and a 42-byte buffer for the result) is this one:

void encrypt (const unsigned char * data, const unsigned char * key, unsigned char * encrypted) {
  unsigned char i, j, k, tp;
  unsigned char width, shift;
  unsigned char temp_data[42];
  const unsigned char fixed_string[] = {0x8e, 0x8b, 0x83, 0x84, 0x8d}; // "OLDEN"
  memcpy(encrypted, data, 42);
  for (i = 41; i < 42; i --) encrypted[i] ^= encrypted[(i < 21) ? (41 - i) : (i - 21)] ^ key[i % 7] ^ fixed_string[i % 5];
  for (i = 6; i < 7; i --) {
    width = (key[i] & 15) + 2;
    shift = (key[i] >> 4) + 1;
    k = 0;
    for (j = 0; j < width; j ++) for (tp = 0; (j + tp) < 42; tp += width) temp_data[j + tp] = encrypted[k ++];
    memcpy(encrypted, temp_data + (42 - shift), shift);
    memcpy(encrypted + shift, temp_data, 42 - shift);
  }
  memcpy(temp_data, key, 7);
  memcpy(temp_data + 7, fixed_string, 5);
  for (i = 0; i < 42; i ++) encrypted[i] += temp_data[(i + 6) % 12] - temp_data[i % 12];
}

There isn’t much left to say at this point; the secret is now revealed. I can only wonder what our local hackers and data miners could have done with this information back in the day; I know that some people tried to find the codes, but as this post should show, those efforts were misguided, as the codes themselves aren’t stored in the ROM at all. Feel free to ask any questions you might have.

I mentioned that the code above is the “current version” of the function; I’ve been recently making changes to the repository prior to its public release so it would be easier to use and understand. I’ll leave you for now with the original version of the distribution builder, which parses the text file shown in the previous post and generates the distribution.bin file with all of the data already encrypted and ready for inclusion in the ROM:

Original autogen.c program

And a big shoutout to Pigu for coming up with this clever system.

Original Reddit thread

AC distribution system revealed – Part 2

by asdf14396

In the previous post, I talked about how the distribution system in Anniversary Crystal came to be, and how it was developed. In the meantime, yet another Pokémon has been distributed. And so, the time has come to talk about…

Part 2: The data

For every Pokémon we intended to distribute, we had to determine the exact data it would contain. The distribution sets consist of sixteen Pokémon from our previous runs and four made-up ones; these were handled differently.

When it came to Pokémon from previous runs, our goal was clear: reconstruct the originals as closely as possible. This required some data diving, going through old saves, and reading the data off hexadecimal dumps. The distributed Pokémon aren’t exact copies of the originals, but they get as close as it was reasonable to achieve. Two instances where this imperfection is visible are experience points (any distributed Pokémon only has enough experience to reach the level they are at, while the originals could have made some progress towards the next level) and damage (Pokémon are distributed fully healed, with full HP, no status conditions, and full PP for all moves).

It is instructive at this point to describe the data that is stored in the distribution system. Every Pokémon is stored as a custom encrypted 42-byte data structure, from which the game rebuilds the full data for the Pokémon when the player enters a distribution code. The first 8 bytes are related to the passwords; the remaining 34 (before encryption) are as follows:

  • Species: 1 byte
  • Held item: 1 byte
  • Moves: 1 byte each (4 total)
  • OT ID: 2 bytes (big-endian)
  • DVs: 2 bytes (in the usual format)
  • Level: 1 byte
  • OT name: 11 bytes (terminated with $50)
  • Nickname: 11 bytes (terminated with $50)
  • Flags: 1 byte (usually zero, more on this later)

This structure explains the differences mentioned a few paragraphs above. In order to keep the system simpler, unnecessary data such as stats or experience points isn’t stored; this data is recalculated when the Pokémon is obtained. Stat experience is also missing, as it would amount to a lot of data that wasn’t practical to obtain; every Pokémon is distributed with zero stat experience, and thus somewhat lower stats than their original counterparts. The missing values are recalculated as follows: experience points are set to the minimum needed for the indicated level, stat experience is set to zero, stats are recalculated on the spot (as it happens when you withdraw a Pokémon from a box), move PPs are calculated as well with PP Ups set to zero, encounter data is set to indicate that the Pokémon was received in a trade and caught at an unknown time and location (as a matter of fact, it is wholly zeroed out), Pokérus data is cleared (i.e., set to the default value of “has never been infected”), and happiness is set to zero. Most of the values set to zero aren’t explicitly set — the distribution code simply zeroes out the whole party slot before regenerating the distributed Pokémon.

We intentionally kept the number of made-up sets low. While making sets may be fun, and the code can support up to 255 distributed Pokémon, the main focus of the system was distributing Pokémon from old TPP runs. That being said, we made three sets in recognition of staff members (namely Koolboyman, PikalaxALT, and our original Streamer). We only chose the OT ID and OT name for those sets (and the DVs for the shiny Slowpoke); the sets themselves were made by the respective staff members.

There is an additional made-up set, and that’s the one for Phancero, which was distributed both as a reference within Prism and through a regular code post. Since this would be the only way to obtain a Phancero in the game (other than trading from Prism, but we already knew that would be a pipe dream; it will happen some day, but not soon), and all distribution Pokémon have fixed DVs, we chose perfect DVs for it; shiny DVs were another option, but obtaining a shiny Phancero is already easy in Prism (all you need is a Shiny Ball). We also wanted the player to actually be able to own the Pokémon, as if they had caught it — and thus the “flags” field was born. The flags field in the distribution data is set to 0 for all other Pokémon, but to 1 for Phancero; setting it to 1 causes the game to set the OT name and ID to the current player’s, and to ask the player to enter a nickname. The three corresponding fields in the data are thus empty, since their values aren’t meaningful.

When it came to sets from our previous runs, the hardest part was gathering the data. In most cases, the data was available directly from savefiles; however, this was not always the case. The parties for previous runs’ trainers contained within AC itself already had the correct DVs, so that made the search effort a lot simpler, since for those Pokémon we could take the DVs from the source code we already had and the rest of the data from twitchplayspokemon.org (which fortunately shows the ID for the player character, the only “obscure” part of the data). Not everything was so easy, though — for instance, reconstructing DUX‘s OT data proved problematic. We had successfully dumped the rest of the data, but the OT name seemed to be invalid! It turns out that, in generation 1, all in-game trades have their OT stored as a single byte, $5D, which is shown as TRAINER in game; Bulbapedia thankfully explained this.

A particularly complicated case was M4. In my last post, I posted the original data for M4. I manually recovered that data from the final Emerald save by looking at a hexadecimal dump and a description of the data structures, and writing down the values; that’s why the text file looks like messy scribbled-down notes (which is what they are). But there’s a fundamental problem with that data: the IVs are in newer-gen format, in which all IVs are independent and range from 0 to 31. Those IVs had to be converted to generation 2 format, in which there are only four DVs ranging from 0 to 15: the special attack and special defense DVs are shared, and the HP DV is calculated from the other four. Converting DV values to IV values is as simple as doubling them; the reverse conversion isn’t as lossless. M4’s original IVs were HP:11, atk:23, def:19, spe:24, spA:19, spD:26. First step, halve the values: HP:5½, atk:11½, def:9½, spe:12, spA:9½, spD:13. Now, since there’s a single special DV, average them: 11¼. Finally, rounding. Ideally, HP would have to be 5 or 6; for this, we’d need the attack DV to be even, the defense DV to be odd, and the remaining two to have different parities (odd speed gives 6, odd special gives 5). Therefore, the attack and defense DVs were rounded to 12 and 9 respectively; since the speed DV was already even and the special DV was closer to an odd value, the special DV was rounded to 11. This way we get the DVs that were finally recorded for M4 in the distribution data: 12 attack, 9 defense, 12 speed, 11 special, giving 5 HP. This would represent generation 3 IVs of 10/24/18/24/22/22, which is as close as we could get to the original 11/23/19/24/19/26. (Note that speed is given after defense because this is the order in which stats appear internally in every single game.)

And that’s all for now. Of course, feel free to ask any questions. In the next post, I’ll finally reveal how the data is encrypted and encoded, and how everything works. I might even post some code, for those who can read it. For now, I’ll leave you with a screenshot of the data file that is parsed and built into the distribution data that goes into the ROM:

Screenshot (with the undistributed Pokémon censored)

(and yes, when the source code is released, you should be able to edit this file to generate your own codes)

Original Reddit thread (some links may no longer work)

AC distribution system revealed – Part 1

by asdf14396

About two years ago, I promised that, once distribution was finished, I would make a lengthy post explaining how it all worked. Up until now, it has been one of those well-kept secrets that somehow never leaked, most likely since there aren’t many people that actually understand the system, despite the data is plainly visible to anyone with access to the source code.

Distribution isn’t finished yet, but it should be done soon. And thus, the time has come to lift the veil on a system that has remained, to my knowledge, uncracked so far. There’s a lot of information to share, so I’m making this a three-part series.

Part 1: The history

I don’t remember where the idea came from. Logs are completely lost, and thus I’m simply going from memory here. But a plan came up to distribute Pokémon after the game was released, as a DLC lookalike. Of course, the initial challenge was determining the list of Pokémon that we would distribute.

And here came the first problem. A quick calculation showed that actually distributing Pokémon was impossible. If we wanted to distribute a Pokémon, we would have to pack the data into a code — and even the shortest encoding we could come up with was 40 characters long, including checksums and validations.

So, the only way we could distribute Pokémon would be by encoding them all into the ROM at the time of release. The idea we came up with was that we’d store a code with each Pokémon, and thus we’d distribute the codes when we were ready to do so. That would put a limit on the number of Pokémon we could distribute, but we didn’t plan on doing it forever anyway.

We went back and forth on the number of Pokémon that we wanted to distribute as we filled up the list, but we eventually settled for 20. Most of the debate at that point was over which Pokémon we’d distribute — we gathered representative Pokémon from generation 1 and 2 runs, a few Pokémon we created honoring some staff members, and a few others (M4 comes to mind). Gathering the data was quite a process in and of itself — in some cases we had to go through old saves and the like to retrieve the exact data for each Pokémon. For instance, the released Pokémon from our Red run come from a collection of dumped saves, which we had to analyze one by one until we found one that contained the relevant information.

At some point while we were filling up the list, someone suggested bringing back Phancero. Phancero had been added to the codebase sometime during development, but it had been scrapped at the last minute before releasing the game to the stream; the only thing that remained at that point was its cry. It seemed like a shame to let it die, considering it had been designed full with sprite, base data, moveset and so on, so we came up with the idea of bringing it back in through the distribution system. Of course, this created a problem — we had to hide it. And we did, by not listing it, not talking about it, and only adding it to the latest version of the ROM, the one that actually contained the distribution system itself — I even removed it from my data dumps. Here’s a version with the data unmodified:

Phancero became pretty much AC’s Mew in this regard, and just like Mew, it managed to remain hidden, in this case until we released the code through Prism.

Of course, we had one last hurdle to cross, one that most people already know about. If we had just added the data to the ROM unchanged, anyone would have found the full list within days; we have plenty of tinkerers in the community, and we had even more back then. Many people tried to do just that as soon as the distribution system was announced, proving our point. And so, Pigu came up with an extremely clever system that allowed us to hide the list in a way it would be extremely hard to find and crack without disassembling the ROM. He wrote a Python script that took the raw data bytes and encoded them into this form, which was the way we first generated the distribution data file that would be included in the ROM. This script had one major flaw, though: it required encoding all of the data manually into raw hexadecimal bytes before passing it through the program, thus creating accidental encoding errors. We found more than a handful of those errors when testing the system, caused by our own mistakes when transforming the data into hexadecimal; to put a final stop to that, I wrote a C program that would parse a plain text file containing the data in human-readable form and generate the distribution data file. As part of this process, I had two options: either use pigu’s Python script to encrypt the data, or rewrite it in C. As I don’t know Python, I went for the latter choice; this took a bit of trial and error, but eventually (and with his help) I managed to produce output that matched pigu’s encryption. This program, as well as the data itself, still lives in the actual source code repository (i.e., not the public copy); this is the reason why the actual repository is still private (as the distribution data is extremely easy to find and read).

And that is the history of the AC distribution system. In the next post, I’ll talk about the data involved: what data is stored, what isn’t, what we gathered, and so on. Coming across a particular data file was what inspired me to make this series in the first place: that file is M4’s raw data from the Emerald save, which I’ll leave you as a sneak peek of what’s to come:

M4’s original data

(shoutout to M4 for obvious reasons, and to LightningXCE for general help related to this post series)

Original Reddit thread (some old file download links may no longer function)

Report prism bugs

EDIT: This is now outdated. Please use our new form to report bugs.

Anniversary Crystal data dump

Originally posted by asdf14396

Since lots of things have been changed for this hack (wild Pokemon, trainers, and so on), we’ve decided to build some documentation on things like wild encounters, learnsets and the like, since the information available in other sources (e.g., Buibapedia) won’t always apply here.

That documentation will eventually be available for interactive browsing (on the same website as the ROM patches, where right now there is a download link to a .zip file instead). However, the data is also available as a raw JSON file, in case someone wants to do something with the data in machine-parsable form (perhaps even make your own website to show the data, who knows).

EDIT: latest version 2016-05-14 23:50:02

I’ll keep the JSON updated if anything changes, so make sure to check the version in the JSON itself (which is simply a UTC timestamp) to ensure you have the latest version. (Also, fair warning, it’s 8 MB.)

If you have any questions about the dataset, ask ahead and I’ll answer to the best of my capacity. Enjoy!

Original Reddit thread (old links no longer work)

Scroll to top