Download

AC distribution system revealed – Part 2

by asdf14396

In the previous post, I talked about how the distribution system in Anniversary Crystal came to be, and how it was developed. In the meantime, yet another Pokémon has been distributed. And so, the time has come to talk about…

Part 2: The data

For every Pokémon we intended to distribute, we had to determine the exact data it would contain. The distribution sets consist of sixteen Pokémon from our previous runs and four made-up ones; these were handled differently.

When it came to Pokémon from previous runs, our goal was clear: reconstruct the originals as closely as possible. This required some data diving, going through old saves, and reading the data off hexadecimal dumps. The distributed Pokémon aren’t exact copies of the originals, but they get as close as it was reasonable to achieve. Two instances where this imperfection is visible are experience points (any distributed Pokémon only has enough experience to reach the level they are at, while the originals could have made some progress towards the next level) and damage (Pokémon are distributed fully healed, with full HP, no status conditions, and full PP for all moves).

It is instructive at this point to describe the data that is stored in the distribution system. Every Pokémon is stored as a custom encrypted 42-byte data structure, from which the game rebuilds the full data for the Pokémon when the player enters a distribution code. The first 8 bytes are related to the passwords; the remaining 34 (before encryption) are as follows:

  • Species: 1 byte
  • Held item: 1 byte
  • Moves: 1 byte each (4 total)
  • OT ID: 2 bytes (big-endian)
  • DVs: 2 bytes (in the usual format)
  • Level: 1 byte
  • OT name: 11 bytes (terminated with $50)
  • Nickname: 11 bytes (terminated with $50)
  • Flags: 1 byte (usually zero, more on this later)

This structure explains the differences mentioned a few paragraphs above. In order to keep the system simpler, unnecessary data such as stats or experience points isn’t stored; this data is recalculated when the Pokémon is obtained. Stat experience is also missing, as it would amount to a lot of data that wasn’t practical to obtain; every Pokémon is distributed with zero stat experience, and thus somewhat lower stats than their original counterparts. The missing values are recalculated as follows: experience points are set to the minimum needed for the indicated level, stat experience is set to zero, stats are recalculated on the spot (as it happens when you withdraw a Pokémon from a box), move PPs are calculated as well with PP Ups set to zero, encounter data is set to indicate that the Pokémon was received in a trade and caught at an unknown time and location (as a matter of fact, it is wholly zeroed out), Pokérus data is cleared (i.e., set to the default value of “has never been infected”), and happiness is set to zero. Most of the values set to zero aren’t explicitly set — the distribution code simply zeroes out the whole party slot before regenerating the distributed Pokémon.

We intentionally kept the number of made-up sets low. While making sets may be fun, and the code can support up to 255 distributed Pokémon, the main focus of the system was distributing Pokémon from old TPP runs. That being said, we made three sets in recognition of staff members (namely Koolboyman, PikalaxALT, and our original Streamer). We only chose the OT ID and OT name for those sets (and the DVs for the shiny Slowpoke); the sets themselves were made by the respective staff members.

There is an additional made-up set, and that’s the one for Phancero, which was distributed both as a reference within Prism and through a regular code post. Since this would be the only way to obtain a Phancero in the game (other than trading from Prism, but we already knew that would be a pipe dream; it will happen some day, but not soon), and all distribution Pokémon have fixed DVs, we chose perfect DVs for it; shiny DVs were another option, but obtaining a shiny Phancero is already easy in Prism (all you need is a Shiny Ball). We also wanted the player to actually be able to own the Pokémon, as if they had caught it — and thus the “flags” field was born. The flags field in the distribution data is set to 0 for all other Pokémon, but to 1 for Phancero; setting it to 1 causes the game to set the OT name and ID to the current player’s, and to ask the player to enter a nickname. The three corresponding fields in the data are thus empty, since their values aren’t meaningful.

When it came to sets from our previous runs, the hardest part was gathering the data. In most cases, the data was available directly from savefiles; however, this was not always the case. The parties for previous runs’ trainers contained within AC itself already had the correct DVs, so that made the search effort a lot simpler, since for those Pokémon we could take the DVs from the source code we already had and the rest of the data from twitchplayspokemon.org (which fortunately shows the ID for the player character, the only “obscure” part of the data). Not everything was so easy, though — for instance, reconstructing DUX‘s OT data proved problematic. We had successfully dumped the rest of the data, but the OT name seemed to be invalid! It turns out that, in generation 1, all in-game trades have their OT stored as a single byte, $5D, which is shown as TRAINER in game; Bulbapedia thankfully explained this.

A particularly complicated case was M4. In my last post, I posted the original data for M4. I manually recovered that data from the final Emerald save by looking at a hexadecimal dump and a description of the data structures, and writing down the values; that’s why the text file looks like messy scribbled-down notes (which is what they are). But there’s a fundamental problem with that data: the IVs are in newer-gen format, in which all IVs are independent and range from 0 to 31. Those IVs had to be converted to generation 2 format, in which there are only four DVs ranging from 0 to 15: the special attack and special defense DVs are shared, and the HP DV is calculated from the other four. Converting DV values to IV values is as simple as doubling them; the reverse conversion isn’t as lossless. M4’s original IVs were HP:11, atk:23, def:19, spe:24, spA:19, spD:26. First step, halve the values: HP:5½, atk:11½, def:9½, spe:12, spA:9½, spD:13. Now, since there’s a single special DV, average them: 11¼. Finally, rounding. Ideally, HP would have to be 5 or 6; for this, we’d need the attack DV to be even, the defense DV to be odd, and the remaining two to have different parities (odd speed gives 6, odd special gives 5). Therefore, the attack and defense DVs were rounded to 12 and 9 respectively; since the speed DV was already even and the special DV was closer to an odd value, the special DV was rounded to 11. This way we get the DVs that were finally recorded for M4 in the distribution data: 12 attack, 9 defense, 12 speed, 11 special, giving 5 HP. This would represent generation 3 IVs of 10/24/18/24/22/22, which is as close as we could get to the original 11/23/19/24/19/26. (Note that speed is given after defense because this is the order in which stats appear internally in every single game.)

And that’s all for now. Of course, feel free to ask any questions. In the next post, I’ll finally reveal how the data is encrypted and encoded, and how everything works. I might even post some code, for those who can read it. For now, I’ll leave you with a screenshot of the data file that is parsed and built into the distribution data that goes into the ROM:

Screenshot (with the undistributed Pokémon censored)

(and yes, when the source code is released, you should be able to edit this file to generate your own codes)

Original Reddit thread (some links may no longer work)

AC distribution system revealed – Part 1

by asdf14396

About two years ago, I promised that, once distribution was finished, I would make a lengthy post explaining how it all worked. Up until now, it has been one of those well-kept secrets that somehow never leaked, most likely since there aren’t many people that actually understand the system, despite the data is plainly visible to anyone with access to the source code.

Distribution isn’t finished yet, but it should be done soon. And thus, the time has come to lift the veil on a system that has remained, to my knowledge, uncracked so far. There’s a lot of information to share, so I’m making this a three-part series.

Part 1: The history

I don’t remember where the idea came from. Logs are completely lost, and thus I’m simply going from memory here. But a plan came up to distribute Pokémon after the game was released, as a DLC lookalike. Of course, the initial challenge was determining the list of Pokémon that we would distribute.

And here came the first problem. A quick calculation showed that actually distributing Pokémon was impossible. If we wanted to distribute a Pokémon, we would have to pack the data into a code — and even the shortest encoding we could come up with was 40 characters long, including checksums and validations.

So, the only way we could distribute Pokémon would be by encoding them all into the ROM at the time of release. The idea we came up with was that we’d store a code with each Pokémon, and thus we’d distribute the codes when we were ready to do so. That would put a limit on the number of Pokémon we could distribute, but we didn’t plan on doing it forever anyway.

We went back and forth on the number of Pokémon that we wanted to distribute as we filled up the list, but we eventually settled for 20. Most of the debate at that point was over which Pokémon we’d distribute — we gathered representative Pokémon from generation 1 and 2 runs, a few Pokémon we created honoring some staff members, and a few others (M4 comes to mind). Gathering the data was quite a process in and of itself — in some cases we had to go through old saves and the like to retrieve the exact data for each Pokémon. For instance, the released Pokémon from our Red run come from a collection of dumped saves, which we had to analyze one by one until we found one that contained the relevant information.

At some point while we were filling up the list, someone suggested bringing back Phancero. Phancero had been added to the codebase sometime during development, but it had been scrapped at the last minute before releasing the game to the stream; the only thing that remained at that point was its cry. It seemed like a shame to let it die, considering it had been designed full with sprite, base data, moveset and so on, so we came up with the idea of bringing it back in through the distribution system. Of course, this created a problem — we had to hide it. And we did, by not listing it, not talking about it, and only adding it to the latest version of the ROM, the one that actually contained the distribution system itself — I even removed it from my data dumps. Here’s a version with the data unmodified:

Phancero became pretty much AC’s Mew in this regard, and just like Mew, it managed to remain hidden, in this case until we released the code through Prism.

Of course, we had one last hurdle to cross, one that most people already know about. If we had just added the data to the ROM unchanged, anyone would have found the full list within days; we have plenty of tinkerers in the community, and we had even more back then. Many people tried to do just that as soon as the distribution system was announced, proving our point. And so, Pigu came up with an extremely clever system that allowed us to hide the list in a way it would be extremely hard to find and crack without disassembling the ROM. He wrote a Python script that took the raw data bytes and encoded them into this form, which was the way we first generated the distribution data file that would be included in the ROM. This script had one major flaw, though: it required encoding all of the data manually into raw hexadecimal bytes before passing it through the program, thus creating accidental encoding errors. We found more than a handful of those errors when testing the system, caused by our own mistakes when transforming the data into hexadecimal; to put a final stop to that, I wrote a C program that would parse a plain text file containing the data in human-readable form and generate the distribution data file. As part of this process, I had two options: either use pigu’s Python script to encrypt the data, or rewrite it in C. As I don’t know Python, I went for the latter choice; this took a bit of trial and error, but eventually (and with his help) I managed to produce output that matched pigu’s encryption. This program, as well as the data itself, still lives in the actual source code repository (i.e., not the public copy); this is the reason why the actual repository is still private (as the distribution data is extremely easy to find and read).

And that is the history of the AC distribution system. In the next post, I’ll talk about the data involved: what data is stored, what isn’t, what we gathered, and so on. Coming across a particular data file was what inspired me to make this series in the first place: that file is M4’s raw data from the Emerald save, which I’ll leave you as a sneak peek of what’s to come:

M4’s original data

(shoutout to M4 for obvious reasons, and to LightningXCE for general help related to this post series)

Original Reddit thread (some old file download links may no longer function)

Anniversary Crystal data dump

Originally posted by asdf14396

Since lots of things have been changed for this hack (wild Pokemon, trainers, and so on), we’ve decided to build some documentation on things like wild encounters, learnsets and the like, since the information available in other sources (e.g., Buibapedia) won’t always apply here.

That documentation will eventually be available for interactive browsing (on the same website as the ROM patches, where right now there is a download link to a .zip file instead). However, the data is also available as a raw JSON file, in case someone wants to do something with the data in machine-parsable form (perhaps even make your own website to show the data, who knows).

EDIT: latest version 2016-05-14 23:50:02

I’ll keep the JSON updated if anything changes, so make sure to check the version in the JSON itself (which is simply a UTC timestamp) to ensure you have the latest version. (Also, fair warning, it’s 8 MB.)

If you have any questions about the dataset, ask ahead and I’ll answer to the best of my capacity. Enjoy!

Original Reddit thread (old links no longer work)

Scroll to top