Extracting: Railroad Tycoon

Hier könnt ihr euch selbst, eure Homepage, euren Entwicklerstammtisch, Termine oder eure Projekte vorstellen.
Bitte Präfixe benutzen. Das Präfix "[Projekt]" bewirkt die Aufnahme von Bildern aus den Beiträgen des Themenerstellers in den Showroom. Alle Bilder aus dem Thema Showroom erscheinen ebenfalls im Showroom auf der Frontpage. Es werden nur Bilder berücksichtigt, die entweder mit dem attachement- oder dem img-BBCode im Beitrag angezeigt werden.

Die Bildersammelfunktion muss manuell ausgeführt werden, die URL dazu und weitere Details zum Showroom sind hier zu finden.

This forum is primarily intended for German-language video game developers. Please don't post promotional information targeted at end users.
Alexander Kornrumpf
Beiträge: 1684
Registriert: 25.02.2009, 14:37

Extracting: Railroad Tycoon

Beitrag von Alexander Kornrumpf » 31.07.2019, 23:44

tl:dr; Wall of text about a very old game with no current relevance. And about this Forum. And yet you are here ...


Long story short, I have managed to convert some of the assset files that come with "Sid Meier's Railroad Tycoon Deluxe" into a modern file format (png) and have a pretty good idea of what the other files are about. In the end this is not the tale of a great feat I would have hoped it would become. My moderate success was not due to some sudden spark of genius but due to more sleepless nights than I care to admit in which I banged my head at the disassembly.

Over years I have revisited and left the problem again many times, Years with pauses of several months between shorts bursts of renewed interest, the latter often coinciding with updates in Krishty's "Extracting Ace Combat" Thread here.

I don't even know what I did wrong over all that time. One evening without consciously doing anything different than before, I "just knew" I was, finally, looking at the disassembly of the right function (Figure 1). I ported the assembly verbatim to C (think: uint16_t ax = 0x0;) , and I was done. Subsequent restructuring of the code in a more high-level style did not reveal any hidden secrets, at least not to me. Totally anticlimactic.


Code: Alles auswählen

_071B:0010 ; =============== S U B R O U T I N E =======================================
_071B:0010 ; Attributes: bp-based frame
_071B:0010 sub_56E0        proc far                ; CODE XREF: sub_1EE2+B2↑P
_071B:0010 arg_0           = dword ptr  6
_071B:0010 arg_4           = dword ptr  0Ah
_071B:0010 arg_8           = word ptr  0Eh
_071B:0010                 push    bp
_071B:0011                 mov     bp, sp
_071B:0013                 push    si
_071B:0014                 push    di
_071B:0015                 push    ds
_071B:0016                 push    es
_071B:0017                 push    bx
_071B:0018                 cld
_071B:0019                 mov     ax, [bp+arg_8]
_071B:001C                 cmp     ax, 1
_071B:001F                 ja      short loc_56F3
_071B:0021                 out     0A6h, al        ; Interrupt Controller #2, 8259A
_071B:0023 loc_56F3:                               ; CODE XREF: sub_56E0+F↑j
_071B:0023                 les     di, [bp+arg_4]
_071B:0026                 lds     si, [bp+arg_0]
_071B:0029                 mov     dx, 10h
_071B:002C                 lodsw
_071B:002D                 mov     bp, ax
_071B:002F loc_56FF:                               ; CODE XREF: sub_56E0+2C↓j
_071B:002F                                         ; sub_56E0+76↓j ...
_071B:002F                 shr     bp, 1
_071B:0031                 dec     dx
_071B:0032                 jnz     short loc_5709
_071B:0034                 lodsw
_071B:0035                 mov     bp, ax
_071B:0037                 mov     dl, 10h
_071B:0039 loc_5709:                               ; CODE XREF: sub_56E0+22↑j
_071B:0039                 jnb     short loc_570E
_071B:003B                 movsb
_071B:003C                 jmp     short loc_56FF
_071B:003E ; ---------------------------------------------------------------------------
Figure 1: The Disassembly looks like this. You knew that, of course, because all disassembly somewhat looks like this. That exactly is the problem with finding anything meaningful in it.

This post will probably make very little sense without the context of Krishty's thread as linked above. I suggest you read that thread first. It is the more interesting read anyway. I will reference that thread a lot, maybe to the point where it gets awkward. To avoid misunderstandings: Krishty and I never met. The number of PMs we exchanged over the years can be counted on one hand, I think. He does not know about this project until now and I do not know anything about him other that he did a cool thing and I wanted to do the same. I will go into my motivation some more now but part of me thinks, that is all hindsight rationalisation and basically this is about competition. "If Krishty can do it, I will be damned if I can't." Maybe I'm overthinking this.


"Sid Meier's Railroad Tycoon Deluxe" (what a title! let's call it RRT) was not the first videogame I played but the first one I owned. The first physical disc someone bought in a shop and gifted to me. The first disk that was inserted into the newly bought family computer to install a game, or probably any software, with little to no concept among the gathered users what that was: installing software. It was not a new game when I got it, I think the first release was 1990 - without the "Deluxe" back then - and this anecdote takes place ca. 1996. Many now classic gaming franchises were founded in that year, you can google for yourself, but the 1996 release surely most relevant for this story is "Sid Meier's Civilisation II", arguably marking the passage of a whole generation in the genre. I was late to the party.

Ten more years later, 2kgames made RRT available for free as part of the marketing campaign for "Sid Meier's Railroads!" an otherwise unremarkable game that does not run on Windowses newer than Windows XP. They bundled RRT with a Dosbox and a .pdf of the copy protection sheet that was so usual in these days. In preparation for this post, I checked and the link is now dead. There is no longer any sign on the 2K Homepage that "Railroads!", let alone RRT ever happened. So 2K seems to have abandoned RRT almost 30 years after its first release but RRT never again left me after downloading that full digital copy.

You can still get RRT at https://www.pcwelt.de/downloads/Sid-Mei ... 73103.html apparently, which seems legit, but I haven't tried it and you would proceed at your own risk.

As it is probably clear from the above, nostalgia is a main driver of what follows. Krishty speaks of "nostalgische Verstimmung" (an untranslatable play of words along the lines of "nostalgic moodiness" and the desire for conservation (here I won't even try to translate his exact words).

For me it is less about conservation and more about evolution. The interest in games, kindled by the one we discuss here, led to an interest in game development - who would have guessed, given were we are here - led quickly to the realisation that I am unable to produce even passable "programmers art". There are many reasons why I never produced a single game, however small, but this is the one reason which I would have put forward at age 14 or so. An age at which admitting any other reason would have been too much for a developing ego. One day around that time I realised that one of my games (not RRT) stored its assets as bitmaps which I could read. Since that day I had this fantasy that I could use the assets of an existing game to make a new one. Like I said, this never happened, but the idea somehow sticks with me until this day.

The second motivator, which I apparently also share with Krishty, is the urge to do something comparatively hard with no conceivable utility other than my own entertaiment just because it is possible. The analogy of a jigsaw-puzzle comes too mind. Simply knowing that it had to be possible to read these files because, of course, the original game did just that, helped me over many dead ends.

What next?
To manage expectations: Given how long it took to write down only this part, I am not sure I will ever complete the write-up. Given that it took me years to get to the point were I had something to write about, what are a few month for the write-up? If I get around to it you might eventually learn how I obtained Figure 2. For now you can find Part II here.


screen1.png (6.55 KiB) 616 mal betrachtet
Figure 2: Teaser. Here is something you won't see in the game. A tile masked "Err", probably for debugging purposes.

Beiträge: 6977
Registriert: 26.02.2009, 12:18
Benutzertext: state is the enemy

Re: Extracting: Railroad Tycoon

Beitrag von Krishty » 01.08.2019, 21:10

Krishty Krishty Krishty … cool stuff about RRT … Krishty Krishty Krishty ;)

Great work and very lovely tiles you mined there! I’m keen on details – I wrote decoders for some 90s image codecs (PackBits & Co.) and always like to learn about new ones. I didn’t bother to decode the disassembly, but it looks hand-written – no modern compiler would emit such dense encoding, and I doubt compilers in the 90s were any better.
One day around that time I realised that one of my games (not RRT) stored its assets as bitmaps which I could read. Since that day I had this fantasy that I could use the assets of an existing game to make a new one.
Well written … this probably got me on ZFX :-)
seziert Ace Combat, Driver, und S.T.A.L.K.E.R.   —   rendert Sterne

Alexander Kornrumpf
Beiträge: 1684
Registriert: 25.02.2009, 14:37

Re: Extracting: Railroad Tycoon

Beitrag von Alexander Kornrumpf » 02.08.2019, 00:03

Krishty hat geschrieben:
01.08.2019, 21:10
I didn’t bother to decode the disassembly, but it looks hand-written – no modern compiler would emit such dense encoding, and I doubt compilers in the 90s were any better.
Yes, I'm pretty sure the image decoding was directly written in ASM. The basic unit it operates on is a bit (not a byte) the way this works is to shift and then jump based on the carry flag. I am not aware of a way how this can be expressed in C directly. The algorithm is actually easier to express in ASM than it is in C I think.

In C, to my best knowledge I can do no better than this:

Code: Alles auswählen

cf = bp & 0x01;
bp = bp >> 1;
if (cf) 
where in ASM I have this:

Code: Alles auswählen

shr bp, 1
jb somewhere
I wasn't able to determine if compilers understand that these are equivalent but in any case the C version is a very indirect way of expressing this. Usually it is the other way round, I assume.

Beiträge: 6977
Registriert: 26.02.2009, 12:18
Benutzertext: state is the enemy

Re: Extracting: Railroad Tycoon

Beitrag von Krishty » 02.08.2019, 08:06

Yeah, no way you can do this in C. I learned it the hard way while trying to implement integer overflow checks – carry/overflow is totally out of reach for compilers if you are not using the according intrinsics (_mm_addcarry_u32() & Co.). Must be hand-written. Very cool trick, though it probably fails micro-op fusion on modern CPUs.
seziert Ace Combat, Driver, und S.T.A.L.K.E.R.   —   rendert Sterne

Alexander Kornrumpf
Beiträge: 1684
Registriert: 25.02.2009, 14:37

Re: Extracting: Railroad Tycoon

Beitrag von Alexander Kornrumpf » 15.08.2019, 23:04

tl:dr; Lengthy setting of the stage and revelling in nostalgia. Some minor technical points but mostly backstory.

Versions, Documentation, and other Quirks

In Part I, I abbreviated "Sid Meier's Railroad Tycoon Deluxe" to "RRT". This might cause some confusion to potential readers. Only after writing I learned, that among some fans "RRT" is reserved for the original game (without the "Deluxe") and the latter is commonly referred to as "RDX". But to me it always will be just "RRT" and this thread is only about the "Deluxe" version.

Sadly, I never played the original. I would love to, but chances of ever getting my hands on a copy are pretty small. Some consider the original version the better game and from what I read in retrospect some things that seemed to be improvements at the time might indeed have not aged as well as the original. In any case, when I first got the game I did not have this context and so to me it was the Railroad Tycoon, not a second version.

As a curious side-note I think the developers themselves were a bit conflicted whether the "Deluxe" version was actually an improvement. The manual, likely approved by some marketing department, has one version of the story to tell, but there is a readme1.doc (I wonder where the 1 comes from) with a slightly different phrasing. I think it is a fascinating document of its time and it probably doesn't hurt to cite it here in its entirety, ASCII-Art and all (Figure 3a). While we are at it, there is another, slightly more official looking readme file called readme.now. Among other things it contains an apology for using the wrong data types (Figure 3b).

Code: Alles auswählen

      ______             _   _  ___   __             __
 \/___|    |    | | |   | | | |  |   |_   /\  |\  | |__
 |     [][]|    |-| |   |-  |-   |   |   /__\ | \ |    |
 |         |    | | |   | \ | \  |   |  /    \|  \| ___|
/ oo--OOO-o

Release Notes:  Update for Railroad Tycoon Deluxe to version 1.01

Bug Fixes:

  1)    Save game files are now fixed.  The bug did not show up in
        our QA process.  For those of you who are versed in programming
        we forgot a parameter in a create call.  Since the create call
        was using a return address to fill in the blank it would only
        fail if your machine config was just right. Or wrong in this
        case. (I took time to explain this because someone out there
        was pretty vocal about how we could miss this.  Well now you

  2)    Center Key now works 'C'.

  3)    Roland sound bug fix for 486 33MHz and above.

   I thought some of you would like to know how we came to publish
Railroad Tycoon Deluxe.  Our Japanese development group released
Railroad Tycoon in November of 1991.  The game was a direct port to
the NEC 9801 from the IBM.  Our Japanese customers looked at the EGA
graphics and said "Uggh!" or the Japanese equivalent.  This motivated
us to rework the graphics system for a 640x400 16 color graphics
mode, (NEC Format.)

   In the process of reworking the graphics, we decided to add new
theaters and trains.  After we released the Japanese version of the
game we were asked if we could release this product for the US/UK. 
Because of the interest we had on the BBS's we decided to include it
in our Spring 1993 release.

We have concentrated on new worlds and economies and a sound and art
upgrade.  I have read your posts.  Please continue sending us your
comments and suggestions.  Understanding what you are looking for
will help us include appropriate suggestions in future products.

Thank You for purchasing
    Railroad Tycoon Deluxe

    Jeff Billings
Figure 3a: We listen to our fans: On BBS's!

Code: Alles auswählen

        We are unable to fix a money roll over problem that manifests
    in a couple of different ways.  Your cash may mysteriously become
    negative, or you may end up with spikes in your stock graph, or 
    if your Railroad net worth rolls over, you may be kicked out of
    office.  This is an indication that you have done so well as to
    exceed the design parameters of the game.  
        We were also unable to raise the restrictions on the number
    of stations and trains available without completely rewriting the
    entire game.  There is room in the future for a bigger, better,
    faster game in the Railroad Tycoon line and we hope you will be
    with us at that time for more enjoyment.

                      Thank-you for choosing RAILS DELUXE,

                            The RAILS DELUXE Team.
Figure 3b: Did these people learn nothing from Y2K? Oh ... [Aside: is it still safe to assume all readers are familar with the Y2K-Problem?]

Being a mixture of old code from 1990 (original) and slightly newer code from 1993 (deluxe), and given the huge leaps of technology in that time, it is interesting to see that parts of the game are fighting for every bit -- using half bytes instead of bytes at places, having hardcoded limits for things at convenient powers of two -- while others parts are incredibly wastefull -- fullscreen renders with very little value to gameplay, redundant content, advertisements for other, long forgotton games.

There is another reason back from 1996 for which I call the game "RRT" and which is strangely relevant to my endevours years later. Consider this gem from the game manual:
This assumes your machine runs under DOS 5.0 when it boots, which is true of most IBM and compatible machines with hard disks.
(1) Turn on your machine. If it is already on, exit all programs and return to the root directory with the “CD” (change directory) DOS command. For example, if your hard disk is C: then “CD C:\” does this.
(2) Set Speed: If you have a “turbo” or multi-speed computer, use your normal speed setting.
(3) Enter MPS directory: You can enter the MPS directory with the “CD” (change directory) DOS command. For example, if you are in your root directory C:, then “CD MPS” will take you to the MPS directory.
(4) Load Program: Type “RDX” and press return. The simulation will begin loading.
For some readers this will bring back fond memories, others will probably be to young to have any idea what it is about. Both groups can be helped by watching retro computing videos on youtube, here is one about the Turbo Button. For me, when first starting the game, these instructions were at the same time much needed as they were confusing. What wasn't on the Windows 95 start menu didn't exist for me back then and Windows 95 machines did not boot into "DOS 5.0" whatever that was.

When following these instructions the game would greet you with a about 90 seconds of credits which, unlike in modern games, didn't even help to hide loading times. Figure 4 shows a screenshot from DosBox. It could be slightly shortned, but not interrupted completely, by hitting <ESC> at just the right time, but it got boring quickly.

intro_010.png (8.33 KiB) 398 mal betrachtet
Figure 4: Animated Credits

Somehow I got my hands on an old manual for MS-DOS and learned that the RDX command, faithfully typed in per the instructions from the manual, was in fact not a command at all but rdx.bat and could be viewed in an editor. That revealed that two .exe were called, intro.exe and rrt.exe. Finding this out probably is the first thing I did that remotely resembled "hacking" in the traditional, technophile sense of the word.

From that day on, I always called rrt.exe directly and never had to watch the boring intro again. And therefore, for me the game is called RRT. Until ...

What Next?
intro.exe, disregarded for many years, loads the same media file formats as the full game but is much smaller (34 KB vs 263 KB) figuratively turning a 1000 pieces jigsaw-puzzle into one with 128 pieces. Anybody who has actually solved a jigsaw-puzzle knows that this is a huge difference. In the following I will finally describe finding the image decompression routine in intro.exe. But "Extracting: Intro.exe" would have made for a boring title.

Part III is online but it is not what I thought it would be.

Beiträge: 3907
Registriert: 26.02.2009, 00:44
Benutzertext: Lernt nur selten dazu
Echter Name: Thomas Ziegenhagen
Wohnort: Dresden

Re: Extracting: Railroad Tycoon

Beitrag von Schrompf » 16.08.2019, 11:03

Good story. Thanks for explaining!
Häuptling von Dreamworlds. Baut an was Neuem. Hilft nebenbei nur höchst selten an der Open Asset Import Library mit.

Alexander Kornrumpf
Beiträge: 1684
Registriert: 25.02.2009, 14:37

Re: Extracting: Railroad Tycoon

Beitrag von Alexander Kornrumpf » 21.08.2019, 21:11

Schrompf hat geschrieben:
16.08.2019, 11:03
Good story. Thanks for explaining!
Thanks for the kind words.

Alexander Kornrumpf
Beiträge: 1684
Registriert: 25.02.2009, 14:37

Re: Extracting: Railroad Tycoon

Beitrag von Alexander Kornrumpf » 22.08.2019, 00:06

tl:dr; Sid Meier's Railroad Tycoon Deluxe .pic files are LZSS-encoded. The rest of this post is pure meta.

[EDIT Note: I reworked this post, which is Part III, after writing, but before posting, Part IV]

QuickTime (tm?) won't Open my Files

To most laypersons "PDF" (the portable document format) ".pdf" (the filename extension commonly associated with it) and "Acrobat Reader" are probably vaguely the same thing. Does this make sense? You decide. Is it understandable, in terms of how Windows displays the underlying information? I think it certainly is. In that respect ".pdf" works completely different from let's say ".py" (python code). The ".py" filename extension tells us nothing about the actual file format (UTF-8? ANSI? UNIX-Lineendings? Tabs? Spaces?) and it is probably wise to have Windows associate the extension with something like notepad++ even though one does most of the actual editing in an IDE.

All of this is completely obvious once you stop and think about it, but it took me way to long to actually do stop and think. I grew up with the misconception that the filename extension is there to tell Windows with what application to open a file, whereas it is actually there to tell the user what, roughly, they might expect the content of the file to be. I think it is interesting how Windows doubles down on the misguiding interpretation by hiding the filename extension from the user, which, of course, is a completely terrible idea. So while I turn the filename extension display back on as the first thing whenever I use a new Windows installation, at the same time, unless I am very careful, I find my thoughts about these things being confused by the wrong intuition Windows has been feeding me for more than 20 years now.

The point being that I have no name for the file format which was used by RRT for its pictures. The files themselves do not tell. They do seem to start with with these four bytes 00 48 38 00 which might hint at a format indicator but that is impossible to google. I'm incredibly tempted, as a result of the confusion I just described, to call the format ".pic-file-format". I had to remind myself several times that this is nonsense, and fight my own confusion. The only thing that ".pic" really tells us, is that the file is probably a picture, which is correct.

Writing all of this down, a memory came back to me. Back in the day, several games (Myst, I think, but also others) used QuickTime which, when installed saw it fit to register itself as the application to at least one of the filename extensions which RRT used, I think it was said ".pic". I now remember the disappointment when QuickTime could not actually open the file. So in a way, I'm making good on that disappointment by decoding the files myself.

Non-Standard Experiences

Speaking of disappointment ... I opened this thread on a down note, very obvious in the initial post. I think we all know that feeling when you finish something and for some reason rather than euphoria you think more along the lines of "what? that was it?". A bit like leaving cinema after an not entirely satisfying movie. When wrtiting the next part I came to realise that more than being a project, extracting RRT was an experience for me. It made me see more clearly were I had no intuition or the wrong intuition for technical facts I thought I understood perfectly, intellectually. I do not know how relatable my confusion over something so simple as a filename extension is, and I should probably not admit to that confusion in public, but it has to be seen through that lens.

I planned this thread to mirror the structure of Krishty's thread, which is why I referenced the latter so much in my first post. I especially planned to point out, where my experiences differed from those he describes and my thoughts on why. To that end I should emphasise at this stage, how different the .pic situation is from the TIM situation in Krishty's warmup post. He writes:
Das Standardtexturformat der PSX ist TIM. Die meisten Tools des PSX-SDKs arbeiten mit diesem Format, und es ist direkt auf die PSX zugeschnitten. Nicht jedes Spiel nutzt dieses Format, aber die Wahrscheinlichkeit ist recht hoch.
From what I saw, this degree of standardisation was unheard of a few years earlier when RRT was developed. There is no standard .pic format, not only in the sense described above, that .pic just means "It's a picture" but also in the more narrow sense that even within one development ecosystem, in RRT's case Microprose, people could not agree on a file format for pictures.

The community has investigated into .pic files from "Sid Meier's Covert Action", "Sid Meier's Civilization", and Darklands. I have not made any in depth comparison recently -- something I would love to do but not as a priority right now -- but as far as I remember they all were different. And different from RRT's format. This is to say I am not aware of any other application that uses exactly RRT's file format. (Of course it would be embarassing now if I post the code and someone identifies that as a straightforward variant of DEFLATE or the like, but I'm pretty sure that is not the case).

I am still unsure if I even should count these other .pic adventures as previous work on my topic. Is it really previous work if it's about a different file format? I wouldn't regard other formats of .pic files -- like the one QuickTime apparently knows -- previous work just because of the filename extension. Are the formats somehow closer related because it says "Sid Meier" or "Microprose" on the box? I do not know.

What's next?

Since this is the second draft of this post, unlike in the previous ones I know what will be next. Before that, let me conserve some thoughts from the first draft of this:

When I began to write this series I knew it would take a long time to write it up and that I might loose interest before I finished. Several abandoned attempts at blogging taught me that much. I therefore felt obliged to provide at least one piece of information that brings us closer to the actual point of knowing how to decompress RRT's .pic files. So right in the tl:dr; I gave away that the files are LZSS compressed even though this post is not about LZSS. The next one will be ... sort of.

Word of warning though. If you are mostly waiting for the actual format or rather decompression algorithm I'm afraid they are pretty boring. I promise I will try to get there eventually but as you might have notices if you have read so far, that is not my main goal with this writeup. So until I get to the point, I hope the backstory of my experiences is mildly entertaining.

Beiträge: 6977
Registriert: 26.02.2009, 12:18
Benutzertext: state is the enemy

Re: Extracting: Railroad Tycoon

Beitrag von Krishty » 22.08.2019, 08:04

LZSS, cool – like in Ace Combat 3, but probably not in separate streams.

It’s pretty impressive to see that games in the 90s had just a dozen different compression algorithms in use. That’s probably due to the fact that search engines weren’t quite mature and instead, game programmers all read the same books. (The same reason quaternions were largely unknown to game programmers until the early 2000s.)

Nowadays we have PNG (DEFLATE), JPEG (DCT + Huffman) and Webm derivatives (Wavelets) for textures on the hard drive, DXT (three different families) for textures on the GPU, Zip/gzip (DEFLATE again), BZip (BWT + Huffman) and 7-Zip (LZMA) for game archives, Mermaid/Selkie/Kraken (LZdontknow) for high-performance archives, H.265 (DCT + Motion detection + arithmetic coding) for videos *plus* the in-house stuff for things like terrain compression in flight simulators. Sucks to be a reverse engineer in the 2030s :)
They do seem to start with with these four bytes 00 48 38 00 which might hint at a format indicator but that is impossible to google. I'm incredibly tempted, as a result of the confusion I just described, to call the format ".pic-file-format".
48 38 is H8 in ASCII, so I’d just go by 0-H8-0 or the like.

On the other hand, TFX’s SPR files are entirely different from id’s SPRs, but I call them SPR anyway.
seziert Ace Combat, Driver, und S.T.A.L.K.E.R.   —   rendert Sterne

Alexander Kornrumpf
Beiträge: 1684
Registriert: 25.02.2009, 14:37

Re: Extracting: Railroad Tycoon

Beitrag von Alexander Kornrumpf » 28.08.2019, 00:46

tl:dr; Wall of text - again, sorry - about how entropy makes me feel stupid.

Part IV -- LZdontknow

[Heading borrowed from here.]

In Part III I said that the RRT's .pic files were compressed with LZSS, but what does that mean? Basically all I'm saying is that the rather abstract description on Wikipedia roughly fits what I am seeing in the disassembly. I might be wrong in my classification. I have checked other early LZ-family algorithms I am aware of and they are a less good fit.

It is important to not imagine that I found some LZSS reference implementation, ran it on my data and that was it. The reality is much more messy. All the algorithms from the LZ-family have one thing in common. They have dozens of degrees of freedom in how their central idea could actually be implemented and how the compressed data could be stored in a stream of bytes - or even bits in many cases. For a simple example of a degree of freedom, think endianess. That is determined by the architecture, you say? OK then, when reading several bits from a bitstream, which one is the MSB? And so it goes on and on. [Footnote: In any case RRT uses a variation of LZSS apparently, maybe that variation has got its own name but if so I don't know it.]

The primary lesson I learned from extracting RRT is that I have no clue whatsoever how cryptoanalysis works in practice. Somewhere between "you can easily - in fact automatically - crack a substitution cypher with frequency analysis and a dictionary" and "here is this idealised description of AES and here are some cool theoretically derived properties" - which of course I have forgotton by now - "that show it's pretty secure" I am missing a part of the picture. I assume most of the magic, think side-channel attacks, happens in the part I am missing.

From what I know about cryptoanalysis, which is not much, I am facing a pretty standard scenario, without the actually hard part. I have the compressed data, obviously, to me no better than cyphertext. I have the decompressed data (clear text), sort of: I can see the picture on the screen. I have the algorithm, albeit hidden in the blackbox of intro.exe. By poking the blackbox a bit, I arrive at a pretty good idea how the decompressed data is stored in memory.

My task is to figure out what's in the blackbox, not in the abstract sense of naming one algorithm or another, but in terms of implementation details. I have to get it right, down to the level of bits and bytes. Real cryptoanalysis has to deal with that mess, a mess my university course absolutely glossed over, and the actual crypto on top of that. I am fascinated how people do that. I certainly couldn't do it. Like I admitted in Part I the major breakthrough in my analysis was to find the right section of the disassemby, not some deep insight derived from first principles. I had to open the blackbox.

The problem, if you want to call it a problem, is that real crypto stays hard even if you have the source code, whereas my problem vanishes once you have the source code. And yet even now that I know all the details of how the files are decompressed, if you gave me only the compressed data, the uncompressed data and the info that it is LZSS, I could probably not derive the implementation details from that alone. I have not put too much thought into that after I solved the riddler the other way and I now have some ideas where I would start looking, but this is after knowing the solution.

I do not really remember since when I knew that the data was compressed. I think that was clear pretty fast. Previous work told me that I was probably facing some combination of an LZ-family algorithm and run-length encoding. Had there been no previous work, I could have learned from here that LZ and RLE was just how 90s programmers rolled. Cf. also Krishty's comment above. But most importantly, ruling out actual encryption as not very likely -- although I think some later games did encrypt their data -- compression was the only explanation for the lack of patterns in the data or rather the nature of the patterns that were there.

We make sense of the world by recognising patterns. This influences how we canonically store data on a computer. There is no law-of-nature saying that text has to be stored as strings of characters and pictures have to be stored as spatial arrays of pixels but that is how humans make sense of text and pictures and so we naively represent and store things that way. We want our texts and pictures to be patterned, otherwise they would be just gibberish and noise to us. Compression, with apologies to Claude Shannon, is the act of replacing patterns by a shorter representation of the whole pattern, making the resulting string of representations shorter and less patterned than the original.

I learned that knowing this in the abstract and looking at relatively high entropy data for relatively long periods of time are two different things. I can somewhat recommend the latter as an exercise. It really puts some things into perspective.


55 55 00 FF F8 FC FF F8 FC FF F8 FC FF F8 FC FF F8 FC FF F8 FC FF F8 FC DB 56 FF FE FF FF F8

Figure 5: The first few bytes of pixel data. Sort of. There is a pattern (FF F8 FC) but what does it mean?

Entropy takes the neatness out of things. That neat trick where you render your data as a bitmap and see what is is? Won't work! That other neat trick to find regular spaced repetitions in the data? Guess what!

2019-08-28 00_09_16-LABS.PIC - IrfanView (Zoom_ 377 x 472).png
Figure 6: See the pattern?

What's next?

I'm glad to have this out of my system, so to speak. In ther following parts I will finally try to get technical.