Let me tell you a tale of two bugs that wasted my afternoon.
One: SDL wants your vkGetInstanceProcAddr
I like using SDL for cross-platform windows and event handling and and whatnot. For everything else, I try to write my own libraries. I largely do this to understand how things works. Case in point: I wrote my own Vulkan "meta loader" (think Volk).
My meta loader tries to be simple and straight-forward: each Vulkan function is a global function pointer in a single translation unit with external linkage.
Well, turns out SDL tries to be clever: before it calls dlopen, it first checks if it can dlsym the current process for vkGetInstanceProcAddr. Guess what it finds? My pointer! The problem is, I need SDL to load Vulkan (I could write my own but SDL already knows which libraries to look for) so that I can initialize vkGetInstanceProcAddr in my loader. It's a catch-22.
I got around this by renaming vkGetInstanceProcAddr in my meta loader. I thought about trying to hide the symbols but it didn't seem to work (maybe I just did it wrong)
I could have relied on the application to load all the functions that it needs instead of maintaining one big list in a translation unit somewhere. This is what SDL does under the hood. Or, instead of globals, I could have hid them in a struct. But then your function calls don't match the docs 1:1.
Two: Stack Corruption via Struct Size Mismatch using #ifdef
This one caught me off guard. I naively thought the linker could handle this but how could it? Some libraries take advantage of the behavior.
When a function was trying to write to a struct pointer, it would raise SIGBUS on an address of a function pointer I had loaded in the same stack. This screams stack corruption but where and by who?
I had a struct member hidden by an #ifdef. If the macro was defined then the struct was 3 ints (24 bytes) large, otherwise the struct was 2 ints (16 bytes) large. The struct was exposed by a header which was used by the library implementation and the client code.
Turns out the client code didn't define macro but the library did. That meant that the client code only allocated 16 bytes on the stack for the object, whereas the library code thought it was 24 bytes. When the library went to write something to that 3rd int (bytes 17-24) then it wrote past the stack space allocated for the struct and corrupted the stack of the caller!
I solved this by ensuring the client code had the macro defined as well. But I could also have added an assert for the expected struct size in the library. Or I could have removed the macro altogether (seems dangerous to leave it in frankly). Or I could have made the struct opaque and relied on the library to allocate it for the client.
If you're having problems shifting on your bicycle and you feel like you've tried everything you can find on the internet, try checking that your wheels are installed correctly.
I was really struggling with the trigger shifter on my bicycle. (Don't ask me what model it is, it's some Shimano 8 gear thing that's so old I can't find a picture of it on the internet). It was taking so much effort to shift down using the thumb lever, especially 2 to 1. So much effort that I injured my thumb and now I can't put too much pressure on it! My bike is my main method of transportation though so I had to do something.
So I put my bike on my bike stand, pulled up Google, and went to work. I checked:
The rear derailleur. This is my primary suspect for anything shifting related. The left-right movement of the gears is mainly dictated by cable tension, which you tune with the barrel adjuster. But even with the barrel adjuster in the sweet spot to make the gear changes, it was still too hard to shift.
Is the rear derailleur hanger bent? I put my bike on my bike stand and, nope, not bent. Looked fine. The gears of the derailleur were in a pretty straight line with the cassette.
The cable. I had replaced the cable a few months ago (when the old cable had shredded itself inside the shifter) so I didn't think it was a bad cable. Had I installed it wrong? I took it out, inspected it, put it back in, no difference.
The cable housing. Maybe there was something clogging the housing, or maybe the cable had created a channel that was causing some resistance? I took the cable housing off the bike but I couldn't see anything. The cable slid up and down the housing well enough.
The shifter. Normally it's not the shifter, but the grease can dry out and get gummy. Also, when the old cable shredded itself, maybe a little piece had gotten lodged inside? I took out the cable and shifted gears but everything was working fine. I tried putting some dry lube inside (it's all I had on hand). It almost felt like it made a difference but when I tried again the next day it was still too hard to shift.
At this point I was flummoxed. I had spent so many hours and all I had was a sore thumb. All the shifting components are fine, what the hell is causing my problem? I was prepared to take it into a bike shop.
My one last ditch effort was the wheel. I had changed my tires last month for winter, which involved taking the wheel off. What if I didn't install the wheel as before, so the cassette (attached to the wheel) is no longer parallel to the derailleur gears? This would effectively cause the same problems as a bent derailleur hanger: a misalignment between the planes of the cassette and derailleur gears.
The wheel has a bolt running through it that slides between a slot in the tines of the rear fork. Then it gets secured with a nut. I loosed the nut slightly and the wheel dropped into a different position, just by a few millimetres. I tightened the bolt, tested the shifter and BAM shifted like new.
All this to say: counter-intuitively, if you're having shifting problems, maybe the problem is with your wheel (not your shifter). I had not seen this mentioned anywhere else on the internet but it makes sense. Give this a try before giving up.
Initially, I wasn’t sold on HDR. It seemed like a gimmick: why not stretch the existing SDR colors between the darkest and brightest that the panel can display? (2)
Now, I'm sitting on top of 7000 photos (1) from a recent vacation, a subscription to Lightroom, and no deadline for editing. And I just clicked the HDR button for the first time...
HDR? SDR? Huh?
SDR is standard dynamic range. It basically describes what a CRT can do. 8 bit color, sRGB color space, max 100 nits brightness.
HDR is high dynamic range. There are a bunch of standards right now but basically it’s anything better than SDR: higher bit depth (e.g. 10, 12, 14, 16), wider color space, higher peak brightness, and so on. For comparison, HDR10 specifies 1000 nits of brightness.
In a nutshell, images are just bits on a chip. The bit depth describes how many values a single pixel can take. The more bits, the more expressive your image can be. Think of it like an artist with 10 paints vs 100 paints. However, we see images as lights shone out of a monitor. The bits are turned into color via a colorspace, then the colors are turned into lights by the monitor. The monitor has a color gamut which describes how many different colors it can show, and a calibration that says how accurately it can show those colors.
As display technology has improved, we can represent more and more colors and more levels of brightness. But we still need the old images to look the same.
If none of this makes sense, try watching Noodle’s video.
Why HDR now?
Pure accident. (3) 7000 photos has given me lots of varied data. Some of it is overeposed. Some of it is underexposed. Some of it is flat. (5) Sometimes, I just start clicking random buttons to see what happens. One of those was the HDR button.
I was also curious. I knew something was up: I had some shots that looked overexposed but I was still able to recover details in the highlights. I figured this was due to some magic of RAW but didn’t exactly know what.
Coincidentally, (and most importantly) my display happened to be HDR-compatible. I could see the difference. (4) This is HUGE. You need to SEE the difference in person for the effect to work.
What sold you on HDR?
It looked unnatural at first. The brightness looked unnatural and transitions to “HDR” pixels looked harsh. But after taking a break, I started noticing how “flat” the SDR image was looking in comparison.
The biggest things that stood out are improved contrast, the ability to view "overexposed" images, and the increased image clarity via tone mapping.
First, contrast. My jaw always drops when I toggle HDR for an image that actually has a large difference between the darkest and brightest pixel. I got so much more detail out of the ground, roofs, sides of buildings. It’s almost like magic; how was I not able to see this before? Everything else now looks like it has a film in front of it. Turning on HDR is like removing discolored varnish from an oil painting. Again, you need to see the difference in person.
Second, “overexposed” images. HDR lets you take advantage of those bright bits above the limit of SDR (6). My SX60 HS uses 12 bit RAWs, and my Pixel 6a uses 10 bit RAWs. This means that I have to squish 10 or 12 bits of data into 8 bits. Often, I would just pull down the whites but then you’re just reducing contrast; this can make the image look flat.
Third, tone mapping. From what I can tell, the HDR workflow involves editing two images. First, you edit the HDR image so that it looks good for HDR viewers. Second, you apply modifications on top of the (edited) HDR image so that it looks good for SDR. This second step is called tone mapping. (7)
Tone mapping has different results compared to just editing with SDR. I found that shadows would maintain a lot of detail and the image in general had better contrast. However, sometimes I didn’t want this, and it’s hard to add darkness via the tone mapping process. What I ended up doing for one photo was making some pretty drastic changes to make the SDR version look the way I want:
HDR for everything?
I don't know. This is new to me. I'll continue to toy. I think I might take it on a case by case basis. If the SDR rendition of an HDR auto adjustment looks good then maybe I'll keep it. But if I can't tone map the way I want, or the image doesn't need the improved contrast, then maybe I'll just edit the SDR.
I wrote this mainly because I couldn't find people talking about the difference between editing SDR vs tone mapping.
Footnotes
(1) I shot RAW + JPG so that number is likely closer to 4500. The RAW is for editing when I’m back home, the JPG is for sharing with friends when I’m on the go. Why do this when I have Lightroom on my phone? The SX60HS can’t transfer RAWs over Canon Camera Connect. To it’s credit, the camera was sold in 2014; I don’t recall editing RAWs on your phone being common at that time.
(2) The baffling things is that I’ve done display calibration before. On reflection, I know now that the answer is probably something to do with specs, color gamuts, and calibration.
(3) HDR editing in Lightroom was added “recently” in October 2023. I think that I started using Lightroom before then.
(4) I didn’t explicitly buy an HDR monitor. It just happened that the M1 Macbook Air that I use for editing has limited HDR support (only up to 400 nits of brightness)
(5) I might go into this later, but I made the… uninformed… decision of underexposing a lot of my images. I was afraid of overexposing (not understanding the bit depth of my camera) and thought that I would have better luck pulling colors out of black. It’s actually kinda the opposite: a bias towards underexposing means you’re raising the exposure in post. But that just amplifies grain. It’s almost like using a higher ISO. My camera produces more accurate colors with more light. And, in general, cameras LOVE light (up to the point of saturation). Anyway, I learned a lot on this trip… Just look at your histogram.
(6) Provided that there’s actually data there and the highlights aren’t clipped. Camera sensors can only absorb so much light until they become saturated.
(7) I guess, to some extent, all editing for SDR from a higher bit depth image is tone mapping. Unless you just dither.
(This post was originally written around May, 2019. I found it rotting in my drafts.)
I got into bookbinding before Christmas of last year. It’s a slow and frustrating skill to acquire; most of your time is spent repeatedly folding paper, punching holes, cutting board, or waiting for glue to dry. However, it’s rewarding when everything comes together.
Yesterday, I put the finishing touches on this:
Yes, Korsakovia can now be ingested in book form! I’m going to outline the high level process of taking the book from unformatted script to printed product.
Step 1: Formatting the Script
The script is publicly available on ModDB. Sure, you could take that, print it on A4 paper, then spiral bind it, but that’s too easy and lacks pizzazz. This script deserves better than Courier New!
There are several programs you can use to lay out text:
Word processors, like Microsoft Office (proprietary) or LibreOffice (free). They work OK, but they place emphasis on text layout. This makes them pretty inflexible.
Desktop publishing software, like Adobe InDesign (proprietary) or Microsoft Publisher (proprietary) or Scribus (free). This places emphasis on laying out each individual page, which gives a lot of freedom but also a lot of work to ensure everything stays consistent.
I was using LibreOffice but I found Scribus easier to work with.
At this point, you have to think ahead a bit:
What paper will you be printing on? I have an abundance of letter size copy paper. Folded in half, this makes 5.5″ x 8.5″ pages. In Scribus, you can set that as a custom size, then indicate that it’s double-sided (so it previews in 2 columns)
What are the margins of your printer? I have a Brother HL-L2390DW which can’t print anything around 0.166″ (1/6″) from the edge of the paper. You can sort of get around this by choosing “Page size: Fit” if printing in something like Adobe Acrobat then trimming that off after binding. This is what I ended up doing, but beware that it messes with proportions (everything is scaled down) and decreases the size after binding! (Alternatively, don’t use images that need to bleed to the edge of the margin!)
I’ll post my tips for working effectively with Scribus in another post.
Now it’s just a matter of creating pages, styles, and frames, then copying the text over from the PDF. This takes a lot of manual, tedious labour.
I ended up with around 44 formatted pages. This is a pretty small book. I decided at this point to use a perfect binding. I wanted to use case binding but I would have very few signatures and a thin spine. Sometimes you have to figure out how to bind it this early on because it can impact the inner margins, as well as the imposition step.
When you’re done, you can export to PDF... but it doesn’t look like it’s ready for printing 2 column, double-sided. What gives?
Step 2: Imposition
Imposition is the step where you take the individual 5.5″ x 8.5″ single pages and lay them out to be printed on letter size paper. The catch is that it needs to be printed in such a way that it can be read properly after it’s bound! You’ve probably taken this for granted if you’ve printed anything from a word processor.
Scribus, very purposefully, does not provide imposition features. I used PdfBooklet. It’s a bit rough around the edges but does a pretty good job.
I used the default settings for a booklet because I’m going to cut the pages in half later to be bound. (This ended up being a bad choice due to how my printer bowed the paper during printing. I have a noticeable split in the middle of my book.)
(Editor me from the future, I would go on to write my own imposition workflow in Python using pdfimpose. Highly recommend!)
Step 3: Printing
PdfBooklet produces a PDF ready for printing. I used Adobe Acrobat because I like the quality with which it renders the pages for preview. I also know what settings to use in it’s printing dialog.
The HL-L2390DW does double-sided printing, so I checked Print on both sides of paper and selected Flip on short edge. This is important, otherwise half the pages will be upside down when bound!
As mentioned before, under Page Sizing & Handling, I chose Fit.
You can do a quick check at this point to make sure it will read like a book. Since it was printed like a booklet, you can fold the stack gently in half and see if you can flip through it like a book.
Step 4: Binding (finally!)
I ended up with 22 printed sheets. I stacked them and cut them into the 5.5 x 8.5″ sheets of paper. Until bound, you can hold the resulting stack of 44 sheets together with binder clips. I slapped some PVA glue on the spine, then wrapped it in construction paper.
__declspec(naked) (Adventures in Reverse Engineering)
Even the most basic function which does nothing produces machine code:
The push+mov is called the prolog. The pop+ret are the epilog. This is boilerplate to setup the stack, save registers, handle arguments, etc. based on calling convention.
... but what if I want an empty function? That's where __declspec(naked) comes in
(int 3 is a debugger breakpoint???)
Now we can do whatever we want!
This still requires a calling convention because the body of the function does not dictate how the client is expected to pass arguments, which registers to save, etc.
(remember, call is just a push eip + jmp; ret is just a pop eip + jmp)
The newish type safe Navigation Compose lets you use objects instead of strings for routes. So, typically I start with something like this if there are no parameters for the route:
Then I go "oh I need to add an ID to the destination to access some resource. So I change the object into a data class:
@Serializable
private data class NewTaskDestination(val resourceId: Int = 0)
// …
navController.navigate(NewTaskDestination)
Then, if I recompile and run, I get this exception trying to navigate:
kotlinx.serialization.SerializationException: Serializer for class 'Companion' is not found.
Please ensure that class is marked as '@Serializable' and that the serialization compiler plugin is applied.
This is the most unhelpful error in the history of ever I swear to God!
wtf is Companion
what does serialization have to do with a navigation problem?
You know what the solution is? Open and closed parentheses when calling navigate:
@Serializable
private data class NewTaskDestination(val resourceId: Int = 0)
// …
navController.navigate(NewTaskDestination())
I just dftkjhodhjpdoghj I want to murder someone now. Why didn't my IDE tell me this?!
I'm trying to learn Kotlin so that I can use Jetpack Compose so that I can make an Android app. Following the Android codelab, I almost immediately hit a roadblock: wtf is this code doing:
As a C++ developer (with a little Java knowledge), some of this is bewildering. Especially:
setContent {
// ...
}
WTF? It looks like a function call but there's no parentheses. It looks like a scope but how is the code run? I was scratching my head: does this have something to do with the annotation (no, that's just metadata), or something to do with accessors (no, you need to spell out get() and set()), or something to do with anonymous objects (no, you need to spell out object). Then I took a look at the setContent method declaration:
public fun ComponentActivity.setContent(
parent: CompositionContext? = null,
content: @Composable () -> Unit
)
Ok that's interesting, the first parameter has a default so it's likely optional, and the last parameter looks like a lambda. Maybe that has something to do with it? (BTW Kotlin Unit == C++ void) So I go to the Kotlin docs and, lo and behold, there it is: trailing lambdas.
According to Kotlin convention, if the last parameter of a function is a function, then a lambda expression passed as the corresponding argument can be placed outside the parentheses [...] If the lambda is the only argument in that call, the parentheses can be omitted entirely
That means that setContent is taking a lambda. That lambda also calls a function that takes a lambda, and so on. The entire UI hierarchy is just a bunch of lambdas that something (?) eventually executes... And, despite looking like a type due to PascalCase, they are actually functions (which normally use camelCase). How this all works is a question for another day.
In C++, lambda parameters go before the curly branches. In Kotlin, lambda parameters go inside the curly brace. Here, the Scaffold() function wants a lambda that takes padding parameters.
EDIT: To their credit, the Android Developer website mentions this in one of their codelabs. However, getting to the right codelab that contains this information is (IMO) not intuitive.
Merging git repos into a monorepo with git subtree
Sometimes you yearn for the simplicity of a monorepo. Maybe you just want everything in one place, maybe you just want to more easily share code.
Regardless of your motivation, git makes it easy to split and combine repos via git subtree. While it isn't part of git core, most distributions package git with subtree anyway.
After your monorepo is set up, you can start importing repos into it with git subtree add:
-P says which subdirectory in the monorepo the imported repo should live. This allows you to organize the code that you're importing.
is the thing you want to import, e.g. https://github.com/user/repo
is the branch or tag to import. Typically, this is main or master.
Compared to submodules, subtrees exist independently of their remote. If the remote of the subtree is deleted, the subtree in your monorepo is a copy so it persists. However, if the remote of the submodule is deleted then you can no longer clone it.
Compared to good ol' copy-paste, subtrees preserve git history. Although if you don't want the history you can also --squash.
One word of caution: if you want history then you need to include the merge commit. On GitHub, this means you have to Create a merge commit; do not squash it! If you squash the merge during submission in GitHub then you lose the history. Maybe I'll go into details another day...
I started a new project at work and it got me thinking about some easy things you can do that immediately elevate your code quality
Source Control (git)
I'm a git guy but any one will do. You just need some way of saving and tracking code over time.
Build System (CMake + Ninja)
While I'd love to recommend Bazel, it's not very cross-platform. Modern CMake is the closest that it gets. My only gripe is that everything gets added to the ALL target by default. If you build your dependencies from scratch via submodules then the usual CMake commands can build too much. So you either need to train people to build specific targets, or not use submodules, or just deal with bloated build times.
Please read Professional CMake. It's the most comprehensive, up-to-date CMake guide.
Style Guide
Google Style is my go-to since it's comprehensive, tested, and makes sense (for the most part).
Dependency Management (conan)
Building everything from source is a good approach since it provides more control. If you find that it doesn't build fast enough I'd consider setting up conan
Auto-format (clang-format)
Just use Google style. Set up to format on save. This makes everything consistent and prevents arguments about code style.
Unit Tests (gtest)
Please write unit tests. I'm begging you.
You catch bugs.
You write fundamentally different code. What behaviors do you want? Are they intuitive? What dependencies do you have? Should they be injected? This forces you to think about SOLID principles and maintainability from the get-go.
Static Analysis (clang-tidy)
C++ is full of ways to shoot yourself in the foot. clang-tidy provides good guardrails.
Code Coverage
This can help you understand which parts of your code could be better tested. Don't use coverage% as a strict metric otherwise you just waste time writing pointless tests.
Sanitizers
This pairs well with static analysis to give you more guardrails. This only works if you have good tests, so please write tests.
Documentation
I used to be a Doxygen fanatic but I got trained on... just reading the source code. Structure your code in ways that can be read by people, then put the docs there in comments. If the headers are too complicated since you're using weird things like templates then maybe don't use templates?
I bought a Steam Deck OLED so I can play gachas while I'm away on vacation.
Why I went with the Steam Deck and not the competition (Lenovo Legion Go or Asus ROG Ally):
Steam Deck seems to be the most stable and reliable. Apparently there are software and hardware issues with the others.
That does mean sacrificing on-paper tech specs considering the competition offers higher resolutions and display refresh rates. However, I'm not looking for a desktop replacement; this is an airport fidget. The increase resolution and refresh rate would eat into battery life for honestly not a whole lot of gain.
I also sacrifice Windows (yes you can install Windows but it doesn't seem to work great) but I'm comfortable with Linux and Proton seems to work well. I prefer an OS that I can hack in a pinch...
Unorganized thoughts:
It defaults into "Native Big Picture" but can be swapped into Desktop Mode. This works with USB-C hubs! Having a keyboard and mouse really helps getting non-Steam games working.
Desktop Mode is Arch running KDE. Time to play pacman. I'm a GNOME guy but KDE is very polished, powerful, and approachable.
I call it "Native Big Picture" because it offers more than Desktop Mode running Big Picture. Primarily, the Steam Deck overlay only works in "Native Big Picture" but not Desktop Mode running Big Picture. This seems important for optimizing battery life. Also, the input remapping only seems to work in "Native Big Picture".
"Native Big Picture" does run a basic window manager, it just lacks decoration and shows up centered.
I got the 90hz model. I notice it occasionally but 60hz is fine for me. Anything above that is gravy. I worry about the battery life that this eats up...
Notes for getting games:
Want Minecraft? Use PrismLauncher. This makes it slightly easier to add as a non-steam game. I still had to convert their .desktop to .sh so it can be added as a non-Steam game...
Use Epic? Install Heroic from the app store, then add them to Steam as non-steam Games.
Play Genshin? Sounds like HoyoPlay works but I used Heroic. I think that I needed to change the install drive from Z: to C:. Then I added the Unity binary as the non-Steam game. Then, in Steam, I needed to turn on compatibility with Proton. This game does not work in Desktop Mode since input remapping doesn't work in that mode. You'll want to log in with a physical mouse and keyboard.
The above applies for ZZZ as well.
Palia? I installed though Heroic then added PaliaClient.exe as a non-Steam game. It was complaining about the C++ Runtime but Reddit to the rescue. I installed protontricks and used that to get vcrun2022. I found this easiest to log in when not docked, otherwise the game resolution got really messed up and input didn't work well. You need to log in every time so I recommend having a password manager like 1password (which you can install from the app store)
Current pain points and annoyances:
While it basically runs any Windows game, every non-Steam game I've downloaded requires non-zero time investment to add it to Steam. Typically I need to drop into Desktop Mode, install the game, find the binary on disk, launch it once or twice to make sure it works, maybe write a wrapper script, then go into Steam and hope it can be added. Then hope Proton works. I've had trouble importing .desktop files (which is annoying...) but those are trivial to convert into a .sh
You need to keep the screen on to download games. Burn-in isn't a problem with the OLED under normal use though so this is more a LOL than anything else.
Steam+X is the soft keyboard. This is not easy to discover and necessary for Desktop Mode. It also doesn't work as well as a physical keyboard
MSVC Boolean Branches (Adventures in Reverse Engineering)
There are many ways of writing the same boolean expression, but some are not the same! Particularly, pay attention to explicit comparisons to TRUE! Anyway, i use this graphic when I get confused about how to decompile something
x87 FPU and SSE (Adventures in Reverse Engineering)
Game: Diablo Pre-Release Demo (1996)
Language: C++
Toolchain: Visual C++ 4.0 (suspected)
Take a look at this:
These are x87 FPU instructions (described in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Chapter 8). This roughly corresponds to:
if (gamma_correction >= min_gamma) gamma_correction -= gamma_delta;
However, if you put any floating point math into Compiler Explorer, you won't get the same output! That's because the default mode now is to use the SSE instructions.
Fortunately, you can disable SSE on MSVC using /arch:IA32.
Game: Diablo Pre-Release Demo (1996)
Language: C++
Toolchain: Visual C++ 4.0 (suspected)
Calling conventions are something that a programmer rarely thinks about but is very important to code interoperability. It covers things like: do I push all my arguments on the stack? Can I use registers to pass arguments? Who cleans up the stack, you or me? What matters most about calling convention is that both caller and callee agree.
The default calling convention for the Visual C++ 4.0 toolchain seems to be __fastcall. Here's the Microsoft documentation: https://learn.microsoft.com/en-us/cpp/cpp/fastcall?view=msvc-170 In essence: the first two arguments (if they fit in 4 bytes) can use ecx and edx; other arguments go on the stack in reverse order (e.g. the third argument is at the top, followed by the fourth, etc).
But let's look at an example:
The prologue allocates 8 bytes on the stack, then saves ecx (4 bytes) and edx (4 bytes) in that space. It does this so it can reuse ecx and edx later (for indexing monster[])
Calling code looks like this:
What happens if I treat this like a __stdcall? I push my arguments on the stack but ignore ecx/edx so they contain garbage. The function though doesn't look at the stack, it looks at ecx and edx! So undefined behavior. Worst case, the i variable accesses monster beyond its bounds, corrupts memory, and eventually the game crashes.
Was this a macro? (Adventures in Reverse Engineering)
Game: Diablo Pre-Release Demo
Langauge: C++
Toolchain: Visual C++ 4.0 (suspected)
One of the tricky things about decompiling is that a lot is lost when the source goes through a compiler:
comments are thrown away
human-readable names are discarded
the preprocessor kicks in and copy-pastes code
variables can be optimized away
In this case, let's look at the preprocessor. The preprocessor runs pretty early on and expands macros into new code. This is indistinguishable from the developer copy-pasting a block of code. Compiler Explorer backs us up here if you look at this sample, both functions produce the same assembly despite seemingly being implemented differently.
So when I see things like this:
or this
or this
Did the developers copy-paste these function calls or did they make a macro?
To some extent, the answer is "who cares". Although it largely depends on what your goals are. If you want source that produces the same binary (which is what I'm doing) then both copy-paste and macros are viable options. If you're aiming to be source exact then... good luck.
For me, compiling to the same binary gives me confidence that I have a working version that I can build on. I don't the same exact source.
Total reverse engineering including decompilation is the holy grail of any mod project where an SDK is not provided.
I'm currently embarking on a journey of totally decompiling the Diablo Pre-Release Demo using Devilution as a reference. After I have the source code, I can make any change I want anywhere I want. I'll be free from the whims of the original DIABLO.EXE that I'm patching. I won't need binary patching, and I won't need DLL hijacking. I won't need to worry about new code throwing off relative offsets, nor will I need to worry about how to jump in and out of patch code; the patch will be seamlessly integrated.
This is a long way off and I make slow progress. I'm currently investigating what my options are for speeding up this process.
Binary patching isn't the only option for modding. There's also DLL Hijacking. If you think about it, I'm acting like a computer virus; any mechanism that a computer virus uses to hijack a process can be used here. DLL Hijacking is just the simplest one in this case. You'll see why in a moment.
The Diablo Pre-Release Demo relies on DPLAY.DLL, DDRAW.DLL and STORM.DLL. The one that stands out here is DPLAY.DLL; isn't the demo single-player? Yes it is, but it includes some snippets of non-functional multiplayer.* So, since the game isn't using DPLAY.DLL, how about I substitute my own!
What DIABLO.EXE needs from DPLAY.DLL is:
DPLAY.DLL exists in the DLL search path (e.g. the same directory as DIABLO.EXE)
DLPAY.DLL exports two functions: DirectPlayCreate() and DirectPlayEnumerate().
I can make my own DLL that looks like DPLAY.DLL except it has my own C++ code, and the game will accept it and run with it. I very naughtily do this in DllMain() (which you're not supposed to do but it works so hey why not) using VirtualProtect() with PAGE_EXECUTE_READWRITE.
The downside is that I still need to, to some degree, jump in and out of the functions that I write. Though with __declspec(naked) I don't need to worry about the compiler generating prolog/epilog that tramples the register contents.
So now I write C++ code that gets compiled into a DLL and self-patches itself into memory when it's loaded. Now i can use more traditional software development workflows and scrap the tedious binary patching method entirely.
(If I need to get multipalyer working in the future, I can move my code over to DDRAW.DLL since I already distribute a custom version of that)
* That the demo has multiplayer is interesting from an archival, historical perspective but I gloss over it here because it's not important to DLL hijacking. I also can't comment on whether or not it actually works...
Pre-ablo is my mod of the Diablo Pre-Release Demo that aims to be as faithful to the source while allowing it to be played from start to finish. I started it because the only existing mod of the Pre-Release Demo was Alpha4 and, while I enjoyed it, I wanted something with fewer creative liberties. Alpha4 is a new game entirely, I just wanted to play the Pre-Release Demo as it was without crashing!
However, I'm relatively new to the world of decompiling and x86 assembly. I figured the best place to start was the most obvious and I could bootstrap my way from there.
I started with binary patching. It was painful. The workflow looked like this:
Load DIABLO.EXE into IDA. I have a running IDA file where I annotate functions and variables based on Devilution
Identify the broken code. This often requires understanding the x86 assembly, mentally decompiling it to C++, and saving those annotations into the IDA file. Very rarely do I already know exactly what the code is doing, so understanding the code is a large part of this time.
Identify a fix. The best fix is one that doesn't add net new instructions so I tried to favor those. I'd also take some shortcuts which I regret doing later...
Identify where the fix would go. In step 3 I said that I didn't want to add net new instructions. This is largely thanks to the .EXE format. If I add new bytes to the file then all the offsets are now wrong and the game won't work. In these cases I had to repurpose dead code, and figure out how to jump in and out. Every jump is more work down the line...
Turn the fix into asm. My mind works in C++ so I need to mentally translate that into X86 assembly. This is tricky since I'm largely unfamiliar with x86 assembly (though I get better every day)
Turn the asm into machine code. Oh boy this sucked. This sucked so hard. Machine code is meant to be read by the processor, not humans! It ends up being incredibly terse! In addition, x86 has some peculiarities: it uses a multibyte encoding and has a lot of weird edge cases. At one point I switched to using Ghidra to do this for me (but this created its own headaches since I can't use my IDA annotations in Ghidra...). Also I needed to calculate relative offsets to the current instruction pointer...
Insert the machine code into DIABLO.EXE. I did this with a hex editor. By hand. If I made a mistake I'd have to start over.
I made this reproducible by encoding the binary differences using vcdiff. That way, I could take a fresh DIABLO.EXE and reconstruct Pre-ablo by applying the vcdiff patches in order. It also separated the logical changes into a list of discrete patches; changing one patch (usually) had no impact on the other.
This sucked but it worked. I used this approach until v0.4 when I replaced it with something better...
(The "easy way out" was to pay for IDA Pro. Which is several hundred US $. No thanks.)