When you hear the word disassembly, what comes to mind?
For most people, the word summons visions of rogue hackers slaving away in third-world 24/7 chop shops, reverse engineering major commercial applications, robbing people blind, violating encryption, breaking software licensing mechanisms like Daniel-san chopping through six blocks of ice in the second Karate Kid movie:
But that sort of thing (Hi-Yaaaah!) is the exception, rather than the rule.
More often, disassembly is used not to recreate a commercial application, but rather to glean information about the inner workings of such an application. You've heard about Google's new Chrome web browser. What you might not have heard is the story of how Chrome was built on APIs which were (allegedly) disassembled from the Windows kernel. Scott Hanselman explains:
Looks like The Chromium authors may have disassembled part of the Windows Kernel in order to achieve this security feature under Windows XP SP2. Probably not cool to do that, but they're clearly doing it for good and not evil, as their intent (from reading their code) is to make their browser safer under XP SP2 and prevent unwanted code execution.
It's an excellent article as always from Scott, who was (I believe) the first person to publicize Google's brush with the "dark side" of programming. In all fairness, Google soon denied having performed the disassembly, although they did go on to state:
Disassembling is a common and accepted practice in software development, frequently used to make sure software features are compatible with other software programs or operating systems.
I wouldn't exactly describe disassembly as a common and accepted practice, but let's acknowledge that disassembly has its place in the programmer arsenal. The only problem: in the grab-bag of marketable technologies, disassembly techniques are...well, they're not even present. Maybe something like 1% of the jobs out there call for this kind of knowledge:
- Assembly language
- CPU architecture
- Optimizing compilers
It may be important to understand low-level programming concepts, but at this level, you're perilously close to eating your circuitry's lunch. Go any lower, and what you're doing can no longer properly be called software.
This is software bedrock.
Down here, as every assembly programmer knows, there are no classes, no generics, no convenient enumerations. There aren't really even any functions, properly speaking. In the trenches of binary executable software, everything is a number:
- Instructions
- Data
We call these instructions opcodes, and they're intended to be fed, more or less directly, to the CPU, which can be thought of as Pac Man: chomping his way through a gameboard littered with "dots" or instructions.

Some of these instructions tell him to perform a simple task:
Add two numbers together.
Some of these instructions tell him what other instructions to execute next (causing our Pac Man to turn left or right):
Start executing the instructions at address 0x4530FFE8.
It's not that much of a stretch to say that all programs, regardless of the language they were written in or the platform they were written for, ultimately condense down to a sequence of opcodes (sitting in one part of memory) and data (sitting in another).
These opcodes are rigorously documented and publically available:
- Intel 64 and IA-32 Architectures Software Developer's Manuals
- AMD64 Architecture Programmer's Manuals
Which means that modern applications are 100% self-documenting at the binary level. Instructions fed to the CPU can't be hidden or disguised. Memory addresses can't be faked or falsified. The thing that prevents our applications (once publically deployed) from being reverse-engineered and stolen out from under us is the sheer quantity of instructions required to make a program do anything meaningful.
Hence the title of this article.
Obfuscation through verbosity, it turns out, discourages all but the most determined attackers. Top-to-bottom reverse-engineering of a complex application in a timely enough fashion to hurt that application's market share is a rarity. But partial reverse-engineering happens all the time:
- Make Apple iTunes produce DRM-free music
- Figure out that Minesweeper cheats
- Extract hidden game text in the PokerStars chat window
These exploits and many, many others are possible because binary executable code is still the lingua franca of software development and it's still as wide-open as ever. It just happens to be a language fewer and fewer people bother to learn—for good reason, as programming moves to higher and higher levels of abstraction.
And yet—if a programmer really wants to learn to "see through the Matrix"...
...he should be willing to pierce the comfortable illusion of his favorite language, every once in a while, in order to look at the sometimes ugly but always logical binary code running beneath. As Yeats put it:
I must lie down where all the ladders start,
In the foul rag-and-bone shop of the heart.
Though he might not have been talking about x86 programming per se.
Posted by James Devlin 12 comment(s)






