Obfuscation Through Verbosity

Thursday, January 22, 2009

When you hear the word disassembly, what comes to mind?

For most people, the word summons visions of rogue hackers slaving away in third-world 24/7 chop shops, reverse engineering major commercial applications, robbing people blind, violating encryption, breaking software licensing mechanisms like Daniel-san chopping through six blocks of ice in the second Karate Kid movie:

But that sort of thing (Hi-Yaaaah!) is the exception, rather than the rule.

More often, disassembly is used not to recreate a commercial application, but rather to glean information about the inner workings of such an application. You've heard about Google's new Chrome web browser. What you might not have heard is the story of how Chrome was built on APIs which were (allegedly) disassembled from the Windows kernel. Scott Hanselman explains:

Looks like The Chromium authors may have disassembled part of the Windows Kernel in order to achieve this security feature under Windows XP SP2. Probably not cool to do that, but they're clearly doing it for good and not evil, as their intent (from reading their code) is to make their browser safer under XP SP2 and prevent unwanted code execution. 

It's an excellent article as always from Scott, who was (I believe) the first person to publicize Google's brush with the "dark side" of programming. In all fairness, Google soon denied having performed the disassembly, although they did go on to state:

Disassembling is a common and accepted practice in software development, frequently used to make sure software features are compatible with other software programs or operating systems.

I wouldn't exactly describe disassembly as a common and accepted practice, but let's acknowledge that disassembly has its place in the programmer arsenal. The only problem: in the grab-bag of marketable technologies, disassembly techniques are...well, they're not even present. Maybe something like 1% of the jobs out there call for this kind of knowledge:

  • Assembly language
  • CPU architecture
  • Optimizing compilers

It may be important to understand low-level programming concepts, but at this level, you're perilously close to eating your circuitry's lunch. Go any lower, and what you're doing can no longer properly be called software.

This is software bedrock.

Down here, as every assembly programmer knows, there are no classes, no generics, no convenient enumerations. There aren't really even any functions, properly speaking. In the trenches of binary executable software, everything is a number:

  • Instructions
  • Data

We call these instructions opcodes, and they're intended to be fed, more or less directly, to the CPU, which can be thought of as Pac Man: chomping his way through a gameboard littered with "dots" or instructions

Some of these instructions tell him to perform a simple task:

Add two numbers together.

Some of these instructions tell him what other instructions to execute next (causing our Pac Man to turn left or right):

Start executing the instructions at address 0x4530FFE8.

It's not that much of a stretch to say that all programs, regardless of the language they were written in or the platform they were written for, ultimately condense down to a sequence of opcodes (sitting in one part of memory) and data (sitting in another).

These opcodes are rigorously documented and publically available:

  • Intel 64 and IA-32 Architectures Software Developer's Manuals
  • AMD64 Architecture Programmer's Manuals

Which means that modern applications are 100% self-documenting at the binary level. Instructions fed to the CPU can't be hidden or disguised. Memory addresses can't be faked or falsified. The thing that prevents our applications (once publically deployed) from being reverse-engineered and stolen out from under us is the sheer quantity of instructions required to make a program do anything meaningful.

Hence the title of this article.

Obfuscation through verbosity, it turns out, discourages all but the most determined attackers. Top-to-bottom reverse-engineering of a complex application in a timely enough fashion to hurt that application's market share is a rarity. But partial reverse-engineering happens all the time:

These exploits and many, many others are possible because binary executable code is still the lingua franca of software development and it's still as wide-open as ever. It just happens to be a language fewer and fewer people bother to learn—for good reason, as programming moves to higher and higher levels of abstraction.

And yet—if a programmer really wants to learn to "see through the Matrix"...

Pong (xkcd.com)

...he should be willing to pierce the comfortable illusion of his favorite language, every once in a while, in order to look at the sometimes ugly but always logical binary code running beneath. As Yeats put it:

I must lie down where all the ladders start,
In the foul rag-and-bone shop of the heart.

Though he might not have been talking about x86 programming per se.

Tags: disassembly, CPU, opcodes

13 comment(s)

Entertaining article, good "detour" on the botting series.

Gee, you've got us trained to wait weeks for your articles - it'll be days before people check in here!

I remember learning IBM 360 machine language at Humber College in the 70s. Mr Cassell I think...

Great article.

The Pac-Man metaphor is great. I'm going to use that for my next batch of students. Also, any technical post which mentions the Karate Kid gets a thumbs up from me.

I was waiting for the article to [b]start[/b] and then it ended. What a disappointment.

It's aimed at a slightly different audience, but

Hoglund, G & McGraw, G (2004) Exploiting Software - How to Break Code Addison Wesley Press

Does an excellent job of explaining disassembly and IDA, the tool for the job. It also shows how to cut through a good portion of the verbosity by zeroing in on the specific DLL and function to hook or disassemble. One example in the book is a single-byte patch that causes all kernel permissions checks to succeed. Another example IIRC explains reverse-engineering a security patch to identify the security flaw it was intended to fix.

Warning: This is not a light-reading "Hacking exposed" book. Don't drop $40 for it unless you are willing to invest some time going through the examples for understanding.


Another enjoyable article. Mr Miyagi is a legend, thanks for the nostalgia too :)

Hoglund, G & McGraw, G (2004) Exploiting Software - How to Break Code

Got bored of it. I had trouble keeping interested as it took a long winded path to explaining software exploitation.

I quite agree with Elizabeth--- it's an excellent book, one which has been mentioned before on this blog (look in some of the earlier poker botting articles). I also agree with Poker Rakeback though. It has a lot of fluff, and I also don't like how it focuses on WoW. The WoW code is outdated but still explanatory.

I remember machine language code from UC Irvine in 82. Yikes! I actually remember the opcodes. Question for the Bot Master: Have the online poker sites been able to detect when a DLL is injected? There is some talk I am hearing that this has happened and if so, can the bot ever be kept undetectable?

... love the series... hanging on your every word.

[quote]Have the online poker sites been able to detect when a DLL is injected?[/quote]

Absolutely. Look for calls to such functions as EnumProcessModulesEx() ReadProcessMemory() EnumDeviceDrivers() GetTimestampForLoadedLibrary() CreateToolhelp32Snapshot(). Some of these calls are generated dynamically at run-time and as such don't show up in static code analysis.

Consider machine code which implements an interpreter, call it interpreter1. This interpreter1 could interpret bytecode (bytecode1) which in turn implements another interpreter, interpreter2 with an incompatible language. Then when interpreter2 (j) executes program bytecode2 (j) it might in turn implement interpreter3 (j+1), until at some level, bytecodeN implements the actual desired program (slowly). Now reverse engineer that from the disassembly level working your way up the semantic chain.

Use the form below to leave a comment.

Coding the Wheel has appeared on the New York Time's Freakonomics blog, Jeff Atwood's Coding Horror, and the front page of Reddit, Slashdot, Digg.

On Twitter

Thanks for reading!

If you enjoyed this post, consider subscribing to Coding the Wheel by RSS or email. You can also follow us on Twitter and Facebook. And even if you didn't enjoy this post, better subscribe anyway. Keep an eye on us.

Question? Ask us.



I am just sittin' here watching the wheels go round and round...


You've read our technical articles, you've tolerated our rants and raves. Now you can hire us anytime, day or night, for any project large or small.

Learn more

We Like

Speculation, by Edmund Jorgensen.