Hand-coded image recognition. Checking individual color values to try to figure out if the random smear of pixels forms a "L" or an "H". Dubious OCR schemes. Internet spambots. Click fraud. Hacking.
These are the things most people think of, when they hear the phrase screen scraping.
![]()
They're wrong, of course.
But you still won't find Screen Scraping 101 up at the local college. You won't find it in books, except here and there, occasionally, mentioned in passing, usually with a smirk. I've said before: screen scraping should always be a last resort. But you know what? Screen scraping has gotten a bad rap and it's not entirely deserved.
My experience?
Getting our applications to talk to one another has been something of a software Holy Grail. I give you DLL exports, COM, DCOM, Corba, OLE, Remoting, a dozen other technologies. It would be nice if we lived in a world in which each application (and each electronic gadget) exposed a clean, universal Automation interface allowing for external command and control. It should be a requirement for a well-behaved piece of hardware or software, that it allows itself to be controlled by other pieces of software.
if (DateTime.Now >= ac.WakeTime)
{
CoffeeMaker cm = House.Rooms["Kitchen"].Items["Coffee Maker"];
if (cm != null)
{
cm.Wake();
cm.Prepare(2); // 2 cups
cm.Brew(); // asynchronously brew the coffee
}
}
But we don't live in that world, at least not yet.
We live in a world which is more like the Tower of Babel.

Our world is a world of one hundred million competing pieces of software flogging it out across a dozen platforms, using different languages, different protocols, different storage mechanisms, different endian-ness, different everything. It's a huge mess and in part we're glad, because entire industries are born of these differences. Without them, many of us wouldn't have jobs..
But if I had to say in a word what the single most expensive characteristic of modern software is, complexity would not be my choice. I would choose words like interoperability, migration, conversion, transformation, adjustment—any word describing the enormous friction created by trying to fit billions of square pegs into billions of round holes, line by line of code, across the trillions of lines of code that comprise the world's code base.
In the midst of all this chaos, one thing we can usually count on, is the GUI.
No matter what happens in software, no matter how the underlying mechanisms change and improve, end-user applications will always display textual and graphical elements on a surface. That surface might be your computer screen, your iPod, or one day, your visual cortex. But there will always be a surface and there will always be elements painted on that surface, and the programmers responsible for coding those elements will try to make them pleasing to, and easily readable by, the human eye.
In order to do that they'll usually have to rely on publically available APIs or libraries because it's enormously difficult to paint good text, or good anything, from scratch. Font rendering is a subject for experts. Drawing anti-aliased lines quickly is a subject for experts. The mathematics of viewing frustums is a subject for experts. If you're looking for an exercise in futility: try to implement a good font-drawing routine from scratch. Get back to me in five years and let me know how it worked out.

No, by and large we're forced to use operating system APIs or third-party libraries because the complexities that they mask would be too costly to implement ourselves. And once we do that, we make our programs accessible not only to humans, but to software. To build an application based on common UI components is to implicitly state that yes, our application can be accessed and manipulated by other applications written by other people.
And this is why I say screen-scraping has gotten a bad rap. Screen scraping, true screen scraping, has very little to do with "pushing pixels" and everything to do with accessing display text and other UI elements in a robust way, regardless of whether such use was planned or intended by the creator(s) of the software. That's right: I'm saying that screen-scraping, implemented properly, is a clean, robust, generic, and a powerful addition to the programmer's bag of tricks.
It just suffers from a bad name, occasional misuse, and widespread misunderstanding.
Screen scraping is a technique in which a computer program extracts data from the display output of another program. The program doing the scraping is called a screen scraper. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing. Screen scraping often involves ignoring binary data (usually images or multimedia data) and formatting elements that obscure the essential, desired text data. Optical character recognition software is a kind of visual scraper.
I can't fault Wikipedia's definition of the phrase, but I think whoever wrote it thought that screen scraping is about how to extract meaning from pixels drawn to the screen. The definition is technically correct, but misleading. I'd like to suggest a different definition.
Screen scraping is a technique in which a computer program extracts data from the target program by examining the properties and behavior of the target program's GUI, and in particular, by examining and hooking the code structures that lie beneath the GUI.
By that definition, screen scraping has a lot more in common with system-level development techniques, or even disassembly, than it does with OCR or image recognition. And when we look at the code structures beneath the GUI, rather than trying to grok the meaning of pixellated randomness, we find that screen-scraping techniques are very robust indeed:
- DLL Injection
- API Detouring
- Window Subclassing
- Message Processing
In fact, we start to see that screen-scraping is an exercise in conventional development techniques. Windows. DLLs. Processes. Threads.
The stuff of which software is made.
So the next time you hear someone mention screen scraping, think twice before you chuckle. Screen scraping is about writing software with the sensory (visual) and motive (manipulaing the mouse, keyboard) capability of a human. It's not always (or even usually) appropriate, but like that most famous of martial arts techniques...

...when properly applied, it's extremely powerful.
Posted by James Devlin 2 comment(s)
Subscribe via RSS
Subscribe via email
