Online poker, meet optical character recognition. Optical character recognition, meet online poker. I'm sure the two of you will get along just fine.

Optical character recognition or OCR is one of those cool technologies which occupies a boring space. Document scanning is nifty and all, very useful from an office productivity standpoint, but it's not sexy. But take that same OCR component and embed it as a cog in the gearworks of an intricate real-time online poker botting rig...
That's positively porn star.
Well, whatever. The point is, repurposing a plain vanilla office productivity app for the purposes of online poker degeneracy makes us happy. So let's see what happens when we feed a piece of online poker text...

...to a typical OCR engine such as the one included with recent versions of Microsoft Word: MODI, short for Microsoft Office Document Imaging.

Only let's do it in code, because that's the way we do things 'round here! MODI can be automated over COM and of all the bajillion ways of invoking a COM object from a given language, .NET's COM Interop is one of the cleanest from a usage standpoint. So we'll add a reference (speaking in Visual Studio terms here) to the MODI type library:

And we'll hack out a few lines of C# code to a) create a new MODI document from an image and b) perform the OCR.
md.Create(@"c:\somefolder\pokerstars_chat_text.tif");
md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image image = (MODI.Image)md.Images[0];
MessageBox.Show(image.Layout.Text);
md.Close(false);
For this barebones proof of concept, you'll have to take a screenshot of the poker window, crop it down to the piece of text you want to OCR, store the result as a TIFF, and pass the name of that file into the Create method. I used this image:

Later, assuming you choose to use OCR in your poker bot or other real-time strategy assist, you'd want to automate this process:
- Programatically snapshot the poker table window.
- Programatically convert the resulting image to an in-memory TIFF.
- Programatically invoke the OCR.
- Programatically parse the returned text.
- Go to Step 1.
A little messy, but these steps can be performed across multiple tables on commodity hardware without taxing the machine as only a small amount of text is being OCRed. If you currently do any sort of interval-based screen scraping or pixel testing, these steps can likely be factored into your existing input loop. And no, this method does not require that the poker table be visible at all times: it's possible to take a full-sized snapshot of a window which is currently minimized.
But I digress.
After running the above code, MODI will return the following text.
Dealer: downgoesdown Folds
Dealer: xactr21 bids
Dealer: Warstar raises $8 to $12
Dealer: iceoholic bIds
Dealer: rdegs2l boids
Dealer: TheYeti calls $10
Comparing that to the original text, we can see that three errors have been introduced.
Dealer: downgoesdown Folds
Dealer: xactr21 folds
Dealer: Warstar raises $8 to $12
Dealer: iceoholic folds
Dealer: rdegs2l folds
Dealer: TheYeti calls $10
MODI has mistaken the word "folds" on line 2, 4, and 5 for "bids," "bIds," and "boids," respectively. But before you start complaining about how "Microsoft software is crap" consider that you've written a whopping total of six lines of code.
Imagine what you could do with:
- A full-fledged commercial or robust open source OCR solution...
- With suitable training and customization...
- Optimized for the text-display characteristics of online poker.
Ultimately, if you're serious about incorporating OCR into your poker bot or other automata, you won't use MODI at all. It's a nifty application, but for error-free OCR you'll want to take advantage of training, customization, and your complete knowledge of how text is displayed in whatever online poker application you're targeting.

In order to do that, leverage one of the many available open-source and/or commercial OCR packages. I've used tesseract-ocr, tessnet2, OCRopus, and GOCR at different times (not necessarily in an online poker botting capacity) and we haven't even touched the commercial OCR solutions, some of which are rock-solid.
I'm not recommending that you actually use OCR. That depends on your specific situation:
- Can you extract the text via conventional methods such as WM_GETTEXT?
- Can you extract the text by detouring the text-output APIs used by the poker client?
- Can you extract the text by snooping around in process memory?
- Can you extract table state by other methods?
But I am saying that OCR is feasible and it does offer certain advantages over mechanical techniques:
- OCR doesn't break with every online poker client update.
- OCR is non-invasive, requiring no injection of DLLs.
- OCR is platform-agnostic; all it needs is a source image to work from.
- OCR is effectively immune to countermeasures (when it comes to online poker).
The online poker user interface is really a sort of implicit CAPTCHA unto itself: easy for humans to solve, not so easy for computers. Over time this CAPTCHA has gotten progressively more sophisticated as poker sites have started employing passive countermeasures aimed at making it more difficult for external tools to interrogate the poker client application for game-related text data.
There are ways around these countermeasures, but maybe you don't want to get involved in a war of escalation. Maybe you shun that sort of bickering. Maybe you grok that one thing you can always count on is the visual interface. Or maybe, like me, you enjoy corrupting innocent office productivity applications just to be perverse.
If that's the case, give OCR a chance. You might not regret it.
UPDATE:
Those interested in the OCR approach should take a look at this fascinating 9-page discussion of real-world OCR in a poker botting context, including the pros/cons of OCR training vs. pixel testing vs. NN-based recognition at PokerAI.org. (You DO read PokerAI.org, don't you?)
Posted by James Devlin 34 comment(s)





