OCR = 0nline P0ken 0pticaL Chanacter Recogrition

Friday, May 29, 2009

Online poker, meet optical character recognition. Optical character recognition, meet online poker. I'm sure the two of you will get along just fine.

Optical character recognition or OCR is one of those cool technologies which occupies a boring space. Document scanning is nifty and all, very useful from an office productivity standpoint, but it's not sexy. But take that same OCR component and embed it as a cog in the gearworks of an intricate real-time online poker botting rig...

That's positively porn star.

Well, whatever. The point is, repurposing a plain vanilla office productivity app for the purposes of online poker degeneracy makes us happy. So let's see what happens when we feed a piece of online poker text...

...to a typical OCR engine such as the one included with recent versions of Microsoft Word: MODI, short for Microsoft Office Document Imaging. 

Only let's do it in code, because that's the way we do things 'round here! MODI can be automated over COM and of all the bajillion ways of invoking a COM object from a given language, .NET's COM Interop is one of the cleanest from a usage standpoint. So we'll add a reference (speaking in Visual Studio terms here) to the MODI type library:

And we'll hack out a few lines of C# code to a) create a new MODI document from an image and b) perform the OCR.

MODI.Document md = new MODI.Document();
MODI.Image image = (MODI.Image)md.Images[0];

For this barebones proof of concept, you'll have to take a screenshot of the poker window, crop it down to the piece of text you want to OCR, store the result as a TIFF, and pass the name of that file into the Create method. I used this image:

Later, assuming you choose to use OCR in your poker bot or other real-time strategy assist, you'd want to automate this process:

  1. Programatically snapshot the poker table window.
  2. Programatically convert the resulting image to an in-memory TIFF.
  3. Programatically invoke the OCR.
  4. Programatically parse the returned text.
  5. Go to Step 1.

A little messy, but these steps can be performed across multiple tables on commodity hardware without taxing the machine as only a small amount of text is being OCRed. If you currently do any sort of interval-based screen scraping or pixel testing, these steps can likely be factored into your existing input loop. And no, this method does not require that the poker table be visible at all times: it's possible to take a full-sized snapshot of a window which is currently minimized.

But I digress.

After running the above code, MODI will return the following text.

Dealer: downgoesdown Folds
Dealer: xactr21 bids
Dealer: Warstar raises $8 to $12
Dealer: iceoholic bIds
Dealer: rdegs2l boids
Dealer: TheYeti calls $10 

Comparing that to the original text, we can see that three errors have been introduced.

Dealer: downgoesdown Folds
Dealer: xactr21 folds
Dealer: Warstar raises $8 to $12
Dealer: iceoholic folds
Dealer: rdegs2l folds
Dealer: TheYeti calls $10

MODI has mistaken the word "folds" on line 2, 4, and 5 for "bids," "bIds," and "boids," respectively. But before you start complaining about how "Microsoft software is crap" consider that you've written a whopping total of six lines of code.

Imagine what you could do with:

  • A full-fledged commercial or robust open source OCR solution...
  • With suitable training and customization...
  • Optimized for the text-display characteristics of online poker.

Ultimately, if you're serious about incorporating OCR into your poker bot or other automata, you won't use MODI at all. It's a nifty application, but for error-free OCR you'll want to take advantage of training, customization, and your complete knowledge of how text is displayed in whatever online poker application you're targeting.

In order to do that, leverage one of the many available open-source and/or commercial OCR packages. I've used tesseract-ocr, tessnet2, OCRopus, and GOCR at different times (not necessarily in an online poker botting capacity) and we haven't even touched the commercial OCR solutions, some of which are rock-solid.

I'm not recommending that you actually use OCR. That depends on your specific situation:

But I am saying that OCR is feasible and it does offer certain advantages over mechanical techniques:

  • OCR doesn't break with every online poker client update.
  • OCR is non-invasive, requiring no injection of DLLs.
  • OCR is platform-agnostic; all it needs is a source image to work from.
  • OCR is effectively immune to countermeasures (when it comes to online poker).

The online poker user interface is really a sort of implicit CAPTCHA unto itself: easy for humans to solve, not so easy for computers. Over time this CAPTCHA has gotten progressively more sophisticated as poker sites have started employing passive countermeasures aimed at making it more difficult for external tools to interrogate the poker client application for game-related text data.

There are ways around these countermeasures, but maybe you don't want to get involved in a war of escalation. Maybe you shun that sort of bickering. Maybe you grok that one thing you can always count on is the visual interface. Or maybe, like me, you enjoy corrupting innocent office productivity applications just to be perverse.

If that's the case, give OCR a chance. You might not regret it.


Those interested in the OCR approach should take a look at this fascinating 9-page discussion of real-world OCR in a poker botting context, including the pros/cons of OCR training vs. pixel testing vs. NN-based recognition at PokerAI.org. (You DO read PokerAI.org, don't you?)

Tags: MODI, OCR, poker bot, online poker, poker

39 comment(s)

I've used tesseract before although not for anything poker related-- but I have wondered whether typical OCR would be up to the online poker challenge. The small x-size of on-screen fonts taxes a lot of OCR apps which expect to be going off a fairly high DPI image, at least, that's what I've gleaned from the theory I've read.

Still interesting to see it actually done, even just in proof-of-proof-of-concept form. Thanks for the food for thought.

There appear to be 4 errors. It has incorretly read rdegs21 as rdegs2l (replacing the 1 with an a lowercase L). But then that was not your point...

"Marblecake also the game."

Lol. Awesome.

One advantage of OCR is that you can run poker in a virtual machine, with literally zero detectability (other than it may know it's in a VM)

One wonders whether the online poker sites should just offer a formal API with per-developer license keys. Legit tools could go directly through the API. Poker bots would be left out in the cold but at least there'd be a formal mechanism for the 'real-time strategy assists' as you call them, that are not blocked by EULA.

Just an idea...

I have Microsoft 2003 and 2007 but for some reason, Microsoft Document Imaging isn't installed. Any ideas? I'd rather not go through getting tessnet up and running. Looks pretty hairy.

Some discussion on the OCR topic: http://www.pokerai.org/pf3/viewtopic.php?f=79&t=1682

another, bad, mistake is the name of xfactr 21, which is read to be xactr21, without the f.

I used to use a diy OCR via pixel reading but reading fonts is a major pain in the ass for a lot of sites due to fuzzy fonts. Someone once suggested that DirectX could be manipulated to disable all the fancy stuff but i never did find out how. Perhaps someone out there knows...

And i'm glad that Linux (and Mac) was mentioned because not everyone uses Windows. Downside is that the list of poker clients is very limited, although web browser clients are becoming more common now. I have managed to get the Entraction (Java Applet) clients to run as a native app on my Ubuntu Linux box with a simple launch script. Main reason for doing that is to enable the saving of Hand History files (which an Applet can't due to security restrictions). The script and source code can be downloaded from my website: http://bespokebots.com/linux.php.

But James I think your point was correct, and it seems indicated in the pokerai discussion as well, that if you train a robust OCR to a specific font, you can get a 100% success rate. The question is whether that's less work than doing a pixel-test method.

I'm going to have to call bullshit on this one. Nobody is actually going to use some godforsaken "pixel-test" method to read text. If they can't get the text directly from whatever window displays it, then it's easy: they can't run a bot.


Remember, the people who build poker bots aren't qualified software developers. They're Internet script-kiddies trying to get rich quick.


I only see one pile of bullshit here: [quote="Guido on 6/1/2009 4:16:12 AM"]I'm going to have to call bullshit on this one. Nobody is actually going to use some godforsaken "pixel-test" method to read text. If they can't get the text directly from whatever window displays it, then it's easy: they can't run a bot.


Remember, the people who build poker bots aren't qualified software developers. They're Internet script-kiddies trying to get rich quick.[/quote]

Although screen scraping is a last resort tactic, it DOES work. Most of the commercial odds calculators and live advisors use screen scraping methods. Even well known names like Winholdem, OpenHoldem and Calculatem use screen scraping. Most so-called "script kiddies" are able to create working bots. If they are as unprofessional as you claim, then a flood of highly predictable bots can only be good for the rest of us...

Has anybody published source code for a simple pixel-testing method? I don't want to OCR, I just want to look at the individual pixels. I've seen this mentioned various places but there's usually not source code to go with it.

Any ideas?

You mentioned getting a snapshot of a minimised window. Got a link on how that's done? Currently I'm using SwitchToThisWindow and waiting for it to come to the front, which takes an unacceptable amount of time. Thanks.

Theres an article on capturing a minimized window here: http://www.codeproject.com/KB/cs/CapturingMinimizedWindow.aspx?display=Print I dont know how good/fast it is though, havent needed to try it.

Anonymous: If you just want a simple pixel-testing method, you could just use a 32-bit CRC checksum of the pixel, window square, et.c, that you wish to test for. You would have to map each card to its own CRC checksum (precomputed) value. This is not the most robust way of doing it because a CRC will not tolerate a single bit change in the pixels that you check, i.e. it is an exact testing method. If only one color is changed (compared to the image you computed the checksum from in the first place) in the pixel window you are testing for, you will most likely end up with a completely different checksum value.

Check google for the CRC checksum algorithm, there should be source samples out there...

stalewee, there are many other ways which you can do that.

Have a look at experts-exchange.com

Frode - errr. or you could just get a DC, and use GetPixel.

I test a single pixel to get card rank (2 color tests, one each for black and red), and one for card suite (again one color test for black and red), on AP - But that's probably just the way the anti-aliasing worked out in my favor.

Slightly off topic, since this is an 0ptical chanacter recogrition post. Fortunately for me AP does run drawtext to put text to screen, and the chatbox is pretty much a standard RTF text box. AP isn't gungho-psycho about demolishing third party software development - which is why I won't touch PS. If they change to something more complicated/tricky, me and my little poker calculator are probably shit-outta-luck T_T

Fascinating stuff James.

@ Guido. I think your response is a fairly poorly developed one. Creating bots for anything is highly technical and that goes for poker bots too. It doesn't just involve a few spotty kids trying to earn some money, there's some really great computer science involved here.

Ultimately you could stick a camera in front of the screen and interpret the video image, generate mouse movements and key presses electronically. I remember seeing a project where a guy had a tetris playing bot using these techniques. Let them try and detect that..........

Great articles, James! I have question for you. Did you make onboard cards recognition on Poker Stars? I made simple recognition by capturing and recognizing card images from window, and it works like shit.

aw - I meant calculating the CRC of all the pixels in a fixed size window area the cards are displayed (only the upper left corner that displays As, 2h and so on.) I don't think I was making my self very clear on that point. I do agree however, that this is a very naive approach, but if the client doesnt change the card bitmaps, it should work.

Hey, what is the name of OCR program used on a first picture? And is it detectable?

Nice article. I'll have to look into it. I currently capture the whole screen and hash screen regions against stored images to get the current table state.

in my opinion there is a problem using OCRs: how to know, the new OCR scan, if the chat lines was pre-existent or not?

True line: Dealer: Johnny34 folds

First OCR: Dealer: Jobnny34 folds Second OCR: Dealer: Johnni34 folds

If I compare the two rows for my code them are different... I register a fold that in real play there is not....

I have developed the same thing but not using windows ocr but faster way by just comparing letters to its match (all fields are same font). If anyone is interested mail me semmimuzespsat(at)email.cz - I have c# code and it could work in real time with added evaluations and statistics.

Dude you are awesome. Im a serious HNL deep stack player, PT3 enthusiastic user, excel addicted and love programing. I have read several of your articles and liked them all, totally get you, keep up the hard work.

Here is a good article which compares the various OCR software: http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html

the ultimate bot would be one that clones the poker client and sends/recieves data duplicating the information being sent and recieved by the client.

Your blog provided us with valuable information to work with. Each & every tips of your post are awesome. Thanks a lot for sharing. Keep blogging.tactical pant

PokerStars chat reader recognizes chat and prints it to standard output

Use the form below to leave a comment.

Coding the Wheel has appeared on the New York Time's Freakonomics blog, Jeff Atwood's Coding Horror, and the front page of Reddit, Slashdot, Digg.

On Twitter

Thanks for reading!

If you enjoyed this post, consider subscribing to Coding the Wheel by RSS or email. You can also follow us on Twitter and Facebook. And even if you didn't enjoy this post, better subscribe anyway. Keep an eye on us.

Question? Ask us.



All in all you're just another spoke in the Wheel.


You've read our technical articles, you've tolerated our rants and raves. Now you can hire us anytime, day or night, for any project large or small.

Learn more

We Like

Speculation, by Edmund Jorgensen.