Stranger than Fiction: The Story of PokerTracker, Online Poker, and PostgreSQL

Wednesday, July 08, 2009

I wonder if the PostgreSQL development team had any idea.

After all, when you go about the incredibly difficult task of building (that is, coding) a relational database, you don't think: man, this thing will one day achieve mass popularity as the storage repository of choice for tens of thousands of real-money online poker players.

Do you?

You don't wonder how long it will be until a group of people in Austin, Texas, in a dilapidated basement not far from 6th Street... 

...subvert your open-source database as a cog in the gearworks of an intricate online poker botting rig; nor how many other people will do the same thing in Dallas, and New York, and L.A., and London, and Paris, and Leipzig, and Hyderabad, and Rio, those elusive poker botting hatcheries springing up out of nowhere and from nothing, like underground massage parlors or Northern Cali marijuana grow-houses or seedy all-night poker clubs conducting business in smoke-filled back-offices in nondescript greater metropolitan area office parks. You wouldn't consider or even be aware of the thousands upon thousands of PokerTracker and Hold'em Manager users who'll flock to your database, learning its ins and outs, getting their hands dirty, occasionally even parlaying an end-user poker player's knowledge of a third-party online poker tool into a job as a full-time database administrator. And you'd never (in your hypothetical shoes as a PostgreSQL contributor) be able to predict that certain dedicated poker players would take it upon themselves to learn SQL and relational database theory/practice, delving into the arcane (to a normal person) minutiae of keys and joins and tables and DDL and DML, all in order to better use a third-party online poker tool originally written by a single developer with an MS Access database; now thankfully running on a PostgreSQL back-end.

These things happen, but they can't be predicted.

And I guess that the application of open-source technology to a game as viscerally competitive as poker is not without irony. Software-assisted poker isn't the noblest use of open-source technology that can be imagined...but it is one of the most interesting. The evolution of online poker from its naive roots into the tooled-up search-and-destroy seven-headed acid-spitting monster of today is fascinating, at times profound, colorful, even crazy. Degenerate. But always interesting...

And it all started with those pesky goddamn text files.

Since the dawn of online poker (more or less), players have had the ability to request hand histories—textual, step-by-step descriptions of each and every hand of poker played by the player while playing. After all, it's poker over the Internet for real money; some sort of formal, human-readable game record is essential. And early on, on some sites at least, the way you got these hand histories was over email, usually via sort of automated "Request Hand Histories" window. You'd go in and say "hey, give me my last 100 hands," and you'd get an email containing the textual history of those hands.

Bada bing.

And that was fine, for a while. It meant you could do post-mortem analysis, to a certain extent. If you played an interesting hand or won or lost big in a killer session, you could request the hand history and sanity-check your play or maybe email it to a friend; whatever; you had the record. But the thing which had earth-shaking ramifications for poker as a whole was that textual hand histories enabled the crowdsourcing of poker strategy. (Also, the Internet may have had something to do with it.)

You know what I mean by crowdsourcing. I'm talking about the Online Poker Tactical RFC. Here's how it works:

  1. Take a textual hand history.
  2. Cut-and-paste it into a post in a poker forum.
  3. Ask for advice.

For the first time in history, players could get tactical and strategic feedback from dedicated players around the world, and not just generic advice, but specific advice pertaining to the exact play of a specific hand or poker scenario, taking into account all the factors: stack sizes, position, exact bet amounts, and so forth. And as you know if you've spent more than two seconds in a poker forum like CardsChat, this trend is alive and well to this day. (Sadly, the original home of the Tactical RFC, rec.gambling.poker, is now an empty lot littered with beer cans and cigarette butts. Spam City.)

Anyway, the idea of the hand history as a formal textual record of a single hand of poker gained consensus early on.

And not too much time passed before certain enthusiasts got the idea of subjecting these hand histories to computerized analysis. After all, it doesn't take a genius to figure out that rigorous number-crunching of a player's complete set of hand histories might yield useful and/or interesting data. And although hand history formats differed (and still differ) from site to site, within a site they were fairly uniform, which in turn meant that it wouldn't be too difficult to build a hand-history analyzer: a tool which a) reads in a bunch of hand histories and b) extracts meaningful data from them.

The idea was kicked around on rec.gambling.poker and other forums for a spell, and the early poker/programming pioneers descended into their basements and concocted little one-off pieces of software which could parse this or that kind of hand history from such-and-such an online poker site. I myself wrote (or started writing) no less than three of these. But no single tool achieved real mindshare until PokerStat was released in 2001. The original rec.gambling.poker thread announcing its release is like taking a glimpse into another world:

Hi all,

I've been spending the past several months writing a Paradise Poker hand analysis software for Windows. It will have:

    1. you and other's starting cards play
    2. you and other's statistics like win rate in terms of money and big bets
    3. ability to leave player comments and notes
    4. you and other's play in the blinds (improve your EV when stealing blinds!)
    5. how you and your opponents play when raising preflop

And to be added...

    6. tourney support (results plus hand histories)
    7. filter to only view blinds play, exclude blinds play, view by position, etc...
    8. list of players, see who the top players are and the worst players are in your database
    9. other things to be added before the final release (any ideas?)

Ah, the innocence.

Now it's important to understand that as of 2001, hand history acquisition was still treated as something a player might occasionally want to do. The idea was that if you played an interesting hand or session, or if you had some question about how the money had been awarded in a certain pot, you could request the hand history. It was a manual, email-centric process and that was okay, because what would you do with a mountain of 250,000 hand histories anyway? Write a program to sort through them all? Ridiculous. And so the online poker sites provided no mechanism for generating or storing hand histories locally, on the user's machine, and they provided no mechanism for automatically requesting hand histories from the poker site.

PokerStat didn't change any of this. What PokerStat did do was:

  • Popularize the notion of the hand-history analyzer as a valuable standalone tool.
  • Invent the concept of automated hand history acquisition (see below)
  • Inspire other developers to build bigger-and-better hand history analyzers.

In other words, PokerStat paved the way for PokerTracker. The PokerTracker.com domain name was registered on 19 March 2002, roughly one year after the release of PokerStat. Shortly after that, the community took up the PokerTracker vs. PokerStat debate, as in the following rec.gambling.poker missive from 2003:

Hello,

I am a registered user of PokerStat.  I haven't used it for about three months, and have some questions about it and the new software called PokerTracker.

PokerStat Questions:

1. Is there an update coming that will support Party and Stars?
2. If so, will it automatically capture and process histories from these sites like it does with Paradise (I love this feature)?
3. If #2 is coming, will this new version allow playing on tables from different sites simultaneously?
4. If #1 is true, will it allow different names on different sites (my Paradise login name was already in use when I signed up for Party and Stars)?
5 Am I a "lifetime" member now that I have version 2?  I paid for the software initially, then I had to pay again to upgrade if I'm not mistaken.

PokerTracker Questions:

1. Does it have an automated feature to request hand histories?
2. Does it have an automated feature to log in to a POP mail server (like PokerStat)?
3. Does it track multi-table tournament results on Stars?
4. If the answer to any of the above is "no", are these features planned for future versions?

Questions about both:

1. Do these packages track session statistics?
2. If so, how well do they handle multiple tables at a time?
3. How do they handle hourly win/loss with multiple tables?

Finally, what are your overall impressions of these packages?  How useful are they compared to, say, tracking sessions on a spreadsheet?

Thanks for any info.

Phil 

Again, at this point in time you still had to request hand histories manually from the poker site. But notice two of Phil's questions:

1. Does it have an automated feature to request hand histories?
2. Does it have an automated feature to log in to a POP mail server (like PokerStat)?

PokerStat was the first online poker tool to take a stab at automated hand history acquisition. You gave it your incoming mail server address along with a username and a password, and PokerStat would go out every so often and fetch/analyze your hand history emails. This functionality was duplicated in PokerTracker v1, and still exists today, six years and two major version releases later:

The appearance of this sort of feature in both of the major hand history analyzers of the day is indicative. What had happened is that players had started to realize that hand histories are a potential goldmine of information. It suddenly became important to request ALL of your hand histories, boring hands and interesting hands alike. And as you can imagine, doing this manually (by requesting hand histories, fiddling with email, etc.) over the course of an intensive 8- or 12-hour poker session was both tedious and error-prone. So in response to that, first PokerStat and then PokerTracker incorporated Import from Email and Auto-Request Histories functionality, and for a while this was a big selling point.

Now this idea of automated hand history acquisition and analysis is powerful; even revolutionary, at least within the world of poker.

It meant that hand histories were no longer strictly a subject for post-mortems. If you could automate the process of delivering and analyzing those hand histories, you could run your hand history analyzer on hands as you played them, in punctuated real time. If you could then complete the feedback loop by providing the player with real-time information on the opponents he's facing at the table right now, you'd have a full-fledged online poker assistant, an intelligent agent whose job is to:

  1. Parse incoming hand histories
  2. Analyze the resulting data
  3. Communicate it to the player

This would ultimately lead to the development of the next piece of the online poker equation, the HUD (heads up display), first in primitive versions such as PokerStat's "manual HUD", and later with full-fledged HUDs such as PokerAce HUD (now integrated with PokerTracker) capable of delivering customizable visual information in all the colors of the rainbow. But that's another story.

The thing to take away from all this is that the hand history analyzers created a demand for instantaneous hand histories.

More and more players started requesting more and more hand histories, and more and more programmers built tools to help those players automate the process of requesting those histories (giving rise to an entirely new species of software, the standalone hand history grabber), and in general the number of histories being requested started trending sharply upward, and I think at some point the online poker companies realized that email is a clunky mechanism for hand history delivery. High-volume email consumes server resources, gobbles network bandwidth, and suffers from high latency. It's difficult to secure. It's messy. And there's absolutely no excuse for it, since by definition:

Every last piece of information necessary to construct a complete poker hand history is present locally, in the poker client running on the user's machine.

In other words, people in a position to do something about it realized that hand histories could and should be generated locally rather than transmitted remotely. It's better from the user's perspective, because he gets instant access to his hand histories; and it's better for the site, because all that expensive hand history processing logic is offloaded onto the client. Sort of a classic client/server scenario, really, and one wonders why the poker sites didn't implement it this way from the beginning. Whatever the reason, they eventually came around, and one by one, the major poker venues started generating hand histories locally. (There are still many sites today that do not, however.)

And it was good.

PokerTracker went on to win the First War of the Hand Analyzers, of course, and within a couple years it would corner the hand history analysis market (at least until the arrival of Hold'em Manager). PokerStat continued to enjoy (and still enjoys) a small following, and holds the honor of being the first widely-available hand history analyzer; but PokerTracker was the first hand history analyzer to achieve mass popularity, or what counts for mass popularity in the world of poker. I haven't been able to find reliable figures for how many copies PokerTracker has sold, but based on its sheer ubiquity, I believe it to be the most profitable piece of third-party poker software built to date, and one of the top three or four most interesting examples of indie development success I've ever come across. So Jeff Vogel, eat your heart out.

But to get back to PostgreSQL.

Pretty much every hand history analyzer that's ever been built can be described as a graphical user interface sitting on top of a database.

That database might be a proprietary or embedded data store (as in PokerStat); it might be a simple Microsoft Access database (as in early versions of PokerTracker); it might be a full-fledged PostgreSQL instance (as in Hold'em Manager and v2/v3 of PokerTracker). But there will always be a database of some kind. The mechanism works like this:

  1. Input. The user imports his textual hand histories.
  2. Processing. The analyzer parses each history and stores a compacted representation in the database along with various other computed statistics.
  3. Output. Later on, the analyzer executes SQL against the database and displays the results to the user.

Once the hand history files have been imported and converted to database format (a process which usually involves taking the data for each hand and splintering it out across multiple tables, a process the programmers and DBAs out there will know as normalizing) the hand history analyzer never touches them again. It does all its work (statistical analysis and reporting and the displaying of charts and tables and graphs) from data stored directly in the database.

You might wonder why this sort of two-stage processing is necessary. Why even use a database at all? Why not just analyze the hand history text files directly anytime the user wants to see some data? (After all, the hand history contains all the information. Like a can of Prego spaghetti sauce, "it's in there".) And the answer of course is that:

  • Text files are opaque (they can't be easily queried/manipulated)
  • Parsing text is time-consuming
  • Text files live on a physical hard drive
  • Disk I/O is time-consuming

In other words, as various hobbyists discovered early on, parsing raw text files takes a lot of time. An active player can easily chalk up hundreds of thousands or even millions of hands over the course of his career, which means that somewhere on his machine there will be a folder containing multiple gigabytes of hand histories: endless streams and rivers and oceans of text (a lot like this article, come to think of it). And even though a robust parser can burn through hundreds of hand histories a second, it can still take hours to import a large collection. When it comes to online poker hand history analysis, you could say that everything is slow for large N.

And it means you have to have a storage bin to persist extracted history data in a durable format that allows it to be queried. There are several ways to handle this:

  • You could treat all hand history files as a single flat-file database.
  • You could build your own proprietary data store.
  • You could use an embedded relational database, invisible to the user.
  • You could use a full-fledged relational database, visible to the user.

Well no, I'm sorry. Pet peeve. There's exactly one way to handle this. Hand history data cries out for an open relational database with a documented schema running a standard SQL dialect and is extremely pissed off when it doesn't get it. You have to store everything in an well-documented standard format so that:

  • Players can query and manipulate the database directly
  • Players can (if necessary) administer/maintain very large databases
  • Third-party toolmakers can build add-ons that work with your database
  • You have a mechanism for the creation of custom statistics (custom SQL or SQL-like expressions)

Otherwise your data will kill you. No, I'm not saying give the user the option of using an RDBMS. I'm saying hog-tie the user and force him to use an RDBMS. It's what the user wants, even if he doesn't know it. Consider the following random forum post published two days before Christmas, 2003:

Also, does anyone know what PokerTracker uses as a back-end to store the data. I'm a database guy and was disappointed that I couldn't write my own queries against the PokerStat database because it is not just a standard Access (or other db type) database.

Lastly, I guess I'm wondering if anyone has a stat tracker that is just in a regular db (I'd take Oracle or even DB2 if that's all I had access to...).

Thanks
dave

And it's fascinating from a software development standpoint, because as software developers we've been instilled with a healthy fear of exposing databases to the end-user. I can hear it now.

Product Manager: So wait, let me get this straight. You want us to force our users, who are some of the laziest people on the planet, to install a full-fledged relational database management system??? On their home computer??? Like what, they're going to become DBAs? And you're calling that a feature? Well, why stop there? Why not just ship them our source code directly and force them to compile it on the COMMAND LINE? Every user is a programmer, right? Well? ARE YOU OUT OF YOUR F--KING MIND??

But online poker isn't a typical industry, and online poker players are anything but typical users. And apparently the PokerTracker people (well, at that time I think it was more of a single-man operation) saw the writing on the wall, because when PokerTracker was released, it did, in fact, ship with a relational database.

The only problem? That database was Microsoft Access.

Now the programmers out there will be rolling their eyes. I don't want to rant about how terrible a database Microsoft Access is, because it's not fair to make those kinds of value judgments when it was never intended to be a robust production database. Another mitigating factor is this: when PokerTracker was released, large hand history databases were a rarity. The size limitation of Microsoft Access and its choppy performance under heavy load didn't seem quite the deal-breakers then that they would today, now that multiple-hundreds-of-thousands-of-hands databases are commonplace.

Nevertheless, as PokerTracker started to gain market share, that innocuous Microsoft Access database started causing problems:

  • Its size limitation meant that large hand history collections had to be broken across multiple databases.
  • Performance degraded as the database grew in size
  • There were data durability/corruption issues

It got to the point to where you could almost hear the Microsoft Access back-end JET engine groaning under the strain. And before too long, the thing which had set PokerTracker apart from PokerStat—its open database—became it's single biggest flaw (the unresponsive single-threaded PokerTracker v1/v2 UI being a close second). Armchair programmers around the world started to make comments like: I could build a better PokerTrackerI'm a database professional and if I were to design PokerTracker I'd do X, Y, and Z. I can't BELIEVE they used Microsoft Access. And so on.

And then one day PokerTracker version 2.0 was released.

Poker Tracker comes ready to use with a Microsoft Access database that stores the statistics and text of your hand histories. For most users, the Access database is sufficient and works just fine. However, for more advanced users and users that want to store a huge amount of hands, a more robust database option has been a requested for a long time. Enter PostgreSQL...

Finally, at long last, PostgreSQL joined the fray. 

Now my belief is that the nature of the hand-history-text-file-analysis problem meant that PostgreSQL was destined to enter the picture sooner or later. Consider:

  1. You can't use a commercial database like Oracle or SQL Server because commercial database licenses cost money.
  2. You can't use SQL Server Express, MSDE, or other "light" versions of commercial databases. First, they're hobbled in terms of performance and capacity. Second, your power users would still have capacity problems. Third, "light" versions of commercial databases often have tricky licensing issues. Fourth, light versions of commercial database are almost always structured to upsell you into a commercial database. Fifth, the company behind the database can discontinue it at any time.
  3. You can't use any database which doesn't have a large developer and user community.
  4. You can't use a database which hasn't been field-tested across multiple versions. (I.e., no betas or avant-garde databases.)

There are only two databases that I know of which meet the above requirements. They are:

  • PostgreSQL
  • MySQL

Both of these databases are robust enough out of the box to handle copious hand history data. MySQL has the larger user base and development community. But when you consider MySQL's web-centric history, its lack of transactional and foreign key support, and its relaxed attitude with regard to data integrity, PostgreSQL emerges as the clear winner—at least when it comes to the task of storing and retrieving hand history data in a Windows desktop environment. In fact, let's just go ahead and say it:

PostgreSQL is the ideal database for hand history storage.

And as I was saying, PokerTracker 2 was a hybrid. It had support for Microsoft Access, and it had support for PostgreSQL. And for a long time PokerTracker was the only game in town, the only product that was functional enough and had enough of a user base to allow for the creation of lots of supporting "how to use PokerTracker" content in the form of forum threads, linkbait, blog posts, e-books, and so forth. And more and more players started jumping on the PostgreSQL bandwagon as by this point the problems with MS Access were pretty well known throughout the community. It was a sort of you mean you're still using an Access database? Come on man, you gotta go with Postgre. Get with the program! type thing.

Time passed...

And then in 2007, a guy by the name of Roy Goncalves released the beta for Hold'em Manager, the first serious challenger to the PokerStars throne.

It's funny how history repeats itself. In 2002-2003, PokerTracker was introduced as a "better" PokerStat. Five years later, Hold'em Manager was introduced as a "better" PokerTracker. This sort of technological leap-frogging is great for innovation, and most of the innovations Hold'em Manager brought to the table were intended to address some of the unpopular quirks of PokerTracker v2:

  • The single-threaded user interface which froze whenever hand histories were being imported
  • The lack of an integrated HUD (PokerTracker v2 HUDs were add-ons)
  • Various layout and display issues
  • Various database/schema issues

Now I don't want to get into the subject of whether PokerTracker was/is better than Hold'em Manager or vice-versa. I just want to point out that Hold'em Manager followed PokerTracker's lead and used PostgreSQL natively from day one. Around the same time (2007), the PokerTracker team started gathering feedback for the long-awaited (some would say overdue) PokerTracker v3.0; which is the current version of the product as of this writing. And whereas PokerTracker 2 had given the user a choice, PokerTracker 3 drew a line in the sand:

Attention all users: PostgreSQL is mandatory!

Which meant that the two most popular tools in online poker—a market famous for shoddy software and grumpy, stubborn users—have both standardized on an open-source database for the prosecution of real-money, real-time online poker intelligence, or whatever you call the blend of services that PokerTracker and Hold'em Manager provide. Which means that all that robust open-source code running in the PostgreSQL transactional engine? It's running on behalf of a competitive poker player named Joe Bob living in Utah. And twenty thousand more like him. Those JOINS are helping poker players improve their game.

As if that wasn't random enough: Joe Bob the poker player is actually learning SQL.

With the advent of custom statistics in the PokerTracker 3 (and to a degree, custom reports in Hold'em Manager), with the publishing of the PokerTracker 3 database schema and statistical reference, and with the increasing sophistication of PokerTracker/Hold'em Manager setups and custom HUDs and statistical arrangements in general, the utility of SQL has gone way up for the serious player. You don't need to know a lick of SQL in order to use PokerTracker 3 and/or Hold'em Manager; you can point and click your way to some pretty advanced setups. But if you want to get the most out of PokerTracker... You Will Need To Know:

  • What a table is
  • What a column is
  • What a primary/foreign key is
  • What a JOIN is
  • Etc.

And if you plan on doing any serious custom work with PokerTracker then SQL is an absolute necessity.

All of this, all of the above, if you're still reading, all of it; it makes me realize what I've suspected for years without ever really vocalizing: the online poker industry is without a doubt the strangest application of computer science and software development technique to a particular domain that I've ever witnessed or come across...with the possible exception of EVE Online. Poker has always been a darling of the research crowd, from the days of Johnny von Neumann; but when you add the Internet and the prospect of real money to the equation, the game of poker tumbles down the rabbit hole of weirdness, never to emerge. I haven't even touched on:

  • How the widespread availability of hand histories coupled with a demand for large hand history databases produced a black market for illicit hand histories shared/pooled across many users
  • How that same demand gave rise to a new species of software called hand history data miners: tools (such as the cleverly-named IdleMiner) whose only purpose is to "watch" one or more tables of poker and squirt out a textual hand history.
  • How those hand history data miners enabled the creation of independent and/or unauthorized "player tracking" sites like Sharkscope, OPR, PTR, and the PokerDB.
  • How the poker botting crowd learned to use hand history databases to bootstrap basic strategy and opponent modelling.

But I hope I've made it clear that the online poker user experience has, to a large degree, been shaped by the need to generate, transmit, store, parse, convert, analyze, anonymize, visualize, and broadcast the humble text file; and that this has created a sort of gravity which slowly and over time has pulled users into the orbit of PostgreSQL, a segue which is random, bizarre, slightly perverse, and completely awesome. Hand histories are the pieces of the online poker jigsaw puzzle. We were just waiting for the right database to put that puzzle together...

Tags: PokerStat, Hold'em Manager, PokerTracker, online poker, poker

44 comment(s)

Wow.... what a walk down memory lane. Fascinating stuff; I hadn't thought about rec.gambling.poker in years.

I have to disagree with your statement that PostgreSQL is the "ideal" database for hand histories. I think what happened is the PokerTracker guys started using Postgre pretty much randomly, and nobody gave MySQL a chance. I'd be interested in seeing some benchmarks, maybe duplicating the PT3 schema in a MySQL database so we can run side-by-side queries and get some hard data.

Im working on a pokertracker 'clone' at the moment, finding a decent portable database was proving difficult.. In fact I am still searching.. SQL Server Express is looking like the top choice, regardless of your reasons against it. Performance is still top notch, its incredibly well supported. Only negative is the data storage cap at 4GB, i dont know yet how many hands that will hold, I'm not quite at the research point yet.

SQLite looks promising.

PostGreSQL - meh maybe your right and it is a good choice.

Haven't read your whole post yet, its long and im at work. looks like a good read though.

Rock on.

additional: I just dont want to clog a users computer up with full database servers unless its absolutely necessary. Thats why access was ideal imo, its just a shame its a bit pants. thats why i'm looking at maybe SQLite. Its tiny (its just a file). No need to force a full database down a users throat. its a near fully functional RDBMS. Fairly quick (though i havent tested it with larrrge loads yet). Free, open source, etc, blah blah.

Awesome post!!

I'm in the process of trying to decipher the PT3 schema, and your diagram (from a prev post) has helped hugely. The only thing I would have liked to have seen in this post is some actual code. I thought I was pretty familiar with the evolution of PT3 but you brought up some details I'd forgotten.

Also glad to see you're a Jeff Vogel reader.Guy is my hero.

Your attitude is one of the reasons why Postgres is not accepted in the enterprise. I quote "But when you consider MySQL's web-centric history, its lack of transactional and foreign key support, and its relaxed attitude with regard to data integrity, PostgreSQL emerges as the clear winner". Are you deliberately trying to mislead, or are you just ignorant?

Fanboys like you convince me that MySQL will continue to dominate in the enterprise.

"There are only two databases that I know of which meet the above requirements" - Firebird fits the bill exactly and its embedded edition makes for very easy deployment.

Nice post. Your take on PostgreSQL vs MySQL was bound to generate some heat. For me personally I've worked with both and find pros and cons for both. It is mostly a matter of taste and personal preferences.

One correction though. You must be referring to John von Neumann at the end of the post, not Johnny.

"You wouldn't consider or even be aware of the thousands upon thousands of PokerTracker and Hold'em Manager users who'll flock to your database, learning its ins and outs, getting their hands dirty, occasionally even parlaying an end-user poker player's knowledge of a third-party online poker tool into a job as a full-time database administrator"

Yeah, wow.. this actually happened to me. Ironically, the client's got me working in Access.

The popularity of PostgreSQL as DBMS for handhistories is by no means just a matter of some alleged technological superiority over MySQL.

Let's not forget that Pokertracker, Holdem Manager etc is proprietary software, so they really don't have any other choice but to bundle with postgreSQL. If they were to ship their products with MySQL, they would either have to open-source their products according to the GPL, or pay hefty commercial license fees.

James,

JB mentioned SQLite. I have been using it a lot myself lately, and I like it a lot. Do you have any experience on this? (or other readers of this website) Not sure how unbiased that information is, but I remember seeing a section on the SQLite database doing performance comparisons, and it held its own quite well against all the usual suspects. Also, how big do these hand history databases get?

Cheers, Marc.

MySQL has matured a lot in the past few years, but at the time, PostgreSQL was the only serious choice for a complete SQL experience. Nowadays, MySQL is probably a reasonable option, but I'm not sure what it would offer that PostgreSQL doesn't already have. In other words, there's no reason to change.

Today, I'd be interested to see what SQLite could do. However, some thought would probably have to be put into using it with multiple front-ends -- if you have a substantial poker operation going on, you're probably going to be doing a lot of inserts from different clients at the same time. I don't know how big of a problem this would be in reality.

If I were to make an application from scratch, I would at least try SQLite. I have also noticed that Firebird comes in an embedded edition, perhaps that's also an alternative?

Bogdan's comment is right on the money. There are licensing issues with MySQL. MySQL commercial licenses are contracts with Sun. Not cheap. It had to be PostgreSQL.

Other than that, some great comments here. Fascinating stuff.

Favorite line in the post: "Everything is slow for large N." Not really true, but a funny inversion of the original Atwood quote.

When I wrote my poker tracking software that didn't quite make it past alpha, I was using SQLite. It was pretty good at doing what I needed.

In 2004 and 2005, I played a lot of on-line poker, mainly at Ultimate Bet. Part of the reason for that site choice is that you could (and maybe still can) pull 50 hands of history down after sitting at a table [i]for hands you weren't even in[i]. So after a couple of seconds for that data to process, I already had a decent chunk of stats for anybody who had been sitting there a while, before I made a single bet. In most cases, I had my custom aggressiveness indicator SQL showing up on a heads-up display over each player from the database before I saw their first raise I had to make a hard decision about.

Great fun, but I never got to where it made more than about $1/hour playing so I had to give that up for real work. Nowadays? I work on PostgreSQL...

While it's fine for small databases, I wouldn't expect SQLite would scale up properly to handle the volume of data from the largest hand histories around. What's nice about PostgreSQL is that it's possible to get it running on a small data set without taking up too many resources, yet it goes smoothly up to terabytes worth of data. The main thing to realize is that your entire database is a single file in SQLite, while PostgreSQL splits everything up into sections at most 1GB in size. It's a lot easier to find a single bit of data out of a large data set when it's broken up into multiple levels of smaller pieces for you. I have my doubts about whether the SQLite optimizer has enough information to work with to handle the kinds of decisions needed for complicated queries against large data sets as well.

Also, SQLite locks the entire database for every query. Since even a crappy player like me can make money running 4 tables at once, I wouldn't expect it to handle that very well, or even the two processes you'd expect a minimum--the background tracker application and the playing client are typically both active when you're playing. The site's own [url=http://www.sqlite.org/whentouse.html]Appropriate Uses for SQLite[/url] goes over why it's really not suitable for this job on a large database. You have to understand that as the size of the database grows, the amount of time to service even small queries goes up, which widens the window where locks are active for. That's certainly not usable if you're trying to build a real-time multi-hand playing bot based on a lot of history for example.

Firebird is a much more viable alternative to PostgreSQL for this use case. I suspect you'd still see a performance regression there relative to Postgres for large hand databases with multiple tables running, but I don't have any hard data to support that opinion.

I developed a simple .NET/C# hand history tracker app with SQLite as a backend, so it's definitely possible. [url=http://tervola.com/DonkeyTracker]This app[/url] seems to use it as well. However, what I did was basically to import 500MB of hand histories, gathering them in a single datbase. I.e I did not infer any fancy statistics like V3BA$OSTC*. The size of the database after the import was roughly 400MB, which in my case gives that a 4GB database could store 2.8 million hands. Maybe it could taken as a hint to the max capacity of a SQL server express DB. Speaking of which: There is also a SQL Server Compact edition, an embedded SQL server (not the same as express edition) which is probably worth looking into if you're on the .NET platform.

PS: And if you are a real poker player, you ought to know that V3BA$OSTC reads "Villain's-average-3bet-amount-on-scary-turn-card" ;)

Maybe someone already commented about it, but MySQL's embedded version requires purchasing a license. It's only relevancy to the discussion is to point out that it's not relevant to the discussion :) http://www.mysql.com/about/legal/licensing/oem/

Sqlite has also recieved a lot of mention here. However, since the application domain requires multi threading (import hands while running ad hoc queries and updating the HUD), it's probably not the best choice. http://www.sqlite.org/cvstrac/wiki?p=MultiThreading

Chris, you seemed to have come to the opposite conclusion of what that SQLite page describes. The page even links to a Java library that does the handling required for multi-threading.

You forgot one, James. And it's really a ´goldmine you forgot about. I am talking about PokerEdge. Any serious pokerbotter should check it out!

Hi James. I couldn't find a way to email you, and figured you'll be reading the comments. I've been following your blog for a couple of months now, and really like it. I wanted to speak with you about a few things. Would certainly appreciate an email. Thanks!

Great link to that EVE article, James. Two great reads in one day!

@ Bogdan

"V3BA$OSTC" is a great stat! I don't think PokerTracker understands "Scary Turn Card" though :P

Comments like that are why comment permalinks were invented... ;-) which I notice you don't have.

So wait, is SQLite an EMBEDDED database or EMBEDDABLE database?

http://www.sqlite.org/selfcontained.html

Neither. Its a SQL ENGINE. Doesnt have a server process. Doesnt have external dependencies. So sure it's embedded but it's also OPEN. As opposed to proprietary/closed embedded DBs like SQL Server CE.

James I agree that if you're going to use an RDBMS, it should be open. But I think the fact is that the RDBMS is sort of dying, and it was never really alive for text-based applications to begin with. I'm sure you've seen some of the chatter around this:

http://cacm.acm.org/blogs/blog-cacm/32212-the-end-of-a-dbms-era-might-be-upon-us/fulltext

Why must hand histories be stored relationally? Does the data lend itself to that? Why note some sort of full text or OO database? Why not a column store or a key/value aggregating system?

I wrote a tracking software which uses SQLite. This database has many advantages like good performance and it doesn't requere a installation. [url=http://www.sergeant.org/sqlitevspgsync.html]SQLite vs PostreSQL[/url]

The reason im not keen on bundling postresql (or any installed database service) is because i want to avoid clogging a users pc up with extra software and processes. James I understand your view on allowing users to customise and write there own queries using the database, which for some users is great. For other users, I dont think they dont know or care. The masses just want the information to be given to them, not to create it themselves.

I've not used postre before, so I dont know how intensive it is, but i know SQL server takes up a lot of RAM, even when idle. Its not uncommon for me, an hour after i've finished using the db, it will still be taking up ~1GB+ of my precious RAM, when all its doing is sitting there waiting for someone to connect. I know you can change this, but thats not the point.

If I install a prog like poker tracker, i dont particularly want it to come with a huge database. I like to keep hold of whats installed on my box. Whats next - yahoo toolbar? Thats the reason I refuse to install iTunes, because i dont want quicktime. And realplayer because i dont want all the crap it comes with.

Maybe I'm alone in thinking this.

Or maybe i'm wrong about PostreSQL, maybe its a beautiful app which shuts down easily, uses minimal system resources while still delivering good speed.

In my day job I'm working as a developer of a column-oriented in-memory data store which is used for fast aggregation of BI data. The data model is mostly a star schema, I wonder if this could be used for hand histories? Fast in-memory aggregation could come in very handy there. The downside is that it can require some GB of main memory, so it would probably need a dedicated machine. But it should outperform traditional databases in all cases where there is no appropriate index for a query.

Fascinating post. I enjoy the occasional longer piece of writing, so much on the web nowadays is packaged for instant consumption.

One thing I disagree with is that there's a clear advantage to using PostgreSQL. If these databases are only 3 or 4 GB in size, what does it matter? Any half-decent DB will work fine unless I'm seriously underestimating the performance requirements here.

To JB and others who are considering coding a tracker. Consider contributing code to fpdb. It is an open source tracker/HUD, written in python, that has about 6 semi-active developers. http://fpdb.wiki.sourceforge.net/

As far as PostgreSQL v MySQL: I think the big reason that the PokerTracker boys went with PG way MySQL's dual liscensing thing. At the time you had to buy a MySQL liscense if you wanted to "embed" MySQL in your app and they took a rather broad view of embed.

fpdb supports both MySQL and PG, and SQLite will be supported fairly soon. I think that SQLite will be perfectly adaquate for the less-than-professional poker player who has less than say 100-200k hands in his database.

RB: is there a version which is easy to use for non-programmers?

"RB: is there a version which is easy to use for non-programmers? " Easy to use isn't quite there yet. I think it is pretty easy to install and use on XP or Linux. Difficult on Vista, until someone writes up an Howto. There is lots of install and setup info on the wiki: http://fpdb.wiki.sourceforge.net/

Been using PokerOffice for a few years ... it runs MySQL

Hey James, hello from a fellow RGPer. I was curious if you ever posted about poker botting related stuff on RGP. I remember seeing a few threads on the subject...back in the day.

minor correction, you wrote: And then in 2007, a guy by the name of Roy Goncalves released the beta for Hold'em Manager, the first serious challenger to the PokerStars throne.

when i highly suspect you meant pokertracker.

great article tho, good read as always.

The popularity of PostgreSQL as DBMS for handhistories is by no means just a matter of some alleged technological superiority over MySQL. Let's not forget that Pokertracker, Holdem Manager etc is proprietary software, so they really don't have any other choice but to bundle with postgreSQL. If they were to ship their products with MySQL, they would either have to open-source their products according to the GPL, or pay hefty commercial license fees.

I cannot get enough online poker. I don't play for real money (my wife would kill me) I just play for fun but I love it. Once I get home from work I will usually spend an hour or so playing then I'll study Jiu Jitsu at the Indianapolis Jiu Jitsu academy I go to, then I go to bed. It's a busy life but I love it!

No need to force a full database down a users throat. its a near fully paroxetine dosage functional RDBMS. Fairly quick (though i haven't tested it with large loads yet).

クリントン米国務長官?4日?ミャンマー?国会補?実施???民主化努力を評価???国?対?る金?サービス投資?止や政府高官?渡米?止??一部制??緩和措置を?る用???る?発表??。????ミャンマー米大使を近??指???る?明ら????。  民主化?動指導者アウンサンスー?ー????率?る国民民主連盟(NLD)?補??圧?????を???制?緩和??手?る姿勢を示?????一層?民主化を促???狙?。???緩和?対象を?定??全政治犯釈放や北?鮮???事?力?止を???改?を推進?るよ?圧力を維??る方?も示??。

i Agree in Firebird fits the bill exactly and its embedded edition makes for very easy deployment. Thanks!

Nicholas, that's cool that you study BJJ. I actually do kickboxing at an Indianapolis Muay Thai Academy and absolutely love it! I have yet to really get into BJJ but I hope to try it out sometime soon. I watch UFC and would love to be able to better understand the ground game.

Use the form below to leave a comment.






Share This Article

Coding the Wheel has appeared on the New York Time's Freakonomics blog, Jeff Atwood's Coding Horror, and the front page of Reddit, Slashdot, Digg.

On Twitter

Thanks for reading!

If you enjoyed this post, consider subscribing to Coding the Wheel by RSS or email. You can also follow us on Twitter and Facebook. And even if you didn't enjoy this post, better subscribe anyway. Keep an eye on us.

Question? Ask us.

About

Poker

Here there be Code. And poker. And technology, arrrr. And games. And...


Hire

You've read our technical articles, you've tolerated our rants and raves. Now you can hire us anytime, day or night, for any project large or small.

Learn more

We Like

Speculation, by Edmund Jorgensen.