Avoiding Duplicates in Feedburner and Google Reader

Wednesday, July 27, 2011

Like many sites, we use Feedburner to handle RSS and email subscriptions. Depending on the needs of your site, there are some good reasons to do this—particularly if bandwidth is a concern. But if you're not careful (or if your web developer is this guy), you can trick Feedburner into thinking your feed contains duplicates. If readers subscribe to that feed by email, Feedburner can end up sending them multiple copies of each post, effectively spamming them with your content.

Case In Point

A few weeks ago Feedburner started sending out duplicate emails to people on the Coding the Wheel subscription list. Instead of getting one email per post, subscribers were getting one email per post per day. Well, that's annoying. Certain unlucky readers got seven or eight copies of stuff like A Platitude on Software Failure. Gah!

So a sincere and humble apology to the folks who had to put up with that (if any of you are still reading). Coding the Wheel is a small site, so it's not like hundreds of thousands of readers were affected. But the readers we do have hate spam the way hipsters hate hipster bear traps. When we started getting friendly "WTF" emails...

I've gotten more than a half dozen copies of your latest post so far, typically being delivered around 6am-6:30am daily. Didn't get any for a week and figured you had issues worked out but got another this morning.

Cheers!

...we threw the site into lockdown mode, battened the digital hatches, and brought out the gimp. Er, the source code.

What Went Wrong

A lot of people have had duplication issues on Feedburner as well as in Google Reader, both of which are owned by Google, so it's temping to simply Blame Google on this. Channeling Jason Calacanis for a second, we could've titled this post:

Feedburner Duplicates Issue Can Decimate Subscribers, Topple Brands, and Impregnate Your Dog.

But that would be a cheap way to repay Feedburner for years of free and mostly reliable service, wouldn't you agree? Plus, it would rob us of yet another chance to point out our own dumbassery in the area of programming. Here's how we Michael Bolton'd it this time:

  • By omitting the <guid> element in the raw XML feed. This didn't affect previous versions of the feed because Feedburner was able to determine uniqueness from other elements. (Per the spec, <guid> is an optional element. But just because it's optional, doesn't mean you shouldn't include it.)
  • By changing the meaning of the <pubDate> element. In prior versions of the feed, <pubDate> was filled with the original publication date of the post, which never changed. We modified it to store the last-officially-modified date of the post, which sometimes changed.

RSS consumers (like Feedburner) have to determine the uniqueness of items in a feed somehow. If you don't provide a <guid>, you force them to infer uniqueness from things like <title>, <link>, and/or <pubDate> elements. If you then allow those elements to change, Feedburner can decide:

Hey! The timestamp changed! And there's no <guid>! This must be a new post. Let's send it out! What's that, timestamp changed again? Must be a new post! Send it out! And again! And again! And again!

The long and short of it is that it's possible to confuse Feedburner if you play fast and loose with notions of uniqueness. Now let's talk about how to un-confuse Feedburner.

How To Avoid Feedburner and Google Reader Duplicates

  1. Always include a <guid> element. Well duh, right? Declarative uniqueness is a Very Good Thing for feeds which get repurposed and transmitted to hell and back across the web and back again.
  2. Enforce a clean <pubDate> element. Because it's sometimes used as a variable in the uniqueness equation. It should reflect the date your post was published, and shouldn't change unless the post itself is updated in a way that will be visible to readers.
  3. Enforce unique and unchanging <title> and <link> elements. Ideally, these should never change for a given post.
  4. Make sure your XML is valid and well-formed, to the extent you can, given the quirks and idiosyncracies of RSS/Atom.
  5. Enforce a canonical version of your originating feed with a proper 301 redirect, even if it seems redundant or unnecessary, and even if Feedburner is the only party that's ever going to access the feed. If you visit http://codingthewheel.com/syndication.axd (our private, internal feed, used only by Feedburner) you'll be 301-redirected to http://www.codingthewheel.com/syndication.axd.
  6. Subscribe to your own Feedburner feeds on a Gmail account as well as a couple different offsite email accounts.
  7. Consider using another service for email subscriptions. Feedburner's email support is a quick-and-dirty but effective way of repurposing your RSS feeds as emails. But Feedburner isn't really in the business of top-shelf email fulfillment, and it gives you less control over email subscriptions than you can get from other services.
  8. All the cool kids 302-redirect their public feed to Feedburner. Advertise your sovereign feed (http://yoursite.com/yourfeed) to your human readers. When they click it to subscribe, issue a 302 redirect to http://feeds.feedburner.com/yoursite. [This isn't really related to the duplication issue, but as long as we're spouting off Feedburner tips and tricks...]

After applying these changes to our feed, not only did the email duplicates stop, the harmless-but-annoying duplication of posts in Google Reader stopped as well:

In other words, Coding the Wheel finally stepped up to join the ranks of the big boys, also known as "sites who don't screw up their RSS feeds and piss off their readers". (Granted, most sites use an established CMS. We built our stuff from scratch.) So whether or not RSS is dead, at least we can take comfort from the fact that our RSS isn't dirty.

Tags: Atom, email, Feedburner, RSS, spam, code

6 comment(s)

This one only came in once so far for me. Working correctly, I think. I was receiving multiples. A bit of a hassle, but not enough to forgo the useful content you provide...

whats up man. feedburner should have better safeguards in place. an article WITH THE SAME TITLE shouldnt be sent and resent...seems careless [on their part]. shouldn't need to specify a guid and 4 other kinds of uniqueness so they won't spam subscribers. if title and link don't change = same piece of content. at least when it comes to email. that's common sense. there should be a default behavior here which is to err on the side of not spamming.

another thing that bugs me about feedslurper - they pulled down the docs on 302 redirecting user feeds. something tells me the reason is they don't want users having independent feeds. they know if sites just hand out "feeds.feedburner.com/whatthefuckever" as their subscription link, sites will depend on the service forever. so they disappeared any mention of the technique. just like jimmy hoffa

Thanks for the information.

It is indeed my great delight to think about your website also to appreciate your current fantastic content right here.I favor that quite definitely.

I haven't been blogging long and still learning, so I am having a tiny (ok a lot ) of difficulty absorbing your tips. It's not you, it's me. Anyway, feedburner sent out 2 of the same post 20 minutes apart today. Trying to figure out why. Last year it randomly sent out a very old post out of nowhere. I deactivated and then reactivated my feedburner email and it didn't happen again. (I read that as a tip from somewhere). Would that work here?

Use the form below to leave a comment.






Coding the Wheel has appeared on the New York Time's Freakonomics blog, Jeff Atwood's Coding Horror, and the front page of Reddit, Slashdot, Digg.

On Twitter

Thanks for reading!

If you enjoyed this post, consider subscribing to Coding the Wheel by RSS or email. You can also follow us on Twitter and Facebook. And even if you didn't enjoy this post, better subscribe anyway. Keep an eye on us.

Question? Ask us.

About

Poker

Coding the Wheel =
Code, poker, technology, games, design, geekery.


Hire

You've read our technical articles, you've tolerated our rants and raves. Now you can hire us anytime, day or night, for any project large or small.

Learn more

We Like

Speculation, by Edmund Jorgensen.