Extension-less URLs: the Best Practice that Time Forgot

Tuesday, April 01, 2008

I'm going to say this as simply as I can: web pages shouldn't have file extensions. Ever!*

The Internet is full of best practices: how to structure your web pages, how to handle redirection properly, how to design friendly URLs. For example, we know that the following URL...

      http://www.somesite.com/products/videogames.htm

...is better (in every way) than this one:

      http://www.somesite.com/products.htm?category=234398304&sessionid=029384029348

But both URLs suffer from another problem: they contain a file extension. Including a file extension in your public URLs isn't a best practice, it's a worst practice - or at least, a worse practice. And it's an epidemic one. Today, the vast majority of web pages serve .htm files, .aspx files, .jsp files - whatever the flavor of the month happens to be. And it works because HTTP is largely an agnostic protocol: it doesn't much care about things like file extensions, or even files. In fact, the following is a perfectly valid HTTP request:

      http://www.somesite.com/marypoppins.supercalafrajalisticexpialadocious

Does it point to a file of type "supercalafrajalisticexpialadocious"? Or some other kind of resource, perhaps one that's generated from a database? HTTP doesn't know and it doesn't care. It simply requests a particular resource, and receives a response.

So, if HTTP doesn't care about file extensions, and browsers don't (usually) care about file extensions, why should we? What's so bad about having an innocuous little ".htm" or ".aspx" in our URLs?

For starters, let's look at what Tim Berners-Lee said on the subject in a classic article I've quoted before:

 

What to Leave Out [of your URLs]

 

Everything! After the creation date, putting any information in the name is asking for trouble one way or another.

  • .........
  • File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.(how?)
  • Software mechanisms. Look for "cgi", "exec" and other give-away "look what software we are using" bits in URIs. Anyone want to commit to using perl cgi scripts all their lives? Nope? Cut out the .pl. Read the server manual on how to do it.
  • .........

And it's a valid point - who knows how long any of these technologies, with their specific and often proprietary file extensions, will be around? Even the sacrosanct .HTM extension can go extinct - for example, as more and more people switch to ASP.NET, JSP, PHP, or other dynamic content technologies. And if .HTM files can go extinct, you'd better believe that ASPXs, JSPs, and PHPs can too.

Nor is URL longevity the only thing at stake. Perhaps you don't think your pages will be around in twenty year's time, or a hundred. Maybe you don't care. But consider some of the other potential downsides:

File extensions are ugly. There's no reason in particular to confront your users with them. Sure, everybody knows what an .HTM file is. But how about .ASHX or .JSP? Do we expect every user to somehow understand Aha! JSP, that's a JavaServer Page! They're serving dynamic content using JSP! Yeee haw!

File extensions are irrelevant. A file extension contributes zero useful information. It doesn't help identify a particular page, or distinguish it topically from other pages. It doesn't extend your site's identity or branding in any way. It's an irrelevant implementation detail hacked onto the end of what could otherwise be a truly clean URL.

File extensions give away implementation details. In the lingo of object-oriented analysis and design, we'd say that file extensions violate encapsulation. They tell the world, "this is the technology I'm using, underneath the hood."

File extensions make life difficult if you ever decide to switch technologies. Let's say you've assembled a content-rich website with hundreds or thousands of pages, all tagged with a .jsp extension. You spend countless hours marketing your site, getting people to link to you, establishing a position in the search engines. Until one day, you decide it's time to make the leap to ASP.NET (or any other technology). Now you're faced with two ugly choices:

  • Change the extension of all URLs from ".jsp" to ".aspx", invalidating all your incoming links, wreaking havoc with your search engine ranking, and probably forcing you to 301-redirect every page on the site to the "new" version.
  • Somehow get ASP.NET (or whatever technology you're working with) to work with .jsp files. This is possible, if you know what you're doing, but is it clean? Can it be considered anything other than a kludge to make your ASP.NET-generated content masquerade as a .JSP?

File extensions, in other words, are EVIL. By avoiding them, we not only renew our allegiance to all things good, we separate ourselves from the millions of sites that have hardcoded their allegiance to a particular technology. There's a reason why Wikipedia and the W3C, among others, use mostly extension-less URLs. We could do worse than to follow their example.

Thanks for reading, and remember, the Devil's in the details.

* - Okay, well not necessarily never. Usually, though. All other things being equal.

Tags: URL Rewriting, IIS, SEO, ASP.NET

15 comment(s)

Your search page (www.codingthewheel.com/search.aspx) has an .aspx extension bad practice! Bad practice! :p

See you finally got around to implementing them (extless rewrites) yourself, goo' job! wouldn't have anything to do with the new IIS 7 execution pipeline would it? ;-) Or did you go the ISAPI Filter route?

How are you handling your form tag's action attribute? Ie, even if the URL is extensionless, doesn't the form's action attribute still point to the .aspx. extension?

Please do tell us how did you manage to remove the aspx extension on your blog.

James, my site is php powered (own crappy blog engine) :D ,but i'd like to switch to dotnetblogengine .

Are you running on a VPS or shared account?

Why do you still use extensions for images? http://www.codingthewheel.com/pics/smile/smiley-laughing.gif

How do you know you will still use GIF in 20 years?

http://www.codingthewheel.com/image.axd?picture=evilextension.gif

btw, you have a redirect loop in your how? link: http://www.codingthewheel.com/admin/Pages/#remove

Hey, let's remove extensions on image media like .jpg. I know, when I release my new software in self extracting exe and zip format, I'll remove the file extensions there too. Just kidding. Good article.

Great article James Devlin.

I recently decided to remove page extensions on my sites. They are pointless.

I also decided to go without 'www'. Nowadays having 'www' or not is purely a preference, 'www' is in fact depreciated. 'www' used to depict an internet server, while without 'www' would be an intranet server.

'www' is defined by: '[i]The complete set of documents residing on all Internet servers that use the HTTP protocol, accessible to users via a simple point-and-click system.[/i]'

Web browsers nowadays automatically prepend 'http://' onto the requested URL. 'www' is now just a useless subdomain. 'www' can be for offline advertising, when someone hears 'double-you-double-you-double-you' they immediately know you're talking about a website. This is fine, as you can 301 redirect all requests for www.example.com to example.com.

Other than that, the only reason to keep www dot is if you have a large established site with a high pagerank. Your backlinks may become invalid if you go from with-www to without-www, or vice versa.

@Mike: You're missing the point. Hiding extensions like .asp, .php, .htm, etc, is perfectly valid as the produced content of those pages are html. In the end, regardless of page extension, you're producing an html page. An image could be a gif, jpeg, png, etc, the extension conveys meaning.

[url=http://javex.org]javex.org - Security & Network Troubleshooting Tools[/url]

There is actually a website dedicated to [url=http://extensionless.org]Extensionless URLs[/url].

Does this work with PHP? I dislike ASP. I have to switch to IIS because the WebCP software is ASP Based. I know php and IIS work, But how about extensionless php files?

丰胸

Another advantage: visitors will no longer have to REMEMBER the extension.

Another advantage: visitors will no longer have to REMEMBER the extension.

Use the form below to leave a comment.






Coding the Wheel has appeared on the New York Time's Freakonomics blog, Jeff Atwood's Coding Horror, and the front page of Reddit, Slashdot, Digg.

On Twitter

Thanks for reading!

If you enjoyed this post, consider subscribing to Coding the Wheel by RSS or email. You can also follow us on Twitter and Facebook. And even if you didn't enjoy this post, better subscribe anyway. Keep an eye on us.

Question? Ask us.

About

Poker

Coding the Wheel =
Code, poker, technology, games, design, geekery.


Hire

You've read our technical articles, you've tolerated our rants and raves. Now you can hire us anytime, day or night, for any project large or small.

Learn more

We Like

Speculation, by Edmund Jorgensen.