Extension-less URLs: the Best Practice that Time Forgot
Tuesday, April 01, 2008   

I'm going to say this as simply as I can: web pages shouldn't have file extensions. Ever!*

The Internet is full of best practices: how to structure your web pages, how to handle redirection properly, how to design friendly URLs. For example, we know that the following URL...

      http://www.somesite.com/products/videogames.htm

...is better (in every way) than this one:

      http://www.somesite.com/products.htm?category=234398304&sessionid=029384029348

But both URLs suffer from another problem: they contain a file extension. Including a file extension in your public URLs isn't a best practice, it's a worst practice - or at least, a worse practice. And it's an epidemic one. Today, the vast majority of web pages serve .htm files, .aspx files, .jsp files - whatever the flavor of the month happens to be. And it works because HTTP is largely an agnostic protocol: it doesn't much care about things like file extensions, or even files. In fact, the following is a perfectly valid HTTP request:

      http://www.somesite.com/marypoppins.supercalafrajalisticexpialadocious

Does it point to a file of type "supercalafrajalisticexpialadocious"? Or some other kind of resource, perhaps one that's generated from a database? HTTP doesn't know and it doesn't care. It simply requests a particular resource, and receives a response.

So, if HTTP doesn't care about file extensions, and browsers don't (usually) care about file extensions, why should we? What's so bad about having an innocuous little ".htm" or ".aspx" in our URLs?

For starters, let's look at what Tim Berners-Lee said on the subject in a classic article I've quoted before:

 

What to Leave Out [of your URLs]

 

Everything! After the creation date, putting any information in the name is asking for trouble one way or another.

  • .........
  • File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.(how?)
  • Software mechanisms. Look for "cgi", "exec" and other give-away "look what software we are using" bits in URIs. Anyone want to commit to using perl cgi scripts all their lives? Nope? Cut out the .pl. Read the server manual on how to do it.
  • .........

And it's a valid point - who knows how long any of these technologies, with their specific and often proprietary file extensions, will be around? Even the sacrosanct .HTM extension can go extinct - for example, as more and more people switch to ASP.NET, JSP, PHP, or other dynamic content technologies. And if .HTM files can go extinct, you'd better believe that ASPXs, JSPs, and PHPs can too.

Nor is URL longevity the only thing at stake. Perhaps you don't think your pages will be around in twenty year's time, or a hundred. Maybe you don't care. But consider some of the other potential downsides:

File extensions are ugly. There's no reason in particular to confront your users with them. Sure, everybody knows what an .HTM file is. But how about .ASHX or .JSP? Do we expect every user to somehow understand Aha! JSP, that's a JavaServer Page! They're serving dynamic content using JSP! Yeee haw!

File extensions are irrelevant. A file extension contributes zero useful information. It doesn't help identify a particular page, or distinguish it topically from other pages. It doesn't extend your site's identity or branding in any way. It's an irrelevant implementation detail hacked onto the end of what could otherwise be a truly clean URL.

File extensions give away implementation details. In the lingo of object-oriented analysis and design, we'd say that file extensions violate encapsulation. They tell the world, "this is the technology I'm using, underneath the hood."

File extensions make life difficult if you ever decide to switch technologies. Let's say you've assembled a content-rich website with hundreds or thousands of pages, all tagged with a .jsp extension. You spend countless hours marketing your site, getting people to link to you, establishing a position in the search engines. Until one day, you decide it's time to make the leap to ASP.NET (or any other technology). Now you're faced with two ugly choices:

  • Change the extension of all URLs from ".jsp" to ".aspx", invalidating all your incoming links, wreaking havoc with your search engine ranking, and probably forcing you to 301-redirect every page on the site to the "new" version.
  • Somehow get ASP.NET (or whatever technology you're working with) to work with .jsp files. This is possible, if you know what you're doing, but is it clean? Can it be considered anything other than a kludge to make your ASP.NET-generated content masquerade as a .JSP?

File extensions, in other words, are EVIL. By avoiding them, we not only renew our allegiance to all things good, we separate ourselves from the millions of sites that have hardcoded their allegiance to a particular technology. There's a reason why Wikipedia and the W3C, among others, use mostly extension-less URLs. We could do worse than to follow their example.

Thanks for reading, and remember, the Devil's in the details.

* - Okay, well not necessarily never. Usually, though. All other things being equal.


Posted by James Devlin   7 comment(s)

SEARCH

COMMENTS

Your search page (www.codingthewheel.com/search.aspx) has an .aspx extension bad practice! Bad practice! :p

CodifyTheChad on 4/2/2008 3:29:48 PM (94 days ago)

See you finally got around to implementing them (extless rewrites) yourself, goo' job! wouldn't have anything to do with the new IIS 7 execution pipeline would it? Wink Or did you go the ISAPI Filter route?

Keith on 4/3/2008 11:19:00 AM (93 days ago)

How are you handling your form tag's action attribute? Ie, even if the URL is extensionless, doesn't the form's action attribute still point to the .aspx. extension?

Anonymous on 4/17/2008 8:00:46 PM (78 days ago)

Please do tell us how did you manage to remove the aspx extension on your blog.

Mike on 4/27/2008 3:28:53 AM (69 days ago)

Mike: your site already has the .aspx extension removed! Right?

I did it by upgrading to IIS 7, enabling Integrated Pipeline mode, and then tweaking my URL rewriter (HTTP module) so that /archives/somepage gets rerouted to /archives/page.aspx?id=1392. Well, that's the idea. In practice my internal scheme is a little different than your typical "fetch a page by querystring id" technique, but it's the same principle.

In order to make sure that my internal .aspx extensions are never visible to users, and prevent URL aliasing, I also hard 404'd (alternately, if you have a lot of outstanding links you could 301 redirect) all incoming requests with an .aspx extension.

And there were a couple other details, such as rewriting the form tag so it doesn't submit to an .aspx extension on postback.

How did you implement it over at wakeIM, if I may ask?

James Devlin on 4/27/2008 3:25:16 PM (69 days ago)

James, my site is php powered (own crappy blog engine) Laughing ,but i'd like to switch to dotnetblogengine .

Are you running on a VPS or shared account?

Mike on 4/28/2008 6:09:39 AM (68 days ago)

Why do you still use extensions for images?
www.codingthewheel.com/.../smiley-laughing.gif

How do you know you will still use GIF in 20 years?

You may not be using GIF for that page in 20 years time on 5/9/2008 5:15:50 PM (56 days ago)

Comment on this post:

Thanks for your interest in Coding the Wheel. All fields are optional.