Django 0.96 released

The 0.96 version of Django has just been released. Quoting the announcement from the Django weblog,

The primary goal for 0.96 was a cleanup and stabilization of the features introduced in 0.95. The release notes cover the few backwards-incompatible changes, but for most people the upgrade process should be simple.

Unessa.net has been running on the SVN version for a while now and I’m very excited about the next phase that will be starting now that 0.96 is out of the door. The last push for 1.0 will include some major backward-incombatible changes. Most, if not all, of these are described in Version one features wiki page. Among others, these features include the newforms-library that was introduced in 0.96, completely rewritten admin framework based on the newforms-library, generic relations and lots of other interesting stuff.

Congrats to all Django-devs for this release!

PS. “So when will Django 1.0 be released?” – It’ll be ready when it’s ready.

Healing Growing Pains With Django

When Unessa.net launched back in 2000, the interweb looked quite a bit different. During this time Unessa.net has changed a lot, too. I wrote earlier about taking the jump to a Django-powered site. Here are some experiences from the journey so far.

At first there were only a bunch of static HTML pages. Then came the server-side includes, and soon after that came PHP. At first it was .php3, then .phtml, eventually just .php. In 2006 I moved the site to dedicated server and finally got some long lusted Django-love going on. So now everything is finally good, right? Well, not quite. Unlike most of other sites seem to do, I don’t want to lose the old stuff. I hate linkrot and even more I hate sites that delete (good or bad) content just because it’s convenient to do so. This means that I’m now stuck with this great server full of ancient sh*t that needs to be taken care of. (No, not that way.)

Evolving URLs

After learning Django and Python for about a year now, I’m beginning to understand how great these tools really are. Firstly, Djangos URL dispatcher is fan-freakin’-tastic. It’s very easy to set redirects to old URLs and make custom (and smart) 404 handlers to different parts of the site. It’s also very easy to get realtime information about possible broken links, which is important. Writing custom views to handle legacy URLs semi-smartly is an easy way to get rid of old crufty URLs.

For example, I had a Movable Type installation for my mobile photoblog from 2003 that had URLs like /photoapp/archives/2003/06/28/foo.php. Being perfectionist about URLs, I wanted to evolve these pages to something like /photoapp/2003/06/foo/, which is more logical, much shorter and cruft-free. I exported the MT-data to a new Django app, wrote short URLconf for the old URLs and a view that looks something like this:

 def oldphoto_redirect(request, year, month, day, slug):     """     Redirects old MT-URLs to new format "smartly".     If a correct match is not found, raise a 404.     """     try:         photo = NewPhoto.public_objects.get(date_taken__year=year, date_taken__month=month, date_taken__day=day)         return HttpResponsePermanentRedirect('/photoapp/%s/%s/%s/' % (year, month, photo.photo_id))     except NewPhoto.DoesNotExist:         # If no entries match, raise 404         raise Http404     except AssertionError:         # If many entries match         raise Http404

Now 95% of the old URLs are redirected automagically to new URL, with a correct HTTP status code (301). The rest five percent of the cases are URLs that have more than one post in a single day. They will get a custom 404 page that explains why the pages are moved, where they are, and that I have been informed about this 404. When I get a 404 email from Django, I’ll add these few URLs manually to URLconf. This system healed itself in less than a week. Oh, joy!

(Unfortunately this site is almost entirely in Finnish, but it shouldn’t stop you from browsing trough the new photo site that has been knit together from two different photoblogs, added to Flickr, fully tagged, and enhanced in many ways. Among other things, the new app includes full Flickr catalog duplicated locally on the server, automatic synchronization with Flickr, automatic resizing of images in various sizes (see homepage), favourites and ratings for logged-in users, livesearch feature and much more.)

Harmonizing static content

Second great thing about Django and Python is, well, Python 🙂 There are tons of great libraries for Python. One that I’ve totally fallen in love with is Beautiful Soup. In addition of various PHP and Perl powered dynamic parts of old Unessa.net, there are also hundreds of statical pages that I don’t want to keep on the main site anymore. I’ve started to archive these pages to a dedicated archive, and at the same time I’m officially washing my hands about keeping those pages up to date. As a perfectionist, I want to tell this to my visitors too. But I’m just not going to edit hundreds of these files manually.

I’ve been playing with an idea that I’d process all these static HTML-files with a python script that would do something like:

Add a note about archive status (something like “This page has been archived for historical reasons and is no longer maintained. Current content can be found from the front page.“) in a DIV right after <body>-tag
Parse SSI-includes into the page
Check for broken links and fix all trivial internal links
Convert old image links from /images/* to http://images.unessa.net/*
Validate the final output

This all would actually be fairly easy to do with little help from Beautiful Soup and some mind expanding regexps. And how cool would it be to have over seven years worth of archived static content, all valid and with no broken internal links 🙂

To be continued…

Unessa.net is a personal site and a hobby. These kind of things are fun to do. The best part is that sometimes it’s even more fun to do it for paying customers, in more complex projects and under a strict schedule.

I’ll keep on reporting on my progress with Djangofying Unessa.net. My goal is to have the whole site on Django (meaning that all dynamic data is served by Django and all the other data is archived in some way) by the end of this year. At the moment I’m about 40% there so there’s definitely a lot more to do. If you have any comments or ideas, please share them!

Design Your URLs

URLs are the most important, and yet often least designed, part of your website. No matter if it’s a cool 2.0 application or plain vanilla homepage, it’s the URLs that lead the visitors to it. Why is it that almost nobody seem to know (or care about) how to design them?

Any web developer should know that URIs are a part of the sites user interface. Usability guru Jakob Nielsen reminded us about this back in 1999, but very few of us seem to remember.

Learning from bad examples

Here’s a typical example of a very bad URL:

http://www.turunsanomat.fi/viimeuutiset/ts/?ts=0,3:2025:0:0,4:26:0:0:0;4:27:0:0:0;4:28:0:0:0;4:29:0:0:0;4:30:0:0:0;4:31:0:1:2007-01-06;4:35:0:0:0,0,1:0:0:0:0:0:#429985

This is so ugly that it’s impossible to say anything good about it. It makes me want to cry.

Apple usually gets things like these right. Apple.com is generally very good regarding to URLs, but then there is the Apple Store. I’ll buy you a beer if you can read this to your grandma over the phone:

http://store.apple.com/Apple/WebObjects/fistore.woa/6994044/wa/PSLID?mco=E2DBC1A0&nclm=iPod&wosid=ci7FXn07nrcd2jXw2kj1tYj6QqG

Do you know where it points to? Of course not. But guess what’s the best part of it? It doesn’t work. Try it yourself.

As an example of a half-way designed URL, lets look at

http://shop.digitalworldtokyo.com/index.php/shop/product/full_set_of_three_usb_humping_dogs/

This is a good example of an URL as UI. You may not want to visit that page after seeing the URL. But this is not a good URL. First, why on earth is there an /index.php/ in the middle, even though it most definitely could have been omitted?

Also, the word ‘shop’ occurs twice in the URL. And ‘product’ is not useful here in any way, so why display it? Instead there could be some category reference in the URL so that products in different categories would be distinguishable.

With these tweaks a good URL for this product could be something like:

http://shop.digitalworldtokyo.com/usb/full_set_of_three_usb_humping_dogs/

How to design good URLs

There are many preferences of how to build a good URL structure. The most important thing is to be consistent. The small details are not as important as that they are consistent as a whole.

Often the URL structure of a companys web site is similar to the inner structure of the company. This is a very basic mistake that you should avoid. While it may be quite easy for you to understand the inners of your company, it’s not something that your customer knows, or even should know. Try to be as descriptive and simple as you can when designing the URLs.

A good URL:

is short
gives user a hint about the content (ie. use /products/product_name/ instead of /c/1/)
tells user where he/she is (ie. /support/product_name/faq/)
is hackable (ie. if faq is at /support/product_name/faq/, then /support/product_name/ shows the support main page for this product and /support/ is the main support page. Also, if you search faq for product2, you should find it at /support/product2_name/faq/)
does not have any session data or other cruft in it
does not expose the underlying technology (like foo.php and bar.aspx does)

How to redesign URLs

While Cool URIs don’t change, it’s inevitable that sometimes you have to change something. As a matter a fact, in my opinion, you should change the URLs as your organization changes. But when you do, you should do it very wisely.

Here are some tips:

Avoid linkrot with any means necessary. With every changed URL, redirect the old URL permanently (using HTTP status code 301) to the new one. Automate the process if you can.
Just in case, use a good 404 page that informs users about recent site redesigns, so that in case of a missed redirect, they can locate the moved page easily.
Design to future. Try to avoid situations where you have to change URL designs more often than yearly.
Use temporary redirects. For example, when designing a campaign site for spring 2007, you can use /spring_campaign/ or similar as the URL for marketing, and redirect the page temporarily (with HTTP status code 302) to /spring_campaign/2007/. Next year, just change the redirect to /spring_campaign/2008/. This way, when someone links to the campaign page, they’ll probably link to the right permanent URL and you can still use short and easy to remember URLs in marketing.
If, for whatever reason, you absolutely must delete a page, inform users about it on the 404 page. If possible, offer them a way to go to some other relevant page.

Require good URLs

Good URLs don’t just happen — you have to design them. Next time when listening to someone pitch about website design, publishing platform or developing framework, ask the person “what about the URLs”. Remember that the question is very much like “is she pregnant”, because it’s not possible to have a little bit customized URLs (well, it is, but it’s pointless) no more than it’s possible to be a little bit pregnant.

In web design, you can cut the corners in many places, but don’t do it with URLs.

Unicode and Django RSS Framework

Unicode issues are the most annoying thing about Django. Here is one workaround for a bug in Django RSS framework.

I have migrated my Ma.gnolia bookmarks and Flickr photos into this site. Both services have tags that have what Django devs call “funky characters”, that is non-ascii characters in them. Getting these into the database unchanged was one pain in the butt itself, but after that, I wanted to make my own feeds for both Ma-gnolia and Flickr tags with Djangos wonderful syndication framework. Turns out that the framework don’t play well with urls that have funky characters.

The problem is in the feed class that adds automatically appropriate ‘http://’ prefixes in front of any urls that need them. On creation, the feed object it is passed with request object that has unencoded path attribute which throws an uncatched exception when there are funky characters in the url. Adding the site domain to it before passing it to the feed class circumvents the problem.

This is my (stripped down) feeds view:

 from django.contrib.syndication.views import feed  def my_feeds(request, url):     from unessanet.links.feeds import *     from unessanet.photos.feeds import *      unessanet_feed_dict = {         'linkit': LatestBookmarks,         'valokuvat': LatestPhotos,         'valokuvatagi': PhotosForTag,     }      # Fixes a bug in syndication framework     request.path = 'http://www.unessa.net' + request.path     return feed(request, url, unessanet_feed_dict)

Now the feeds render properly. Almost.

A feed with an unquoted url does not validate. It may work, but it doesn’t validate. To fix this, just escape the url with quote function found in urllib module.

This is my feed class for photo tags:

 class PhotosForTag(Feed):      description_template = "feeds/latest_photos_description.html"     title_template = "feeds/latest_photos_title.html"      def get_object(self, bits):         if len(bits) != 1:             raise ObjectDoesNotExist         tag = bits[0]         return PhotoTag.objects.get(tag=tag)      def title(self, obj):         return "Unessa.net Valokuvat: %s" % obj.tag      def link(self, obj):         # Quote the url so the feed validates         from urllib import quote         return 'http://www.unessa.net/valokuvat/tagit/%s/' % quote(obj.tag)      def description(self, obj):         return "Unessa.net Valokuvat: %s" % obj.tag      def items(self, obj):         return obj.flickrphoto_set.filter(is_public=True)[:10]

Note that the quoted part of the url must be unicode or otherwise you’ll end up with a broken url. But after these fixes, the feeds work as expected — with or withouth funky characters.

I really, really hope that Django will be converted to use nothing but unicode strings before the long waited 1.0 release.