Django: Say Hello to Unicode

After weeks of testing, the Django unicode-branch was merged into trunk today. This changeset brings huge improvements to unicode-awareness of Django and it also fixes a lot of unicode-related bugs. From the announcement at django-users list:

This should be backwards-compatible for all practical purposes (providing you only use ASCII data). The only real difference you will notice in that case is that model fields are Unicode strings instead of bytestrings in type, but since they are ASCII data anyway, that shouldn’t make any real difference.

See Unicode data in Django and Porting Applications (The Quick Checklist) for more.

Furthermore, there was also another great commit today fixing a bug that has always been in top five of my personal “The things I hate most about Django”-list. Changeset 5608 adds finally “unicode-aware slugify filter (in Python) and better non-ASCII handling for the Javascript slug creator in admin”. Until today, slugify-function converted a typical non-english title like “Tässä on älyttömästi ääkkösiä” into (totally unreadable) “tss-on-lyttmsti-kksi” which of course sucks big time when every other slugify function on the planet makes it something like “tassa-on-alyttomasti-aakkosia” (which is totally readable).

I’m really, really happy that Django is slowly but firmly maturing into a unicode-friendly framework. Kudos for Malcolm Tredinnick for his huge efforts on the unicode-branch and also big thanks to everyone who helped with testing and bugfixes!

Django unicode-branch: testers wanted

The long-waited unicode-branch is finally at a stage that wider community testing is needed. Read the notification at django-users mailing list.

Malcolm has done terriffic job with the branch and there are already fairly solid documentation available. For most people, the short checklist (five steps, maximum!) is all you need to convert your applications to handle unicode well. If you want more information, check the detailed documentation from the trunk.

Using this branch means an end for the numerous unicode-related problems (for most of them, anyway) when using Django. So, this is a must for every djangonaut who is living in the Real World 😉

Go on, get on with it! 🙂

Unicode and Django RSS Framework

Unicode issues are the most annoying thing about Django. Here is one workaround for a bug in Django RSS framework.

I have migrated my Ma.gnolia bookmarks and Flickr photos into this site. Both services have tags that have what Django devs call “funky characters”, that is non-ascii characters in them. Getting these into the database unchanged was one pain in the butt itself, but after that, I wanted to make my own feeds for both Ma-gnolia and Flickr tags with Djangos wonderful syndication framework. Turns out that the framework don’t play well with urls that have funky characters.

The problem is in the feed class that adds automatically appropriate ‘http://’ prefixes in front of any urls that need them. On creation, the feed object it is passed with request object that has unencoded path attribute which throws an uncatched exception when there are funky characters in the url. Adding the site domain to it before passing it to the feed class circumvents the problem.

This is my (stripped down) feeds view:

 from django.contrib.syndication.views import feed  def my_feeds(request, url):     from unessanet.links.feeds import *     from unessanet.photos.feeds import *      unessanet_feed_dict = {         'linkit': LatestBookmarks,         'valokuvat': LatestPhotos,         'valokuvatagi': PhotosForTag,     }      # Fixes a bug in syndication framework     request.path = 'http://www.unessa.net' + request.path     return feed(request, url, unessanet_feed_dict)

Now the feeds render properly. Almost.

A feed with an unquoted url does not validate. It may work, but it doesn’t validate. To fix this, just escape the url with quote function found in urllib module.

This is my feed class for photo tags:

 class PhotosForTag(Feed):      description_template = "feeds/latest_photos_description.html"     title_template = "feeds/latest_photos_title.html"      def get_object(self, bits):         if len(bits) != 1:             raise ObjectDoesNotExist         tag = bits[0]         return PhotoTag.objects.get(tag=tag)      def title(self, obj):         return "Unessa.net Valokuvat: %s" % obj.tag      def link(self, obj):         # Quote the url so the feed validates         from urllib import quote         return 'http://www.unessa.net/valokuvat/tagit/%s/' % quote(obj.tag)      def description(self, obj):         return "Unessa.net Valokuvat: %s" % obj.tag      def items(self, obj):         return obj.flickrphoto_set.filter(is_public=True)[:10]

Note that the quoted part of the url must be unicode or otherwise you’ll end up with a broken url. But after these fixes, the feeds work as expected — with or withouth funky characters.

I really, really hope that Django will be converted to use nothing but unicode strings before the long waited 1.0 release.