unessanet Archives - Same Con to Hoyci

When Unessa.net launched back in 2000, the interweb looked quite a bit different. During this time Unessa.net has changed a lot, too. I wrote earlier about taking the jump to a Django-powered site. Here are some experiences from the journey so far.

At first there were only a bunch of static HTML pages. Then came the server-side includes, and soon after that came PHP. At first it was .php3, then .phtml, eventually just .php. In 2006 I moved the site to dedicated server and finally got some long lusted Django-love going on. So now everything is finally good, right? Well, not quite. Unlike most of other sites seem to do, I don’t want to lose the old stuff. I hate linkrot and even more I hate sites that delete (good or bad) content just because it’s convenient to do so. This means that I’m now stuck with this great server full of ancient sh*t that needs to be taken care of. (No, not that way.)

Evolving URLs

After learning Django and Python for about a year now, I’m beginning to understand how great these tools really are. Firstly, Djangos URL dispatcher is fan-freakin’-tastic. It’s very easy to set redirects to old URLs and make custom (and smart) 404 handlers to different parts of the site. It’s also very easy to get realtime information about possible broken links, which is important. Writing custom views to handle legacy URLs semi-smartly is an easy way to get rid of old crufty URLs.

For example, I had a Movable Type installation for my mobile photoblog from 2003 that had URLs like /photoapp/archives/2003/06/28/foo.php. Being perfectionist about URLs, I wanted to evolve these pages to something like /photoapp/2003/06/foo/, which is more logical, much shorter and cruft-free. I exported the MT-data to a new Django app, wrote short URLconf for the old URLs and a view that looks something like this:

 def oldphoto_redirect(request, year, month, day, slug):     """     Redirects old MT-URLs to new format "smartly".     If a correct match is not found, raise a 404.     """     try:         photo = NewPhoto.public_objects.get(date_taken__year=year, date_taken__month=month, date_taken__day=day)         return HttpResponsePermanentRedirect('/photoapp/%s/%s/%s/' % (year, month, photo.photo_id))     except NewPhoto.DoesNotExist:         # If no entries match, raise 404         raise Http404     except AssertionError:         # If many entries match         raise Http404

Now 95% of the old URLs are redirected automagically to new URL, with a correct HTTP status code (301). The rest five percent of the cases are URLs that have more than one post in a single day. They will get a custom 404 page that explains why the pages are moved, where they are, and that I have been informed about this 404. When I get a 404 email from Django, I’ll add these few URLs manually to URLconf. This system healed itself in less than a week. Oh, joy!

(Unfortunately this site is almost entirely in Finnish, but it shouldn’t stop you from browsing trough the new photo site that has been knit together from two different photoblogs, added to Flickr, fully tagged, and enhanced in many ways. Among other things, the new app includes full Flickr catalog duplicated locally on the server, automatic synchronization with Flickr, automatic resizing of images in various sizes (see homepage), favourites and ratings for logged-in users, livesearch feature and much more.)

Harmonizing static content

Second great thing about Django and Python is, well, Python 🙂 There are tons of great libraries for Python. One that I’ve totally fallen in love with is Beautiful Soup. In addition of various PHP and Perl powered dynamic parts of old Unessa.net, there are also hundreds of statical pages that I don’t want to keep on the main site anymore. I’ve started to archive these pages to a dedicated archive, and at the same time I’m officially washing my hands about keeping those pages up to date. As a perfectionist, I want to tell this to my visitors too. But I’m just not going to edit hundreds of these files manually.

I’ve been playing with an idea that I’d process all these static HTML-files with a python script that would do something like:

Add a note about archive status (something like “This page has been archived for historical reasons and is no longer maintained. Current content can be found from the front page.“) in a DIV right after <body>-tag
Parse SSI-includes into the page
Check for broken links and fix all trivial internal links
Convert old image links from /images/* to http://images.unessa.net/*
Validate the final output

This all would actually be fairly easy to do with little help from Beautiful Soup and some mind expanding regexps. And how cool would it be to have over seven years worth of archived static content, all valid and with no broken internal links 🙂

To be continued…

Unessa.net is a personal site and a hobby. These kind of things are fun to do. The best part is that sometimes it’s even more fun to do it for paying customers, in more complex projects and under a strict schedule.

I’ll keep on reporting on my progress with Djangofying Unessa.net. My goal is to have the whole site on Django (meaning that all dynamic data is served by Django and all the other data is archived in some way) by the end of this year. At the moment I’m about 40% there so there’s definitely a lot more to do. If you have any comments or ideas, please share them!

I’ve been very busy past two weeks with Real Work (TM) and configuring my new webserver for Unessa.net. Gonna get PAID, man.

Unessa.net is basically a digital representation of me on the web. It’s almost entirely in Finnish, it has couple of thousand pages, several blogs about various topics, forum, frequently updated web-diary that dates back to 1998, popular podcast and a bunch of other things. It’s a huge mess, and mainly knit together with PHP and MySQL. Moving this beast to a new server is not very easy task, especially for a perfectionist like me; I want to see all the pages working fine on the new server.

The old server is a shared plan, and so I haven’t had very much (well, at all) control over the environment. I also got in trouble with bandwith after I started to do podcasting last summer. The new server is a dedicated one, and so I have full control of everything. This is very nice, but also very nerve wrecking. Anyways, running Django on the new server is a breeze. The difficult part has been, and is still, to figure out just how much of the site should be ported to Django at once. On the other hand, it would be very cool to switch everything to Django right away, but then again, it would also be nice to actually get the site moved to the new server sooner than, say, a decade. I’m hoping to find some kind of a compromise for now.

Being able to do anything you want is amazing. During this couple of weeks, there has been so many “wait.. I can do that, too!1″ moments that even the boring set-up work has been fun. I can’t wait to get that server online so I can see what PAID really means.

Tag: unessanet

Healing Growing Pains With Django

Evolving URLs

Harmonizing static content

To be continued…

Busy Getting PAID