2006-03-18

Python and UTF-8 rant

On Wednesday, Thursday and Friday, I felt compelled to rant and scream and make a fuss.




March 15

I need to emit a rant. You can safely ignore this post.

I'm working on some software for work: an application that will take an uncompressed DAISY book (with WAVE audio) and encode it to a compressed book (with MP3 audio) suitable for distributing on a CD, and create MP3s for online distribution.

I'm doing this in Python, because that's my petfavourite language. However, there's some glitches that keep cropping up, and I'm getting tired of dealing with them. The biggest one is unicode. How come a default installation of Python doesn't just handle unicode so that it just works? I'm constantly having to convert/unconvert unicode to utf-8 or cp1252 or whatever just to get it to not completely fail. This of course leads to lots of repetitive code, something I chose Python to avoid. That wouldn't be so bad in and of itself if the standard libraries also did this. I.e., I'd like to be able to use ConfigParser with utf-8 files. If I don't edit the file by hand, then it's okay, actually, since utf-8 codes for the accented characters that show up in French are the same as Latin-1 or whatever the hell it thinks it is. But when I edit the file in, say, Notepad, it helpfully changes the file to utf-8, which is probably the correct behaviour. Of course, when I try to run the app after this, it throws up a big pile of exception gobbledegook on the screen, because it doesn't understand utf-8 magic number code.

So ... don't use Notepad? Do all my configuration through the interface, which at least works? No, of course not, that's why I chose the ConfigParser module in the first place! I wanted to be able to edit .ini files by hand; that's why they exist.

I'm not quite ready to give upon Python yet, since nearly all the programme is written in it (except for the parts I didn't write myself, like the MP3 encoder). But seriously, Guido et al. need to get working on making Python's unicode handling mature.




Addendum, March 16

So I have it partially figured out. I need to use the codecs module to explicitly specify the encoing of the file. I was already passing around a file object instead of a filename (so I could close the file immediately after having read/written it), so this was pretty easy to do.

Except that it still chokes on files from Windows machines, with the byte order mark at the beginning. Apparantly this problem isn't present for utf-16; it's just the Python handles utf-8 wrong. There's an easy fix, if you're dealing with the unicode string--take off the BOM, if present. Problem is, I'm not dealing with the unicode strings directly--those are being handled by the module. So it looks like I'm going to have to write a wrapper to the codecs.open(f, m, 'utf-8') function just to read and write utf-friggen-8.

Further reading indicates that a utf-8-sig decoder will strip the BOM if present. Although the patch was written last April, it wasn't committed until January, ie after the release of the most recent version.

Also, the utf-8 encoder doesn't give me \r\n line terminators, it only gives \n, which means that the files are not editable in Notepad. Guess what the default application is for opening .ini files in Windows?




Addendum, March 17

Screw all this. I'm tired of wasting my time on this. I've switched over to utf-16, which is the native Windows Unicode encoding, anyway. If Python ever gets the fix in, or if Windows decides to drop the utf-8 BOM, then I'll switch back.

2006-03-03

Officially Happy

I'm officially happy with my new scope.

(I bought a 102 mm Mak-Cas with a German equatorial mount a few months ago. I've taken it out only four times so far because of cold and weather and illness and general laziness.)

This time, I actually let it cool down to ambient temperature before using it (yay!) and I was able to get quite good looks at Saturn and the Orion nebula. I also tried Mars and the Pleides, but they were just above the roof, so the seeing was awful. Orion was the best I've ever seen in the city. I could make out a little bit of structure, and with averted vision some contrast between lighter and darker areas. Saturn was directly overhead, so I got a pretty good view. With my higher-magnification eyepiece, I was able to see a bit of contrast between the rings and the surface of the planet, and a little bit of the shadow on the rings.

I'm still not really happy with the mount. Partly this is because I'm not used to the German equatorial design (the old Questar I used to use has a fork mount), but it is a little rickety. I'm never sure if the scope is mounted securely, and most of the screws don't screw tight without more mnual strength than I have. But since the deal at Khan's basically got me the mount for free, I'm not going to complain too much.