Internationalizing the Mac OS X terminal

That international characters don't work properly in Terminal.app is fairly annoying when everything else in OS X just works out of the box.

Here is my recipe for fixing it. For good measure we'll fix the X11 xterm and GTK too. I know many people use and love iTerm, but it appears it doesn't have proper support for combining diacritical characters. These instructions work with the Terminal.app that ships with OS X. They also greatly improve on iTerm's capabilities, however it's not perfect.

One of the first things you'll notice when typing an international character in Terminal.app, such as æ, ü, é or whatever, is that it will show up as ae, u and e respectively. This is no good. Make sure Terminal.app is configured correctly, go to its "Window settings..." and make sure that "Wide glyphs for Japanese.." is checked and that Character Set Encoding is set to "Unicode (UTF-8)".

For iTerm, go to the Session Info window and make sure the Encoding setting is "Unicode (UTF-8)"

The culprit here is bash, the default shell that OS X ships with. It's a rather old version with no proper support for UTF-8, the text encoding we want to use. Well, it will actually work, but not when run as the first shell apparently.

To fix this, you'll need to have MacPorts installed (I suppose fink will work as well, but I haven't checked). MacPorts includes a newer version of bash that will work properly.

Installation instructions for MacPorts can be found here. MacPorts was previously named DarwinPorts, and the URL's have since changed, so after installing you should switch to the MacPorts repository by editing /opt/local/etc/ports/ports.conf and replacing rsync_server rsync.darwinports.org with rsync_server rsync.macports.org

Now that MacPorts is installed we can upgrade bash. Go to Terminal.app and type

$ sudo port install bash

After a while, bash will be installed and we can start using it.

First, the new bash needs to be added to the list of allowed shells in the system, edit /ets/shells (for instance by typing "sudo nano /etc/shells") and add the line /opt/local/bin/bash to the end.

Now you can start using the new bash, type

$ chsh

and change the line that reads /bin/bash to /opt/local/bin/bash. I assume you're using nano as the editor as it's default on the system, type Ctrl-X to save and exit.

If you try a new terminal session now you'll notice that international characters still don't work.

We need to tell bash that it should use UTF-8 mode. In the terminal, type

$ nano ~/.profile

and add the lines

export LC_ALL=da_DK.UTF-8 
export LANG=da_DK.UTF-8

to the end. NOTE that this is my preferred locale setting. da means danish language, DK means I'm in Denmark. You'll probably want to change these to your preferred language and location. Try typing

$ locale -a

to see a list of the available locales. The locale you choose should always end in UTF-8.

Now try a new Terminal session and type some international characters. It should work. Also try deleting the line you type to confirm you end up at the end of the prompt and not in the middle of it.

It's still not perfect though. If you try

$ touch æøÜÉ 
$ ls æøÜÉ

you'll notice the output is

$ touch æøÜÉ 
$ ls æøÜÉ 
æø????

That's no good. Again, the reason is the command line tools that ship with OS X, they're fairly old and we need some newer ones.

MacPorts to the rescue once again:

$ sudo port install coreutils +with_default_names

+with_default_names instructs the installer to use the regular names suchs as ls, cp etc. If it's not specified, the tools are installed with names such as gls and gcp. No good. Although, if you prefer, you can still install the tools that way and alias everything but that's a bit of a hassle.

After installing the coreutils, restart the terminal and observe that ls now works properly:

$ touch æøÜÉ 
$ ls æøÜÉ 
æøÜÉ

There still are some minor niggles though, if you try using filename completion on æøÜ, you get nothing. Instead you have to do filename completion on æøU (without the diacritical mark). I have not found a way to fix this, unfortunately.

That's it for the native OS X terminal. If you want to fix the X11 xterm and GTK, read on...

If you launch X11, start xterm (it may already start by default if you haven't messed with your .xinitrc) and try ls, you'll notice that the æøÜÉ file shows up as ????U??E??. Apparently your .profile is not being read, so the PATH variable doesn't include /opt/local/bin. Your .bashrc is being read, so add the line "source ~/.profile" to ~/.bashrc and restart xterm.

Now you should get /opt/local/bin/ls when you type ls.

As with Terminal.app, filename completion is a bit dodgy. If you try using filename completion on æøÜ, you get nothing. Instead you have to do filename completion on æøU (without the diacritical mark).

If you find xterm is not quite good enough, try mlterm instead, which is specifically built with internationalization in mind. It's also available in MacPorts, simply type

$ sudo port install mlterm

You'll probably notice straight away that the font in mlterm looks horrible, Ctrl-right click on mlterm and use font size 14 instead of 16.

And on to GTK

I like using easytag (a program for easily tagging and renaming mp3, ogg and other files). Easytag uses GTK, and with my locale settings easytag displays everything in danish. This is great except that all the UTF-8 strings are displayed as "raw" UTF-8 so the danish characters æ, ø and å show up as illegible two-character strings. Not exactly an ideal situation.

GTK needs to know about UTF-8.

Edit the file ~/.gtkrc (for instance by typing "nano ~/.gtkrc") and put this in it

style "default-text" { 
fontset = "-*-arial-medium-r-normal--*-120-*-*-*-*-iso10646-1, 
-*-helvetica-medium-r-normal--*-120-*-*-*-*-*-*" 
}

class "GtkWidget" style "default-text"

That's it! My system now understand UTF-8, I hope yours does too :)