If you're anything like me, you use PuTTY to connect to a lot of the servers you manage. But, if you're seeing odd characters in your man pages, you might need this little fix.
PuTTY has a lot of options available to its users. Some that I use all the time are the option to change my background and foreground colors and the option to change my font size. The colors help to remind me which of my servers I'm logged into at the time and the font makes the display easier to read, especially if I'm trying to demonstrate Unix commands to one of my classes. I had noticed, however, that man pages that I displayed included some odd characters and figured I should look into the cause and see if I could fix the problem. As it turns out, the fix was quite easy. So, here's my fix and some explanation of why it was needed.
One of PuTTY's options which I had been overlooking is the Translation option. This option controls how PuTTY will display the data that it receives from the system it's connecting to. You get to choose what character set it uses for this translation, but the default is ISO-8859-1:1998. This encoding scheme is one of the ISO/IEC 8859 series of ASCII-based character encodings. Basically, PuTTY looks at the data sent from the far side of the connection and uses this standard to determine how to display it in your PuTTY window.
The far side of the connection, however, may not be encoding things using the same standard as PuTTY, so you just might run into some oddities in what you see. The fix for me was to choose the encoding that works with the Linux systems that I was connecting to. When I changed my encoding to UTF-8, the man pages looked just fine and the â characters that were getting on my nerves turned into the nicely behaved single quotes that I was expecting to see.
UTF-8 is a character encoding scheme which is said to be backward compatible with ASCII. It's also the dominant character encoding scheme on the web. It's an interesting standard in that it can use anywhere from one to four bytes per character. It's as compact as ASCII (one byte per character) if you're using only English characters, but it's flexible and can handle a wide variety of characters as needed. From basic Latin through to arrows and dingbats, it can encode Greek, Coptic, Cyrillic, currency symbols, etc.
The first 1-5 bits of each character's first (and maybe only) byte determine how many bytes must be used to interpret the character. If the first (highest value) bit is a 0, the character will be represented in the following seven bits. If the first bit is a 1, on the other hand, some additional bits will determine how many bytes are used.
Byte 1 # of bytes needed 0xxxxxxx -- one byte 110xxxxx -- two bytes 1110xxxx -- three bytes 11110xxx -- four bytes
If I use the od (octal dump) command to display a portion of the man page for the top command, I see this in its output:
0000340 157 156 040 164 157 160 342 200 231 163 o n t o p 342 200 231 s
That 342 200 231 sequence that follows the word "top" is actually a right single quote mark. If you were to look at it in binary, it would look like this:
11100010 10000000 10011001
Compare that to the chart above.
Since the sequence starts with 1110, the requirement for three bytes should come as no surprise. The rightmost part of the string, if interpreted as UTF-8, shows up as top's.
By the way, another little trick for looking at your character codes applies when you're editing a file with vim. Say you have the text "abc123" in a file that you're editing. Press %!xxd at the : prompt and you'll see both the content and its hex representaion.
:%!xxd
You'll see this:
0000000: 6162 6331 3233 0a abc123.
You can press u to go back to your normal display mode.
Modifying your PuTTY setting, as I've said above, is easy. Select UTF-8 as the translate option, then select your connection from the Session listing and click Save. From that point on, your man pages should look just the way the authors intended.