Back when I started working with computers, understanding the nature of ASCII was exciting. In fact, just knowing how to convert binary to hex was fun.\nThat was a lot of years ago \u2014 berfore ASCII had yet reached drinking age \u2014\u00a0but character encoding standards are as important as ever today with the internet being so much a part of our business and our personal lives. They're also more complex and more numerous than you might imagine. So, let\u2019s dive into some of the details of what ASCII is and some of the commands that make it easier to see coding standards in action.\nWhy ASCII?\nASCII came about to circumvent the problem that different types of electronic systems were storing text in different ways. They all used some form of ones and zeroes (or ONs and OFFs), but the issue of compatibility became important when they needed to interact. So, ASCII was developed primarily to provide encoding consistency. It became a standard in the U.S. in 1960. Initially, ASCII characters used only 7 bits. Some years later, ASCII was extended to use all 8 bits in each byte.\nThat said, it is important to understand that ASCII, the American Standard Code for Information Interchange is not used on all computers. In fact, most Linux systems today use UTF-8 \u2014 a standard closely related to ASCII but not quite identical. In UTF-8, the classic ASCII characters are encoded in 7 bits and characters with greater values use two bytes.\nSome of the more important encoding standards in use today include:\n\nASCII \u2014 Most widely used for English before 2000\nUTF-8 \u2014 Used in Linux by default along with much of the internet\nUTF-16 \u2014 Used by Microsoft Windows, Mac OS X file systems and others\nGB 18030 \u2014 Used in China (contains all Unicode chars)\nEUC-JP (Extended Unix Code) \u2014 Used in Japan\nIEC 8859 series \u2014 Used for most European languages\n\nAccording to one source that I describe below, however, there are as many as 1,173 different encoding schemes in use today.\nViewing an ASCII translation table\nOne of the easiest ways to display an ASCII table on Linux systems is to use the man ASCII or man ascii command. Within the body of the page displayed, you will see a table that starts like this:\n Oct Dec Hex Char Oct Dec Hex Char\n \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n 000 0 00 NUL '' (null character) 100 64 40 @\n 001 1 01 SOH (start of heading) 101 65 41 A\n 002 2 02 STX (start of text) 102 66 42 B\n 003 3 03 ETX (end of text) 103 67 43 C\n 004 4 04 EOT (end of transmission) 104 68 44 D\n 005 5 05 ENQ (enquiry) 105 69 45 E\n 006 6 06 ACK (acknowledge) 106 70 46 F\n 007 7 07 BEL 'a' (bell) 107 71 47 G\n 010 8 08 BS 'b' (backspace) 110 72 48 H\n 011 9 09 HT 't' (horizontal tab) 111 73 49 I\n\nNotice that the table is split into two 4-column displays. The right side in the display is in bold font above\u00a0(depending on your browser) to make this more clear. Each side displays the octal, decimal, hexadecimal and character representations a series of characters. The letter "I" (bottom right) is shown as being 1001001 in binary is 111 in octal, 73 in decimal, and 49 in hex.\nLooking at file content\nTo display the content of a file in some other format than its character (ASCII) format, you could use any of a number of different commands. These include od (octal dump), hexdump,\u00a0xxd and iconv.\nod\nThe od -bc command will display a file in both octal and character format. The 12 at the end is the newline character that ends the single line of text.\n$ cat testing\nTesting 1 2 3\n$ od -bc testing\n0000000 124 145 163 164 151 156 147 040 061 040 062 040 063 012\n T e s t i n g 1 2 3 n\n0000016\n\nTo view the same file in hex, you could use this command, though you'll probably notice that the characters in each two-letter set are swapped. For example, T=54 and e=65 so you might expect to see "5465" instead of "6554".\n$ od -hc \/tmp\/testing\n0000000 6554 7473 6e69 2067 2031 2032 0a33\n T e s t i n g 1 2 3 n\n\nAdding an "endian" specification to the od command gets around this issue:\n$ od -xc --endian=big testing\n0000000 5465 7374 696e 6720 3120 3220 330a\n T e s t i n g 1 2 3 n\n0000016\n\nThe big-endian and little-endian designation refers to whether the data values are ordered with the most significant ( big-endian ) or least significant ( little-endian ) byte first.\nThe command below shows the same text in octal. Keep in mind that octal 124 is 01 010 100 in binary and 54 (0101 0100) in hex \u2014 same values, different way of expressing.\n$ echo Testing 1 2 3 | od -bc\n0000000 124 145 163 164 151 156 147 040 061 040 062 040 063 012\n T e s t i n g 1 2 3 n\n0000016\n\nhexdump\nAnother useful command is hexdump. In the examples below, we see hexdump displaying the file in hex, character and octal format.\nhex\n$ hexdump testing\n0000000 6554 7473 6e69 2067 2031 2032 0a33\n000000e\n\ncharacter\n$ hexdump -c testing\n0000000 T e s t i n g 1 2 3 n\n000000e\n\none-byte octal\n$ hexdump -b testing\n0000000 124 145 163 164 151 156 147 040 061 040 062 040 063 012\n000000e\n\nxxd\nThe xxd is a command that creates a hex dump or converts a hex dump to some other format. It displays a file in big-endian format by default.\n$ xxd testing\n00000000: 5465 7374 696e 6720 3120 3220 330a Testing 1 2 3.\n\n$ echo "Testing 1 2 3" | xxd\n00000000: 5465 7374 696e 6720 3120 3220 330a Testing 1 2 3.\n\nIn continuous hex dump style:\n$ echo "Testing 1 2 3" | xxd -p\n54657374696e672031203220330a\n$ echo 54657374696e672031203220330 | xxd -r -p\nTesting 1 2 3\n\niconv\nThe iconv command will translate content from one character encoding to another. This is the command that, as I promised earlier, suggests that there are 1,173 different encoding schemes. Let's see why.\nThe --list option gets the command to list all encoding schemes.\n$ iconv --list | wc -l\n1173\n\nYou can get a list of them using the iconv --list command or focus solely on the UTF* schemes with this command:\n$ iconv --list | grep UTF\nISO-10646\/UTF-8\/\nISO-10646\/UTF8\/\nUTF-7\/\/\nUTF-8\/\/\nUTF-16\/\/\nUTF-16BE\/\/\nUTF-16LE\/\/\nUTF-32\/\/\nUTF-32BE\/\/\nUTF-32LE\/\/\nUTF7\/\/\nUTF8\/\/\nUTF16\/\/\nUTF16BE\/\/\nUTF16LE\/\/\nUTF32\/\/\nUTF32BE\/\/\nUTF32LE\/\/\n\nThe iconv command converts data between different formats using the syntax:\niconv [-f from-encoding] [-t to-encoding] [inputfile]\n\nHere's an example of iconv in action. Our initial file is a copy of the testing file that I'm calling testing8.\n$ cat testing8\nTesting 1 2 3\n\nWe use the iconv command to convert the file to UTF16 format:\n$ iconv -f utf8 -t utf16 testing8 > testing16\n\nA file listing shows that the resultant file is just over twice the size of the original.\n$ ls -l testing*\n-rw-rw-r-- 1 shs shs 30 Dec 22 11:06 testing16\n-rw-r--r-- 1 shs shs 14 Dec 22 11:05 testing8\n\nOut of curiosity, we look at the new file and every other byte is "000". The data we're displaying isn't making use of the extra byte. We also see that two bytes were tacked onto the beginning of the file. These two bytes are intended to indicate the endianness of the UTF16 format and are not treated as characters.\n$ od -bc testing16\n0000000 377 376 124 000 145 000 163 000 164 000 151 000 156 000 147 000\n 377 376 T e s t i n g \n0000020 040 000 061 000 040 000 062 000 040 000 063 000 012 000\n 1 2 3 n \n0000036\n\nCharacter encoding is a much larger and more complex issue than ASCII. Fortunately, Linux offers nice tools that allow you to peer into coding schemes and see what happens when you convert one to another.